Removing spans/divs with style attributes from HTML

Occasionally I have to clean up some HTML code – mostly because parts of it were pasted into a CMS like WordPress from rich text editor like Word.

I’ve noticed that the formatting I want to remove is mostly based on span and div elements with a style attribute. Therefore, I’ve written a simple Python script based on BeautifulSoup4 which will replace certain tags with their contents if they have a style attribute. While in some cases some other formatting might be destroyed by such a script, it is very useful for some recurring usecases.

Mehr lesen

Advantages and disadvantages of hugepages

In a previous post, I’ve written about how to check and enable transparent hugepages in Linux globally.

Although this post is important if you actually have a usecase for hugepages, I’ve seen multiple people getting fooled by the prospect that hugepages will magically increase performance. However, hugepaging is a complex topic and, if used in the wrong way, might easily decrease overall performance. Mehr lesen

In-place trimming/stripping in C

For an explanation of in-place algorithms see my previous post on zero-copy in-place splitting

The problem

You have a C string possibly containing whitespace at the beginning and/or the end.

char* s = " abc   \n\r";

Using an in-place algorithm, you want to remove the whitespace from this string.

Doing this is also possible using boost::algorithm::trim, but it has the same caveats as boost::algorithm::split as discussed in my previous post about C splitting Mehr lesen

How to interpret smartctl messages like ‘Error: UNC at LBA’?

When running smartctl on your hard drive, you often get a plethora of information that can be hard to interpret for unexperienced users. This post attempts to provide aid in interpreting what the technical reasons behind the error messages are. If you’re looking for advice on whether to replace your hard drive, the only guidance I can give you is it might fail any time, so better backup your data, but it might also run for many years to come.. Furthermore, this article does not describe basic SMART WHEN_FAILED checking but rather interpretation of more subtle signs of possibly impending HDD failures.

Mehr lesen

Accurate calculation of PT100/PT1000 temperature from resistance

TL;DR for impatient readers

PT100/PT1000 temperatures calculation suffers from accuracy issues for large sub-zero temperatures. UliEngineering implements a polynomial-fit based algorithm to provide 58.6 \mu{\degree}C peak-error over the full defined temperature range from -200 {\degree}C to +850 °C.

Use this code snippet (replace pt1000_ by pt100- to use PT100 coefficients) to compute an accurate temperature (in degrees celsius) e.g. for a resistane of 829.91 Ω of a PT1000 sensor.

from UliEngineering.Physics.RTD import pt1000_temperature
# The following calls are equivalent and print -43.2316359463
print(pt1000_temperature("829.91 Ω"))
print(pt1000_temperature(829.91))

You install the library (compatible to Python 3.2+) using

$ pip3 install git+https://github.com/ulikoehler/UliEngineering.git

Mehr lesen

Reusing your calendars, the pythonic way

Yesterday got a calendar for 2016. An interesting question came up my mind: When can I reuse this calendar, and for which year can I reuse which old calendar?

The 1st January 2015 was a Thursday. The same day in 2016 is a Friday. Once you follow this pattery you will quickly recognize that the base period of seven years is disrupted by leap years.

It quickly turns out that for some years it takes decades until you can reuse a calendar: 2016 is a leap yer, so you can not reuse it for 2044.

However, there’s a neat quirk that is currently unimplemented in online services like whencanireusethiscalendar.com: You can partially reuse a calendar.

Mehr lesen

Automated domain name extraction from Let’s Encrypt certificate transparency logs

A few days ago, Let’s Encrypt into public beta. At the time of writing this article, almost 120k certificateshave been issued, including the certificate for TechOverflow.

I really like the Let’s Encrypt service and I believe it might actually change the way people perceive HTTPS encryption. However, there is one rarely-mentioned side-effect when protecting your domains with their certificates.

Let’s Encrypt publishes certificate transparency logs at crt.sh. This transparency does not come without side-effects, however: crt.sh effectively publishes.

In other words, hiding sites from the public by not publishing their (sub-)domain names anywhere will not work when you issue a certificate for the domain on services like Let’s Encrypt.

Mehr lesen