Python

How to load sample dataset in Pandas

This code loads an example dataset in Pandas but requires an internet connection:

import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

This is how the iris dataset looks like:

For a more detailed answer with other options on how to import example datasets, see this StackOverflow post.

Posted by Uli Köhler in pandas, Python

How to format IPv6 address in groups of 32 bits in Python

def format_ipv6addr_group2(addr):
    """
    Format IPv6 addresses in 4 \n-separated groups of 32 bits
    Returns a string
    """
    addr_s = str(addr.exploded)
    return f"{addr_s[:10]}\n{addr_s[10:20]}\n{addr_s[20:30]}\n{addr_s[30:40]}"

Usage example:

addr = ipaddress.IPv6Address("fd66:6cbb:8c10:1234:4567:89ab:cdef:0123")
print(format_ipv6addr_group2(addr))

Output:

fd66:6cbb:
8c10:1234:
4567:89ab:
cdef:0123

 

Posted by Uli Köhler in Python

How to generate random IPv6 addresses in a given network using Python

This code generates random IPv6 addresses in a given network using Python’s ipaddress module:

import ipaddress
import random

def random_ipv6_addr(network):
    """
    Generate a random IPv6 address in the given network
    Example: random_ipv6_addr("fd66:6cbb:8c10::/48")
    Returns an IPv6Address object.
    """
    net = ipaddress.IPv6Network(network)
    # Which of the network.num_addresses we want to select?
    addr_no = random.randint(0, net.num_addresses)
    # Create the random address by converting to a 128-bit integer, adding addr_no and converting back
    network_int = int.from_bytes(net.network_address.packed, byteorder="big")
    addr_int = network_int + addr_no
    addr = ipaddress.IPv6Address(addr_int.to_bytes(16, byteorder="big"))
    return addr

# Usage example
print(random_ipv6_addr("fdce:4879:a1e9::/48"))
# Prints e.g. fdce:4879:a1e9:e351:1a01:be9:4d9a:157d

It works by first converting the IPv6 network address to binary and then adding a random host number. After that, it will be converted back to an IPv6Address object.

Posted by Uli Köhler in Networking, Python

How to modify file inside a ZIP file using Python

Python provides the zipfile module to read and write ZIP files. Our previous posts Python example: List files in ZIP archive and Downloading & reading a ZIP file in memory using Python show how to list and read files inside a ZIP file.

In this example, we will show how to copy the files from one ZIP file to another and modify one of the files in the progress. This is often the case if you want to use ZIP file formats like ODT or LBX as templates, replacing parts of the text content of a file.

import zipfile

with zipfile.ZipFile(srcfile) as inzip, zipfile.ZipFile(dstfile, "w") as outzip:
    # Iterate the input files
    for inzipinfo in inzip.infolist():
        # Read input file
        with inzip.open(inzipinfo) as infile:
            if inzipinfo.filename == "test.txt":
                content = infile.read()
                # Modify the content of the file by replacing a string
                content = content.replace("abc", "123")
                # Write conte
                outzip.writestr(inzipinfo.filename, content)
            else: # Other file, dont want to modify => just copy it
                

After opening both the input file and the output ZIP using

with zipfile.ZipFile(srcfile) as inzip, zipfile.ZipFile(dstfile, "w") as outzip:

we iterate through all the files in the input ZIP file:

for inzipinfo in inzip.infolist():

In case we’ve encountered the file we want to modify, which is identified by it’s filename test.txt:

if inzipinfo.filename == "test.txt":

we read and modify the content ….

with inzip.open(inzipinfo) as infile:
    content = infile.read().replace("abc", "123")

… and write the modified content to the output ZIP:

outzip.writestr("test.txt", content)

Otherwise, if the current file is not the file we want to modify,  we just copy the file to the output ZIP using

outzip.writestr(inzipinfo.filename, infile.read())

Note that the algorithm will always .read() the file from the input ZIP, hence its entire content will be temporarily stored in memory. Therefore, it doesn’t work well for files which are large when uncompressed.

Posted by Uli Köhler in Python

How to fix Python asyncio RuntimeError: There is no current event loop in thread …

Problem:

You are trying to run your Python application, but you see an error message like

Traceback (most recent call last):
  [...]
  File "/mnt/KATranslationCheck/CrowdinLogin.py", line 38, in get_crowdin_tokens
    return asyncio.get_event_loop().run_until_complete(async_get_crowdin_tokens(username, password))
  File "/usr/lib/python3.6/asyncio/events.py", line 694, in get_event_loop
    return get_event_loop_policy().get_event_loop()
  File "/usr/lib/python3.6/asyncio/events.py", line 602, in get_event_loop
    % threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'worker 3'.

Solution:

You are trying to run asyncio.get_event_loop() in some thread other than the main thread – however, asyncio only generates an event loop for the main thread.

Use this function instead of asyncio.get_event_loop():

import asyncio

def get_or_create_eventloop():
    try:
        return asyncio.get_event_loop()
    except RuntimeError as ex:
        if "There is no current event loop in thread" in str(ex):
            loop = asyncio.new_event_loop()
            asyncio.set_event_loop(loop)
            return asyncio.get_event_loop()

It will first try asyncio.get_event_loop(). In case that doesn’t work, it will generate a new event loop for the current thread using

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)

and then returns this event loop.

Note that while this works well for generating the event loop, but depending on the way you use the event loop, you might encounter further error messages like

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/launcher.py", line 305, in launch
    return await Launcher(options, **kwargs).launch()
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/launcher.py", line 157, in launch
    signal.signal(signal.SIGINT, _close_process)
  File "/usr/lib/python3.6/signal.py", line 47, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread

that are usually not as easy to fix and require some restructuring of your program.

Posted by Uli Köhler in Python

How to fix pyppeteer pyppeteer.errors.BrowserError: Browser closed unexpectedly:

Problem:

You want to run your Pyppeteer application on Linux, but you see an error message like

Traceback (most recent call last):
  File "PyppeteerExample.py", line 15, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
    return future.result()
  File "PyppeteerExample.py", line 6, in main
    browser = await launch()
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/launcher.py", line 305, in launch
    return await Launcher(options, **kwargs).launch()
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/launcher.py", line 166, in launch
    self.browserWSEndpoint = get_ws_endpoint(self.url)
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/launcher.py", line 225, in get_ws_endpoint
    raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

Solution:

In most cases, the underlying error for this error message is Puppetteer’s libX11-xcb.so.1: cannot open shared object file: No such file or directory. In order to fix that, you need to install dependency libraries for Chromium which is used internally by Puppeteer / Pyppeteer:

sudo apt install -y gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget

 

Posted by Uli Köhler in Pyppeteer, Python

How to get hostmask/netmask for given prefix length in Python

In order to get the host mask for e.g. a /112 IPv6 prefix, use:

import ipaddress
# Get netmask for a /112 prefix
ipaddress.IPv6Network("::/112").netmask

# Get host mask for a /112 prefix
ipaddress.IPv6Network("::/112").hostmask

 

Posted by Uli Köhler in Networking, Python

How to fix Python3 TypeError: unsupported operand type(s) for &: ‘bytes’ and ‘bytes’

Problem:

You want to perform bitwise boolean operations on bytes() arrays in Python, but you see an error message like

TypeError: unsupported operand type(s) for &: 'bytes' and 'bytes'

or

TypeError: unsupported operand type(s) for |: 'bytes' and 'bytes'

or

TypeError: unsupported operand type(s) for ^: 'bytes' and 'bytes'

Solution:

Python can’t perform bitwise operations directly on byte arrays. However, you can use the code from How to perform bitwise boolean operations on bytes() in Python3:

def bitwise_and_bytes(a, b):
    result_int = int.from_bytes(a, byteorder="big") & int.from_bytes(b, byteorder="big")
    return result_int.to_bytes(max(len(a), len(b)), byteorder="big")

def bitwise_or_bytes(a, b):
    result_int = int.from_bytes(a, byteorder="big") | int.from_bytes(b, byteorder="big")
    return result_int.to_bytes(max(len(a), len(b)), byteorder="big")

def bitwise_xor_bytes(a, b):
    result_int = int.from_bytes(a, byteorder="big") ^ int.from_bytes(b, byteorder="big")
    return result_int.to_bytes(max(len(a), len(b)), byteorder="big")

# Example usage:

a = bytes([0x00, 0x01, 0x02, 0x03])
b = bytes([0x03, 0x02, 0x01, 0xff])

print(bitwise_and_bytes(a, b)) # b'\x00\x00\x00\x03'
print(bitwise_or_bytes(a, b)) # b'\x03\x03\x03\xff'
print(bitwise_xor_bytes(a, b)) # b'\x03\x03\x03\xfc'
Posted by Uli Köhler in Python

How to perform bitwise boolean operations on bytes() in Python3

Performing bitwise operations on bytes() instances in Python3.2+ is easy but not straightforward:

  1. Use int.from_bytes(...) to acquire an integer representing the byte array
  2. Perform bitwise operations with said integer
  3. Use result.to_bytes(...) to convert back the integer to a bytes() array

Note that for the result to make any sense, you need to ensure that both bytes() instances have the same length.

Python code:

def bitwise_and_bytes(a, b):
    result_int = int.from_bytes(a, byteorder="big") & int.from_bytes(b, byteorder="big")
    return result_int.to_bytes(max(len(a), len(b)), byteorder="big")

def bitwise_or_bytes(a, b):
    result_int = int.from_bytes(a, byteorder="big") | int.from_bytes(b, byteorder="big")
    return result_int.to_bytes(max(len(a), len(b)), byteorder="big")

def bitwise_xor_bytes(a, b):
    result_int = int.from_bytes(a, byteorder="big") ^ int.from_bytes(b, byteorder="big")
    return result_int.to_bytes(max(len(a), len(b)), byteorder="big")

Example usage:

a = bytes([0x00, 0x01, 0x02, 0x03])
b = bytes([0x03, 0x02, 0x01, 0xff])

print(bitwise_and_bytes(a, b)) # b'\x00\x00\x00\x03'
print(bitwise_or_bytes(a, b)) # b'\x03\x03\x03\xff'
print(bitwise_xor_bytes(a, b)) # b'\x03\x03\x03\xfc'

 

Posted by Uli Köhler in Python

Bitwise operation with IPv6 addresses and networks in Python

Python3 features the easy-to-use ipaddress library providing many calculations. However, bitwise boolean operators are not available for addresses.

This post shows you how to perform bitwise operations with IPv6Address() objects. We’ll use the following strategy:

  1. Use .packed to get a binary bytes() instance of the IP address
  2. Use int.from_bytes() to acquire an integer representing the binary address
  3. Perform bitwise operations with said integer
  4. Use result.to_bytes(16, ...) to convert back the integer to a bytes() array in the correct byte order
  5. Construct an IPv6Address() object from the resulting byte array.

Python code:

import ipaddress

def bitwise_and_ipv6(addr1, addr2):
    result_int = int.from_bytes(addr1.packed, byteorder="big") & int.from_bytes(addr2.packed, byteorder="big")
    return ipaddress.IPv6Address(result_int.to_bytes(16, byteorder="big"))

def bitwise_or_ipv6(addr1, addr2):
    result_int = int.from_bytes(addr1.packed, byteorder="big") | int.from_bytes(addr2.packed, byteorder="big")
    return ipaddress.IPv6Address(result_int.to_bytes(16, byteorder="big"))

def bitwise_xor_ipv6(addr1, addr2):
    result_int = int.from_bytes(addr1.packed, byteorder="big") ^ int.from_bytes(addr2.packed, byteorder="big")
    return ipaddress.IPv6Address(result_int.to_bytes(16, byteorder="big"))

Example usage:

a = ipaddress.IPv6Address('2001:16b8:2703:8835:9ec7:a6ff:febe:96b1')
b = ipaddress.IPv6Address('2001:16b8:2703:4241:9ec7:a6ff:febe:96b1')

print(bitwise_and_ipv6(a, b)) # IPv6Address('2001:16b8:2703:1:9ec7:a6ff:febe:96b1')
print(bitwise_or_ipv6(a, b)) # IPv6Address('2001:16b8:2703:ca75:9ec7:a6ff:febe:96b1')
print(bitwise_xor_ipv6(a, b)) # IPv6Address('0:0:0:ca74::')

Similarly, you can use the code in order to manipulate IPv6Network() instances:

a = ipaddress.IPv6Network('2001:16b8:2703:8835:9ec7:a6ff:febe::/112')
b = ipaddress.IPv6Network('2001:16b8:2703:4241:9ec7:a6ff:febe::/112')

print(bitwise_and_ipv6(a.network_address, b.network_address)) # IPv6Address('2001:16b8:2703:1:9ec7:a6ff:febe:0')
print(bitwise_or_ipv6(a.network_address, b.network_address)) # IPv6Address('2001:16b8:2703:ca75:9ec7:a6ff:febe:0')
print(bitwise_xor_ipv6(a.network_address, b.network_address)) # IPv6Address('0:0:0:ca74::')

Note that the return type will always be IPv6Address() and never IPv6Network() since the result of the bitwise operation doesn’t have any netmask associated with it.

Besides .network_address you can also use other properties of IPv6Address() instances like .broadcast_address or .hostmask or .netmask.

Posted by Uli Köhler in Networking, Python

How to get current page URL in pyppeteer

In pyppeteer you can use

url = await page.evaluate("() => window.location.href")

in order to get the current URL. Note that page.evaluate() runs whatever Javascript your give it – hence you can use your Javascript skills in order to create the desired effect.

Full example

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://www.techoverflow.net')

    # Get the URL and print it
    url = await page.evaluate("() => window.location.href")
    print(url) # prints https://www.techoverflow.net/

    # Cleanuip
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

 

Posted by Uli Köhler in Pyppeteer, Python

How simulate click using pyppeteer

In order to click a button or a link using the the pyppeteer library, you can use page.evaluate().

If you have an <button> element or a link (<a>) like

<button id="mybutton">

you can use

# Now click the search button    
await page.evaluate(f"""() => {{
    document.getElementById('mybutton').dispatchEvent(new MouseEvent('click', {{
        bubbles: true,
        cancelable: true,
        view: window
    }}));
}}""")

in order to generate a MouseEvent that simulates a click. Note that page.evaluate() will run any Javascript code you pass to it, so you can use your Javascript skills in order to create the desired effect

Also see https://gomakethings.com/how-to-simulate-a-click-event-with-javascript/ for more details on how to simulate mouse clicks in pure Javascript without relying on jQuery.

Note that page.evaluate() will just run any Javascript code you give it, so you can put your Javascript skills to use in order to manipulate the page.

Full example

This example will open https://techoverflow.net, enter a search term into the search field, click the search button and then create a screenshot

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://techoverflow.net')

    # Fill content into the search field
    content = "pypetteer"
    await page.evaluate(f"""() => {{
        document.getElementById('s').value = '{content}';
    }}""")

    # Now click the search button    
    await page.evaluate(f"""() => {{
        document.getElementById('searchsubmit').dispatchEvent(new MouseEvent('click', {{
            bubbles: true,
            cancelable: true,
            view: window
        }}));
    }}""")

    # Wait until search results page has been loaded
    await page.waitForSelector(".archive-title")

    # Now take screenshot and exit
    await page.screenshot({'path': 'screenshot.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

The result will look like this:

Posted by Uli Köhler in Pyppeteer, Python

How to fill <input> field using pyppeteer

In order to fill an input field using the pyppeteer library, you can use page.evaluate().

If you have an <input> element like

<input name="myinput" id="myinput" type="text">

you can use

content = "My content" # This will be filled into <input id="myinput"> !
await page.evaluate(f"""() => {{
    document.getElementById('myinput').value = '{content}';
}}""")

Note that page.evaluate() will just run any Javascript code you give it, so you can put your Javascript skills to use in order to manipulate the page.

Full example

This example will open https://techoverflow.net, enter a search term into the search field and then create a screenshot

#!/usr/bin/env python3
import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://techoverflow.net')
    
    # This example fills content into the search field
    content = "My search term"
    await page.evaluate(f"""() => {{
        document.getElementById('s').value = '{content}';
    }}""")

    # Make screenshot
    await page.screenshot({'path': 'screenshot.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

The result will look like this:

Posted by Uli Köhler in Pyppeteer, Python

How to get duration of WAV file in Python (minimal example)

Use

duration_seconds = mywav.getnframes() / mywav.getframerate()

to get the duration of a WAV file in seconds.

Full example:

import wave

with wave.open("myaudio.wav") as mywav:
    duration_seconds = mywav.getnframes() / mywav.getframerate()
    print(f"Length of the WAV file: {duration_seconds:.1f} s")

 

Posted by Uli Köhler in Audio, Python

How to create filename containing date/time in Python

In datalogging quiten often you have to create a new log file once you start to log data.

Often it’s convenient to include the current date and time in the log file.  In Python, this is pretty easy to do:

from datetime import datetime

filename = f"Temperature log-{datetime.now():%Y-%m-%d %H-%m-%d}.csv"

This will create filenames like

Temperature log 2020-06-17 22-37-41.csv
Temperature log 2019-12-31 00-15-55.csv

Note that if you use another Date/time format, you need to avoid special characters that must not occur in filenames. The rules for which filename is correct are much easier on Linux than on Windows, but since you should be compatible with both operating systems, you should always check the Windows rules.

These characters are forbidden for Windows filenames:

<>:"/\|?*

The date-time format we used above, %Y-%m-%d %H-%m-%d is specially crafted in order to avoid colons in ISO-8601-like date/time formats such as 2020-04-02 11:45:33 since colons would be illegal in Windows filenames (they would work in Linux filenames, though). %Y-%m-%d %H-%m-%d only contains spaces and dash (-) characters in order to avoid any issues with filename rules.

Posted by Uli Köhler in Python

What fraction of the year has passed until a given Timestamp in pandas?

To compute what fraction of the year has passed since the start of the year, use this function:

import pandas as pd

def fraction_of_year_passed(date):
    """Compute what fraction of the current year has already passed up to the given date"""
    start_of_year = pd.Timestamp(now.year, 1, 1)
    start_of_next_year = pd.Timestamp(now.year + 1, 1, 1)
    # Compute seconds in entire year and seconds since start of year
    entire_year_seconds = (start_of_next_year - start_of_year).total_seconds()
    seconds_since_start_of_year = (date - start_of_year).total_seconds()
    return seconds_since_start_of_year / entire_year_seconds

Usage example:

print(fraction_of_year_passed(pd.Timestamp("2020-03-01"))) # prints 0.16393442622950818

Detailed explanation:

First, we define that start of the calendar year date belongs to, and the start of the calendar year after that:

start_of_year = pd.Timestamp(now.year, 1, 1)
start_of_next_year = pd.Timestamp(now.year + 1, 1, 1)

Now we compute the number of seconds in the entire year and the number of seconds passed between the start of the year and date:

entire_year_seconds = (start_of_next_year - start_of_year).total_seconds()
seconds_since_start_of_year = (date - start_of_year).total_seconds()

The rest is simple: Just divide seconds_since_start_of_year / entire_year_seconds to obtain what fraction of the year has passed until date.

Posted by Uli Köhler in pandas, Python

How to compute number of days in a year in Pandas

In our previous post we showed how to used the pendulum library in order to compute the number of days in a given year using the pendulum library.

This post shows how to achieve the same using pandas:

import pandas as pd
def number_of_days_in_year(year):
    start = pd.Timestamp(year, 1, 1)
    end = pd.Timestamp(year + 1, 1, 1)
    return (end - start).days)

Usage example:

print(number_of_days_in_year(2020)) # Prints 366
print(number_of_days_in_year(2021)) # Prints 365

Explanation:

First, we define the start date to be the first day (1st of January) of the year we’re interested in:

start = pd.Timestamp(year, 1, 1)

Now we generate the end date, which is the 1st of January of the following year:

end = pd.Timestamp(year + 1, 1, 1)

The rest is simple: Just compute the difference (end – start) and ask pandas to give us the number of days:

(end - start).days

 

Posted by Uli Köhler in pandas, Python

How to generate range of dates in pandas

In this example, we’ll create a list of pandas Timestamp objects that represent 100 consecutive days, starting at a fixed date:

start_date = pd.Timestamp("2020-03-01")

Generating the 100 consecutive days is easy:

all_days = [start_date + pd.Timedelta(d, "days") for d in range(100)]

Note that range(100) will generate all numbers from 0 up to and including 99. Hence, [pd.Timedelta(d, "days") for d in range(100)] will generate a list of Timedeltas that represent 0 days, 1 days, 2 days, …, 99 days.

Full example:

import pandas as pd

start_date = pd.Timestamp("2020-03-01")
all_days = [start_date + pd.Timedelta(d, "days") for d in range(100)]

print(all_days)

 

Posted by Uli Köhler in pandas, Python