Python

How to compute number of days in a year in Pandas

In our previous post we showed how to used the pendulum library in order to compute the number of days in a given year using the pendulum library.

This post shows how to achieve the same using pandas:

import pandas as pd
def number_of_days_in_year(year):
    start = pd.Timestamp(year, 1, 1)
    end = pd.Timestamp(year + 1, 1, 1)
    return (end - start).days)

Usage example:

print(number_of_days_in_year(2020)) # Prints 366
print(number_of_days_in_year(2021)) # Prints 365

Explanation:

First, we define the start date to be the first day (1st of January) of the year we’re interested in:

start = pd.Timestamp(year, 1, 1)

Now we generate the end date, which is the 1st of January of the following year:

end = pd.Timestamp(year + 1, 1, 1)

The rest is simple: Just compute the difference (end – start) and ask pandas to give us the number of days:

(end - start).days

 

Posted by Uli Köhler in pandas, Python

How to generate range of dates in pandas

In this example, we’ll create a list of pandas Timestamp objects that represent 100 consecutive days, starting at a fixed date:

start_date = pd.Timestamp("2020-03-01")

Generating the 100 consecutive days is easy:

all_days = [start_date + pd.Timedelta(d, "days") for d in range(100)]

Note that range(100) will generate all numbers from 0 up to and including 99. Hence, [pd.Timedelta(d, "days") for d in range(100)] will generate a list of Timedeltas that represent 0 days, 1 days, 2 days, …, 99 days.

Full example:

import pandas as pd

start_date = pd.Timestamp("2020-03-01")
all_days = [start_date + pd.Timedelta(d, "days") for d in range(100)]

print(all_days)

 

Posted by Uli Köhler in pandas, Python

How to compute number of days in a year in Python using Pendulum

Also see How to compute number of days in a year in Pandas

We can use the excellent pendulum library to find the number of days in a given year

import pendulum

def number_of_days_in_year(year):
    start = pendulum.date(year, 1, 1)
    end = start.add(years=1)
    return (end - start).in_days()

Usage example:

print(number_of_days_in_year(2020)) # Prints 366
print(number_of_days_in_year(2021)) # Prints 365

Explanation:

First, we define the start date to be the first day (1st of January) of the year we’re interested in:

start = pendulum.date(year, 1, 1)

Now we use pendulum‘s add function to add exactly one year to that date. This will always result in the 1st of January of the year after the given year:

end = start.add(years=1)

The rest is simple: Just ask pendulum to give us the number of days in the difference between end and start:

(end - start).in_days()

 

Posted by Uli Köhler in Python

How to skip first element of a Generator/Iterator in Python

Use the skip_first() utility function from UliEngineering:

First, install UliEngineering using

pip install --user UliEngineering

Note that UliEngineering requires Python 3.3+.

Now you can use skip_first() like this:

from UliEngineering.Utils.Iterable import skip_first

for v in skip_first(v for v in [1,2,3,4,5]):
    print(v) # Prints 2,3,4,5

skip_first() will work for any Iterable or Iterator.

Don’t want to install UliEngineering?

Copy the skip_first() utility function into your own code:

import collections

def skip_first(it):
    """
    Skip the first element of an Iterator or Iterable,
    like a Generator or a list.
    This will always return a generator or raise TypeError()
    in case the argument's type is not compatible
    """
    if isinstance(it, collections.Iterator):
        try:
            next(it)
            yield from it
        except StopIteration:
            return
    elif isinstance(it, collections.Iterable):
        yield from skip_first(it.__iter__())
    else:
        raise TypeError(f"You must pass an Iterator or an Iterable to skip_first(), but you passed {it}")

 

Posted by Uli Köhler in Python

How to fix NumPy timedelta64 TypeError: Invalid datetime unit “min” in metadata

Problem:

You want to construct a NumPy timedelta64 from a value in minutes using

np.timedelta64(1, 'min')

but you see an error message like

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    delta = np.timedelta64(1, 'min')
TypeError: Invalid datetime unit "min" in metadata

Solution:

numpy uses m as specifier for minutes, not min! Change your code to

np.timedelta64(1, 'm')

 

Posted by Uli Köhler in Python

Split pandas DataFrame every time a Series is True

In our previous post we explored how to Split pandas DataFrame every time a column is True.

This slightly modified function also works if the given Series is not a column in the DataFrame:

def split_dataframe_by_series(df, series):
    """
    Split a DataFrame where the given series is True. Yields a number of dataframes
    """
    previous_index = df.index[0]

    for split_point in df[series].index:
        yield df[previous_index:split_point]
        previous_index = split_point
    # Yield remainder of dataset
    try:
        yield df[split_point:]
    except UnboundLocalError:
        pass # There is no split point => Ignore

Full example

We’ll use the ZeroCrossing column we built in our previous post on How to detect value change in pandas string column/series which itself builds on our post on How to create pandas time series DataFrame example dataset. Based on that example we add the modified utility function shown above:

import pandas as pd

# Load pre-built time series example dataset
df = pd.read_csv("https://techoverflow.net/datasets/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Create a new column containing "Positive" or "Negative"
df["SinePositive"] = (df["Sine"] >= 0).map({True: "Positive", False: "Negative"})
# Create "change" column (boolean)
df["ZeroCrossing"] = df["SinePositive"].shift() != df["SinePositive"]
# Set first entry to False
df["ZeroCrossing"].iloc[0] = False

def split_dataframe_by_series(df, series):
    """
    Split a DataFrame where the given series is True. Yields a number of dataframes
    """
    previous_index = df.index[0]

    for split_point in df[series].index:
        yield df[previous_index:split_point]
        previous_index = split_point
    # Yield remainder of dataset
    try:
        yield df[split_point:]
    except UnboundLocalError:
        pass # There is no split point => Ignore

# Print result
split_frames = list(split_dataframe_by_series(df, df["ZeroCrossing"]))
print(f"Split DataFrame into {len(split_frames)} separate frames by zero-crossing")

Note that converting the result of split_dataframe_to_series() into a list might not be neccessary depending on your application. If possible, I recommend directly iterating the data frames using a for loop, e.g.:

for df_section in split_dataframe_by_series(df, df["ZeroCrossing"]):
    pass # TODO: Your code goes here!

 

Posted by Uli Köhler in pandas, Python

Split pandas DataFrame every time a column is True

TL;DR

If the Series you want to use to split is a column in the DataFrame, continue reading this post. Else, read Split pandas DataFrame every time a Series is True.

Use this utility function:

def split_dataframe_by_column(df, column):
    """
    Split a DataFrame where a column is True. Yields a number of dataframes
    """
    previous_index = df.index[0]

    for split_point in df[df[column]].index:
        yield df[previous_index:split_point]
        previous_index = split_point
    # Yield remainder of dataset
    try:
        yield df[split_point:]
    except UnboundLocalError:
        pass # There is no split point => Ignore

# Usage example:
list(split_dataframe_by_column(df, "ZeroCrossing"))

Note that one or more of those dataframes might be empty.

Full example:

We’ll use the ZeroCrossing column we built in our previous post on How to detect value change in pandas string column/series which itself builds on our post on How to create pandas time series DataFrame example dataset. Based on that example we add the utility function shown above:

import pandas as pd

# Load pre-built time series example dataset
df = pd.read_csv("https://techoverflow.net/datasets/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Create a new column containing "Positive" or "Negative"
df["SinePositive"] = (df["Sine"] >= 0).map({True: "Positive", False: "Negative"})
# Create "change" column (boolean)
df["ZeroCrossing"] = df["SinePositive"].shift() != df["SinePositive"]
# Set first entry to False
df["ZeroCrossing"].iloc[0] = False

def split_dataframe_by_column(df, column):
    """Split a DataFrame where a column is True. Yields a number of dataframes"""
    previous_index = df.index[0]

    for split_point in df[df[column]].index:
        yield df[previous_index:split_point]
        previous_index = split_point
    # Yield remainder of dataset
    try:
        yield df[split_point:]
    except UnboundLocalError:
        pass # There is no split point => Ignore

# Print result
split_frames = list(split_dataframe_by_column(df, "ZeroCrossing"))
print(f"Split DataFrame into {len(split_frames)} separate frames by zero-crossing")
# This prints "Split DataFrame into 20 separate frames by zero-crossing"

 

Posted by Uli Köhler in pandas, Python

Get index where column is True in pandas

TL;DR

Simply use

df[df["ZeroCrossing"]].index

Full example:

We’ll use the ZeroCrossing column we built in our previous post on How to detect value change in pandas string column/series which itself builds on our post on How to create pandas time series DataFrame example dataset. Based on that example, we only modify the last line:

import pandas as pd

# Load pre-built time series example dataset
df = pd.read_csv("https://techoverflow.net/datasets/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Create a new column containing "Positive" or "Negative"
df["SinePositive"] = (df["Sine"] >= 0).map({True: "Positive", False: "Negative"})
# Create "change" column (boolean)
df["ZeroCrossing"] = df["SinePositive"].shift() != df["SinePositive"]
# Set first entry to False
df["ZeroCrossing"].iloc[0] = False

# Print result
print(df[df["ZeroCrossing"]].index)

This prints

DatetimeIndex(['2020-05-25 20:05:10.040874', '2020-05-25 20:05:10.090874',
               '2020-05-25 20:05:10.140874', '2020-05-25 20:05:10.190874',
               '2020-05-25 20:05:10.240874', '2020-05-25 20:05:10.290874',
               '2020-05-25 20:05:10.340874', '2020-05-25 20:05:10.390874',
               '2020-05-25 20:05:10.440774', '2020-05-25 20:05:10.490874',
               '2020-05-25 20:05:10.540874', '2020-05-25 20:05:10.590874',
               '2020-05-25 20:05:10.640774', '2020-05-25 20:05:10.690874',
               '2020-05-25 20:05:10.740874', '2020-05-25 20:05:10.790874',
               '2020-05-25 20:05:10.840874', '2020-05-25 20:05:10.890774',
               '2020-05-25 20:05:10.940874'],
              dtype='datetime64[ns]', name='Timestamp', freq=None)
Posted by Uli Köhler in pandas, Python

How to convert pandas Timedelta to seconds

TL;DR

Use

my_timedelta / np.timedelta64(1, 's')

Full example

import pandas as pd
import numpy as np
import time

# Create timedelta
t1 = pd.Timestamp("now")
time.sleep(3)
t2 = pd.Timestamp("now")
my_timedelta = t2 - t1

# Convert timedelta to seconds
my_timedelta_in_seconds = my_timedelta / np.timedelta64(1, 's')
print(my_timedelta_in_seconds) # prints 3.00154

 

Posted by Uli Köhler in pandas, Python

How to detect value change in pandas string column/series

TL;DR

In order to get a series that is True every time the input string column changes, use

my_column_changes = df["MyStringColumn"].shift() != df["MyStringColumn"]

The first value of this Series will always be True since the value is considered to be NaN before the start of the series (due to the behaviour of shift()). In order to force the first value to be False, use

my_column_changes.iloc[0] = False

In order to get the rows in the dataframe where the column changes, use

df[my_column_changes]

or use this one-liner:

df[df["MyStringColumn"].shift() != df["MyStringColumn"]]

In order to assign this value to a new column in the DataFrame, use e.g.

df["MyStringColumnChanges"] = df["MyStringColumn"].shift() != df["MyStringColumn"]

Full example:

First we load our example from our previous post on How to create pandas time series DataFrame example dataset:

import pandas as pd

# Load pre-built time series example dataset
df = pd.read_csv("https://techoverflow.net/datasets/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

Now we create a new column that contains Positive if the sine wave value in the "Sine" column is positive or "Negative" if that value is negative:

df["SinePositive"] = (df["Sine"] >= 0).map({True: "Positive", False: "Negative"})

Now we create the ZeroCrossing column using the method shown above:

# Create "change" column (boolean)
df["ZeroCrossing"] = df["SinePositive"].shift() != df["SinePositive"]

… and set the first entry to False since we don’t consider the start of the series to be a zero crossing:

df["ZeroCrossing"].iloc[0] = False

Now we can use

df[df["ZeroCrossing"]]

to show the rows in the DataFrame where the zero crossing happened:

                                    Sine   Cosine SinePositive  ZeroCrossing
Timestamp                                                                   
2020-05-25 20:05:10.040874 -6.283144e-03 -0.99998     Negative          True
2020-05-25 20:05:10.090874  6.283144e-03  0.99998     Positive          True
2020-05-25 20:05:10.140874 -6.283144e-03 -0.99998     Negative          True
2020-05-25 20:05:10.190874  6.283144e-03  0.99998     Positive          True
2020-05-25 20:05:10.240874 -6.283144e-03 -0.99998     Negative          True
2020-05-25 20:05:10.290874  6.283144e-03  0.99998     Positive          True
2020-05-25 20:05:10.340874 -6.283144e-03 -0.99998     Negative          True
2020-05-25 20:05:10.390874  6.283144e-03  0.99998     Positive          True
2020-05-25 20:05:10.440774 -2.450532e-15 -1.00000     Negative          True
2020-05-25 20:05:10.490874  6.283144e-03  0.99998     Positive          True
2020-05-25 20:05:10.540874 -6.283144e-03 -0.99998     Negative          True
2020-05-25 20:05:10.590874  6.283144e-03  0.99998     Positive          True
2020-05-25 20:05:10.640774 -1.960673e-15 -1.00000     Negative          True
2020-05-25 20:05:10.690874  6.283144e-03  0.99998     Positive          True
2020-05-25 20:05:10.740874 -6.283144e-03 -0.99998     Negative          True
2020-05-25 20:05:10.790874  6.283144e-03  0.99998     Positive          True
2020-05-25 20:05:10.840874 -6.283144e-03 -0.99998     Negative          True
2020-05-25 20:05:10.890774  4.901063e-15  1.00000     Positive          True
2020-05-25 20:05:10.940874 -6.283144e-03 -0.99998     Negative          True

 

Full example code:

import pandas as pd

# Load pre-built time series example dataset
df = pd.read_csv("https://techoverflow.net/datasets/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Create a new column containing "Positive" or "Negative"
df["SinePositive"] = (df["Sine"] >= 0).map({True: "Positive", False: "Negative"})
# Create "change" column (boolean)
df["ZeroCrossing"] = df["SinePositive"].shift() != df["SinePositive"]
# Set first entry to False
df["ZeroCrossing"].iloc[0] = False

# Print result
print(df[df["ZeroCrossing"]])

 

Posted by Uli Köhler in pandas, Python

Matplotlib custom SI-prefix unit tick formatter

You can use UliEngineering’s format_value() to easily make a custom matplotlib tick formatter that formats values with SI-prefixes like k, M, G, T, …

Formatting the Y axis ticks

The following example formats the Y axis in the Unit J (Joule). For example, 100000 would be formatted as 100 kJ.

The formatter function we use is

def format_joules(value, pos=None):
    return format_value(value, 'J')

In order to set the formatter, function, use

# Set our formatter as Y axis formatter
plt.gca().yaxis.set_major_formatter(mtick.FuncFormatter(format_joules))

Example:

import matplotlib.ticker as mtick
from UliEngineering.EngineerIO import format_value
from matplotlib import pyplot as plt

def format_joules(value, pos=None):
    return format_value(value, 'J')

# Set our formatter as Y axis formatter
plt.gca().yaxis.set_major_formatter(mtick.FuncFormatter(format_joules))

Formatting the X axis ticks

In order to format the X axis ticks instead, use the same formatter function but activate it using

# Set our formatter as Y axis formatter
plt.gca().xaxis.set_major_formatter(mtick.FuncFormatter(format_joules))

How to set the number of decimal places

UliEngineering’s format_value() allows you set the decimal places using e.g. significant_digits=4

def format_joules(value, pos=None):
    return format_value(value, 'J', significant_digits=4)

Full example

This example generates the Y axis plot shown above

import matplotlib.ticker as mtick
from UliEngineering.EngineerIO import format_value
from matplotlib import pyplot as plt
plt.style.use("ggplot")
import numpy as np

def format_joules(value, pos=None):
    return format_value(value, 'J')

# Set our formatter as Y axis formatter
plt.gca().yaxis.set_major_formatter(mtick.FuncFormatter(format_joules))

# Generate test data
test_data = np.arange(1, 1.2e6)
plt.plot(test_data)
plt.gcf().set_size_inches(10,5)
plt.savefig("/ram/mpl-si-formatter.svg")
Posted by Uli Köhler in Python

How to create pandas time series DataFrame example dataset

TL;DR: Use our pre-built example dataset like this:

# Load pre-built time series example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

How to build your own time series example dataset

In our previous post Easily generate sine/cosine waveform data in Python using UliEngineering we showed how to generate sine and cosine waves using UliEngineering.

In this post, we show how to create a pandas DataFrame containing sine and cosine data to be used as a sample time series dataset.

First, we generate the sine and cosine wave data:

import pandas as pd
import numpy as np
from UliEngineering.SignalProcessing.Simulation import sine_wave, cosine_wave

# Configure the properties of the sine wave here
frequency = 10.0 # 10 Hz sine / cosine wave
samplerate = 10000 # 10 kHz
nseconds = 1 # Generate 1 second of data

sine = sine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
cosine = cosine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
nsamples = len(sine) # How many values we have in the data arrays

After that, we define the timestamp where the dataset starts:

start_timestamp = pd.Timestamp('now')

Now we can create a list of Timestamp objects representing the points in time where the signal has been sampled:

# Create timestamps by offsetting
timedelta = pd.Timedelta(1/samplerate, 'seconds')
timestamps =  [start_timestamp + i * timedelta for i in range(nsamples)]

Now we’re reading to create the DataFrame object:

df = pd.DataFrame(index=timestamps, data={
    "Sine": sine,
    "Cosine": cosine
})
df.index.name = 'Timestamp'

Now we can use df.plot() to plot the dataset:

# Use nice plotting style
from matplotlib import pyplot as plt
plt.style.use("ggplot")
# Plot dataset
df.plot()
# Make figure larger
plt.gcf().set_size_inches(10, 5)

Additionally we can export the dataset as CSV using

df.to_csv("/ram/timeseries-example.csv")

This example file is also available online at https://techoverflow.net/datasets/timeseries-example.csv

Full example:

#!/usr/bin/env python3
import pandas as pd
import numpy as np
from UliEngineering.SignalProcessing.Simulation import sine_wave, cosine_wave
# Configure the properties of the sine wave here
frequency = 10.0 # 10 Hz sine / cosine wave
samplerate = 10000 # 10 kHz
nseconds = 1 # Generate 1 second of data
sine = sine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
cosine = cosine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
nsamples = len(sine) # How many values we have in the data arrays

start_timestamp = pd.Timestamp('now')

# Create timestamps by offsetting
timedelta = pd.Timedelta(1/samplerate, 'seconds')
timestamps =  [start_timestamp + i * timedelta for i in range(nsamples)]

df = pd.DataFrame(index=timestamps, data={
    "Sine": sine,
    "Cosine": cosine
})
df.index.name = 'Timestamp'

df.to_csv("timeseries-example.csv")

 

Posted by Uli Köhler in pandas, Python

How to get last 10 minutes of a pandas DataFrame

In our previous post we showed how to subtract 5 minutes from a pandas DataFrame:

pd.Timestamp('now') - pd.Timedelta(10, 'minutes')

We can also use this knowledge in order to get the last 10 minutes of a pandas DataFrame. In our example, we assume that df[“Timestamp”] contains the timestamp. First, we get the last timestamp in the dataset using

# Use this if the timestamp is the index of the DataFrame
last_ts = df.index.iloc[-1]

or

# ... or use this if the timestamp is in a colum
last_ts = df["Timestamp"].iloc[-1]

Next, we define the first timestamp that shall be considered by subtracting 10 minutes from last_ts:

first_ts = last_ts - pd.Timedelta(10, 'minutes')

Now we can filter the DataFrame using

# Use this if the Timestamp is in a column
filtered_df = df[df["Timestamp"] >= first_ts]

or

# Use this if the Timestamp is the index of the DataFrame
filtered_df = df[df.index >= first_ts]

By filtering, we don’t need the DataFrame to be sorted and the original order will be maintained.

Full example:

This example loads our pre-built time series example dataset from our previous post How to create pandas time series DataFrame example dataset. The code loads that dataset (which is 1 second long) and takes the last 0.5 seconds from it.

import pandas as pd

# Load example dataset
df = pd.read_csv("https://techoverflow.net/datasets/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Use this if the timestamp is the index of the DataFrame
last_ts = df.index[-1]

first_ts = last_ts - pd.Timedelta(0.5, 'seconds')

filtered_df = df[df.index >= first_ts]

# Plot the result
filtered_df.plot()

 

Posted by Uli Köhler in pandas, Python

How to subtract 5 minutes from pandas Timestamp

In our previous post we showed how to create a pandas Timestamp representing the current point in time:

pd.Timestamp('now')

You can subtract 5 minutes from that timestamp by using Timedelta(5, 'minutes'):

pd.Timestamp('now') - pd.Timedelta(5, 'minutes')
Posted by Uli Köhler in pandas, Python

How to create pandas ‘now’ Timestamp

In order to create a pandas Timestamp representing the current point in time, use

pd.Timestamp('now')

This will create a Timestamp in the current timezone.

Full example:

import pandas as pd

now = pd.Timestamp('now')

print(now) # Prints e.g. Timestamp('2020-05-25 19:02:31.051836')

 

Posted by Uli Köhler in pandas, Python

How to get last row of Pandas DataFrame

Use .iloc[-1] to get the last row (all columns) of a pandas DataFrame, for example:

my_dataframe.iloc[-1]

 

Posted by Uli Köhler in pandas, Python

Python threading minimal example

This minimal example shows how to create a thread that prints Hello world every second:

import threading
import time

def my_thread_func():
    while True:
        print("Hello world")
        time.sleep(1)

my_thread = threading.Thread(target=my_thread_func)
my_thread.start()

 

Posted by Uli Köhler in Python

How to fix Python unittest __init__() takes 1 positional argument but 2 were given

Problem:

You are trying to run your Python unit tests using the unittest package, but you see this unspecific stack trace:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.7/unittest/__main__.py", line 18, in <module>
    main(module=None)
  File "/usr/lib/python3.7/unittest/main.py", line 100, in __init__
    self.parseArgs(argv)
  File "/usr/lib/python3.7/unittest/main.py", line 124, in parseArgs
    self._do_discovery(argv[2:])
  File "/usr/lib/python3.7/unittest/main.py", line 244, in _do_discovery
    self.createTests(from_discovery=True, Loader=Loader)
  File "/usr/lib/python3.7/unittest/main.py", line 154, in createTests
    self.test = loader.discover(self.start, self.pattern, self.top)
  File "/usr/lib/python3.7/unittest/loader.py", line 349, in discover
    tests = list(self._find_tests(start_dir, pattern))
  File "/usr/lib/python3.7/unittest/loader.py", line 414, in _find_tests
    yield from self._find_tests(full_path, pattern, namespace)
  File "/usr/lib/python3.7/unittest/loader.py", line 406, in _find_tests
    full_path, pattern, namespace)
  File "/usr/lib/python3.7/unittest/loader.py", line 460, in _find_test_path
    return self.loadTestsFromModule(module, pattern=pattern), False
  File "/usr/lib/python3.7/unittest/loader.py", line 124, in loadTestsFromModule
    tests.append(self.loadTestsFromTestCase(obj))
  File "/usr/lib/python3.7/unittest/loader.py", line 93, in loadTestsFromTestCase
    loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
  File "/usr/lib/python3.7/unittest/suite.py", line 24, in __init__
    self.addTests(tests)
  File "/usr/lib/python3.7/unittest/suite.py", line 57, in addTests
    for test in tests:
TypeError: __init__() takes 1 positional argument but 2 were given

Solution:

You have at least one test case like this one:

class MyTest(unittest.TestCase):
    def __init__(self):
        self.x = 1.0

    def test_stuff(self):
        assert(self.x == 1.0)

Overriding __init__(...) is not possible in this way when using unittest. You need to use setUp() instead.

Usually, just replacing def __init__(self): by def setUp(self): will do the trick. unittests will call setUp() automatically.

Our example will look like this:

class MyTest(unittest.TestCase):
    def setUp(self):
        self.x = 1.0

    def test_stuff(self):
        assert(self.x == 1.0)

If the error still persists, check if you have more testcases overriding the __init__() method.

Posted by Uli Köhler in Python

Two easy ways to download a file using Python requests

requests does not provide a

Option 1: Use requests_download:

First, install requests_download using

sudo pip3 install requests_download

or equivalent.

Now you can use it like this:

from requests_download import download

download(url, filename)

It also has built-in progress bar support:

from requests_download import download, HashTracker, ProgressTracker
from progressbar import DataTransferBar # sudo pip3 install progressbar2

progress = ProgressTracker(DataTransferBar())

download(pdfUrl, filename, trackers=(progress,))

Option 2: Do it yourself:

Use this snippet in your code:

import requests
import shutil

def requests_download_file(url, filename):
    with requests.get(url, stream=True) as response:
        with open(filename, 'wb') as fout:
            shutil.copyfileobj(response.raw, fout)
Posted by Uli Köhler in Python