Problem:
You have a concurrent.futures executor, e.g.
import concurrent.futures executor = concurrent.futures.ThreadPoolExecutor(64)
Using this executor, you want to map
a function over an iterable in parallel (e.g. parallel download of HTTP pages).
In order to aid interactive execution, you want to use tqdm to provide a progress bar, showing the fraction of futures
Solution:
You can use this function:
from tqdm import tqdm import concurrent.futures def tqdm_parallel_map(executor, fn, *iterables, **kwargs): """ Equivalent to executor.map(fn, *iterables), but displays a tqdm-based progress bar. Does not support timeout or chunksize as executor.submit is used internally **kwargs is passed to tqdm. """ futures_list = [] for iterable in iterables: futures_list += [executor.submit(fn, i) for i in iterable] for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list), **kwargs): yield f.result()
Note that internally, executor.submit()
is used, not executor.map()
because there is no way of calling concurrent.futures.as_completed()
on the iterator returned by executor.map()
.
Usage example if you are not interested in the actual values:
import concurrent.futures executor = concurrent.futures.ThreadPoolExecutor(64) # Run my_func with arguments ranging from 1 to 10000, in parallel for _ in tqdm_parallel_map(executor, lambda i: my_func(i), range(1, 10000)): pass
In case you care about the return value of my_func()
, use this snippet instead:
import concurrent.futures executor = concurrent.futures.ThreadPoolExecutor(64) # Run my_func with arguments ranging from 1 to 10000, in parallel for result in tqdm_parallel_map(executor, lambda i: my_func(i), range(1, 10000)): # result is the return value of my_func(). # NOTE: result is not neccessarily in the same order as the input Iterable ! # Whichever parallel execution of my_func() finishes first will be printed first ! print(result)
Note: In constract to executor.map()
this function does NOT yield the arguments in the same order as the input.