Statistics

How to estimate the number of surgeries per year for countries without relevant data

For many countries, including the full range from underdeveloped to highly developed countries, no data about the number of surgeries per year is available.

However, the 2008 Lancet publication Weiser et al.

An estimation of the global volume of surgery: a modelling strategy based on available data, DOI 10.1016/s0140-6736(08)60878-8

provides a simple method of roughly estimating this number.

According to table 2, countries can be assigned one of several expenditure stata regarding their health care spending, each with an assigned Mean estimated surgical rate per 100 000 population :

  • Poor-expenditure countries: 295
  • Low-expenditure countries: 2255
  • Middle-expenditure countries: 4248
  • High-expenditure countries: 11110

How to classify countries:

According to the above-cited paper, the countries shall be classified by the per-capita total yearly expediture on health:

  • Poor-expenditure countries: Less than 100$
  • Low-expenditure countries: Between 101$ and 401$
  • Middle-expenditure countries: Between 401$ and 1000$
  • High-expenditure countries: More than 1000$

There are many source from where you can acquire this data. One of the easiest-to-use sources is the WHO country profile page. For example, you can see that Angola had a per-capity health spending of 51$ in 2020. This clearly puts Angola into the Poor-expenditure countries considering our criteria above.

Calculation example:

Consider a middle-expediture country with a population of 22.8 million inhabitants.

According to the formula above, a rough estimate for the number of surgical proceducers in said country can be calculated by:

\text{Number of surgeries} \approx \frac{\text{Population number}}{100000} \cdot \text{Middle expenditure mean surgical rate}

Regarding our example:

\text{Number of surgeries} \approx \frac{22.8*10^6}{100000} \cdot 4248 = 968544

Hence, we can roughly estimate that 968544 surgical procedures are performed in said country per year.

Posted by Uli Köhler in Statistics

Number of inpatient surgeries performed in Canada in 2021-2022 (Statistics)

According to CIHI, 1349787 inpatient surgeries have been performed in Canada in 2021-2022.

Source:

This number has been extracted from the CIHI Hospital Stays in Canada page, Inpatient Hospitalization, Surgery and Newborn Statistics material. In worksheet 4. Top 10 inp surgeries you can take any of the numbers regarding the total of Canada, e.g. Cell A6 (= Number of inpatient fracture surgeries 2021-2022), and divide it by the respective value in Column E, the percentage of that surgery of all inpatient surgeries.

The time range of the data is the canadian fiscal year, i.e. April 1, 2021, to March 31, 2022 (Source: The metadata PDF for the abovementioned source table). See the metadata PDF for discussion about coverage and other caveats which apply to the data.

Posted by Uli Köhler in Medical devices, Statistics

Simulating survival data for Kaplan-Meier plots in Python

Libraries like lifelines provide a plethora of example datasets that one can work with. However, for many tasks you need to simulate specific behaviour in survival curves.

In this post, we demonstrate a simple algorithm to generate survival data in a format comparable to the one used in the lifelines example datasets like load_leukemia().

The generation algorithm is based on the following assumptions:

  • There is a strict survival plateau with a given survival probability starting at a given point in time
  • The progression from 100% survival, t=0 to the survival plateau is approximately linear (i.e. if you would generate an infinite number of datapoints, the survival curve would be linear)
  • No censoring events shall be generated except for censoring all surviving participants at the end point of the timeline.

Code:

import numpy as np
import random
from lifelines import KaplanMeierFitter

def simulate_survival_data_linear(N, survival_plateau, t_plateau, t_end):
    """
    Generate random simulated survival data using a linear model
    
    Keyword parameters
    ------------------
    N : integer
        Number of entries to generate
    survival_plateau : float
        The survival probability of the survival plateau
    t_plateau : float
        The time point where the survival plateau starts
    t_end : float
        The time point where all surviving participants will be censored.
    
    Returns
    -------
    A dict with "Time" and "Event" numpy arrays: 0 is censored, 1 is event
    """
    data = {"Time": np.zeros(N), "Event": np.zeros(N)}

    for i in range(N):
        r = random.random()
        if r <= survival_plateau:
            # Event is censoring at the end of the time period
            data["Time"][i] = t_end
            data["Event"][i] = 0
        else: # Event occurs
            # Normalize where we are between 100% and the survival plateau
            p = (r - survival_plateau) / (1 - survival_plateau)
            # Linear model: Time of event linearly depends on uniformly & randomly chosen position
            #  in range (0...tplateau)
            t = p * t_plateau
            data["Time"][i] = t
            data["Event"][i] = 1
    return data

# Example usage
data1 = simulate_survival_data_linear(250, 0.2, 18, 24)
data2 = simulate_survival_data_linear(250, 0.4, 17.2, 24)

Given data1 and data2 (see the usage example at the end of the code) you can plot them using

# Plot bad subgroup
kmf1 = KaplanMeierFitter()
kmf1.fit(data1["Time"], event_observed=data1["Event"], label="Bad subgroup")
ax = kmf1.plot()

# Plot good subgroup
kmf2 = KaplanMeierFitter()
kmf2.fit(data2["Time"], event_observed=data2["Event"], label="Good subgroup")
ax = kmf2.plot(ax=ax)

# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Thi

Do not want a survival plateau?

Just set t_end = t_survival:

# Example usage
data1 = simulate_survival_data_linear(250, 0.2, 24, 24)
data2 = simulate_survival_data_linear(250, 0.4, 24, 24)
# Code to plot: See above

What happens if you have a low number of participants?

Let’s use 25 instead of 250 as above:

# Example usage
data1 = simulate_survival_data_linear(25, 0.2, 24, 24)
data2 = simulate_survival_data_linear(25, 0.4, 24, 24)
# Plot code: See above

Although we generated the data with the same data, the difference is much less clear in this example, especially towards the end of the timeline (note however that the data is generated randomly, so you might see a different result). You can see a large portion of the confidence intervals overlappings near t=24. In other words, based on this data it is not clear that the two groups of patients are significantly different (in other words, P \geq 0.05)

Posted by Uli Köhler in Python, Statistics

How to use custom legend label in lifelines Kaplan-Meier plot?

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want a custom label instead of KM_estimates to appear in the legend?

Use kmf.fit(..., label='Your label'). Since we use the leukemias dataset for this example, we use the label 'Leukemia'

Full example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
# Load datasets
df_leukemia = load_leukemia()

# Fit & plot leukemia dataset
kmf_leukemia = KaplanMeierFitter()
kmf_leukemia.fit(df_leukemia['t'], df_leukemia['Rx'], label="Leukemia")
ax = kmf_leukemia.plot()
# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How plot multiple Kaplan-Meier curves using lifelines

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want to plot multiple survival curves?

The call to kmf.plot() returns a Matplotlib ax object which you can use on a second kmf2.plot() call als argument: kmf2.plot(ax=ax).

Full example (note that we also set a fixed Y range of [0.0, 1.0], see How to make Y axis start from 0 in lifelines Kaplan-Meier plots):

from lifelines.datasets import load_leukemia, load_lymphoma
from lifelines import KaplanMeierFitter
# Load datasets
df_leukemia = load_leukemia()
df_lymphoma = load_lymphoma()

# Fit & plot leukemia dataset
kmf_leukemia = KaplanMeierFitter()
kmf_leukemia.fit(df_leukemia['t'], df_leukemia['Rx'], label="Leukemia")
ax = kmf_leukemia.plot()

# Fit & plot lymphoma dataset
kmf_lymphoma = KaplanMeierFitter()
kmf_lymphoma.fit(df_lymphoma['Time'], df_lymphoma['Censor'], label="Lymphoma")
ax = kmf_lymphoma.plot(ax=ax)

# Set Y axis to fixed scale
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How to make Y axis start from 0 in lifelines Kaplan-Meier plots

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you want to make the Y axis start from 0.0 and not from approx. 0.2 as in this example?

Remember that lifelines just calls matplotlib internally and km.plot() returns the ax object that you can use to manipulate the plot. In this specific case, you can use

ax.set_ylim([0.0, 1.0])

to stop autoranging the Y axis and set its range to fixed [0.0, 1.0].

Full example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
ax = kmf.plot()
# Set Y axis range to [0.0, 1.0]
ax.set_ylim([0.0, 1.0])

Posted by Uli Köhler in Python, Statistics

How to hide confidence interval in lifelines Kaplan-Meier plots

Using the lifelines library, you can easily plot Kaplan-Meier plots, e.g. as seen in our previous post Minimal Python Kaplan-Meier Plot example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot()

What if you just want to show the dark blue curve and hide the light-blue confidence interval?

Easy: Just use km.plot(ci_show=False). ci in ci_show means confidence interval. Example:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
df = load_leukemia()

kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
kmf.plot(ci_show=False)

Posted by Uli Köhler in Python, Statistics

Minimal Python Kaplan-Meier Plot example

Install the lifelines library:

sudo pip3 install lifelines

Now you can use this snippet to create a basic Kaplan-Meier plot from an example dataset included with the library:

from lifelines.datasets import load_leukemia
from lifelines import KaplanMeierFitter
# Load example dataset
df = load_leukemia()

# Create model from data
kmf = KaplanMeierFitter()
kmf.fit(df['t'], df['Rx']) # t = Timepoints, Rx: 0=censored, 1=event
# Plot model's survival curve
kmf.plot()

Note that different datasets might have different names for the time column (t in this example) and the event/censoring column (Rx in this example)

We exported the plot using

import matplotlib.pyplot as plt 
plt.savefig("Kaplan-Meier-Example-Leukemias.svg")

 

Posted by Uli Köhler in Python, Statistics