04. DABEST Introduction

Modified

March 28, 2025

DABEST is a package that performs estimation statistics available on Python and R. With Jupyter Notebook you can try DABEST-Python.

import pandas as pd
import dabest
from palmerpenguins import load_penguins
Pre-compiling numba functions for DABEST...
Compiling numba functions: 100%|████████████████| 11/11 [00:00<00:00, 67.69it/s]
Numba compilation complete!
penguins = load_penguins() 

# If you had trouble installing the penguins package, you can also read in the data from the csv file by uncommenting the line below.
# penguins = pd.read_csv("penguins.csv")
penguins.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007
penguins_analyse = dabest.load(data=penguins, 
                           x="species", y="bill_length_mm",
                           idx=("Adelie", "Chinstrap", "Gentoo")
                          )
penguins_analyse.mean_diff
DABEST v2025.03.27
==================
                  
Good evening!
The current time is Thu Mar 27 23:02:20 2025.

The unpaired mean difference between Adelie and Chinstrap is 10.0 [95%CI 9.14, 11.0].
The p-value of the two-sided permutation t-test is 0.0, calculated for legacy purposes only. 

The unpaired mean difference between Adelie and Gentoo is 8.71 [95%CI 8.02, 9.42].
The p-value of the two-sided permutation t-test is 0.0, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`
penguins_analyse.mean_diff.plot(raw_marker_size = 1.5, fig_size=(7, 7), raw_label="Bill length (mm)")

# penguins.dropna().melt(id_vars="species", value_vars="metric").head()

Even easier estimation statistics

The DABEST library has also been developed into a web application at estimationstats.com