04. DABEST Introduction

Modified

October 19, 2023

DABEST is a package that performs estimation statistics available on Python and R. With Jupyter Notebook you can try DABEST-Python.

import pandas as pd
import dabest
from palmerpenguins import load_penguins
penguins = load_penguins() 

# if you had trouble installing the package, you can also read in the data from the csv file by uncommenting the line below.
# penguins = pd.read_csv("penguins.csv")
penguins.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007
penguins_analyse = dabest.load(data=penguins, 
                           x="species", y="bill_length_mm",
                           idx=("Adelie", "Chinstrap", "Gentoo")
                          )
penguins_analyse.mean_diff
DABEST v2023.02.14
==================
                  
Good evening!
The current time is Thu Oct 19 12:57:27 2023.

The unpaired mean difference between Adelie and Chinstrap is 10.0 [95%CI 9.14, 10.9].
The p-value of the two-sided permutation t-test is 0.0, calculated for legacy purposes only. 

The unpaired mean difference between Adelie and Gentoo is 8.71 [95%CI 8.03, 9.41].
The p-value of the two-sided permutation t-test is 0.0, calculated for legacy purposes only. 

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`
penguins_analyse.mean_diff.plot(raw_marker_size = 4, fig_size=(6, 6), swarm_label="Bill length (mm)")
/Applications/anaconda3/envs/dabest/lib/python3.9/site-packages/dabest/plotter.py:473: FutureWarning: Passing `palette` without assigning `hue` is deprecated.
  rawdata_plot = sns.swarmplot(data=plot_data, x=xvar, y=yvar,
/Applications/anaconda3/envs/dabest/lib/python3.9/site-packages/dabest/plotter.py:563: UserWarning: FixedFormatter should only be used together with FixedLocator
  rawdata_axes.set_xticklabels(ticks_with_counts)

penguins.dropna().melt(id_vars="species", value_vars="metric").head()
species variable value
0 Adelie year 2007
1 Adelie year 2007
2 Adelie year 2007
3 Adelie year 2007
4 Adelie year 2007