# Tutorial¶

 1 2 3 4 5  import numpy as np import pandas as pd import dabest print("We're using DABEST v{}".format(dabest.__version__)) 
We're using DABEST v0.3.1


## Create dataset for demo¶

Here, we create a dataset to illustrate how dabest functions. In this dataset, each column corresponds to a group of observations.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34  from scipy.stats import norm # Used in generation of populations. np.random.seed(9999) # Fix the seed so the results are replicable. # pop_size = 10000 # Size of each population. Ns = 20 # The number of samples taken from each population # Create samples c1 = norm.rvs(loc=3, scale=0.4, size=Ns) c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns) c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns) t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns) t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns) t3 = norm.rvs(loc=3, scale=0.75, size=Ns) t4 = norm.rvs(loc=3.5, scale=0.75, size=Ns) t5 = norm.rvs(loc=3.25, scale=0.4, size=Ns) t6 = norm.rvs(loc=3.25, scale=0.4, size=Ns) # Add a gender column for coloring the data. females = np.repeat('Female', Ns/2).tolist() males = np.repeat('Male', Ns/2).tolist() gender = females + males # Add an id column for paired data plotting. id_col = pd.Series(range(1, Ns+1)) # Combine samples and gender into a DataFrame. df = pd.DataFrame({'Control 1' : c1, 'Test 1' : t1, 'Control 2' : c2, 'Test 2' : t2, 'Control 3' : c3, 'Test 3' : t3, 'Test 4' : t4, 'Test 5' : t5, 'Test 6' : t6, 'Gender' : gender, 'ID' : id_col }) 

Note that we have 9 groups (3 Control samples and 6 Test samples). Our dataset also has a non-numerical column indicating gender, and another column indicating the identity of each observation.

This is known as a ‘wide’ dataset. See this writeup for more details.

 1  df.head() 
Control 1 Test 1 Control 2 Test 2 Control 3 Test 3 Test 4 Test 5 Test 6 Gender ID
0 2.793984 3.420875 3.324661 1.707467 3.816940 1.796581 4.440050 2.937284 3.486127 Female 1
1 3.236759 3.467972 3.685186 1.121846 3.750358 3.944566 3.723494 2.837062 2.338094 Female 2
2 3.019149 4.377179 5.616891 3.301381 2.945397 2.832188 3.214014 3.111950 3.270897 Female 3
3 2.804638 4.564780 2.773152 2.534018 3.575179 3.048267 4.968278 3.743378 3.151188 Female 4
4 2.858019 3.220058 2.550361 2.796365 3.692138 3.276575 2.662104 2.977341 2.328601 Female 5

Before we create estimation plots and obtain confidence intervals for our effect sizes, we need to load the data and the relevant groups.

We simply supply the DataFrame to dabest.load(). We also must supply the two groups you want to compare in the idx argument as a tuple or list.

 1  two_groups_unpaired = dabest.load(df, idx=("Control 1", "Test 1"), resamples=5000) 

Calling this Dabest object gives you a gentle greeting, as well as the comparisons that can be computed.

 1  two_groups_unpaired 
DABEST v0.3.1
=============

Good afternoon!
The current time is Mon Oct 19 17:12:44 2020.

Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1

5000 resamples will be used to generate the effect size bootstraps.


### Changing statistical parameters¶

If the dataset contains paired data (ie. repeated observations), specify this with the paired keyword. You must also pass a column in the dataset that indicates the identity of each observation, using the id_col keyword.

 1 2  two_groups_paired = dabest.load(df, idx=("Control 1", "Test 1"), paired=True, id_col="ID") 
 1  two_groups_paired 
DABEST v0.3.1
=============

Good afternoon!
The current time is Mon Oct 19 17:12:44 2020.

Paired effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1

5000 resamples will be used to generate the effect size bootstraps.


You can also change the width of the confidence interval that will be produced.

 1  two_groups_unpaired_ci90 = dabest.load(df, idx=("Control 1", "Test 1"), ci=90) 
 1  two_groups_unpaired_ci90 
DABEST v0.3.1
=============

Good afternoon!
The current time is Mon Oct 19 17:12:44 2020.

Effect size(s) with 90% confidence intervals will be computed for:
1. Test 1 minus Control 1

5000 resamples will be used to generate the effect size bootstraps.


## Effect sizes¶

dabest now features a range of effect sizes:
• the mean difference (mean_diff)

• the median difference (median_diff)

• Cohen’s d (cohens_d)

• Hedges’ g (hedges_g)

• Cliff’s delta (cliffs_delta)

Each of these are attributes of the Dabest object.

 1  two_groups_unpaired.mean_diff 
DABEST v0.3.1
=============

Good afternoon!
The current time is Mon Oct 19 17:12:44 2020.

The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.221, 0.768].
The p-value of the two-sided permutation t-test is 0.001.

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use .mean_diff.statistical_tests

For each comparison, the type of effect size is reported (here, it’s the “unpaired mean difference”). The confidence interval is reported as: [confidenceIntervalWidth LowerBound, UpperBound]

This confidence interval is generated through bootstrap resampling. See Bootstrap Confidence Intervals for more details.

Since v0.3.0, DABEST will report the p-value of the non-parametric two-sided approximate permutation t-test. This is also known as the Monte Carlo permutation test.

For unpaired comparisons, the p-values and test statistics of Welch’s t test, Student’s t test, and Mann-Whitney U test can be found in addition. For paired comparisons, the p-values and test statistics of the paired Student’s t and Wilcoxon tests are presented.

 1 2  pd.options.display.max_columns = 50 two_groups_unpaired.mean_diff.results 
control test control_N test_N effect_size is_paired difference ci bca_low bca_high bca_interval_idx pct_low pct_high pct_interval_idx pvalue_permutation permutation_count bootstraps resamples random_seed pvalue_welch statistic_welch pvalue_students_t statistic_students_t pvalue_mann_whitney statistic_mann_whitney
0 Control 1 Test 1 20 20 mean difference False 0.48029 95 0.220869 0.767721 (140, 4889) 0.215697 0.761716 (125, 4875) 0.001 5000 [-0.157303571150468, -0.025932185794146356, 0.... 5000 12345 0.002094 -3.308806 0.002057 -3.308806 0.001625 83.0
 1  two_groups_unpaired.mean_diff.statistical_tests 
control test control_N test_N effect_size is_paired difference ci bca_low bca_high pvalue_permutation pvalue_welch statistic_welch pvalue_students_t statistic_students_t pvalue_mann_whitney statistic_mann_whitney
0 Control 1 Test 1 20 20 mean difference False 0.48029 95 0.220869 0.767721 0.001 0.002094 -3.308806 0.002057 -3.308806 0.001625 83.0

Let’s compute the Hedges’ g for our comparison.

 1  two_groups_unpaired.hedges_g 
DABEST v0.3.0
=============

Good afternoon!
The current time is Mon Oct 19 17:12:46 2020.

The unpaired Hedges' g between Control 1 and Test 1 is 1.03 [95%CI 0.349, 1.62].
The p-value of the two-sided permutation t-test is 0.001.

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use .hedges_g.statistical_tests
 1  two_groups_unpaired.hedges_g.results 
control test control_N test_N effect_size is_paired difference ci bca_low bca_high bca_interval_idx pct_low pct_high pct_interval_idx pvalue_permutation permutation_count bootstraps resamples random_seed pvalue_welch statistic_welch pvalue_students_t statistic_students_t pvalue_mann_whitney statistic_mann_whitney
0 Control 1 Test 1 20 20 Hedges' g False 1.025525 95 0.349394 1.618579 (42, 4724) 0.472844 1.74166 (125, 4875) 0.001 5000 [-0.3617512915188043, -0.06120428036887727, 0.... 5000 12345 0.002094 -3.308806 0.002057 -3.308806 0.001625 83.0

## Producing estimation plots¶

To produce a Gardner-Altman estimation plot, simply use the .plot() method. You can read more about its genesis and design inspiration at Robust and Beautiful Statistical Visualization.

Every effect size instance has access to the .plot() method. This means you can quickly create plots for different effect sizes easily.

 1 two_groups_unpaired.mean_diff.plot(); 
 1  two_groups_unpaired.hedges_g.plot(); 

Instead of a Gardner-Altman plot, you can produce a Cumming estimation plot by setting float_contrast=False in the plot() method. This will plot the bootstrap effect sizes below the raw data, and also displays the the mean (gap) and ± standard deviation of each group (vertical ends) as gapped lines. This design was inspired by Edward Tufte’s dictum to maximise the data-ink ratio.

 1  two_groups_unpaired.hedges_g.plot(float_contrast=False); 

For paired data, we use slopegraphs (another innovation from Edward Tufte) to connect paired observations. Both Gardner-Altman and Cumming plots support this.

 1  two_groups_paired.mean_diff.plot(); 
 1  two_groups_paired.mean_diff.plot(float_contrast=False); 

The dabest package also implements a range of estimation plot designs aimed at depicting common experimental designs.

The multi-two-group estimation plot tiles two or more Cumming plots horizontally, and is created by passing a nested tuple to idx when dabest.load() is first invoked.

Thus, the lower axes in the Cumming plot is effectively a forest plot, used in meta-analyses to aggregate and compare data from different experiments.

 1 2 3 4 5  multi_2group = dabest.load(df, idx=(("Control 1", "Test 1",), ("Control 2", "Test 2") )) multi_2group.mean_diff.plot(); 

The multi-two-group design also accomodates paired comparisons.

 1 2 3 4 5 6 7  multi_2group_paired = dabest.load(df, idx=(("Control 1", "Test 1"), ("Control 2", "Test 2") ), paired=True, id_col="ID" ) multi_2group_paired.mean_diff.plot(); 

The shared control plot displays another common experimental paradigm, where several test samples are compared against a common reference sample.

This type of Cumming plot is automatically generated if the tuple passed to idx has more than two data columns.

 1 2 3 4  shared_control = dabest.load(df, idx=("Control 1", "Test 1", "Test 2", "Test 3", "Test 4", "Test 5", "Test 6") ) 
 1  shared_control 
DABEST v0.3.0
=============

Good afternoon!
The current time is Mon Oct 19 17:12:54 2020.

Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
2. Test 2 minus Control 1
3. Test 3 minus Control 1
4. Test 4 minus Control 1
5. Test 5 minus Control 1
6. Test 6 minus Control 1

5000 resamples will be used to generate the effect size bootstraps.

 1  shared_control.mean_diff 
DABEST v0.3.0
=============

Good afternoon!
The current time is Mon Oct 19 17:12:58 2020.

The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.221, 0.768].
The p-value of the two-sided permutation t-test is 0.001.

The unpaired mean difference between Control 1 and Test 2 is -0.542 [95%CI -0.914, -0.211].
The p-value of the two-sided permutation t-test is 0.0042.

The unpaired mean difference between Control 1 and Test 3 is 0.174 [95%CI -0.295, 0.628].
The p-value of the two-sided permutation t-test is 0.479.

The unpaired mean difference between Control 1 and Test 4 is 0.79 [95%CI 0.306, 1.31].
The p-value of the two-sided permutation t-test is 0.0042.

The unpaired mean difference between Control 1 and Test 5 is 0.265 [95%CI 0.0137, 0.497].
The p-value of the two-sided permutation t-test is 0.0404.

The unpaired mean difference between Control 1 and Test 6 is 0.288 [95%CI -0.00441, 0.515].
The p-value of the two-sided permutation t-test is 0.0324.

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use .mean_diff.statistical_tests
 1  shared_control.mean_diff.plot(); 

dabest thus empowers you to robustly perform and elegantly present complex visualizations and statistics.

 1 2 3 4  multi_groups = dabest.load(df, idx=(("Control 1", "Test 1",), ("Control 2", "Test 2","Test 3"), ("Control 3", "Test 4","Test 5", "Test 6") )) 
 1  multi_groups 
DABEST v0.3.0
=============

Good afternoon!
The current time is Mon Oct 19 17:12:58 2020.

Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
2. Test 2 minus Control 2
3. Test 3 minus Control 2
4. Test 4 minus Control 3
5. Test 5 minus Control 3
6. Test 6 minus Control 3

5000 resamples will be used to generate the effect size bootstraps.

 1  multi_groups.mean_diff 
DABEST v0.3.0
=============

Good afternoon!
The current time is Mon Oct 19 17:13:02 2020.

The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.221, 0.768].
The p-value of the two-sided permutation t-test is 0.001.

The unpaired mean difference between Control 2 and Test 2 is -1.38 [95%CI -1.93, -0.895].
The p-value of the two-sided permutation t-test is 0.0.

The unpaired mean difference between Control 2 and Test 3 is -0.666 [95%CI -1.3, -0.103].
The p-value of the two-sided permutation t-test is 0.0352.

The unpaired mean difference between Control 3 and Test 4 is 0.362 [95%CI -0.114, 0.887].
The p-value of the two-sided permutation t-test is 0.161.

The unpaired mean difference between Control 3 and Test 5 is -0.164 [95%CI -0.404, 0.0742].
The p-value of the two-sided permutation t-test is 0.208.

The unpaired mean difference between Control 3 and Test 6 is -0.14 [95%CI -0.398, 0.102].
The p-value of the two-sided permutation t-test is 0.282.

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
The p-value(s) reported are the likelihood(s) of observing the effect size(s),
if the null hypothesis of zero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.

To get the results of all valid statistical tests, use .mean_diff.statistical_tests
 1  multi_groups.mean_diff.plot(); 

### Using long (aka ‘melted’) data frames¶

dabest can also work with ‘melted’ or ‘long’ data. This term is so used because each row will now correspond to a single datapoint, with one column carrying the value and other columns carrying ‘metadata’ describing that datapoint.

More details on wide vs long or ‘melted’ data can be found in this Wikipedia article. The pandas documentation gives recipes for melting dataframes.

  1 2 3 4 5 6 7 8 9 10 11 12  x='group' y='metric' value_cols = df.columns[:-2] # select all but the "Gender" and "ID" columns. df_melted = pd.melt(df.reset_index(), id_vars=["Gender", "ID"], value_vars=value_cols, value_name=y, var_name=x) df_melted.head() # Gives the first five rows of df_melted. 
Gender ID group metric
0 Female 1 Control 1 2.793984
1 Female 2 Control 1 3.236759
2 Female 3 Control 1 3.019149
3 Female 4 Control 1 2.804638
4 Female 5 Control 1 2.858019

When your data is in this format, you will need to specify the x and y columns in dabest.load().

 1 2 3 4  analysis_of_long_df = dabest.load(df_melted, idx=("Control 1", "Test 1"), x="group", y="metric") analysis_of_long_df 
DABEST v0.3.0
=============

Good afternoon!
The current time is Mon Oct 19 17:13:03 2020.

Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1

5000 resamples will be used to generate the effect size bootstraps.

 1  analysis_of_long_df.mean_diff.plot(); 

### Controlling plot aesthetics¶

Changing the y-axes labels.

 1 2  two_groups_unpaired.mean_diff.plot(swarm_label="This is my\nrawdata", contrast_label="The bootstrap\ndistribtions!"); 

Color the rawdata according to another column in the dataframe.

 1  multi_2group.mean_diff.plot(color_col="Gender"); 
 1  two_groups_paired.mean_diff.plot(color_col="Gender"); 

Changing the palette used with custom_palette. Any valid matplotlib or seaborn color palette is accepted.

 1  multi_2group.mean_diff.plot(color_col="Gender", custom_palette="Dark2"); 
 1  multi_2group.mean_diff.plot(custom_palette="Paired"); 

You can also create your own color palette. Create a dictionary where the keys are group names, and the values are valid matplotlib colors.

You can specify matplotlib colors in a variety of ways. Here, I demonstrate using named colors, hex strings (commonly used on the web), and RGB tuples.

 1 2 3 4 5 6 7  my_color_palette = {"Control 1" : "blue", "Test 1" : "purple", "Control 2" : "#cb4b16", # This is a hex string. "Test 2" : (0., 0.7, 0.2) # This is a RGB tuple. } multi_2group.mean_diff.plot(custom_palette=my_color_palette); 

By default, dabest.plot() will desaturate the color of the dots in the swarmplot by 50%. This draws attention to the effect size bootstrap curves.

You can alter the default values with the swarm_desat and halfviolin_desat keywords.

 1 2 3  multi_2group.mean_diff.plot(custom_palette=my_color_palette, swarm_desat=0.75, halfviolin_desat=0.25); 

You can also change the sizes of the dots used in the rawdata swarmplot, and those used to indicate the effect sizes.

 1 2  multi_2group.mean_diff.plot(raw_marker_size=3, es_marker_size=12); 

Changing the y-limits for the rawdata axes, and for the contrast axes.

 1 2  multi_2group.mean_diff.plot(swarm_ylim=(0, 5), contrast_ylim=(-2, 2)); 

If your effect size is qualitatively inverted (ie. a smaller value is a better outcome), you can simply invert the tuple passed to contrast_ylim.

 1 2  multi_2group.mean_diff.plot(contrast_ylim=(2, -2), contrast_label="More negative is better!"); 

You can add minor ticks and also change the tick frequency by accessing the axes directly.

Each estimation plot produced by dabest has 2 axes. The first one contains the rawdata swarmplot; the second one contains the bootstrap effect size differences.

  1 2 3 4 5 6 7 8 9 10 11 12  import matplotlib.ticker as Ticker f = two_groups_unpaired.mean_diff.plot() rawswarm_axes = f.axes[0] contrast_axes = f.axes[1] rawswarm_axes.yaxis.set_major_locator(Ticker.MultipleLocator(1)) rawswarm_axes.yaxis.set_minor_locator(Ticker.MultipleLocator(0.5)) contrast_axes.yaxis.set_major_locator(Ticker.MultipleLocator(0.5)) contrast_axes.yaxis.set_minor_locator(Ticker.MultipleLocator(0.25)) 
  1 2 3 4 5 6 7 8 9 10 11  f = multi_2group.mean_diff.plot(swarm_ylim=(0,6), contrast_ylim=(-3, 1)) rawswarm_axes = f.axes[0] contrast_axes = f.axes[1] rawswarm_axes.yaxis.set_major_locator(Ticker.MultipleLocator(2)) rawswarm_axes.yaxis.set_minor_locator(Ticker.MultipleLocator(1)) contrast_axes.yaxis.set_major_locator(Ticker.MultipleLocator(0.5)) contrast_axes.yaxis.set_minor_locator(Ticker.MultipleLocator(0.25)) 

### Creating estimation plots in existing axes¶

Implemented in v0.2.6 by Adam Nekimken.

dabest.plot has an ax keyword that accepts any Matplotlib Axes. The entire estimation plot will be created in the specified Axes.

  1 2 3 4 5 6 7 8 9 10 11 12 13  from matplotlib import pyplot as plt f, axx = plt.subplots(nrows=2, ncols=2, figsize=(15, 15), gridspec_kw={'wspace': 0.25} # ensure proper width-wise spacing. ) two_groups_unpaired.mean_diff.plot(ax=axx.flat[0]); two_groups_paired.mean_diff.plot(ax=axx.flat[1]); multi_2group.mean_diff.plot(ax=axx.flat[2]); multi_2group_paired.mean_diff.plot(ax=axx.flat[3]); 

In this case, to access the individual rawdata axes, use name_of_axes to manipulate the rawdata swarmplot axes, and name_of_axes.contrast_axes to gain access to the effect size axes.

 1 2 3 4 5  topleft_axes = axx.flat[0] topleft_axes.set_ylabel("New y-axis label for rawdata") topleft_axes.contrast_axes.set_ylabel("New y-axis label for effect size") f 

### Applying style sheets¶

Implemented in v0.2.0.

dabest can apply matplotlib style sheets to estimation plots. You can refer to this gallery of style sheets for reference.

 1 2  import matplotlib.pyplot as plt plt.style.use("dark_background") 
 1  multi_2group.mean_diff.plot();