# Tutorial¶

## Load Libraries¶

```
>>> import numpy as np
>>> import pandas as pd
>>> import dabest
>>> print("We're using DABEST v{}".format(dabest.__version__))
```

```
We're using DABEST v0.2.1
```

## Create dataset for demo¶

Here, we create a dataset to illustrate how `dabest`

functions. In
this dataset, each column corresponds to a group of observations.

```
>>> from scipy.stats import norm # Used in generation of populations.
>>> np.random.seed(9999) # Fix the seed so the results are replicable.
>>> Ns = 20 # The number of samples taken from each population
>>> # Create samples
>>> c1 = norm.rvs(loc=3, scale=0.4, size=Ns)
>>> c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
>>> c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
>>> t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns)
>>> t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns)
>>> t3 = norm.rvs(loc=3, scale=0.75, size=Ns)
>>> t4 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
>>> t5 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
>>> t6 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
>>> # Add a `gender` column for coloring the data.
>>> females = np.repeat('Female', Ns/2).tolist()
>>> males = np.repeat('Male', Ns/2).tolist()
>>> gender = females + males
>>> # Add an `id` column for paired data plotting.
>>> id_col = pd.Series(range(1, Ns+1))
>>> # Combine samples and gender into a DataFrame.
>>> df = pd.DataFrame({'Control 1' : c1, 'Test 1' : t1,
... 'Control 2' : c2, 'Test 2' : t2,
... 'Control 3' : c3, 'Test 3' : t3,
... 'Test 4' : t4, 'Test 5' : t5,
... 'Test 6' : t6,
... 'Gender' : gender, 'ID' : id_col
... })
```

Note that we have 9 samples (3 Control samples and 6 Test samples). Our dataset also has a non-numerical column indicating gender, and another column indicating the identity of each observation.

This is known as a ‘wide’ dataset. See this writeup for more details.

```
df.head()
```

Control 1 | Test 1 | Control 2 | Test 2 | Control 3 | Test 3 | Test 4 | Test 5 | Test 6 | Gender | ID |
---|---|---|---|---|---|---|---|---|---|---|

2.793984 | 3.420875 | 3.324661 | 1.707467 | 3.816940 | 1.796581 | 4.440050 | 2.937284 | 3.486127 | Female | 1 |

3.236759 | 3.467972 | 3.685186 | 1.121846 | 3.750358 | 3.944566 | 3.723494 | 2.837062 | 2.338094 | Female | 2 |

3.019149 | 4.377179 | 5.616891 | 3.301381 | 2.945397 | 2.832188 | 3.214014 | 3.111950 | 3.270897 | Female | 3 |

2.804638 | 4.564780 | 2.773152 | 2.534018 | 3.575179 | 3.048267 | 4.968278 | 3.743378 | 3.151188 | Female | 4 |

2.858019 | 3.220058 | 2.550361 | 2.796365 | 3.692138 | 3.276575 | 2.662104 | 2.977341 | 2.328601 | Female | 5 |

## Loading Data¶

Before we create estimation plots and obtain confidence intervals for our effect sizes, we need to load the data and the relevant groups.

We simply supply the DataFrame to `dabest.load()`

. We also must supply
the two groups you want to compare in the `idx`

argument as a tuple or
list.

```
>>> two_groups_unpaired = dabest.load(df, idx=("Control 1", "Test 1"),
... resamples=5000)
```

Calling this Dabest object gives you a gentle greeting, as well as the comparisons that can be computed.

```
>>> two_groups_unpaired
```

```
DABEST v0.2.1
=============
Good afternoon!
The current time is Mon Mar 11 16:19:24 2019.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
```

### Changing statistical parameters¶

If the dataset contains paired data (ie. repeated observations), specify
this with the `paired`

keyword. You must also pass a column in the
dataset that indicates the identity of each observation, using the
`id_col`

keyword.

```
>>> two_groups_paired = dabest.load(df, idx=("Control 1", "Test 1"),
... paired=True, id_col="ID")
```

```
>>> two_groups_paired
```

```
DABEST v0.2.1
=============
Good afternoon!
The current time is Mon Mar 11 16:19:25 2019.
Paired effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
```

You can also change the width of the confidence interval that will be produced.

```
>>> two_groups_unpaired_ci90 = dabest.load(df,
... idx=("Control 1", "Test 1"),
... ci=90)
```

```
>>> two_groups_unpaired_ci90
```

```
DABEST v0.2.1
=============
Good afternoon!
The current time is Mon Mar 11 16:19:25 2019.
Effect size(s) with 90% confidence intervals will be computed for:
1. Test 1 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
```

## Effect sizes¶

`dabest`

now features a range of effect sizes:

- the mean difference (
`mean_diff`

) - the median difference (
`median_diff`

) - Cohen’s *d* (
`cohens_d`

) - Hedges’ *g* (
`hedges_g`

) - Cliff’s delta (
`cliffs_delta`

)

Each of these are attributes of the Dabest object.

```
>>> two_groups_unpaired.mean_diff
```

DABEST v0.2.1 ============= Good afternoon! The current time is Mon Mar 11 16:19:25 2019. The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.205, 0.774]. The two-sided p-value of the Mann-Whitney test is 0.000813. 5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true. To get the results of all valid statistical tests, use .mean_diff.statistical_tests

For each comparison, the type of effect size is reported (here, it’s the “unpaired mean difference”).

The confidence interval is reported as: [*confidenceIntervalWidth*
*LowerBound*, *UpperBound*], and is generated through bootstrap resampling.
See Bootstrap Confidence Intervals for more details.

By default, DABEST will report the two-sided p-value of the most conservative test that is appropriate for the effect size. This is the statistical test that does not assume normality of the underlying populations, and does not assume that both of them do not share the same variance (ie. heteroscadacity).

You can access the results as a `pandas DataFrame`

.

```
>>> two_groups_unpaired.mean_diff.results
```

control | test | effect_size | is_paired | difference | ci | bca_low | bca_high | bca_interval_idx | pct_low | pct_high | pct_interval_idx | bootstraps | resamples | random_seed | pvalue_welch | statistic_welch | pvalue_students_t | statistic_students_t | pvalue_mann_whitney | statistic_mann_whitney | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | Control 1 | Test 1 | mean difference | False | 0.48029 | 95 | 0.205161 | 0.773647 | (145, 4893) | 0.197427 | 0.758752 | (125, 4875) | [-0.05989473868674011, -0.018608309424335, 0.0... | 5000 | 12345 | 0.002094 | -3.308806 | 0.002057 | -3.308806 | 0.000813 | 83.0 |

You can use `.mean_diff.statistical_tests`

to
obtain the p-values and test statistics for all relavant statistical
tests.

```
>>> two_groups_unpaired.mean_diff.statistical_tests
```

control | test | effect_size | is_paired | difference | ci | bca_low | bca_high | pvalue_welch | statistic_welch | pvalue_students_t | statistic_students_t | pvalue_mann_whitney | statistic_mann_whitney | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | Control 1 | Test 1 | mean difference | False | 0.48029 | 95 | 0.205161 | 0.773647 | 0.002094 | -3.308806 | 0.002057 | -3.308806 | 0.000813 | 83.0 |

Let’s compute the Hedges’ g for our comparison.

```
>>> two_groups_unpaired.hedges_g
```

DABEST v0.2.1 ============= Good afternoon! The current time is Mon Mar 11 16:19:26 2019. The unpaired Hedges' g between Control 1 and Test 1 is 1.03 [95%CI 0.317, 1.62]. The two-sided p-value of the Mann-Whitney test is 0.000813. 5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true. To get the results of all valid statistical tests, use .hedges_g.statistical_tests

```
>>> two_groups_unpaired.hedges_g.results
```

control | test | effect_size | is_paired | difference | ci | bca_low | bca_high | bca_interval_idx | pct_low | ... | pct_interval_idx | bootstraps | resamples | random_seed | pvalue_welch | statistic_welch | pvalue_students_t | statistic_students_t | pvalue_mann_whitney | statistic_mann_whitney | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | Control 1 | Test 1 | Hedges' g | False | 1.025525 | 95 | 0.316506 | 1.616235 | (42, 4725) | 0.44486 | ... | (125, 4875) | [-0.1491709040527835, -0.0504066101302326, 0.0... | 5000 | 12345 | 0.002094 | -3.308806 | 0.002057 | -3.308806 | 0.000813 | 83.0 |

1 rows × 21 columns

## Producing estimation plots¶

To produce a **Gardner-Altman estimation plot**, simply use the
`.plot()`

method. You can read more about its genesis and design
inspiration here.

Every effect size instance has access to the `.plot()`

method. This
means you can quickly create plots for different effect sizes easily.

```
>>> two_groups_unpaired.mean_diff.plot()
```

```
>>> two_groups_unpaired.hedges_g.plot()
```

Instead of a Gardner-Altman plot, you can produce a **Cumming estimation
plot** by setting `float_contrast=False`

in the `plot()`

method. This will plot the bootstrap effect sizes below the raw data.

The mean (gap) and ± standard deviation of each group (vertical ends) is plotted as a gapped line, an inspiration from Edward Tufte’s dictum to maximise data-ink ratio.

```
>>> two_groups_unpaired.hedges_g.plot(float_contrast=False)
```

For paired data, we use slopegraphs (another innovation from Edward Tufte) to connect paired observations.

```
>>> two_groups_paired.mean_diff.plot()
```

```
>>> two_groups_paired.mean_diff.plot(float_contrast=False)
```

The `dabest`

package also implements a range of estimation plot
designs aimed at depicting common experimental designs.

The **multi-two-group estimation plot** tiles two or more Cumming plots
horizontally, and is created by passing a *nested tuple* to idx when
`dabest.load()`

is first invoked.

Thus, the lower axes in the Cumming plot is effectively a forest plot, used in meta-analyses to aggregate and compare data from different experiments.

```
>>> multi_2group = dabest.load(df, idx=(("Control 1", "Test 1",),
... ("Control 2", "Test 2")
... ))
>>> multi_2group.mean_diff.plot()
```

The multi-two-group design also accomodates paired comparisons.

```
>>> multi_2group_paired = dabest.load(df, idx=(("Control 1", "Test 1",),
... ("Control 2", "Test 2")
... ),
... paired=True, id_col="ID"
... )
>>> multi_2group_paired.mean_diff.plot()
```

The **shared control plot** displays another common experimental
paradigm, where several test samples are compared against a common
reference sample.

This type of Cumming plot is automatically generated if the tuple passed
to `idx`

has more than two data columns.

```
>>> shared_control = dabest.load(df, idx=("Control 1", "Test 1",
... "Test 2", "Test 3",
... "Test 4", "Test 5", "Test 6")
... )
```

```
>>> shared_control
```

```
DABEST v0.2.1
=============
Good afternoon!
The current time is Mon Mar 11 16:19:31 2019.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
2. Test 2 minus Control 1
3. Test 3 minus Control 1
4. Test 4 minus Control 1
5. Test 5 minus Control 1
6. Test 6 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
```

```
shared_control.mean_diff
```

DABEST v0.2.1 ============= Good afternoon! The current time is Mon Mar 11 16:19:32 2019. The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.205, 0.774]. The two-sided p-value of the Mann-Whitney test is 0.000813. The unpaired mean difference between Control 1 and Test 2 is -0.542 [95%CI -0.915, -0.206]. The two-sided p-value of the Mann-Whitney test is 0.00572. The unpaired mean difference between Control 1 and Test 3 is 0.174 [95%CI -0.273, 0.647]. The two-sided p-value of the Mann-Whitney test is 0.205. The unpaired mean difference between Control 1 and Test 4 is 0.79 [95%CI 0.325, 1.33]. The two-sided p-value of the Mann-Whitney test is 0.0266. The unpaired mean difference between Control 1 and Test 5 is 0.265 [95%CI 0.0115, 0.497]. The two-sided p-value of the Mann-Whitney test is 0.0206. The unpaired mean difference between Control 1 and Test 6 is 0.288 [95%CI 0.00913, 0.524]. The two-sided p-value of the Mann-Whitney test is 0.0137. 5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true. To get the results of all valid statistical tests, use .mean_diff.statistical_tests

```
>>> shared_control.mean_diff.plot()
```

`dabest`

thus empowers you to robustly perform and elegantly present
complex visualizations and statistics.

```
>>> multi_groups = dabest.load(df,
... idx=(("Control 1", "Test 1",),
... ("Control 2", "Test 2", "Test 3"),
... ("Control 3", "Test 4", "Test 5", "Test 6")
... )
... )
```

```
>>> multi_groups
```

```
DABEST v0.2.1
=============
Good afternoon!
The current time is Mon Mar 11 16:19:33 2019.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
2. Test 2 minus Control 2
3. Test 3 minus Control 2
4. Test 4 minus Control 3
5. Test 5 minus Control 3
6. Test 6 minus Control 3
5000 resamples will be used to generate the effect size bootstraps.
```

```
>>> multi_groups.mean_diff
```

DABEST v0.2.1 ============= Good afternoon! The current time is Mon Mar 11 16:19:35 2019. The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.205, 0.774]. The two-sided p-value of the Mann-Whitney test is 0.000813. The unpaired mean difference between Control 2 and Test 2 is -1.38 [95%CI -1.93, -0.905]. The two-sided p-value of the Mann-Whitney test is 1.3e-05. The unpaired mean difference between Control 2 and Test 3 is -0.666 [95%CI -1.29, -0.0788]. The two-sided p-value of the Mann-Whitney test is 0.0219. The unpaired mean difference between Control 3 and Test 4 is 0.362 [95%CI -0.111, 0.901]. The two-sided p-value of the Mann-Whitney test is 0.182. The unpaired mean difference between Control 3 and Test 5 is -0.164 [95%CI -0.398, 0.0747]. The two-sided p-value of the Mann-Whitney test is 0.0778. The unpaired mean difference between Control 3 and Test 6 is -0.14 [95%CI -0.4, 0.0937]. The two-sided p-value of the Mann-Whitney test is 0.22. 5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true. To get the results of all valid statistical tests, use .mean_diff.statistical_tests

```
>>> multi_groups.mean_diff.plot()
```

### Using long (aka ‘melted’) data frames¶

`dabest`

can also work with ‘melted’ or ‘long’ data.
This term isso used because each row will now correspond to a single datapoint,
with one column carrying the value and other columns carrying ‘metadata’
describing that datapoint.

More details on wide vs long or ‘melted’ data can be found in this Wikipedia article. The pandas documentation gives recipes for melting dataframes.

```
>>> x = 'group'
>>> y = 'metric'
>>> value_cols = df.columns[:-2] # select all but the "Gender" and "ID" columns.
>>> df_melted = pd.melt(df.reset_index(),
... id_vars=["Gender", "ID"],
... value_vars=value_cols,
... value_name=y,
... var_name=x)
>>> df_melted.head() # Gives the first five rows of `df_melted`.
```

Gender | ID | group | metric | |
---|---|---|---|---|

0 | Female | 1 | Control 1 | 2.793984 |

1 | Female | 2 | Control 1 | 3.236759 |

2 | Female | 3 | Control 1 | 3.019149 |

3 | Female | 4 | Control 1 | 2.804638 |

4 | Female | 5 | Control 1 | 2.858019 |

When your data is in this format, you will need to specify the `x`

and
`y`

columns in `dabest.load()`

.

```
>>> analysis_of_long_df = dabest.load(df_melted,
... idx=("Control 1", "Test 1"),
... x="group", y="metric")
>>> analysis_of_long_df
```

```
DABEST v0.2.1
=============
Good afternoon!
The current time is Mon Mar 11 16:19:36 2019.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
```

```
>>> analysis_of_long_df.mean_diff.plot()
```

### Controlling plot aesthetics¶

Changing the y-axes labels.

```
>>> two_groups_unpaired.mean_diff.plot(swarm_label="This is my\nrawdata",
contrast_label="The bootstrap\ndistribtions!")
```

Color the rawdata according to another column in the dataframe.

```
>>> multi_2group.mean_diff.plot(color_col="Gender")
```

```
>>> two_groups_paired.mean_diff.plot(color_col="Gender")
```

Changing the palette used with `custom_palette`

. Any valid matplotlib
or seaborn color palette is accepted.

```
>>> multi_2group.mean_diff.plot(color_col="Gender",
... custom_palette="Dark2")
```

```
>>> multi_2group.mean_diff.plot(custom_palette="Paired")
```

You can also create your own color palette. Create a dictionary where the keys are group names, and the values are valid matplotlib colors.

You can specify matplotlib colors in a variety of ways. Here, I demonstrate using named colors, hex strings (commonly used on the web), and RGB tuples.

```
>>> my_color_palette = {"Control 1" : "blue",
... "Test 1" : "purple",
... "Control 2" : "#cb4b16", # This is a hex string.
... "Test 2" : (0., 0.7, 0.2) # This is a RGB tuple.
... }
>>> multi_2group.mean_diff.plot(custom_palette=my_color_palette)
```

By default, `dabest`

will
desaturate
the color of the dots in the swarmplot by 50%.
This draws attention to the effect size bootstrap curves.

You can alter the default values with the `swarm_desat`

and
`halfviolin_desat`

keywords.

```
>>> multi_2group.mean_diff.plot(custom_palette=my_color_palette,
... swarm_desat=0.75,
... halfviolin_desat=0.25)
```

You can also change the sizes of the dots used in the rawdata swarmplot, and those used to indicate the effect sizes.

```
>>> multi_2group.mean_diff.plot(raw_marker_size=3, es_marker_size=12)
```

Changing the y-limits for the rawdata axes, and for the contrast axes.

```
>>> multi_2group.mean_diff.plot(swarm_ylim=(0, 5),
... contrast_ylim=(-2, 2))
```

If your effect size is qualitatively inverted (ie. a smaller value is a
better outcome), you can simply invert the tuple passed to
`contrast_ylim`

.

```
>>> multi_2group.mean_diff.plot(contrast_ylim=(2, -2),
>>> contrast_label="More negative is better!")
```

You can add minor ticks and also change the tick frequency by accessing the axes directly.

Each estimation plot produced by `dabest`

has 2 axes. The first one
contains the rawdata swarmplot; the second one contains the bootstrap
effect size differences.

```
>>> import matplotlib.ticker as Ticker
>>> f = two_groups_unpaired.mean_diff.plot()
>>> rawswarm_axes = f.axes[0]
>>> contrast_axes = f.axes[1]
>>> rawswarm_axes.yaxis.set_major_locator(Ticker.MultipleLocator(1))
>>> rawswarm_axes.yaxis.set_minor_locator(Ticker.MultipleLocator(0.5))
>>> contrast_axes.yaxis.set_major_locator(Ticker.MultipleLocator(0.5))
>>> contrast_axes.yaxis.set_minor_locator(Ticker.MultipleLocator(0.25))
```

```
>>> f = multi_2group.mean_diff.plot(swarm_ylim=(0,6),
contrast_ylim=(-3, 1))
>>> rawswarm_axes = f.axes[0]
>>> contrast_axes = f.axes[1]
>>> rawswarm_axes.yaxis.set_major_locator(Ticker.MultipleLocator(2))
>>> rawswarm_axes.yaxis.set_minor_locator(Ticker.MultipleLocator(1))
>>> contrast_axes.yaxis.set_major_locator(Ticker.MultipleLocator(0.5))
>>> contrast_axes.yaxis.set_minor_locator(Ticker.MultipleLocator(0.25))
```

With v0.2.1, `dabest`

can now apply matplotlib style
sheets
to estimation plots. You can refer to this
gallery
of style sheets for reference.

```
>>> import matplotlib.pyplot as plt
>>> plt.style.use("dark_background")
```

```
>>> multi_2group.mean_diff.plot()
```