.. _Tutorial:
================
Tutorial: Basics
================
Load Libraries
--------------
.. code-block:: python3
:linenos:
import numpy as np
import pandas as pd
import dabest
print("We're using DABEST v{}".format(dabest.__version__))
.. parsed-literal::
We're using DABEST v2023.02.14
Create dataset for demo
-----------------------
Here, we create a dataset to illustrate how ``dabest`` functions. In
this dataset, each column corresponds to a group of observations.
.. code-block:: python3
:linenos:
from scipy.stats import norm # Used in generation of populations.
np.random.seed(9999) # Fix the seed so the results are replicable.
# pop_size = 10000 # Size of each population.
Ns = 20 # The number of samples taken from each population
# Create samples
c1 = norm.rvs(loc=3, scale=0.4, size=Ns)
c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns)
t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns)
t3 = norm.rvs(loc=3, scale=0.75, size=Ns)
t4 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
t5 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
t6 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
# Add a `gender` column for coloring the data.
females = np.repeat('Female', Ns/2).tolist()
males = np.repeat('Male', Ns/2).tolist()
gender = females + males
# Add an `id` column for paired data plotting.
id_col = pd.Series(range(1, Ns+1))
# Combine samples and gender into a DataFrame.
df = pd.DataFrame({'Control 1' : c1, 'Test 1' : t1,
'Control 2' : c2, 'Test 2' : t2,
'Control 3' : c3, 'Test 3' : t3,
'Test 4' : t4, 'Test 5' : t5, 'Test 6' : t6,
'Gender' : gender, 'ID' : id_col
})
Note that we have 9 groups (3 Control samples and 6 Test samples). Our
dataset also has a non-numerical column indicating gender, and another
column indicating the identity of each observation.
This is known as a ‘wide’ dataset. See this
`writeup `__
for more details.
.. code-block:: python3
:linenos:
df.head()
.. raw:: html
|
Control 1 |
Test 1 |
Control 2 |
Test 2 |
Control 3 |
Test 3 |
Test 4 |
Test 5 |
Test 6 |
Gender |
ID |
0 |
2.793984 |
3.420875 |
3.324661 |
1.707467 |
3.816940 |
1.796581 |
4.440050 |
2.937284 |
3.486127 |
Female |
1 |
1 |
3.236759 |
3.467972 |
3.685186 |
1.121846 |
3.750358 |
3.944566 |
3.723494 |
2.837062 |
2.338094 |
Female |
2 |
2 |
3.019149 |
4.377179 |
5.616891 |
3.301381 |
2.945397 |
2.832188 |
3.214014 |
3.111950 |
3.270897 |
Female |
3 |
3 |
2.804638 |
4.564780 |
2.773152 |
2.534018 |
3.575179 |
3.048267 |
4.968278 |
3.743378 |
3.151188 |
Female |
4 |
4 |
2.858019 |
3.220058 |
2.550361 |
2.796365 |
3.692138 |
3.276575 |
2.662104 |
2.977341 |
2.328601 |
Female |
5 |
Loading Data
------------
Before we create estimation plots and obtain confidence intervals for
our effect sizes, we need to load the data and the relevant groups.
We simply supply the DataFrame to ``dabest.load()``. We also must supply
the two groups you want to compare in the ``idx`` argument as a tuple or
list.
.. code-block:: python3
:linenos:
two_groups_unpaired = dabest.load(df, idx=("Control 1", "Test 1"), resamples=5000)
Calling this ``Dabest`` object gives you a gentle greeting, as well as
the comparisons that can be computed.
.. code-block:: python3
:linenos:
two_groups_unpaired
.. parsed-literal::
DABEST v2023.02.14
==================
Good evening!
The current time is Sun Aug 29 18:00:54 2021.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
Changing statistical parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can change the width of the confidence interval that will be
produced by manipulating the ``ci`` argument.
.. code-block:: python3
:linenos:
two_groups_unpaired_ci90 = dabest.load(df, idx=("Control 1", "Test 1"), ci=90)
.. code-block:: python3
:linenos:
two_groups_unpaired_ci90
.. parsed-literal::
DABEST v2023.02.14
==================
Good afternoon!
The current time is Mon Oct 19 17:12:44 2020.
Effect size(s) with 90% confidence intervals will be computed for:
1. Test 1 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
Effect sizes
------------
``dabest`` now features a range of effect sizes:
- the mean difference (``mean_diff``)
- the median difference (``median_diff``)
- `Cohen’s d `__ (``cohens_d``)
- `Hedges’ g `__ (``hedges_g``)
- `Cliff’s delta `__ (``cliffs_delta``)
Each of these are attributes of the ``Dabest`` object.
.. code-block:: python3
:linenos:
two_groups_unpaired.mean_diff
.. parsed-literal::
DABEST v2023.02.14
==================
Good evening!
The current time is Sun Aug 29 18:10:44 2021.
The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.221, 0.768].
The p-value of the two-sided permutation t-test is 0.001, calculated for legacy purposes only.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`
For each comparison, the type of effect size is reported (here, it’s the
“unpaired mean difference”). The confidence interval is reported as:
[*confidenceIntervalWidth* *LowerBound*, *UpperBound*]
This confidence interval is generated through bootstrap resampling. See
:doc:`bootstraps` for more details.
Since v0.3.0, DABEST will report the p-value of the `non-parametric two-sided approximate permutation t-test `__. This is also known as the Monte Carlo permutation test.
For unpaired comparisons, the p-values and test statistics of `Welch's t test `__, `Student's t test `__, and `Mann-Whitney U test `__ can be found in addition. For paired comparisons, the p-values and test statistics of the `paired Student's t `__ and `Wilcoxon `__ tests are presented.
.. code-block:: python3
:linenos:
pd.options.display.max_columns = 50
two_groups_unpaired.mean_diff.results
.. raw:: html
|
control |
test |
control_N |
test_N |
effect_size |
is_paired |
difference |
ci |
bca_low |
bca_high |
bca_interval_idx |
pct_low |
pct_high |
pct_interval_idx |
bootstraps |
resamples |
random_seed |
permutations |
pvalue_permutation |
permutation_count |
permutations_var |
pvalue_welch |
statistic_welch |
pvalue_students_t |
statistic_students_t |
pvalue_mann_whitney |
statistic_mann_whitney |
0 |
Control 1 |
Test 1 |
20 |
20 |
mean difference |
None |
0.48029 |
95 |
0.220869 |
0.767721 |
(140, 4889) |
0.215697 |
0.761716 |
(125, 4875) |
[0.6686169333655454, 0.4382051534234943, 0.665... |
5000 |
12345 |
[-0.17259843762502491, 0.03802293852634886, -0... |
0.001 |
5000 |
[0.026356588154404337, 0.027102495439046997, 0... |
0.002094 |
-3.308806 |
0.002057 |
-3.308806 |
0.001625 |
83.0 |
.. code-block:: python3
:linenos:
two_groups_unpaired.mean_diff.statistical_tests
.. raw:: html
|
control |
test |
control_N |
test_N |
effect_size |
is_paired |
difference |
ci |
bca_low |
bca_high |
pvalue_permutation |
pvalue_welch |
statistic_welch |
pvalue_students_t |
statistic_students_t |
pvalue_mann_whitney |
statistic_mann_whitney |
0 |
Control 1 |
Test 1 |
20 |
20 |
mean difference |
None |
0.48029 |
95 |
0.220869 |
0.767721 |
0.001 |
0.002094 |
-3.308806 |
0.002057 |
-3.308806 |
0.001625 |
83.0 |
Let’s compute the Hedges’ *g* for our comparison.
.. code-block:: python3
:linenos:
two_groups_unpaired.hedges_g
.. parsed-literal::
DABEST v2023.02.14
==================
Good evening!
The current time is Sun Aug 29 18:12:17 2021.
The unpaired Hedges' g between Control 1 and Test 1 is 1.03 [95%CI 0.349, 1.62].
The p-value of the two-sided permutation t-test is 0.001, calculated for legacy purposes only.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use `.hedges_g.statistical_tests`
.. code-block:: python3
:linenos:
two_groups_unpaired.hedges_g.results
.. raw:: html
|
control |
test |
control_N |
test_N |
effect_size |
is_paired |
difference |
ci |
bca_low |
bca_high |
bca_interval_idx |
pct_low |
pct_high |
pct_interval_idx |
bootstraps |
resamples |
random_seed |
permutations |
pvalue_permutation |
permutation_count |
permutations_var |
pvalue_welch |
statistic_welch |
pvalue_students_t |
statistic_students_t |
pvalue_mann_whitney |
statistic_mann_whitney |
0 |
Control 1 |
Test 1 |
20 |
20 |
Hedges' g |
None |
1.025525 |
95 |
0.349394 |
1.618579 |
(42, 4724) |
0.472844 |
1.74166 |
(125, 4875) |
[1.1337301267831184, 0.8311210968422604, 1.539... |
5000 |
12345 |
[-0.3295089865590538, 0.07158401210924781, -0.... |
0.001 |
5000 |
[0.026356588154404337, 0.027102495439046997, 0... |
0.002094 |
-3.308806 |
0.002057 |
-3.308806 |
0.001625 |
83.0 |
Producing estimation plots
--------------------------
To produce a **Gardner-Altman estimation plot**, simply use the
``.plot()`` method. You can read more about its genesis and design
inspiration at :doc:`robust-beautiful`.
Every effect size instance has access to the ``.plot()`` method. This
means you can quickly create plots for different effect sizes easily.
.. code-block:: python3
:linenos:
two_groups_unpaired.mean_diff.plot();
.. image:: _images/tutorial_27_0.png
.. code-block:: python3
:linenos:
two_groups_unpaired.hedges_g.plot();
.. image:: _images/tutorial_28_0.png
Instead of a Gardner-Altman plot, you can produce a **Cumming estimation
plot** by setting ``float_contrast=False`` in the ``plot()`` method.
This will plot the bootstrap effect sizes below the raw data, and also
displays the the mean (gap) and ± standard deviation of each group
(vertical ends) as gapped lines. This design was inspired by Edward
Tufte’s dictum to maximise the data-ink ratio.
.. code-block:: python3
:linenos:
two_groups_unpaired.hedges_g.plot(float_contrast=False);
.. image:: _images/tutorial_30_0.png
The ``dabest`` package also implements a range of estimation plot
designs aimed at depicting common experimental designs.
The **multi-two-group estimation plot** tiles two or more Cumming plots
horizontally, and is created by passing a *nested tuple* to ``idx`` when
``dabest.load()`` is first invoked.
Thus, the lower axes in the Cumming plot is effectively a `forest
plot `__, used in
meta-analyses to aggregate and compare data from different experiments.
.. code-block:: python3
:linenos:
multi_2group = dabest.load(df, idx=(("Control 1", "Test 1",),
("Control 2", "Test 2")
))
multi_2group.mean_diff.plot();
.. image:: _images/tutorial_35_0.png
The **shared control plot** displays another common experimental
paradigm, where several test samples are compared against a common
reference sample.
This type of Cumming plot is automatically generated if the tuple passed
to ``idx`` has more than two data columns.
.. code-block:: python3
:linenos:
shared_control = dabest.load(df, idx=("Control 1", "Test 1",
"Test 2", "Test 3",
"Test 4", "Test 5", "Test 6")
)
.. code-block:: python3
:linenos:
shared_control
.. parsed-literal::
DABEST v2023.02.14
==================
Good evening!
The current time is Tue Aug 31 23:39:22 2021.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
2. Test 2 minus Control 1
3. Test 3 minus Control 1
4. Test 4 minus Control 1
5. Test 5 minus Control 1
6. Test 6 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
.. code-block:: python3
:linenos:
shared_control.mean_diff
.. parsed-literal::
DABEST v2023.02.14
==================
Good evening!
The current time is Tue Aug 31 23:42:39 2021.
The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.221, 0.768].
The p-value of the two-sided permutation t-test is 0.001, calculated for legacy purposes only.
The unpaired mean difference between Control 1 and Test 2 is -0.542 [95%CI -0.914, -0.211].
The p-value of the two-sided permutation t-test is 0.0042, calculated for legacy purposes only.
The unpaired mean difference between Control 1 and Test 3 is 0.174 [95%CI -0.295, 0.628].
The p-value of the two-sided permutation t-test is 0.479, calculated for legacy purposes only.
The unpaired mean difference between Control 1 and Test 4 is 0.79 [95%CI 0.306, 1.31].
The p-value of the two-sided permutation t-test is 0.0042, calculated for legacy purposes only.
The unpaired mean difference between Control 1 and Test 5 is 0.265 [95%CI 0.0137, 0.497].
The p-value of the two-sided permutation t-test is 0.0404, calculated for legacy purposes only.
The unpaired mean difference between Control 1 and Test 6 is 0.288 [95%CI -0.00441, 0.515].
The p-value of the two-sided permutation t-test is 0.0324, calculated for legacy purposes only.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`
.. code-block:: python3
:linenos:
shared_control.mean_diff.plot();
.. image:: _images/tutorial_42_0.png
``dabest`` thus empowers you to robustly perform and elegantly present
complex visualizations and statistics.
.. code-block:: python3
:linenos:
multi_groups = dabest.load(df, idx=(("Control 1", "Test 1",),
("Control 2", "Test 2","Test 3"),
("Control 3", "Test 4","Test 5", "Test 6")
))
.. code-block:: python3
:linenos:
multi_groups
.. parsed-literal::
DABEST v2023.02.14
==================
Good evening!
The current time is Tue Aug 31 23:47:40 2021.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
2. Test 2 minus Control 2
3. Test 3 minus Control 2
4. Test 4 minus Control 3
5. Test 5 minus Control 3
6. Test 6 minus Control 3
5000 resamples will be used to generate the effect size bootstraps.
.. code-block:: python3
:linenos:
multi_groups.mean_diff
.. parsed-literal::
DABEST v2023.02.14
==================
Good evening!
The current time is Tue Aug 31 23:48:17 2021.
The unpaired mean difference between Control 1 and Test 1 is 0.48 [95%CI 0.221, 0.768].
The p-value of the two-sided permutation t-test is 0.001, calculated for legacy purposes only.
The unpaired mean difference between Control 2 and Test 2 is -1.38 [95%CI -1.93, -0.895].
The p-value of the two-sided permutation t-test is 0.0, calculated for legacy purposes only.
The unpaired mean difference between Control 2 and Test 3 is -0.666 [95%CI -1.3, -0.103].
The p-value of the two-sided permutation t-test is 0.0352, calculated for legacy purposes only.
The unpaired mean difference between Control 3 and Test 4 is 0.362 [95%CI -0.114, 0.887].
The p-value of the two-sided permutation t-test is 0.161, calculated for legacy purposes only.
The unpaired mean difference between Control 3 and Test 5 is -0.164 [95%CI -0.404, 0.0742].
The p-value of the two-sided permutation t-test is 0.208, calculated for legacy purposes only.
The unpaired mean difference between Control 3 and Test 6 is -0.14 [95%CI -0.398, 0.102].
The p-value of the two-sided permutation t-test is 0.282, calculated for legacy purposes only.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
Any p-value reported is the probability of observing theeffect size (or greater),
assuming the null hypothesis ofzero difference is true.
For each p-value, 5000 reshuffles of the control and test labels were performed.
To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`
.. code-block:: python3
:linenos:
multi_groups.mean_diff.plot();
.. image:: _images/tutorial_47_0.png
Using long (aka ‘melted’) data frames
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``dabest`` can also work with ‘melted’ or ‘long’ data. This term is so
used because each row will now correspond to a single datapoint, with
one column carrying the value and other columns carrying ‘metadata’
describing that datapoint.
More details on wide vs long or ‘melted’ data can be found in this
`Wikipedia
article `__. The
`pandas
documentation `__
gives recipes for melting dataframes.
.. code-block:: python3
:linenos:
x='group'
y='metric'
value_cols = df.columns[:-2] # select all but the "Gender" and "ID" columns.
df_melted = pd.melt(df.reset_index(),
id_vars=["Gender", "ID"],
value_vars=value_cols,
value_name=y,
var_name=x)
df_melted.head() # Gives the first five rows of `df_melted`.
.. raw:: html
|
Gender |
ID |
group |
metric |
0 |
Female |
1 |
Control 1 |
2.793984 |
1 |
Female |
2 |
Control 1 |
3.236759 |
2 |
Female |
3 |
Control 1 |
3.019149 |
3 |
Female |
4 |
Control 1 |
2.804638 |
4 |
Female |
5 |
Control 1 |
2.858019 |
When your data is in this format, you will need to specify the ``x`` and
``y`` columns in ``dabest.load()``.
.. code-block:: python3
:linenos:
analysis_of_long_df = dabest.load(df_melted, idx=("Control 1", "Test 1"),
x="group", y="metric")
analysis_of_long_df
.. parsed-literal::
DABEST v2023.02.14
==================
Good evening!
The current time is Tue Aug 31 23:51:12 2021.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
.. code-block:: python3
:linenos:
analysis_of_long_df.mean_diff.plot();
.. image:: _images/tutorial_52_0.png