API

Loading Data

dabest.load(data, idx, x=None, y=None, paired=False, id_col=None, ci=95, resamples=5000, random_seed=12345)

Loads data in preparation for estimation statistics.

This is designed to work with pandas DataFrames.

Parameters:
  • data (pandas DataFrame) –
  • idx (tuple) – List of column names (if ‘x’ is not supplied) or of category names (if ‘x’ is supplied). This can be expressed as a tuple of tuples, with each individual tuple producing its own contrast plot
  • y (x,) – Column names for data to be plotted on the x-axis and y-axis.
  • paired (boolean, default False.) –
  • id_col (default None.) – Required if paired is True.
  • ci (integer, default 95) – The confidence interval width. The default of 95 produces 95% confidence intervals.
  • resamples (integer, default 5000.) – The number of resamples taken to generate the bootstraps which are used to generate the confidence intervals.
  • random_seed (int, default 12345) – This integer is used to seed the random number generator during bootstrap resampling, ensuring that the confidence intervals reported are replicable.
Returns:

Return type:

A Dabest object.

Example

Load libraries.

>>> import numpy as np
>>> import pandas as pd
>>> import dabest

Create dummy data for demonstration.

>>> np.random.seed(88888)
>>> N = 10
>>> c1 = sp.stats.norm.rvs(loc=100, scale=5, size=N)
>>> t1 = sp.stats.norm.rvs(loc=115, scale=5, size=N)
>>> df = pd.DataFrame({'Control 1' : c1, 'Test 1': t1})

Load the data.

>>> my_data = dabest.load(df, idx=("Control 1", "Test 1"))

Plotting Data

dabest._classes.EffectSizeDataFrame.plot(self, color_col=None, raw_marker_size=6, es_marker_size=9, swarm_label=None, contrast_label=None, swarm_ylim=None, contrast_ylim=None, custom_palette=None, swarm_desat=0.5, halfviolin_desat=1, halfviolin_alpha=0.8, float_contrast=True, show_pairs=True, group_summaries=None, group_summaries_offset=0.1, fig_size=None, dpi=100, swarmplot_kwargs=None, violinplot_kwargs=None, slopegraph_kwargs=None, reflines_kwargs=None, group_summary_kwargs=None, legend_kwargs=None)

Creates an estimation plot for the effect size of interest.

Parameters:
  • color_col (string, default None) – Column to be used for colors.
  • raw_marker_size (float, default 6) – The diameter (in points) of the marker dots plotted in the swarmplot.
  • es_marker_size (float, default 9) – The size (in points) of the effect size points on the difference axes.
  • contrast_label (swarm_label,) – Set labels for the y-axis of the swarmplot and the contrast plot, respectively. If swarm_label is not specified, it defaults to “value”, unless a column name was passed to y. If contrast_label is not specified, it defaults to the effect size being plotted.
  • contrast_ylim (swarm_ylim,) – The desired y-limits of the raw data (swarmplot) axes and the difference axes respectively, as a tuple. These will be autoscaled to sensible values if they are not specified.
  • custom_palette (dict, list, or matplotlib color palette, default None) – This keyword accepts a dictionary with {‘group’:’color’} pairings, a list of RGB colors, or a specified matplotlib palette. This palette will be used to color the swarmplot. If color_col is not specified, then each group will be colored in sequence according to the default palette currently used by matplotlib. Please take a look at the seaborn commands color_palette and cubehelix_palette to generate a custom palette. Both these functions generate a list of RGB colors. See: https://seaborn.pydata.org/generated/seaborn.color_palette.html https://seaborn.pydata.org/generated/seaborn.cubehelix_palette.html The named colors of matplotlib can be found here: https://matplotlib.org/examples/color/named_colors.html
  • swarm_desat (float, default 1) – Decreases the saturation of the colors in the swarmplot by the desired proportion. Uses seaborn.desaturate() to acheive this.
  • halfviolin_desat (float, default 0.5) – Decreases the saturation of the colors of the half-violin bootstrap curves by the desired proportion. Uses seaborn.desaturate() to acheive this.
  • halfviolin_alpha (float, default 0.8) – The alpha (transparency) level of the half-violin bootstrap curves.
  • float_contrast (boolean, default True) – Whether or not to display the halfviolin bootstrapped difference distribution alongside the raw data.
  • show_pairs (boolean, default True) – If the data is paired, whether or not to show the raw data as a swarmplot, or as slopegraph, with a line joining each pair of observations.
  • group_summaries (['mean_sd', 'median_quartiles', 'None'], default None.) – Plots the summary statistics for each group. If ‘mean_sd’, then the mean and standard deviation of each group is plotted as a notched line beside each group. If ‘median_quantiles’, then the median and 25th and 75th percentiles of each group is plotted instead. If ‘None’, the summaries are not shown.
  • group_summaries_offset (float, default 0.1) – If group summaries are displayed, they will be offset from the raw data swarmplot groups by this value.
  • fig_size (tuple, default None) – The desired dimensions of the figure as a (length, width) tuple.
  • dpi (int, default 100) – The dots per inch of the resulting figure.
  • swarmplot_kwargs (dict, default None) – Pass any keyword arguments accepted by the seaborn swarmplot command here, as a dict. If None, the following keywords are passed to sns.swarmplot : {‘size’:raw_marker_size}.
  • violinplot_kwargs (dict, default None) – Pass any keyword arguments accepted by the matplotlib ` pyplot.violinplot` command here, as a dict. If None, the following keywords are passed to violinplot : {‘widths’:0.5, ‘vert’:True, ‘showextrema’:False, ‘showmedians’:False}.
  • reflines_kwargs (dict, default None) – This will change the appearance of the zero reference lines. Pass any keyword arguments accepted by the matplotlib Axes hlines command here, as a dict. If None, the following keywords are passed to Axes.hlines : {‘linestyle’:’solid’, ‘linewidth’:0.75, ‘zorder’:2, ‘color’ : default y-tick color}.
  • group_summary_kwargs (dict, default None) – Pass any keyword arguments accepted by the matplotlib.lines.Line2D command here, as a dict. This will change the appearance of the vertical summary lines for each group, if group_summaries is not ‘None’. If None, the following keywords are passed to matplotlib.lines.Line2D : {‘lw’:2, ‘alpha’:1, ‘zorder’:3}.
  • legend_kwargs (dict, default None) – Pass any keyword arguments accepted by the matplotlib Axes legend command here, as a dict. If None, the following keywords are passed to matplotlib.Axes.legend : {‘loc’:’upper left’, ‘frameon’:False}.
Returns:

  • A matplotlib.figure.Figure with 2 Axes.
  • The first axes (accessible with FigName.axes[0]) contains the rawdata swarmplot; the second axes (accessible with FigName.axes[1]) has the bootstrap distributions and effect sizes (with confidence intervals) plotted on it.

Examples

Create a Gardner-Altman estimation plot for the mean difference.

>>> my_data = dabest.load(df, idx=("Control 1", "Test 1"))
>>> fig1 = my_data.mean_diff.plot()

Create a Gardner-Altman plot for the Hedges’ g effect size.

>>> fig2 = my_data.hedges_g.plot()

Create a Cumming estimation plot for the mean difference.

>>> fig3 = my_data.mean_diff.plot(float_contrast=True)

Create a paired Gardner-Altman plot.

>>> my_data_paired = dabest.load(df, idx=("Control 1", "Test 1"),
...                              paired=True)
>>> fig4 = my_data_paired.mean_diff.plot()

Create a multi-group Cumming plot.

>>> my_multi_groups = dabest.load(df, idx=(("Control 1", "Test 1"),
...                                        ("Control 2", "Test 2"))
...                               )
>>> fig5 = my_multi_groups.mean_diff.plot()

Create a shared control Cumming plot.

>>> my_shared_control = dabest.load(df, idx=("Control 1", "Test 1",
...                                          "Test 2", "Test 3")
...                                 )
>>> fig6 = my_shared_control.mean_diff.plot()

Computing Effect Sizes

class dabest._classes.TwoGroupsEffectSize(control, test, effect_size, is_paired=False, ci=95, resamples=5000, random_seed=12345)

Compute the effect size between two groups.

Parameters:
  • control (array-like) –
  • test (array-like) – These should be numerical iterables.
  • effect_size (string.) – Any one of the following are accepted inputs: ‘mean_diff’, ‘median_diff’, ‘cohens_d’, ‘hedges_g’, or ‘cliffs_delta’
  • is_paired (boolean, default False) –
  • resamples (int, default 5000) – The number of bootstrap resamples to be taken.
  • ci (float, default 95) – The confidence interval width. The default of 95 produces 95% confidence intervals.
  • random_seed (int, default 12345) – random_seed is used to seed the random number generator during bootstrap resampling. This ensures that the confidence intervals reported are replicable.
Returns:

  • A TwoGroupEffectSize object.
  • difference (float) – The effect size of the difference between the control and the test.
  • effect_size (string) – The type of effect size reported.
  • is_paired (boolean) – Whether or not the difference is paired (ie. repeated measures).
  • ci (float) – Returns the width of the confidence interval, in percent.
  • alpha (float) – Returns the significance level of the statistical test as a float between 0 and 1.
  • resamples (int) – The number of resamples performed during the bootstrap procedure.
  • bootstraps (nmupy ndarray) – The generated bootstraps of the effect size.
  • random_seed (int) – The number used to initialise the numpy random seed generator, ie. seed_value from numpy.random.seed(seed_value) is returned.
  • bca_low, bca_high (float) – The bias-corrected and accelerated confidence interval lower limit and upper limits, respectively.
  • pct_low, pct_high (float) – The percentile confidence interval lower limit and upper limits, respectively.

Examples

>>> import numpy as np
>>> import scipy as sp
>>> import dabest
>>> np.random.seed(12345)
>>> control = sp.stats.norm.rvs(loc=0, size=30)
>>> test = sp.stats.norm.rvs(loc=0.5, size=30)
>>> effsize = dabest.TwoGroupsEffectSize(control, test, "mean_diff")
>>> effsize
The unpaired mean difference is -0.253 [95%CI -0.782, 0.241]
5000 bootstrap samples. The confidence interval is bias-corrected
and accelerated.
>>> effsize.to_dict()
{'alpha': 0.05,
 'bca_high': 0.2413346581369784,
 'bca_interval_idx': (109, 4858),
 'bca_low': -0.7818088458343655,
 'bootstraps': array([-1.09875628, -1.08840014, -1.08258695, ...,  0.66675324,
         0.75814087,  0.80848265]),
 'ci': 95,
 'difference': -0.25315417702752846,
 'effect_size': 'mean difference',
 'is_paired': False,
 'pct_high': 0.25135646125431527,
 'pct_interval_idx': (125, 4875),
 'pct_low': -0.763588353717278,
 'pvalue_brunner_munzel': nan,
 'pvalue_kruskal': nan,
 'pvalue_mann_whitney': 0.2600723060808019,
 'pvalue_paired_students_t': nan,
 'pvalue_students_t': 0.34743913903372836,
 'pvalue_welch': 0.3474493875548965,
 'pvalue_wilcoxon': nan,
 'random_seed': 12345,
 'resamples': 5000,
 'statistic_brunner_munzel': nan,
 'statistic_kruskal': nan,
 'statistic_mann_whitney': 406.0,
 'statistic_paired_students_t': nan,
 'statistic_students_t': 0.9472545159069105,
 'statistic_welch': 0.9472545159069105,
 'statistic_wilcoxon': nan}