effsize

A range of functions to compute various effect sizes.

source

two_group_difference


def two_group_difference(
    control:list | tuple | np.ndarray, # Accepts lists, tuples, or numpy ndarrays of numeric types.
    test:list | tuple | np.ndarray, # Accepts lists, tuples, or numpy ndarrays of numeric types.
    is_paired:NoneType=None, # If not None, returns the paired Cohen's d
    effect_size:str='mean_diff', # Any one of the following effect sizes: ["mean_diff", "median_diff", "cohens_d", "hedges_g", "cliffs_delta"]
)->float: # The desired effect size.

Computes the following metrics for control and test:

- Unstandardized mean difference
- Standardized mean differences (paired or unpaired)
    * Cohen's d
    * Hedges' g
- Median difference
- Cliff's Delta
- Cohen's h (distance between two proportions)

See the Wikipedia entry here

effect_size:

mean_diff:      This is simply the mean of `control` subtracted from
                the mean of `test`.

cohens_d:       This is the mean of control subtracted from the
                mean of test, divided by the pooled standard deviation
                of control and test. The pooled SD is the square as:

                       (n1 - 1) * var(control) + (n2 - 1) * var(test)
                sqrt (   -------------------------------------------  )
                                         (n1 + n2 - 2)

                where n1 and n2 are the sizes of control and test
                respectively.

hedges_g:       This is Cohen's d corrected for bias via multiplication
                 with the following correction factor:

                                gamma(n/2)
                J(n) = ------------------------------
                       sqrt(n/2) * gamma((n - 1) / 2)

                where n = (n1 + n2 - 2).

median_diff:    This is the median of `control` subtracted from the
                median of `test`.

source

func_difference


def func_difference(
    control:list | tuple | np.ndarray, # NaNs are automatically discarded.
    test:list | tuple | np.ndarray, # NaNs are automatically discarded.
    func, # Summary function to apply.
    is_paired:str, # If not None, computes func(test - control). If None, computes func(test) - func(control).
)->float:

Applies func to control and test, and then returns the difference.


source

cohens_d


def cohens_d(
    control:list | tuple | np.ndarray, test:list | tuple | np.ndarray,
    is_paired:str=None, # If not None, the paired Cohen's d is returned.
)->float:

Computes Cohen’s d for test v.s. control. See here

If is_paired is None, returns:

\[ \frac{\bar{X}_2 - \bar{X}_1}{s_{pooled}} \]

where

\[ s_{pooled} = \sqrt{\frac{(n_1 - 1) s_1^2 + (n_2 - 1) s_2^2}{n_1 + n_2 - 2}} \]

If is_paired is not None, returns:

\[ \frac{\bar{X}_2 - \bar{X}_1}{s_{avg}} \]

where

\[ s_{avg} = \sqrt{\frac{s_1^2 + s_2^2}{2}} \]

Notes:

  • The sample variance (and standard deviation) uses N-1 degrees of freedoms. This is an application of Bessel’s correction, and yields the unbiased sample variance.

References:

- https://en.wikipedia.org/wiki/Bessel%27s_correction
- https://en.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation

source

cohens_h


def cohens_h(
    control:list | tuple | np.ndarray, test:list | tuple | np.ndarray
)->float:

Computes Cohen’s h for test v.s. control. See here for reference.

Notes:

  • Assuming the input data type is binary, i.e. a series of 0s and 1s, and a dict for mapping the 0s and 1s to the actual labels, e.g.{1: “Smoker”, 0: “Non-smoker”}

source

hedges_g


def hedges_g(
    control:list | tuple | np.ndarray, test:list | tuple | np.ndarray, is_paired:str=None
)->float:

Computes Hedges’ g for for test v.s. control. It first computes Cohen’s d, then calulates a correction factor based on the total degress of freedom using the gamma function.

See here


source

cliffs_delta


def cliffs_delta(
    control:list | tuple | np.ndarray, test:list | tuple | np.ndarray
)->float:

Computes Cliff’s delta for 2 samples. See here


source

weighted_delta


def weighted_delta(
    difference, bootstrap_dist_var
):

Compute the weighted deltas where the weight is the inverse of the pooled group difference.