effsize
two_group_difference
def two_group_difference(
control:list | tuple | np.ndarray, # Accepts lists, tuples, or numpy ndarrays of numeric types.
test:list | tuple | np.ndarray, # Accepts lists, tuples, or numpy ndarrays of numeric types.
is_paired:NoneType=None, # If not None, returns the paired Cohen's d
effect_size:str='mean_diff', # Any one of the following effect sizes: ["mean_diff", "median_diff", "cohens_d", "hedges_g", "cliffs_delta"]
)->float: # The desired effect size.
Computes the following metrics for control and test:
- Unstandardized mean difference
- Standardized mean differences (paired or unpaired)
* Cohen's d
* Hedges' g
- Median difference
- Cliff's Delta
- Cohen's h (distance between two proportions)
See the Wikipedia entry here
effect_size:
mean_diff: This is simply the mean of `control` subtracted from
the mean of `test`.
cohens_d: This is the mean of control subtracted from the
mean of test, divided by the pooled standard deviation
of control and test. The pooled SD is the square as:
(n1 - 1) * var(control) + (n2 - 1) * var(test)
sqrt ( ------------------------------------------- )
(n1 + n2 - 2)
where n1 and n2 are the sizes of control and test
respectively.
hedges_g: This is Cohen's d corrected for bias via multiplication
with the following correction factor:
gamma(n/2)
J(n) = ------------------------------
sqrt(n/2) * gamma((n - 1) / 2)
where n = (n1 + n2 - 2).
median_diff: This is the median of `control` subtracted from the
median of `test`.
func_difference
def func_difference(
control:list | tuple | np.ndarray, # NaNs are automatically discarded.
test:list | tuple | np.ndarray, # NaNs are automatically discarded.
func, # Summary function to apply.
is_paired:str, # If not None, computes func(test - control). If None, computes func(test) - func(control).
)->float:
Applies func to control and test, and then returns the difference.
cohens_d
def cohens_d(
control:list | tuple | np.ndarray, test:list | tuple | np.ndarray,
is_paired:str=None, # If not None, the paired Cohen's d is returned.
)->float:
Computes Cohen’s d for test v.s. control. See here
If is_paired is None, returns:
\[ \frac{\bar{X}_2 - \bar{X}_1}{s_{pooled}} \]
where
\[ s_{pooled} = \sqrt{\frac{(n_1 - 1) s_1^2 + (n_2 - 1) s_2^2}{n_1 + n_2 - 2}} \]
If is_paired is not None, returns:
\[ \frac{\bar{X}_2 - \bar{X}_1}{s_{avg}} \]
where
\[ s_{avg} = \sqrt{\frac{s_1^2 + s_2^2}{2}} \]
Notes:
- The sample variance (and standard deviation) uses N-1 degrees of freedoms. This is an application of Bessel’s correction, and yields the unbiased sample variance.
References:
- https://en.wikipedia.org/wiki/Bessel%27s_correction
- https://en.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation
cohens_h
def cohens_h(
control:list | tuple | np.ndarray, test:list | tuple | np.ndarray
)->float:
Computes Cohen’s h for test v.s. control. See here for reference.
Notes:
- Assuming the input data type is binary, i.e. a series of 0s and 1s, and a dict for mapping the 0s and 1s to the actual labels, e.g.{1: “Smoker”, 0: “Non-smoker”}
hedges_g
def hedges_g(
control:list | tuple | np.ndarray, test:list | tuple | np.ndarray, is_paired:str=None
)->float:
Computes Hedges’ g for for test v.s. control. It first computes Cohen’s d, then calulates a correction factor based on the total degress of freedom using the gamma function.
See here
cliffs_delta
def cliffs_delta(
control:list | tuple | np.ndarray, test:list | tuple | np.ndarray
)->float:
Computes Cliff’s delta for 2 samples. See here
weighted_delta
def weighted_delta(
difference, bootstrap_dist_var
):
Compute the weighted deltas where the weight is the inverse of the pooled group difference.