import numpy as np
import pandas as pd
import scipy as sp
import dabest
Loading Data
Loading data and relevant groups
load
load (data, idx=None, x=None, y=None, paired=None, id_col=None, ci=95, resamples=5000, random_seed=12345, proportional=False, delta2=False, experiment=None, experiment_label=None, x1_level=None, mini_meta=False)
Loads data in preparation for estimation statistics.
This is designed to work with pandas DataFrames.
Type | Default | Details | |
---|---|---|---|
data | pandas DataFrame | ||
idx | NoneType | None | List of column names (if ‘x’ is not supplied) or of category names (if ‘x’ is supplied). This can be expressed as a tuple of tuples, with each individual tuple producing its own contrast plot |
x | NoneType | None | Column name(s) of the independent variable. This can be expressed as a list of 2 elements if and only if ‘delta2’ is True; otherwise it can only be a string. |
y | NoneType | None | Column names for data to be plotted on the x-axis and y-axis. |
paired | NoneType | None | The type of the experiment under which the data are obtained. If ‘paired’ is None then the data will not be treated as paired data in the subsequent calculations. If ‘paired’ is ‘baseline’, then in each tuple of x, other groups will be paired up with the first group (as control). If ‘paired’ is ‘sequential’, then in each tuple of x, each group will be paired up with its previous group (as control). |
id_col | NoneType | None | Required if paired is True. |
ci | int | 95 | The confidence interval width. The default of 95 produces 95% confidence intervals. |
resamples | int | 5000 | The number of resamples taken to generate the bootstraps which are used to generate the confidence intervals. |
random_seed | int | 12345 | This integer is used to seed the random number generator during bootstrap resampling, ensuring that the confidence intervals reported are replicable. |
proportional | bool | False | An indicator of whether the data is binary or not. When set to True, it specifies that the data consists of binary data, where the values are limited to 0 and 1. The code is not suitable for analyzing proportion data that contains non-numeric values, such as strings like ‘yes’ and ‘no’. When False or not provided, the algorithm assumes that the data is continuous and uses a non-proportional representation. |
delta2 | bool | False | Indicator of delta-delta experiment |
experiment | NoneType | None | The name of the column of the dataframe which contains the label of experiments |
experiment_label | NoneType | None | |
x1_level | NoneType | None | A list of String to specify the order of subplots for delta-delta plots. This can be expressed as a list of 2 elements if and only if ‘delta2’ is True; otherwise it can only be a string. |
mini_meta | bool | False | Indicator of weighted delta calculation. |
Returns | A Dabest object. |
prop_dataset
prop_dataset (group:Union[list,tuple,numpy.ndarray,dict], group_names:Optional[list]=None)
Convenient function to generate a dataframe of binary data.
Type | Default | Details | |
---|---|---|---|
group | Union[list, tuple, np.ndarray, dict] | ||
group_names | Optional[list] | None | Accepts lists, tuples, or numpy ndarrays of numeric types. |
Example
Create dummy data for demonstration.
88888)
np.random.seed(= 10
N = sp.stats.norm.rvs(loc=100, scale=5, size=N)
c1 = sp.stats.norm.rvs(loc=115, scale=5, size=N)
t1 = pd.DataFrame({"Control 1": c1, "Test 1": t1}) df
Load the data.
= dabest.load(df, idx=("Control 1", "Test 1"))
my_data my_data
DABEST v2024.03.29
==================
Good afternoon!
The current time is Tue Mar 19 15:34:58 2024.
Effect size(s) with 95% confidence intervals will be computed for:
1. Test 1 minus Control 1
5000 resamples will be used to generate the effect size bootstraps.
For proportion plot.
88888)
np.random.seed(= 10
N = np.random.binomial(1, 0.2, size=N)
c1 = np.random.binomial(1, 0.5, size=N)
t1 = pd.DataFrame({"Control 1": c1, "Test 1": t1})
df = dabest.load(df, idx=("Control 1", "Test 1"), proportional=True) my_data