plot_tools

A set of convenience functions used for producing plots in dabest.

source

sankeydiag

 sankeydiag (data:pandas.core.frame.DataFrame, xvar:str, yvar:str,
             left_idx:str, right_idx:str, left_labels:list=None,
             right_labels:list=None, palette:Union[str,dict]=None,
             ax=None, flow:bool=True, sankey:bool=True,
             one_sankey:bool=False, width:float=0.4,
             right_color:bool=False, align:str='center', alpha:float=0.65,
             **kwargs)

Read in melted pd.DataFrame, and draw multiple sankey diagram on a single axes using the value in column yvar according to the value in column xvar left_idx in the column xvar is on the left side of each sankey diagram right_idx in the column xvar is on the right side of each sankey diagram

Type Default Details
data pd.DataFrame
xvar str x column to be plotted.
yvar str y column to be plotted.
left_idx str the value in column xvar that is on the left side of each sankey diagram
right_idx str the value in column xvar that is on the right side of each sankey diagram, if len(left_idx) == 1, it will be broadcasted to the same length as right_idx, otherwise it should have the same length as right_idx
left_labels list None labels for the left side of the diagram. The diagram will be sorted by these labels.
right_labels list None labels for the right side of the diagram. The diagram will be sorted by these labels.
palette str | dict None
ax NoneType None matplotlib axes to be drawn on
flow bool True if True, draw the sankey in a flow, else draw 1 vs 1 Sankey diagram for each group comparison
sankey bool True if True, draw the sankey diagram, else draw barplot
one_sankey bool False determined by the driver function on plotter.py, if True, draw the sankey diagram across the whole raw data axes
width float 0.4 the width of each sankey diagram
right_color bool False if True, each strip of the diagram will be colored according to the corresponding left labels
align str center the alignment of each sankey diagram, can be ‘center’ or ‘left’
alpha float 0.65 the transparency of each strip
kwargs

source

single_sankey

 single_sankey (left:<built-infunctionarray>, right:<built-
                infunctionarray>, xpos:float=0, left_weight:<built-
                infunctionarray>=None, right_weight:<built-
                infunctionarray>=None, colorDict:dict=None,
                left_labels:list=None, right_labels:list=None, ax=None,
                flow:bool=True, sankey:bool=True, width=0.5, alpha=0.65,
                bar_width=0.2, error_bar_on:bool=True, strip_on:bool=True,
                one_sankey:bool=False, right_color:bool=False,
                align:bool='center')

Make a single Sankey diagram showing proportion flow from left to right Original code from: https://github.com/anazalea/pySankey Changes are added to normalize each diagram’s height to be 1

Type Default Details
left np.array data on the left of the diagram
right np.array data on the right of the diagram, len(left) == len(right)
xpos float 0 the starting point on the x-axis
left_weight np.array None weights for the left labels, if None, all weights are 1
right_weight np.array None weights for the right labels, if None, all weights are corresponding left_weight
colorDict dict None input format: {‘label’: ‘color’}
left_labels list None labels for the left side of the diagram. The diagram will be sorted by these labels.
right_labels list None labels for the right side of the diagram. The diagram will be sorted by these labels.
ax NoneType None matplotlib axes to be drawn on
flow bool True if True, draw the sankey in a flow, else draw 1 vs 1 Sankey diagram for each group comparison
sankey bool True if True, draw the sankey diagram, else draw barplot
width float 0.5
alpha float 0.65
bar_width float 0.2
error_bar_on bool True if True, draw error bar for each group comparison
strip_on bool True if True, draw strip for each group comparison
one_sankey bool False if True, only draw one sankey diagram
right_color bool False if True, each strip of the diagram will be colored according to the corresponding left labels
align bool center if ‘center’, the diagram will be centered on each xtick, if ‘edge’, the diagram will be aligned with the left edge of each xtick

source

width_determine

 width_determine (labels, data, pos='left')

Calculates normalized width positions for a set of labels based on their associated data.

This function is designed to determine width positions for plotting or graphical representation. It takes into account the cumulative weight of each label in the data and adjusts their positions accordingly. The function allows for adjusting the position of labels to either the ‘left’ or ‘right’.

Parameters: labels (list): A list of labels whose width positions are to be calculated. data (DataFrame): A pandas DataFrame containing the data used for calculating width positions. The DataFrame should have columns corresponding to the ‘pos’ and ‘posWeight’. pos (str, optional): The position of labels. It can be either ‘left’ or ‘right’. Defaults to ‘left’.

Returns: defaultdict: A dictionary where each key is a label and the value is another dictionary with keys ‘bottom’, ‘top’, and ‘pos’, representing the calculated width positions.

Note: The function assumes that the data DataFrame contains columns named after the value of ‘pos’ and an additional column named ‘posWeight’ which represents the weight of each label.


source

normalize_dict

 normalize_dict (nested_dict, target)

Normalizes the values in a nested dictionary based on a target dictionary.

This function iterates through a nested dictionary, calculates the sum of values for each key across all sub-dictionaries, and then normalizes these values according to a target dictionary. The normalization is performed such that the values in each sub-dictionary are proportionally scaled to match the corresponding ‘right’ values in the target dictionary.

Parameters: nested_dict (dict of dict): A nested dictionary where each key maps to another dictionary. The values in these inner dictionaries are subject to normalization. target (dict): A dictionary with the target values for normalization. Each key in nested_dict should have a corresponding key in target, and each target[key] should be a dictionary with a ‘right’ key containing the target normalization value.

Returns: dict: The normalized nested dictionary. The original nested_dict is modified in place.

Note: - If the sum of values for a particular key in nested_dict is zero, the normalized value is set to 0. - If a key in a sub-dictionary of nested_dict does not exist in the target dictionary, the corresponding ‘right’ value from the target dictionary is directly assigned. - The function modifies the input nested_dict in place and also returns it.


source

check_data_matches_labels

 check_data_matches_labels (labels, data, side:str)

Function to check that the labels and data match in the sankey diagram. And enforce labels and data to be lists. Raises an exception if the labels and data do not match.

Type Details
labels list of input labels
data Pandas Series of input data
side str ‘left’ or ‘right’ on the sankey diagram

source

error_bar

 error_bar (data:pandas.core.frame.DataFrame, x:str, y:str,
            type:str='mean_sd', offset:float=0.2, ax=None,
            line_color='black', gap_width_percent=1, pos:list=[0, 1],
            method:str='gapped_lines', **kwargs:dict)

Function to plot the standard deviations as vertical errorbars. The mean is a gap defined by negative space.

This function combines the functionality of gapped_lines(), proportional_error_bar(), and sankey_error_bar().

Type Default Details
data pd.DataFrame This DataFrame should be in ‘long’ format.
x str x column to be plotted.
y str y column to be plotted.
type str mean_sd Choose from [‘mean_sd’, ‘median_quartiles’]. Plots the summary statistics for each group. If ‘mean_sd’, then the mean and standard deviation of each group is plotted as a gapped line. If ‘median_quantiles’, then the median and 25th and 75th percentiles of each group is plotted instead.
offset float 0.2 Give a single float (that will be used as the x-offset of all gapped lines), or an iterable containing the list of x-offsets.
ax NoneType None If a matplotlib Axes object is specified, the gapped lines will be plotted in order on this axes. If None, the current axes (plt.gca()) is used.
line_color str black The color of the gapped lines.
gap_width_percent int 1 The width of the gap in the gapped lines, as a percentage of the y-axis span.
pos list [0, 1]
method str gapped_lines The method to use for drawing the error bars. Options are: ‘gapped_lines’, ‘proportional_error_bar’, and ‘sankey_error_bar’.
kwargs dict

source

get_swarm_spans

 get_swarm_spans (coll)

Given a matplotlib Collection, will obtain the x and y spans for the collection. Will return None if this fails.


source

halfviolin

 halfviolin (v, half='right', fill_color='k', alpha=1, line_color='k',
             line_width=0)

source

SwarmPlot

 SwarmPlot (data:pd.DataFrame, x:str, y:str, ax:axes.Subplot,
            order:List=None, hue:str=None,
            palette:Union[Iterable,str]='black', zorder:float=1,
            size:float=5, side:str='center', jitter:float=1)

Initialize a SwarmPlot instance.

Type Default Details
data pd.DataFrame The input data as a pandas DataFrame.
x str The column in the DataFrame to be used as the x-axis.
y str The column in the DataFrame to be used as the y-axis.
ax axes.Subplot Matplotlib AxesSubplot object for which the plot would be drawn on.
order List None The order in which x-axis categories should be displayed. Default is None.
hue str None The column in the DataFrame that determines the grouping for color.
If None (by default), it assumes that it is being grouped by x.
palette Union[Iterable, str] black The color palette to be used for plotting. Default is “black”.
zorder float 1 The z-order for drawing the swarm plot wrt other matplotlib drawings. Default is 1.
size float 5
side str center The side on which points are swarmed (“center”, “left”, or “right”). Default is “center”.
jitter float 1 Determines the distance between points. Default is 1.
Returns None

source

swarmplot

 swarmplot (data:pandas.core.frame.DataFrame, x:str, y:str,
            ax:matplotlib.axes._subplots.AxesSubplot, order:List=None,
            hue:str=None, palette:Union[Iterable,str]='black',
            zorder:float=1, size:float=5, side:str='center',
            jitter:float=1, is_drop_gutter:bool=True,
            gutter_limit:float=0.5, **kwargs)

API to plot a swarm plot.

Type Default Details
data pd.DataFrame The input data as a pandas DataFrame.
x str The column in the DataFrame to be used as the x-axis.
y str The column in the DataFrame to be used as the y-axis.
ax axes.Subplot Matplotlib AxesSubplot object for which the plot would be drawn on. Default is None.
order List None The order in which x-axis categories should be displayed. Default is None.
hue str None The column in the DataFrame that determines the grouping for color.
If None (by default), it assumes that it is being grouped by x.
palette Union[Iterable, str] black The color palette to be used for plotting. Default is “black”.
zorder float 1 The z-order for drawing the swarm plot wrt other matplotlib drawings. Default is 1.
size float 5
side str center The side on which points are swarmed (“center”, “left”, or “right”). Default is “center”.
jitter float 1 Determines the distance between points. Default is 1.
is_drop_gutter bool True If True, drop points that hit the gutters; otherwise, readjust them.
gutter_limit float 0.5 The limit for points hitting the gutters.
kwargs
Returns axes._subplots.Subplot | axes._axes.Axes Matplotlib AxesSubplot object for which the swarm plot has been drawn on.