plot_tools
dabest
.
sankeydiag
sankeydiag (data:pandas.core.frame.DataFrame, xvar:str, yvar:str, left_idx:str, right_idx:str, left_labels:list=None, right_labels:list=None, palette:Union[str,dict]=None, ax=None, flow:bool=True, sankey:bool=True, one_sankey:bool=False, width:float=0.4, right_color:bool=False, align:str='center', alpha:float=0.65, **kwargs)
Read in melted pd.DataFrame, and draw multiple sankey diagram on a single axes using the value in column yvar according to the value in column xvar left_idx in the column xvar is on the left side of each sankey diagram right_idx in the column xvar is on the right side of each sankey diagram
Type | Default | Details | |
---|---|---|---|
data | pd.DataFrame | ||
xvar | str | x column to be plotted. | |
yvar | str | y column to be plotted. | |
left_idx | str | the value in column xvar that is on the left side of each sankey diagram | |
right_idx | str | the value in column xvar that is on the right side of each sankey diagram, if len(left_idx) == 1, it will be broadcasted to the same length as right_idx, otherwise it should have the same length as right_idx | |
left_labels | list | None | labels for the left side of the diagram. The diagram will be sorted by these labels. |
right_labels | list | None | labels for the right side of the diagram. The diagram will be sorted by these labels. |
palette | str | dict | None | |
ax | NoneType | None | matplotlib axes to be drawn on |
flow | bool | True | if True, draw the sankey in a flow, else draw 1 vs 1 Sankey diagram for each group comparison |
sankey | bool | True | if True, draw the sankey diagram, else draw barplot |
one_sankey | bool | False | determined by the driver function on plotter.py, if True, draw the sankey diagram across the whole raw data axes |
width | float | 0.4 | the width of each sankey diagram |
right_color | bool | False | if True, each strip of the diagram will be colored according to the corresponding left labels |
align | str | center | the alignment of each sankey diagram, can be ‘center’ or ‘left’ |
alpha | float | 0.65 | the transparency of each strip |
kwargs |
single_sankey
single_sankey (left:<built-infunctionarray>, right:<built- infunctionarray>, xpos:float=0, left_weight:<built- infunctionarray>=None, right_weight:<built- infunctionarray>=None, colorDict:dict=None, left_labels:list=None, right_labels:list=None, ax=None, flow:bool=True, sankey:bool=True, width=0.5, alpha=0.65, bar_width=0.2, error_bar_on:bool=True, strip_on:bool=True, one_sankey:bool=False, right_color:bool=False, align:bool='center')
Make a single Sankey diagram showing proportion flow from left to right Original code from: https://github.com/anazalea/pySankey Changes are added to normalize each diagram’s height to be 1
Type | Default | Details | |
---|---|---|---|
left | np.array | data on the left of the diagram | |
right | np.array | data on the right of the diagram, len(left) == len(right) | |
xpos | float | 0 | the starting point on the x-axis |
left_weight | np.array | None | weights for the left labels, if None, all weights are 1 |
right_weight | np.array | None | weights for the right labels, if None, all weights are corresponding left_weight |
colorDict | dict | None | input format: {‘label’: ‘color’} |
left_labels | list | None | labels for the left side of the diagram. The diagram will be sorted by these labels. |
right_labels | list | None | labels for the right side of the diagram. The diagram will be sorted by these labels. |
ax | NoneType | None | matplotlib axes to be drawn on |
flow | bool | True | if True, draw the sankey in a flow, else draw 1 vs 1 Sankey diagram for each group comparison |
sankey | bool | True | if True, draw the sankey diagram, else draw barplot |
width | float | 0.5 | |
alpha | float | 0.65 | |
bar_width | float | 0.2 | |
error_bar_on | bool | True | if True, draw error bar for each group comparison |
strip_on | bool | True | if True, draw strip for each group comparison |
one_sankey | bool | False | if True, only draw one sankey diagram |
right_color | bool | False | if True, each strip of the diagram will be colored according to the corresponding left labels |
align | bool | center | if ‘center’, the diagram will be centered on each xtick, if ‘edge’, the diagram will be aligned with the left edge of each xtick |
width_determine
width_determine (labels, data, pos='left')
Calculates normalized width positions for a set of labels based on their associated data.
This function is designed to determine width positions for plotting or graphical representation. It takes into account the cumulative weight of each label in the data and adjusts their positions accordingly. The function allows for adjusting the position of labels to either the ‘left’ or ‘right’.
Parameters: labels (list): A list of labels whose width positions are to be calculated. data (DataFrame): A pandas DataFrame containing the data used for calculating width positions. The DataFrame should have columns corresponding to the ‘pos’ and ‘posWeight’. pos (str, optional): The position of labels. It can be either ‘left’ or ‘right’. Defaults to ‘left’.
Returns: defaultdict: A dictionary where each key is a label and the value is another dictionary with keys ‘bottom’, ‘top’, and ‘pos’, representing the calculated width positions.
Note: The function assumes that the data DataFrame contains columns named after the value of ‘pos’ and an additional column named ‘posWeight’ which represents the weight of each label.
normalize_dict
normalize_dict (nested_dict, target)
Normalizes the values in a nested dictionary based on a target dictionary.
This function iterates through a nested dictionary, calculates the sum of values for each key across all sub-dictionaries, and then normalizes these values according to a target dictionary. The normalization is performed such that the values in each sub-dictionary are proportionally scaled to match the corresponding ‘right’ values in the target dictionary.
Parameters: nested_dict (dict of dict): A nested dictionary where each key maps to another dictionary. The values in these inner dictionaries are subject to normalization. target (dict): A dictionary with the target values for normalization. Each key in nested_dict should have a corresponding key in target, and each target[key] should be a dictionary with a ‘right’ key containing the target normalization value.
Returns: dict: The normalized nested dictionary. The original nested_dict is modified in place.
Note: - If the sum of values for a particular key in nested_dict is zero, the normalized value is set to 0. - If a key in a sub-dictionary of nested_dict does not exist in the target dictionary, the corresponding ‘right’ value from the target dictionary is directly assigned. - The function modifies the input nested_dict in place and also returns it.
check_data_matches_labels
check_data_matches_labels (labels, data, side:str)
Function to check that the labels and data match in the sankey diagram. And enforce labels and data to be lists. Raises an exception if the labels and data do not match.
Type | Details | |
---|---|---|
labels | list of input labels | |
data | Pandas Series of input data | |
side | str | ‘left’ or ‘right’ on the sankey diagram |
error_bar
error_bar (data:pandas.core.frame.DataFrame, x:str, y:str, type:str='mean_sd', offset:float=0.2, ax=None, line_color='black', gap_width_percent=1, pos:list=[0, 1], method:str='gapped_lines', **kwargs:dict)
Function to plot the standard deviations as vertical errorbars. The mean is a gap defined by negative space.
This function combines the functionality of gapped_lines(), proportional_error_bar(), and sankey_error_bar().
Type | Default | Details | |
---|---|---|---|
data | pd.DataFrame | This DataFrame should be in ‘long’ format. | |
x | str | x column to be plotted. | |
y | str | y column to be plotted. | |
type | str | mean_sd | Choose from [‘mean_sd’, ‘median_quartiles’]. Plots the summary statistics for each group. If ‘mean_sd’, then the mean and standard deviation of each group is plotted as a gapped line. If ‘median_quantiles’, then the median and 25th and 75th percentiles of each group is plotted instead. |
offset | float | 0.2 | Give a single float (that will be used as the x-offset of all gapped lines), or an iterable containing the list of x-offsets. |
ax | NoneType | None | If a matplotlib Axes object is specified, the gapped lines will be plotted in order on this axes. If None, the current axes (plt.gca()) is used. |
line_color | str | black | The color of the gapped lines. |
gap_width_percent | int | 1 | The width of the gap in the gapped lines, as a percentage of the y-axis span. |
pos | list | [0, 1] | |
method | str | gapped_lines | The method to use for drawing the error bars. Options are: ‘gapped_lines’, ‘proportional_error_bar’, and ‘sankey_error_bar’. |
kwargs | dict |
get_swarm_spans
get_swarm_spans (coll)
Given a matplotlib Collection, will obtain the x and y spans for the collection. Will return None if this fails.
halfviolin
halfviolin (v, half='right', fill_color='k', alpha=1, line_color='k', line_width=0)
SwarmPlot
SwarmPlot (data:pd.DataFrame, x:str, y:str, ax:axes.Subplot, order:List=None, hue:str=None, palette:Union[Iterable,str]='black', zorder:float=1, size:float=5, side:str='center', jitter:float=1)
Initialize a SwarmPlot instance.
Type | Default | Details | |
---|---|---|---|
data | pd.DataFrame | The input data as a pandas DataFrame. | |
x | str | The column in the DataFrame to be used as the x-axis. | |
y | str | The column in the DataFrame to be used as the y-axis. | |
ax | axes.Subplot | Matplotlib AxesSubplot object for which the plot would be drawn on. | |
order | List | None | The order in which x-axis categories should be displayed. Default is None. |
hue | str | None | The column in the DataFrame that determines the grouping for color. If None (by default), it assumes that it is being grouped by x. |
palette | Union[Iterable, str] | black | The color palette to be used for plotting. Default is “black”. |
zorder | float | 1 | The z-order for drawing the swarm plot wrt other matplotlib drawings. Default is 1. |
size | float | 5 | |
side | str | center | The side on which points are swarmed (“center”, “left”, or “right”). Default is “center”. |
jitter | float | 1 | Determines the distance between points. Default is 1. |
Returns | None |
swarmplot
swarmplot (data:pandas.core.frame.DataFrame, x:str, y:str, ax:matplotlib.axes._subplots.AxesSubplot, order:List=None, hue:str=None, palette:Union[Iterable,str]='black', zorder:float=1, size:float=5, side:str='center', jitter:float=1, is_drop_gutter:bool=True, gutter_limit:float=0.5, **kwargs)
API to plot a swarm plot.
Type | Default | Details | |
---|---|---|---|
data | pd.DataFrame | The input data as a pandas DataFrame. | |
x | str | The column in the DataFrame to be used as the x-axis. | |
y | str | The column in the DataFrame to be used as the y-axis. | |
ax | axes.Subplot | Matplotlib AxesSubplot object for which the plot would be drawn on. Default is None. | |
order | List | None | The order in which x-axis categories should be displayed. Default is None. |
hue | str | None | The column in the DataFrame that determines the grouping for color. If None (by default), it assumes that it is being grouped by x. |
palette | Union[Iterable, str] | black | The color palette to be used for plotting. Default is “black”. |
zorder | float | 1 | The z-order for drawing the swarm plot wrt other matplotlib drawings. Default is 1. |
size | float | 5 | |
side | str | center | The side on which points are swarmed (“center”, “left”, or “right”). Default is “center”. |
jitter | float | 1 | Determines the distance between points. Default is 1. |
is_drop_gutter | bool | True | If True, drop points that hit the gutters; otherwise, readjust them. |
gutter_limit | float | 0.5 | The limit for points hitting the gutters. |
kwargs | |||
Returns | axes._subplots.Subplot | axes._axes.Axes | Matplotlib AxesSubplot object for which the swarm plot has been drawn on. |