API reference#
stampede#
STAMPede - STAMP data Exploration and Differential Expression
- stampede.read_cosmx(slides, samples_df, adata_file, samples_df_columns=None, metadata_df_columns=None, data_dir=None, overwrite=True, verbose=True, **kwargs)#
Read exprMat_file for each slide, convert the contents to sparse anndata objects, and concatenate the results.
- Parameters:
slides (
dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective filessamples_df (
DataFrame) – a dataframe with sample metadata to be added to adata.obsadata_file (
str) – filepath to write the adata object tosamples_df_columns (
list) – list of columns in samples_df to add to adata.obs (default: all)metadata_df_columns (
list) – list of columns in the metadata file to add to adata.obs (default: all)data_dir (
str) – optional filepath prefixoverwrite (
bool) – overwrite existing outputverbose (
bool) – provide written feedback**kwargs – keyword arguments passed to pd.read_csv
- Return type:
- Returns:
the value of the adata_file argument
- stampede.validate_input(slides, samples_df, data_dir=None)#
Check the contents of the slides dictionary and samples_df for expected keys and columns, respectively.
- Parameters:
- Return type:
- Returns:
Nothing
stampede.pp#
preprocessing functions
- stampede.pp.binarize(adata, verbose=True)#
Binarize the values in adata.X
- stampede.pp.cell_qc_postfilter(adata)#
Compute metadata after filtering
- stampede.pp.combine_obs_columns(adata, columns, column_name, delim='_')#
Create a new column in adata.obs by combining all columns with the delimiter.
- stampede.pp.detection_rates(adata, column, normalize=True)#
Calculate gene detection rates per group in the specified column of adata.obs.
- stampede.pp.dim_red(adata, n_dims=50, use_genes=None, key_added='X_svd', random_state=42)#
Dimensionality reduction using Term Frequency Latent Semantic Indexing.
- Parameters:
- Return type:
- Returns:
Nothing, updates adata.obsm and adata.uns
- stampede.pp.filter_cells(adata, falsecode_max=5, negprobe_max=3, ntranscript_min=0, ntranscript_max=inf, area_min=25, area_max=100, filter_columns=None, filter_internalqc=False, verbose=True)#
Filter adata.obs by a set of qc_params.
- Parameters:
adata (
AnnData) – adata objectfalsecode_max (
int) – maximum number of false codes the cell may havenegprobe_max (
int) – maximum number of negative probes the cell may haventranscript_min (
int) – minimum number of transcripts the cell must haventranscript_max (
int) – maximum number of transcripts the cell must havearea_min (
int) – minimum area (in pixels) the cell must havearea_max (
int) – maximum area (in pixels) the cell must havefilter_columns (
list) – a list of additional columns to filter by. Columns by (convertible to) boolean, where False values are removed.filter_internalqc (
bool) – filter by columns qcCellsPassed and qcFlagsFOV.verbose (
bool) – provide written feedback
- Return type:
- Returns:
the filtered adata object
- stampede.pp.filter_edges(adata, all_edges=0, left=0, top=0, right=0, bottom=0, slide=None, verbose=True)#
Filter cells based on their distance to one or more edges of its FOV. Uses the largest distance per edge.
- Parameters:
adata – adata object
all_edges (
int) – minimum distance from any edge in pixelsleft (
int) – minimum distance from the left edge in pixels (x = xmin + left)top (
int) – minimum distance from the top edge in pixels (y = ymin + top)right (
int) – minimum distance from the right edge in pixels (x = xmax - right)bottom (
int) – minimum distance from the bottom edge in pixels (y = ymax - bottom)slide (
int) – which slide to filter (default: all)verbose (
bool) – provide written feedback
- Returns:
the filtered adata object
- stampede.pp.filter_genes(adata, ncell_min=0, ncell_max=inf, ntranscript_min=0, ntranscript_max=inf, filter_columns=None, verbose=True)#
Filter adata.var by a set of qc_params.
- Parameters:
adata (
AnnData) – adata objectncell_min (
int) – minimum number of cells the gene is found in.ncell_max (
int) – maximum number of cells the gene is found in.ntranscript_min (
int) – minimum number of transcripts the gene must have.ntranscript_max (
int) – maximum number of transcripts the gene must have.filter_columns (
str|list) – a list of additional columns to filter by. Columns by (convertible to) boolean, where False values are removed.verbose (
bool) – provide written feedback
- Return type:
- Returns:
the filtered adata object
- stampede.pp.gene_qc(adata, mult=1, noise_threshold=None, overwrite=True)#
Add QC parameters to adata.var.
- About the Signal-to-noise filter:
Approach from https://doi.org/10.1038/s41467-025-64990-y Wang et al. “Systematic benchmarking of imaging spatial transcriptomics platforms in FFPE tissues” Nat Com, 2025.
Calculate the mean expression and standard deviation of the negative control probes. Flag genes with average expression < mean + mult* x STD of ctrl probes.
*the paper used mult=2
- Parameters:
adata (
AnnData) – an adata objectnoise_threshold (
float|Iterable) – manually specify the mimimum mean_Transcript threshold. If None, use the filter specified above.mult (
int|float) – if noise_threshold is None, mult is used in the noise threshold computation specified above.overwrite (
bool) – overwrite existing qc columns
- Return type:
- Returns:
Nothing, updates adata.var
- stampede.pp.gene_qc_postfilter(adata)#
Compute metadata after filtering
- stampede.pp.knn_count_smoothing(adata, layer='binary', layer_added=None, neighbors_key='neighbors', verbose=True)#
For each cell, replace its gene vector with the average of its KNN neighborhood.
Runs sc.pp.neighbors if it has not run. See https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.neighbors.html
- Parameters:
- Return type:
- Returns:
Nothing, updates adata.layers and adata.X
- stampede.pp.pseudobulk(adata, column, layer='binary')#
Generate a pseudobulk table (genes x samples) for all samples in the sample_column and the cluster in the cluster_column, if specified.
- stampede.pp.slide_qc(adata, slides, add_cols=None, data_dir=None)#
Use the fov_positions file to create a dataframe with metadata columns per slide and fov, and store this in adata.uns[“fov_metadata”]. Additional adds columns to adata.obs reflecting the distance from the cell to the camera’s FOV edge.
- Parameters:
adata (
AnnData) – adata object generated using the slides dictslides (
dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective filesadd_cols (
Iterable|str) – additional columns to visualize (e.g. conditions)data_dir (
str) – optional filepath prefix
- Return type:
- Returns:
Nothing, updates adata.uns and adata.obs
stampede.pl#
plotting functions
- stampede.pl.avg_per_pixel(adata, column, fill_cell_area=True, normalize_cell_area=True, log1p=False, cmap='gist_rainbow', background_color='black', figsize=(20, 15), subplot_kwargs=None, plot_kwargs=None)#
Plot the average values of the given column over all FOVs. Color’s the cell’s center pixel, unless fill_cell_area is set to True (slow).
- Parameters:
adata (
AnnData) – an adata objectcolumn (
str) – a column in adata.obs with numeric valuesfill_cell_area (
bool) – distribute the column value over all pixels covered by the cell, assuming square cellsnormalize_cell_area (
bool) – if fill_cell_area is True, normalize the column value over the cell arealog1p (
bool) – normalize the final values per pixel?cmap (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – colormap (default: “gist_rainbow”)background_color (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – color for pixels with 0 values (default: “black”)figsize (
tuple) – figure sizesubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.column_distribution(adata, column, axis=None, min_quantile=0.0, max_quantile=0.95, subplot_kwargs=None, plot_kwargs=None)#
Plot the distribution of values for a column present in either adata.obs or adata.var.
- Parameters:
adata (
AnnData) – an adata object.column (
str) – a column in either adata.obs or adata.varaxis (
int) – specify if the column name is present in both obs (0) and var (1).min_quantile (
float) – lowest quantile of values to plotmax_quantile (
float) – highest quantile of values to plotsubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.correlations(adata, xcolumn, ycolumn, log1p_xcolumn=False, log1p_ycolumn=False, color_xcolumn=None, color_ycolumn=None, cmap_2d='Blues', min_quantile=0.0, max_quantile=0.99, bins_1d='auto', bins_2d='auto', stat='percent', figsize=(8, 7), subplot_kwargs=None, plot_kwargs=None)#
Plot the distributions and 2D correlation between two columns in adata.obs.
- Parameters:
adata (
AnnData) – an adata objectxcolumn (
str) – columns in adata.obs to plot on the x-axisycolumn (
str) – columns in adata.obs to plot on the y-axislog1p_xcolumn (
bool) – normalize the xcolumn?log1p_ycolumn (
bool) – normalize the ycolumn?color_xcolumn (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – color of the xcolumn plotcolor_ycolumn (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – color of the ycolumn plotcmap_2d (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – colormap of the 2d correlation plotmin_quantile (
float) – lowest quantile of values to plotmax_quantile (
float) – highest quantile of values to plotbins_1d (
str|int) – number of bins on the 1-dimensional histogram plotsbins_2d (
str|int) – number of bins on the 2-dimensional histogram plotstat (
str) – which statistic to plot, see sns.histplot for more detailsfigsize (
tuple) – figure sizesubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.dim_red(adata, columns, obsm_key='X_svd', cmap='tab10', n_dims=6, subset_size=2000, random_state=42)#
Grid plot visualizing a range of reduced dimensions.
- Parameters:
adata (
AnnData) – adata objectcolumns (
str|Iterable) – one or more columns in adata.obs to plot. One multiplot per columnobsm_key (
str) – key in adata.obsm with dim_red outputcmap (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – colormapn_dims (
int|tuple) – number of dimensions to plot, or a tuple with dimensionssubset_size (
int) – subsample the data to this number (per column)random_state (
int) – random seed value
- Return type:
- Returns:
a list of tuples with matplotlib figure and axis
- stampede.pl.ncell_per_condition(adata, columns, offset_between_conditions=1, palette='terrain', subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#
Plot the number of cells per condition in a column in adata.obs.
- Parameters:
adata (
AnnData) – an adata objectcolumns (
str|list) – one or more columns in adata.obs to visualize, in order of significanceoffset_between_conditions (
int|list) – distance between different conditions Can be a single value, or a list of offset values for each column (length=len(columns)-1)palette (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float] |dict[str,str]) – color palette (default: “terrain”)subplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting functiontext_kwargs (
dict) – kwargs passed to ax.set_xticks and ax.set_yticks
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.paired_binomial_glm_volcano(df, symbol_column='index', or_column='odds_ratio', pvalue_column='padj', separation_column='perfect_separation', pval_thresh=0.05, l2or_thresh=0.75, to_label=5, drop_perfect_separation=True, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#
Generate a volcano plot from the detection_rates results dataframe.
- Parameters:
df (
DataFrame) – a dataframesymbol_column (
str) – column name of gene IDs to useor_column (
str) – column name of odds ratiospvalue_column (
str) – column name of the adjusted p values to be converted to -log10 p-valuesseparation_column (
str) – boolean column denoting perfect separationspval_thresh (
float) – threshold pvalue_column for genes to be significantl2or_thresh (
float) – threshold for the log2 odds ratios to be considered significantto_label (
int|list|None) – the number of top genes (down and up each) to be labeleddrop_perfect_separation (
bool) – whether to drop the genes with perfect separationssubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting functiontext_kwargs (
dict) – kwargs passed to ax.text
- Return type:
- Returns:
matplotlib figure and axis object
- stampede.pl.pydeseq2_volcano(df, symbol_column='index', log2fc_column='log2FoldChange', pvalue_column='padj', basemean_column='baseMean', pval_thresh=0.05, log2fc_thresh=0.75, to_label=5, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#
Generate a volcano plot from a pyDESeq2 results dataframe.
Adapted from mousepixels/sanbomics
- Parameters:
df (
DataFrame) – a pyDESeq2 results dataframesymbol_column (
str) – column name of gene IDs to uselog2fc_column (
str) – column name of log2 Fold-Change valuespvalue_column (
str) – column name of the adjusted p values to be converted to -log10 p-valuesbasemean_column (
str) – column name of base mean values for each genepval_thresh (
float) – threshold pvalue_column for points to be significantlog2fc_thresh (
float) – threshold for the absolute value of the log2 fold change to be considered significantto_label (
int|list|None) – If an int is passed, that number of top down and up genes will be labeled. If a list of gene Ids is passed, only those will be labeledsubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting functiontext_kwargs (
dict) – kwargs passed to ax.text
- Return type:
- Returns:
matplotlib figure and axis object
- stampede.pl.scree(adata, obsm_key='X_svd')#
Scree plot
- stampede.pl.sketch(adata, obs_column='subset', use_rep='X_svd', plot_kwargs=None)#
Scatterplot highlighting the cells that were sampled. Requires the full adata object.
- Parameters:
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.slide_qc(adata, columns=None, figsize=None, subplot_kwargs=None, plot_kwargs=None, legend_kwargs=None)#
Plot the values from one or more QC columns in adata.uns[“fov_metadata”] (added by slide_qc_data()). Specify columns to limit the number of plots.
- Parameters:
adata (
AnnData) – an adata objectcolumns (
str|Iterable) – columns in adata.uns[“fov_metadata”] to plot (default: all)figsize (
tuple) – tuple of figure, will be multiplied by the number of plotssubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting functionlegend_kwargs (
dict) – kwargs passed to the legends
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.value_distribution(adata, layer=None, min_quantile=0.0, max_quantile=0.95, subplot_kwargs=None, plot_kwargs=None)#
Plot the number of occurrences of values in the dataset.
- Parameters:
adata (
AnnData) – an adata object.layer (
str) – the layer the values are drawn from (default: X)min_quantile (
float) – lowest quantile of values to plotmax_quantile (
float) – highest quantile of values to plotsubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.violin(adata, columns, inner='quart', fill=False, cut=0, log_scale=False, figsize=None, subplot_kwargs=None, plot_kwargs=None)#
Violin plots for one or more columns in adata.obs.
Wraps seaborn’s violinplot. See https://seaborn.pydata.org/generated/seaborn.violinplot.html
- Parameters:
adata (
AnnData) – an adata objectinner (
str) – See sns.violinplot for more detailsfill (
bool) – See sns.violinplot for more detailscut (
int) – See sns.violinplot for more detailslog_scale (
bool|Sequence[bool]) – See sns.violinplot for more detailsfigsize (
tuple) – tuple of figure, will be multiplied by the number of plotssubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
stampede.tl#
analysis tools
- stampede.tl.paired_binomial_glm(df, adata, column, test_condition, reference_condition, condition_column='condition', covariate_columns=None, random_state=42)#
- Runs paired sample-level binomial GLM:
gene_detection_rate ~ condition + covariate(s)
- Parameters:
df (
DataFrame) – dataframe with detection rates per gene per sampleadata (
AnnData) – the adata from which the detection rates were obtainedcolumn (
str) – the column in adata.obs from which the detection rate df column names were obtainedtest_condition (
str) – the condition to compare (e.g., “treated”)reference_condition (
str) – the baseline condition (e.g., “control”)condition_column (
str) – column with the conditionscovariate_columns (
str|list) – column(s) with covariates (e.g. “batch”)random_state (
int) – random seed value
- Return type:
- Returns:
per-gene results including beta, odds_ratio, pval, padj
- stampede.tl.pydeseq2(counts, adata, column, test_condition, reference_condition, condition_column, covariate_columns=None, inference=None, n_cpus=16, return_objects=False, dds_kwargs=None, ds_kwargs=None)#
pyDEseq2 wrapper for pseudobulk DGE
- Parameters:
counts (
DataFrame) – dataframe with counts per gene per sampleadata (
AnnData) – the adata from which the counts were obtainedcolumn (
str) – column in adata.obs with groups to comparetest_condition (
str) – the condition to compare (e.g., “treated”)reference_condition (
str) – the baseline condition (e.g., “control”)condition_column (
str) – column with the conditionscovariate_columns (
str|list) – column(s) with covariates (e.g. “batch”)inference (
Inference) – pyDESeq2 inference class instancen_cpus (
int) – number of threads to usereturn_objects (
bool) – return the DeseqDataSet, DeseqStats and the results_df. If False, only return the results_dfdds_kwargs (
dict) – kwargs passed to DeseqDataSetds_kwargs (
dict) – kwargs passed to DeseqStats
- Returns:
pydeseq2 output
- stampede.tl.sketch(adata, n=None, frac=0.05, use_rep='X_svd', obs_column='subset', random_seed=42, return_subset=False, **kwargs)#
Subset the cells in adata using GeoSketch.
- Parameters:
adata (
AnnData) – adata objectn (
int) – the number of cells to keep. If None, frac will be used instead.frac (
float) – the fraction of cells to keep. Only used if n is None.use_rep (
str) – use the indicated representation.obs_column (
str) – add this column to adata.obs with boolean values if the cell is kept.random_seed (
int) – random seed passed to numpy.return_subset (
bool) – if True, return a subset adata object.kwargs – kwargs passed to geosketch.gs.
- Return type:
- Returns:
The subset anndata object (if specified)
Configuration#
- stampede.config#
A dictionary with package specific settings that may be altered during runtime. Accessed using import stampede as st; st.config.
Keys may not be added or removed, but values may be changed.
Default config items:
{
# columns found in the exprmat_file that represent metadata
"exprmat_md_columns": ["fov", "cell_ID"],
# columns found in the metadata_file that represents metadata
"metadata_md_columns": ["fov", "cell_ID"],
# columns found in the sample_file that represents metadata
"sample_md_columns": ["sample", "slide", "fovs"],
# directory to write (temporary) adata objects to
"adata_dir": "adatas",
}