API reference#

stampede#

STAMPede - STAMP data Exploration and Differential Expression

stampede.read_cosmx(slides, samples_df, adata_file, samples_df_columns=None, metadata_df_columns=None, data_dir=None, overwrite=True, verbose=True, **kwargs)#

Read exprMat_file for each slide, convert the contents to sparse anndata objects, and concatenate the results.

Parameters:
  • slides (dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective files

  • samples_df (DataFrame) – a dataframe with sample metadata to be added to adata.obs

  • adata_file (str) – filepath to write the adata object to

  • samples_df_columns (list) – list of columns in samples_df to add to adata.obs (default: all)

  • metadata_df_columns (list) – list of columns in the metadata file to add to adata.obs (default: all)

  • data_dir (str) – optional filepath prefix

  • overwrite (bool) – overwrite existing output

  • verbose (bool) – provide written feedback

  • **kwargs – keyword arguments passed to pd.read_csv

Return type:

str

Returns:

the value of the adata_file argument

stampede.validate_input(slides, samples_df, data_dir=None)#

Check the contents of the slides dictionary and samples_df for expected keys and columns, respectively.

Parameters:
  • slides (dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective files

  • samples_df (DataFrame) – a dataframe with sample metadata

  • data_dir (str) – optional filepath prefix

Return type:

None

Returns:

Nothing

stampede.pp#

preprocessing functions

stampede.pp.binarize(adata, verbose=True)#

Binarize the values in adata.X

Parameters:
  • adata (AnnData) – adata object

  • verbose (bool) – provide written feedback

Return type:

None

Returns:

Nothing, updates adata.layers and adata.X

stampede.pp.cell_qc_postfilter(adata)#

Compute metadata after filtering

Parameters:

adata (AnnData) – an adata object

Return type:

None

Returns:

Nothing, updates adata.obs

stampede.pp.combine_obs_columns(adata, columns, column_name, delim='_')#

Create a new column in adata.obs by combining all columns with the delimiter.

Parameters:
  • adata (AnnData) – an adata object

  • columns (list) – a list of columns in adata.obs to combine

  • column_name (str) – the name for the new column

  • delim (str) – the delimiter to use while joining the columns

Returns:

Nothing, updates adata.obs

stampede.pp.detection_rates(adata, column, normalize=True)#

Calculate gene detection rates per group in the specified column of adata.obs.

Parameters:
  • adata (AnnData) – adata object

  • column (str) – column in adata.obs with groups to compare

  • normalize (bool) – normalize detection rates for sample quality

Return type:

DataFrame

Returns:

a dataframe with normalized gene detection rates

stampede.pp.dim_red(adata, n_dims=50, use_genes=None, key_added='X_svd', random_state=42)#

Dimensionality reduction using Term Frequency Latent Semantic Indexing.

Parameters:
  • adata (AnnData) – adata object

  • n_dims (int) – number of dimensions to produce

  • use_genes (str) – Boolean column in adata.var with True for genes to be used in dimensionality reduction.

  • key_added (str) – key in adata.obsm for function output

  • random_state (int) – random seed value

Return type:

None

Returns:

Nothing, updates adata.obsm and adata.uns

stampede.pp.filter_cells(adata, falsecode_max=5, negprobe_max=3, ntranscript_min=0, ntranscript_max=inf, area_min=25, area_max=100, filter_columns=None, filter_internalqc=False, verbose=True)#

Filter adata.obs by a set of qc_params.

Parameters:
  • adata (AnnData) – adata object

  • falsecode_max (int) – maximum number of false codes the cell may have

  • negprobe_max (int) – maximum number of negative probes the cell may have

  • ntranscript_min (int) – minimum number of transcripts the cell must have

  • ntranscript_max (int) – maximum number of transcripts the cell must have

  • area_min (int) – minimum area (in pixels) the cell must have

  • area_max (int) – maximum area (in pixels) the cell must have

  • filter_columns (list) – a list of additional columns to filter by. Columns by (convertible to) boolean, where False values are removed.

  • filter_internalqc (bool) – filter by columns qcCellsPassed and qcFlagsFOV.

  • verbose (bool) – provide written feedback

Return type:

AnnData

Returns:

the filtered adata object

stampede.pp.filter_edges(adata, all_edges=0, left=0, top=0, right=0, bottom=0, slide=None, verbose=True)#

Filter cells based on their distance to one or more edges of its FOV. Uses the largest distance per edge.

Parameters:
  • adata – adata object

  • all_edges (int) – minimum distance from any edge in pixels

  • left (int) – minimum distance from the left edge in pixels (x = xmin + left)

  • top (int) – minimum distance from the top edge in pixels (y = ymin + top)

  • right (int) – minimum distance from the right edge in pixels (x = xmax - right)

  • bottom (int) – minimum distance from the bottom edge in pixels (y = ymax - bottom)

  • slide (int) – which slide to filter (default: all)

  • verbose (bool) – provide written feedback

Returns:

the filtered adata object

stampede.pp.filter_genes(adata, ncell_min=0, ncell_max=inf, ntranscript_min=0, ntranscript_max=inf, filter_columns=None, verbose=True)#

Filter adata.var by a set of qc_params.

Parameters:
  • adata (AnnData) – adata object

  • ncell_min (int) – minimum number of cells the gene is found in.

  • ncell_max (int) – maximum number of cells the gene is found in.

  • ntranscript_min (int) – minimum number of transcripts the gene must have.

  • ntranscript_max (int) – maximum number of transcripts the gene must have.

  • filter_columns (str | list) – a list of additional columns to filter by. Columns by (convertible to) boolean, where False values are removed.

  • verbose (bool) – provide written feedback

Return type:

AnnData

Returns:

the filtered adata object

stampede.pp.gene_qc(adata, mult=1, noise_threshold=None, overwrite=True)#

Add QC parameters to adata.var.

About the Signal-to-noise filter:

Approach from https://doi.org/10.1038/s41467-025-64990-y Wang et al. “Systematic benchmarking of imaging spatial transcriptomics platforms in FFPE tissues” Nat Com, 2025.

Calculate the mean expression and standard deviation of the negative control probes. Flag genes with average expression < mean + mult* x STD of ctrl probes.

*the paper used mult=2

Parameters:
  • adata (AnnData) – an adata object

  • noise_threshold (float | Iterable) – manually specify the mimimum mean_Transcript threshold. If None, use the filter specified above.

  • mult (int | float) – if noise_threshold is None, mult is used in the noise threshold computation specified above.

  • overwrite (bool) – overwrite existing qc columns

Return type:

None

Returns:

Nothing, updates adata.var

stampede.pp.gene_qc_postfilter(adata)#

Compute metadata after filtering

Parameters:

adata (AnnData) – an adata object

Return type:

None

Returns:

Nothing, updates adata.var

stampede.pp.knn_count_smoothing(adata, layer='binary', layer_added=None, neighbors_key='neighbors', verbose=True)#

For each cell, replace its gene vector with the average of its KNN neighborhood.

Runs sc.pp.neighbors if it has not run. See https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.neighbors.html

Parameters:
  • adata (AnnData) – adata object

  • layer (str) – name of the adata layer to use for smoothing

  • layer_added (str) – key in adata.layers for function output (default: “KNN_binary_mean”)

  • neighbors_key (str) – See sc.pp.neighbors for details

  • verbose (bool) – provide written feedback

Return type:

None

Returns:

Nothing, updates adata.layers and adata.X

stampede.pp.pseudobulk(adata, column, layer='binary')#

Generate a pseudobulk table (genes x samples) for all samples in the sample_column and the cluster in the cluster_column, if specified.

Parameters:
  • adata (AnnData) – adata object

  • column (str) – column in adata.obs with groups to compare

  • layer (str) – name of the adata layer to aggregate

Return type:

DataFrame

Returns:

a dataframe with summed layer values per sample

stampede.pp.slide_qc(adata, slides, add_cols=None, data_dir=None)#

Use the fov_positions file to create a dataframe with metadata columns per slide and fov, and store this in adata.uns[“fov_metadata”]. Additional adds columns to adata.obs reflecting the distance from the cell to the camera’s FOV edge.

Parameters:
  • adata (AnnData) – adata object generated using the slides dict

  • slides (dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective files

  • add_cols (Iterable | str) – additional columns to visualize (e.g. conditions)

  • data_dir (str) – optional filepath prefix

Return type:

None

Returns:

Nothing, updates adata.uns and adata.obs

stampede.pl#

plotting functions

stampede.pl.avg_per_pixel(adata, column, fill_cell_area=True, normalize_cell_area=True, log1p=False, cmap='gist_rainbow', background_color='black', figsize=(20, 15), subplot_kwargs=None, plot_kwargs=None)#

Plot the average values of the given column over all FOVs. Color’s the cell’s center pixel, unless fill_cell_area is set to True (slow).

Parameters:
Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.column_distribution(adata, column, axis=None, min_quantile=0.0, max_quantile=0.95, subplot_kwargs=None, plot_kwargs=None)#

Plot the distribution of values for a column present in either adata.obs or adata.var.

Parameters:
  • adata (AnnData) – an adata object.

  • column (str) – a column in either adata.obs or adata.var

  • axis (int) – specify if the column name is present in both obs (0) and var (1).

  • min_quantile (float) – lowest quantile of values to plot

  • max_quantile (float) – highest quantile of values to plot

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and array of axes

stampede.pl.correlations(adata, xcolumn, ycolumn, log1p_xcolumn=False, log1p_ycolumn=False, color_xcolumn=None, color_ycolumn=None, cmap_2d='Blues', min_quantile=0.0, max_quantile=0.99, bins_1d='auto', bins_2d='auto', stat='percent', figsize=(8, 7), subplot_kwargs=None, plot_kwargs=None)#

Plot the distributions and 2D correlation between two columns in adata.obs.

Parameters:
Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.dim_red(adata, columns, obsm_key='X_svd', cmap='tab10', n_dims=6, subset_size=2000, random_state=42)#

Grid plot visualizing a range of reduced dimensions.

Parameters:
Return type:

list[tuple[Figure, Axes]]

Returns:

a list of tuples with matplotlib figure and axis

stampede.pl.ncell_per_condition(adata, columns, offset_between_conditions=1, palette='terrain', subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#

Plot the number of cells per condition in a column in adata.obs.

Parameters:
Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.noise_threshold(adata, bins=50, **kwargs)#
Parameters:
stampede.pl.paired_binomial_glm_volcano(df, symbol_column='index', or_column='odds_ratio', pvalue_column='padj', separation_column='perfect_separation', pval_thresh=0.05, l2or_thresh=0.75, to_label=5, drop_perfect_separation=True, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#

Generate a volcano plot from the detection_rates results dataframe.

Parameters:
  • df (DataFrame) – a dataframe

  • symbol_column (str) – column name of gene IDs to use

  • or_column (str) – column name of odds ratios

  • pvalue_column (str) – column name of the adjusted p values to be converted to -log10 p-values

  • separation_column (str) – boolean column denoting perfect separations

  • pval_thresh (float) – threshold pvalue_column for genes to be significant

  • l2or_thresh (float) – threshold for the log2 odds ratios to be considered significant

  • to_label (int | list | None) – the number of top genes (down and up each) to be labeled

  • drop_perfect_separation (bool) – whether to drop the genes with perfect separations

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

  • text_kwargs (dict) – kwargs passed to ax.text

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and axis object

stampede.pl.pydeseq2_volcano(df, symbol_column='index', log2fc_column='log2FoldChange', pvalue_column='padj', basemean_column='baseMean', pval_thresh=0.05, log2fc_thresh=0.75, to_label=5, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#

Generate a volcano plot from a pyDESeq2 results dataframe.

Adapted from mousepixels/sanbomics

Parameters:
  • df (DataFrame) – a pyDESeq2 results dataframe

  • symbol_column (str) – column name of gene IDs to use

  • log2fc_column (str) – column name of log2 Fold-Change values

  • pvalue_column (str) – column name of the adjusted p values to be converted to -log10 p-values

  • basemean_column (str) – column name of base mean values for each gene

  • pval_thresh (float) – threshold pvalue_column for points to be significant

  • log2fc_thresh (float) – threshold for the absolute value of the log2 fold change to be considered significant

  • to_label (int | list | None) – If an int is passed, that number of top down and up genes will be labeled. If a list of gene Ids is passed, only those will be labeled

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

  • text_kwargs (dict) – kwargs passed to ax.text

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and axis object

stampede.pl.scree(adata, obsm_key='X_svd')#

Scree plot

Parameters:
  • adata (AnnData) – adata object

  • obsm_key (str) – key in adata.obsm with dim_red output

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and array of axes

stampede.pl.sketch(adata, obs_column='subset', use_rep='X_svd', plot_kwargs=None)#

Scatterplot highlighting the cells that were sampled. Requires the full adata object.

Parameters:
  • adata (AnnData) – adata object

  • obs_column (str) – column in adata.obs with boolean values if the cell is kept

  • use_rep (str) – use the indicated representation

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and array of axes

stampede.pl.slide_qc(adata, columns=None, figsize=None, subplot_kwargs=None, plot_kwargs=None, legend_kwargs=None)#

Plot the values from one or more QC columns in adata.uns[“fov_metadata”] (added by slide_qc_data()). Specify columns to limit the number of plots.

Parameters:
  • adata (AnnData) – an adata object

  • columns (str | Iterable) – columns in adata.uns[“fov_metadata”] to plot (default: all)

  • figsize (tuple) – tuple of figure, will be multiplied by the number of plots

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

  • legend_kwargs (dict) – kwargs passed to the legends

Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.value_distribution(adata, layer=None, min_quantile=0.0, max_quantile=0.95, subplot_kwargs=None, plot_kwargs=None)#

Plot the number of occurrences of values in the dataset.

Parameters:
  • adata (AnnData) – an adata object.

  • layer (str) – the layer the values are drawn from (default: X)

  • min_quantile (float) – lowest quantile of values to plot

  • max_quantile (float) – highest quantile of values to plot

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.violin(adata, columns, inner='quart', fill=False, cut=0, log_scale=False, figsize=None, subplot_kwargs=None, plot_kwargs=None)#

Violin plots for one or more columns in adata.obs.

Wraps seaborn’s violinplot. See https://seaborn.pydata.org/generated/seaborn.violinplot.html

Parameters:
  • adata (AnnData) – an adata object

  • columns (str | list) – one or more column in adata.obs

  • inner (str) – See sns.violinplot for more details

  • fill (bool) – See sns.violinplot for more details

  • cut (int) – See sns.violinplot for more details

  • log_scale (bool | Sequence[bool]) – See sns.violinplot for more details

  • figsize (tuple) – tuple of figure, will be multiplied by the number of plots

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.tl#

analysis tools

stampede.tl.paired_binomial_glm(df, adata, column, test_condition, reference_condition, condition_column='condition', covariate_columns=None, random_state=42)#
Runs paired sample-level binomial GLM:

gene_detection_rate ~ condition + covariate(s)

Parameters:
  • df (DataFrame) – dataframe with detection rates per gene per sample

  • adata (AnnData) – the adata from which the detection rates were obtained

  • column (str) – the column in adata.obs from which the detection rate df column names were obtained

  • test_condition (str) – the condition to compare (e.g., “treated”)

  • reference_condition (str) – the baseline condition (e.g., “control”)

  • condition_column (str) – column with the conditions

  • covariate_columns (str | list) – column(s) with covariates (e.g. “batch”)

  • random_state (int) – random seed value

Return type:

DataFrame | None

Returns:

per-gene results including beta, odds_ratio, pval, padj

stampede.tl.pydeseq2(counts, adata, column, test_condition, reference_condition, condition_column, covariate_columns=None, inference=None, n_cpus=16, return_objects=False, dds_kwargs=None, ds_kwargs=None)#

pyDEseq2 wrapper for pseudobulk DGE

Parameters:
  • counts (DataFrame) – dataframe with counts per gene per sample

  • adata (AnnData) – the adata from which the counts were obtained

  • column (str) – column in adata.obs with groups to compare

  • test_condition (str) – the condition to compare (e.g., “treated”)

  • reference_condition (str) – the baseline condition (e.g., “control”)

  • condition_column (str) – column with the conditions

  • covariate_columns (str | list) – column(s) with covariates (e.g. “batch”)

  • inference (Inference) – pyDESeq2 inference class instance

  • n_cpus (int) – number of threads to use

  • return_objects (bool) – return the DeseqDataSet, DeseqStats and the results_df. If False, only return the results_df

  • dds_kwargs (dict) – kwargs passed to DeseqDataSet

  • ds_kwargs (dict) – kwargs passed to DeseqStats

Returns:

pydeseq2 output

stampede.tl.sketch(adata, n=None, frac=0.05, use_rep='X_svd', obs_column='subset', random_seed=42, return_subset=False, **kwargs)#

Subset the cells in adata using GeoSketch.

Parameters:
  • adata (AnnData) – adata object

  • n (int) – the number of cells to keep. If None, frac will be used instead.

  • frac (float) – the fraction of cells to keep. Only used if n is None.

  • use_rep (str) – use the indicated representation.

  • obs_column (str) – add this column to adata.obs with boolean values if the cell is kept.

  • random_seed (int) – random seed passed to numpy.

  • return_subset (bool) – if True, return a subset adata object.

  • kwargs – kwargs passed to geosketch.gs.

Return type:

AnnData | None

Returns:

The subset anndata object (if specified)

Configuration#

stampede.config#

A dictionary with package specific settings that may be altered during runtime. Accessed using import stampede as st; st.config.

Keys may not be added or removed, but values may be changed.

Default config items:

{
    # columns found in the exprmat_file that represent metadata
    "exprmat_md_columns": ["fov", "cell_ID"],
    # columns found in the metadata_file that represents metadata
    "metadata_md_columns": ["fov", "cell_ID"],
    # columns found in the sample_file that represents metadata
    "sample_md_columns": ["sample", "slide", "fovs"],
    # directory to write (temporary) adata objects to
    "adata_dir": "adatas",
}