Tutorial 3: Mouse Brain Analysis (Slideseq)

> This tutorial demonstrates how to identify spatial domains on Slideseq data from mouse brain by SemanticST.

Import necessary packages

from sklearn import metrics
import torch
import copy
import os
import random
import numpy as np
from semanticst.loading_batches import PrepareDataloader
from semanticst.loading_batches import Dataloader
import scanpy as sc
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path
import torch.utils.data as data
from semanticst.main import Config
import warnings
warnings.filterwarnings("ignore")
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

Read data and import device

device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
print(f"You are using *{device}*")
BASE_PATH = Path('/media/rokny/DATA1/Roxana/Data/6.Mouse_Hippocampus_Tissue/filtered_feature_bc_matrix.h5ad')
spot_paths= Path(f'{BASE_PATH}')

You are using *cuda:1*

Tip: For optimal preprocessing of this dataset, it is advisable to apply two filtering steps. First, remove cells that contain fewer than 20 detected genes. Second, filter out genes that are expressed in fewer than 50 cells.

dataset="Mouse Brain_Slideseq"
adata = sc.read_h5ad(spot_paths)
adata.var_names_make_unique()
sc.pp.filter_cells(adata, min_genes=20)
sc.pp.filter_genes(adata, min_cells=50)
print(adata)

AnnData object with n_obs × n_vars = 51398 × 14288
    obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'n_genes'
    var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells'
    obsm: 'spatial'

Filter outside spots

x_center, y_center = np.median(adata.obsm['spatial'][:, 0]), np.median(adata.obsm['spatial'][:, 1])
distances = np.sqrt((adata.obsm['spatial'][:, 0] - x_center)**2 + (adata.obsm['spatial'][:, 1] - y_center)**2)
threshold_distance = np.percentile(distances, 95)  # Adjust this percentile as needed
filtered_indices = distances <= threshold_distance
adata = adata[filtered_indices].copy()
plt.figure(figsize=(8, 8))
plt.scatter(adata.obsm['spatial'][:, 0], adata.obsm['spatial'][:, 1], s=5, alpha=0.7)
plt.xlabel('Spatial coordinate X')
plt.ylabel('Spatial coordinate Y')
plt.title('Filtered Spot Distribution')
plt.gca().invert_yaxis()
plt.show()

../../_images/a5d7b4161f295a8d252ca5766205b9d61ea91035e705cbbb44e7f5cb8588dade.png

Train the model

Available data types: ‘Xenium’, ‘Visium’, ‘Stereo’, ‘Slide’.
It is essential to specify the type of ST data, as different ST technologies require distinct preprocessing steps. Additionally, you have the option to select between mini-batch training for large datasets and full dataset training for smaller ones, ensuring efficient data processing and model performance.

dtype = "Slide"  
config=Config(device=device,dtype=dtype, use_mini_batch=False)
from semanticst.SemanticST_main import Semantic as Trainer
config_used = copy.copy(config)
model = Trainer(adata,config)  
adata=model.train()

🚀 Welcome to SemanticST! 🚀

📢 Recommendation: If your dataset contains more than 40000 spots or cells, we suggest using **mini-batch training** for efficiency.

✅ Using Full Dataset Training (No Mini-Batching). 🔥
Begin to train ST data...
building sparse Matrix
Graph constructed!
Done!
320320
Learning Semantic graphs

Training: 100%|████████████████████████████| 250/250 [00:14<00:00, 17.30epoch/s]

Semantic Graph Learning Completed

Feature Learning Epochs: 100%|██████████████| 1000/1000 [38:37<00:00,  2.32s/it]

Clustering

from sklearn.decomposition import PCA
pca = PCA(n_components=20, random_state=1) 
embedding = pca.fit_transform(adata.obsm['emb_decoder'].copy())
adata.obsm['emb_pca'] = embedding
sc.pp.neighbors(adata, use_rep='emb_pca')
sc.tl.umap(adata)

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

sc.tl.louvain(adata, resolution=0.5)

sc.tl.leiden(adata, resolution=0.35)

#adata.obsm['spatial'][:, 1] = -1*adata.obsm['spatial'][:, 1]

adata.uns['colors']= [
    "#A6CEE3",  # Light Blue
    "#1F78B4",  # Blue
    "#B2DF8A",  # Light Green
    "#33A02C",  # Green
    "#FB9A99",  # Light Pink
    "#E31A1C",  # Red
    "#FDBF6F",  # Light Orange
    "#FF7F00",  # Orange
    "#CAB2D6",  # Lavender
    "#6A3D9A",  # Purple
    "#FFFF99",  # Yellow
    "#B15928"   # Brown
]
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="leiden",s=5,palette=adata.uns['colors'], show=False, title='SemanticST')
plt.axis('off')

(735.0915, 5868.5385, -5746.2294999999995, -612.9804999999999)

../../_images/5661493dd0f138e67549504ec521594531d407a3de1dd1faaff531f9ca294f85.png

plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="louvain",s=5,palette=adata.uns['colors'], show=False, title='SemanticST')
plt.axis('off')

(735.0915, 5868.5385, -5746.2294999999995, -612.9804999999999)

../../_images/85f07f725a9e9338675d4ddaceaefa8addf255d110968e0a943ccbb16a2fc655.png

n_cluster =10
tool='mclust'
from semanticst.utils import clustering
clustering(adata,seed=2025, n_clusters=n_cluster, method=tool,key='emb_decoder')

R[write to console]:                    __           __ 
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_  
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%

adata.uns['colors']= [
    "#A6CEE3",  # Light Blue
    "#1F78B4",  # Blue
    "#B2DF8A",  # Light Green
    "#33A02C",  # Green
    "#FB9A99",  # Light Pink
    "#E31A1C",  # Red
    "#FDBF6F",  # Light Orange
    "#FF7F00",  # Orange
    "#CAB2D6",  # Lavender
    "#6A3D9A",  # Purple
    "#FFFF99",  # Yellow
    "#B15928"   # Brown
]
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="domain",s=6,palette=adata.uns['colors'], show=False, title='SemanticST')
plt.axis('off')

(735.0915, 5868.5385, -5746.2294999999995, -612.9804999999999)

../../_images/3d0b313022435a57ade13f1c5ee0babee58e0299984d78e63437c1b474aefade.png