Advertisement

10X单细胞-10X空间转录组联合分析之七----CellDART

阅读量:

大家好!今天很高兴向大家介绍一项新的技术——CellDART——一种用于10X单细胞空间联合分析的方法。关于这项技术的相关文章之前也分享过不少,在这里做了简要介绍。对于这项技术感兴趣的读者可以通过查阅相关资料来深入了解。

十倍测序的单细胞与空间联合分析平台——cell2location

本研究汇总了基于10X空间测序组与10X单细胞测序数据的整合分析方法。

10X单细胞空间联合分析之三----Spotlight

10X单细胞空间联合分析之四----DSTG

十倍单细胞空间整合分析——DSTG(第四部分)

10X单细胞空间联合分析之五----spatialDWLS

10X单细胞空间联合分析之六(基于每个 spot 的胞内计数结果纳入单胞计数数据集构建)----Tangram

多种分析方法层出不穷,因此需要具备独立判断的能力去辨别哪些方法更适合自己的分析需求。接下来将介绍这篇文献的核心内容,在开始之前,请先来分享这篇文献的相关信息;最后通过示例代码展示具体应用。

Abstract

Deciphering(澄清,阐明,辨认 ) the cellular composition in genome-wide spatially resolved transcriptomic data is a critical task to clarify the spatial context of cells in a tissue.(这句话翻译过来就是阐明全基因组空间解析的转录组数据中的细胞组成是阐明组织中细胞空间背景的关键任务,这个确实非常重要 ),作者这里开发了一个新方法 ,cellDART, which estimates the spatial distribution of cells defined by single-cell level data using domain adaptation of neural networks(这个东西是什么,需要我们往下看看了) and applied it to the spatial mapping of human lung tissue。The neural network that predicts the cell proportion in a pseudospot, a virtual mixture of cells from single-cell data, is translated to decompose the cell types in each spatial barcoded region(这个是解卷积方法的常规思路 )。下面运用这个软件分析了两个数据mouse brain and human dorsolateral prefrontal cortex tissue,当然了,效果不错,老套路了 。CellDART is expected to help to elucidate the spatial heterogeneity of cells and their close interactions in various tissues.

Main(Introduction)这里我们总结一下

革命性的技术手段实现了对基因在细胞空间中广泛分布的捕捉。这些技术不仅达到了"几倍"级别的分辨率(相当于10X空间转录组的技术水平),还进一步缩小到单个细胞分辨率(目前单细胞水平的空间转录组研究仍面临较大挑战),甚至达到了亚细胞层面(华大基因在亚细胞水平的空间转录组研究方面取得了显著进展)。

  • 空间转录组现在唯一的限制因素 一个spot里面包含了多个细胞。尤其a tissue with a high level of heterogeneity, such as cancer, consists of a variety of cells in each small domain of the tissue(这个限制确实影响很大 )。Thus, the identification of different cell types in each spot is a crucial task to understand the spatial context of pathophysiology using a spatially resolved transcriptome.
    现在10X空间转录组和10X单细胞联合分析的方法主要有两派,一派是找锚点映射的方法,典型如Seurat,scanpy,另外一种就是解卷积的方法,典型如SPOTlight,cell2location,解卷积的方法占大多数,在解卷积的方法中 ,calculating the proportion of cell types defined by scRNA-seq data from spots of spatially resolved transcriptomic data can be considered a domain adaptation task(区域适应任务,这个翻译有点土,不过意思还是解卷积那种思路 )。A model that predicts cell fractions from the gene expression profile of a group of cells can be transferred to predict the spatial cell-type distribution.(单细胞空间联合分析确实很重要 )。

In this paper, we propose CellDART as a method incorporating adversarial discriminative domain adaptation (ADDA) to infer cell fractions in spatial transcriptomic datasets. A SPOT is randomly constructed from scRNA-seq data中的cells whose proportions are known. The neural network model extracts cellular components from gene expression profiles of SPOTs and is applicable across domains with spatial transcriptomic data. The model's understanding of machine learning concepts is assumed based on its applicability across different domains. Consequently, the integration of spatial and single-cell transcriptomic analyses reveals detailed information about spatial cell composition and highlights their heterogeneity. Then apply this method for practical implementation.

Result 我们首先来看看这个软件的效果

Analysis of the spatial arrangement of cells using the CellDART technique across datasets derived from human and mouse brains.

两份示例数据分别是人类的大脑前额叶前叶区(human \, dorsolateral \, prefrontal \, cortex)以及小鼠大脑(mouse \, brain),请查看注释数据

然后是marker gene(感觉并不是很特异

Groups of cells demonstrated clearly distinguishable gene expression profiles, which were characterized by cell type-specific marker genes.

构建伪SPOT:从单细胞数据集中随机选择特定数量的细胞(k=8),使用随机权重生成了总计2万份伪斑点样本(包括这8个细胞)。

在第二步中, 复合基因表达值是从标志基因中计算得出的

A neural network was trained into precisely decomposing pseudospots, while another network, the domain classifier, was trained to differentiate between real spatially resolved transcriptomes and pseudospots.

The weights of neural networks were updated or adjusted during training in order to aim at predicting cell fractions while misleading a domain classifier so as not to allow discrimination between real and pseudospots. (这个地方有点难理解) 。 As a consequence, both the neural network and source classifier underwent training aimed at estimating cell fractions within both pseudospots and real spatial spots through an adversarial domain adaptation methodology.

到此为止我们已经总结了以下内容:首先使用单细胞数据构建了一个伪空间模型SPOT(其中包含了8个单细胞样本)。通过对这一模型的具体组成进行分析和计算得到了该伪空间模型中的具体组成信息。随后将这些虚拟样本与真实样本一起训练了一个神经网络模型。同时建立了用于识别不同细胞类型的分类系统。经过优化处理后能够准确推断出真实空间中各类型细胞的比例结构。

来看看解卷积效果

这个结果很官网的结果很相似
下面的人的数据

示例数据看,还可以

A Comparative Analysis of CellDART against Alternative Integration Tools Within Human Brain Tissue

  • 另外三个软件是Scanorama, Cell2location, and RCTD.
首先是Scanorama

Scanorama demonstrated a limited number of excitatory neurons with layer-specific distribution characteristics. In contrast, the identified Ex_2_L5, Ex_4_L6, Ex_9_L5/6, and Ex_10_L2/4 populations exhibited distributions that did not conform to established distribution patterns. Notably, 该技术方案似乎存在明显的缺陷

其次是Cell2location

In the case of Cell2location, both excitatory neurons and non-neuronal cells failed to exhibit layer-specific distribution patterns except for a few cell types (Ex_4_L6, Oligos_1, and Micro_Micro)(Cell2location also falls short

第三看RCTD

Finally, for RCTD,少数兴奋性神经元(Ex_2_L5和Ex_10_L2_4)在特定层中显示了较高的细胞比例;然而其他兴奋性神经元则呈现出多样的分布模式

作者很贼,比较的三个软件没有一个常用的。

Receiver operating characteristic (ROC) curve analysis has been used for comparing the performance of four distinct tools in forecasting the layer-specific distribution of excitatory neurons.

当然,文章都不用怎么看,作者的软件效果最好

结果3 Research on the spatial heterogeneity of human lung tissues utilizing CellDART technology

CellDART was further employed in the analysis of healthy lung spatial transcriptomic datasets.

看看联合分析的结果

Across both lung 1 and lung 2 datasets, each cell type exhibited divergent distribution profiles within the segmented tissue regions.

To summarize, CellDART was capable of accurately localizing the spatial distribution of diverse cell populations in normal human lung tissue.

看看文献的结论

In conclusion, CellDART is capacitated to estimate the spatial cell compositions in complex tissues exhibiting high levels of heterogeneity through the alignment of single-cell and spatial transcriptomics datasets. The suggested approach can assist in elucidating the spatial interaction of various cells in close proximity and tracking cell-level transcriptomic changes while preserving the spatial context.

Method 关注一下算法

CellDART: Cell type inference with domain adaptation
  • 首先定义了一种特征嵌入器(feature embedder),该特征嵌入器能够根据基因表达数据计算出64维嵌入特征(embedding features),这些数据来源于空间点或伪点(spatial spots/pseudospots)。
  • 该特征嵌入器由两个全连接层构成(fully connected layers),每个层均经过批标准化(batch normalization)处理,并采用ELU激活函数(activation function)进行激活。
  • 第一层输出维度为1024维、第二层输出维度为64维。
  • 同时定义了源分类器(source and domain classifiers),它们分别能够预测每个斑块中的细胞分数,并对伪点与真点进行区分。
  • 域分类器包含两个全连接层,在输入端接64维的嵌入特征。
  • 源分类器直接连接到特征提取器的嵌入特征,并以单层模型直接连接到特征嵌入器。
  • 因此将源分类器与域分类器各自连接到特征提取器的部分命名为源分类模型(source classification model)或域分类模型(domain classification model)。这两个模型共享相同的特征提取部分。

最后,看一下示例代码

CellDART Example Code: mouse brain

加载模块
复制代码
 import scanpy as sc

    
 import pandas as pd
    
 import numpy as np
    
 import seaborn as sns
    
 import matplotlib.pyplot as plt 
    
 import da_cellfraction
    
 from utils import random_mix
    
 from sklearn.manifold import TSNE

1. Data load

load scanpy data - 10x datasets

复制代码
 sc.set_figure_params(facecolor="white", figsize=(8, 8))

    
 sc.settings.verbosity = 3
    
  
    
 adata_spatial_anterior = sc.datasets.visium_sge(
    
     sample_id="V1_Mouse_Brain_Sagittal_Anterior"
    
 )
    
 adata_spatial_posterior = sc.datasets.visium_sge(
    
     sample_id="V1_Mouse_Brain_Sagittal_Posterior"
    
 )
    
  
    
 #Normalize
    
 for adata in [
    
     adata_spatial_anterior,
    
     adata_spatial_posterior,
    
 ]:
    
     sc.pp.normalize_total(adata, inplace=True)
    
    
    
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-07-12/mwMyDx35LgFR1WSHurzqtO8n0pja.png)

Single cell Data: GSE115746

Access the data repository from the GEO platform, then utilize the two provided files—'GSE115746_cells_exon_counts.csv' and 'GSE115746_complete_metadata_28706-cells.csv'—for your analysis.

复制代码
 adata_cortex = sc.read_csv('../data/GSE115746_cells_exon_counts.csv').T

    
 adata_cortex_meta = pd.read_csv('../data/GSE115746_complete_metadata_28706-cells.csv', index_col=0)
    
 adata_cortex_meta_ = adata_cortex_meta.loc[adata_cortex.obs.index,]
    
 adata_cortex.obs = adata_cortex_meta_
    
 adata_cortex.var_names_make_unique()  
    
 #Preprocessing
    
 adata_cortex.var['mt'] = adata_cortex.var_names.str.startswith('Mt-')  # annotate the group of mitochondrial genes as 'mt'
    
 sc.pp.calculate_qc_metrics(adata_cortex, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
    
 sc.pp.normalize_total(adata_cortex)
    
 #PCA and clustering : Known markers with 'cell_subclass'
    
 sc.tl.pca(adata_cortex, svd_solver='arpack')
    
 sc.pp.neighbors(adata_cortex, n_neighbors=10, n_pcs=40)
    
 sc.tl.umap(adata_cortex)
    
 sc.tl.leiden(adata_cortex, resolution = 0.5)
    
 sc.pl.umap(adata_cortex, color=['leiden','cell_subclass'])
    
    
    
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-07-12/WJYVUDMEsdrbvut59go6TqichRfH.png)
复制代码
 sc.tl.rank_genes_groups(adata_cortex, 'cell_subclass', method='wilcoxon')

    
 sc.pl.rank_genes_groups(adata_cortex, n_genes=20, sharey=False)

Select same gene features

复制代码
 adata_spatial_anterior.var_names_make_unique()

    
 inter_genes = [val for val in res_genes_ if val in adata_spatial_anterior.var.index]
    
 print('Selected Feature Gene number',len(inter_genes))
    
 adata_cortex = adata_cortex[:,inter_genes]
    
  
    
 adata_spatial_anterior = adata_spatial_anterior[:,inter_genes]

Array of single cell & spatial data

  • Single cell data with labels
  • Spatial data without labels
复制代码
 mat_sc = adata_cortex.X

    
 mat_sp = adata_spatial_anterior.X.todense()
    
  
    
 df_sc = adata_cortex.obs
    
  
    
 lab_sc_sub = df_sc.cell_subclass
    
 sc_sub_dict = dict(zip(range(len(set(lab_sc_sub))), set(lab_sc_sub)))
    
 sc_sub_dict2 = dict((y,x) for x,y in sc_sub_dict.items())
    
 lab_sc_num = [sc_sub_dict2[ii] for ii in lab_sc_sub]
    
 lab_sc_num = np.asarray(lab_sc_num, dtype='int')
    
    
    
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-07-12/xqdM1kG6FnJwEC4vroH9utWVQsLI.png)

2. Generate mixture from single cell data and preprocessing

复制代码
 sc_mix, lab_mix = random_mix(mat_sc, lab_sc_num, nmix=5, n_samples=5000)

    
  
    
 def log_minmaxscale(arr):
    
     arrd = len(arr)
    
     arr = np.log1p(arr)
    
     return (arr-np.reshape(np.min(arr,axis=1), (arrd,1)))/np.reshape((np.max(arr, axis=1)-np.min(arr,axis=1)),(arrd,1))
    
  
    
 sc_mix_s = log_minmaxscale(sc_mix)
    
 mat_sp_s = log_minmaxscale(mat_sp)
    
 mat_sc_s = log_minmaxscale(mat_sc)
    
    
    
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-07-12/qs3rmtZbJxu6j9IXwzFMLKSGfnoH.png)

3. Training: Adversarial domain adaptation for cell fraction estimation

Parameters

  • alpha:用于池化域分类器的对抗学习损失权重
    • alpha_lr:训练域分类器的学习率(α_lr乘以0.001)
    • emb_dim:嵌入维度(特征维度)
    • batch_size:训练使用的批量大小
    • n_iterations:对抗训练迭代次数
    • initial_train:如果为真,则在进行对抗域适应之前先对分类模型进行初始训练
    • initial_train_epochs:初始训练的 epoch 数量
复制代码
 embs, clssmodel = da_cellfraction.train(sc_mix_s, lab_mix, mat_sp_s,

    
                              alpha=1, alpha_lr=5, emb_dim = 64, batch_size = 512,
    
                              n_iterations = 2000,
    
                               initial_train=True,
    
                               initial_train_epochs=10)

4. Predict cell fraction of spots and visualization

复制代码
 pred_sp = clssmodel.predict(mat_sp_s)

    
  
    
 def plot_cellfraction(visnum):
    
     adata_spatial_anterior.obs['Pred_label'] = pred_sp[:,visnum]
    
     sc.pl.spatial(
    
     adata_spatial_anterior,
    
     img_key="hires",
    
     color='Pred_label',
    
     palette='Set1',
    
     size=1.5,
    
     legend_loc=None,
    
     title = sc_sub_dict[visnum])
    
  
    
    
    
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-07-12/0ONYy3sMblSBJoPjQE1W5T4degD2.png)
复制代码
 numlist = [2,3,7,8,12,13,18]

    
  
    
 for num in numlist:
    
     plot_cellfraction(num)

方法上跟解卷积的思路一致,不过引入了新的思想,很值得一试

生活很好,有你更好

全部评论 (0)

还没有任何评论哟~