MMGPL: 基于图提示学习的多模态医学数据分析 文献速递-大模型与多模态诊断阿尔茨海默症与帕金森疾病应用
Title
题目
MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning
MMGPL: 基于图提示学习的多模态医学数据分析
01
文献速递介绍
神经系统的疾病中包含自闭症谱系障碍(ASD)(Lord 等人, 2018)以及阿尔茨海默病(AD)(Scheltens 等人, 2021),这些疾病严重影响了患者的社交、语言能力和认知功能,在世界范围内被视为严重的公共健康问题(Feigin 等人, 2020)。值得注意的是,在大多数神经系统的疾病中(如ASD和AD),尚未找到有效的治疗方法因此亟需开发相应的诊断手段以实施早期干预策略并减缓病情进展(Wingo 等人, 2021;Zhu 等人, 2022)。
在过去的十年间
Aastract
摘要
Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to awide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosisof neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally,despite the fact that only a small number of patches in neuroimaging are relevant to the disease, and (ii) theyignore the structural information inherent in the brain connection network which is crucial for understandingand diagnosing neurological disorders. To tackle these issues, we introduce a novel prompt learning modelby learning graph prompts during the fine-tuning process of multimodal models for diagnosing neurologicaldisorders. Specifically, we first leverage GPT-4 to obtain relevant disease concepts and compute semanticsimilarity between these concepts and all patches. Secondly, we reduce the weight of irrelevant patchesaccording to the semantic similarity between each patch and disease-related concepts. Moreover, we constructa graph among tokens based on these concepts and employ a graph convolutional network layer to extract thestructural information of the graph, which is used to prompt the pre-trained multimodal models for diagnosingneurological disorders. Extensive experiments demonstrate that our method achieves superior performance forneurological disorder diagnosis compared with state-of-the-art methods and validated by clinicians.
提示学习在多模态大型模型的微调过程中对各种下游任务表现出了显著的效果。然而,在应用现有的提示学习方法至神经系统疾病的诊断中仍面临两大挑战:(i)现有方法通常将所有补丁视为同等重要性看待,尽管在神经影像中仅有少数补丁与疾病相关;(ii)它们未能充分考虑大脑连接网络中的固有结构信息这一关键特征。针对上述问题,在多模态模型微调过程中我们开发了一种新型提示学习架构——通过学习图提示来进行神经系统疾病的诊断。具体而言,在利用GPT-4获取相关的疾病概念后,我们计算了这些概念与所有补丁之间的语义相似度值,并在此基础上降低了那些与疾病无明显关联的补丁权重分配系数。在此基础上,我们基于这些概念构建了一个标记图,并采用图卷积网络层提取其结构特征作为辅助诊断信息来源。经过大量实验验证表明,在神经系统疾病的诊断任务上所提出的方法较现有最先进的方法展现出显著的优势性能,并获得了临床医生的一致认可
Method
方法
采用先进的transformers架构(Vaswani等, 2017)作为编码器的处理多模态数据已成为现代大型多模型中一个受欢迎的选择。其显著优势在于能够有效地整合来自多种模态的数据。例如,预训练的视觉语言模型如CLIP(Radford等, 2021)通过分别使用基于ViT等不同架构的分离式变压器主干来编码图像和文本内容。为了提取样本表示,该架构包含两个关键组成部分:(i)分词:将原始数据转换为分词。(ii)编码:对所有分词进行基于注意力机制的特征提取层处理。
采用Transformer(Vaswani等人于2017年提出)作为编码器架构来处理多模态数据,在现代多模态大型模型中已成为流行选择。由于其能够有效地整合来自多种模态的信息。此类预训练视觉语言模型(如CLIP由Radford等人于2021年提出)采用了基于独立骨干网络(如ViT)的Transformer架构分别对图像和文本进行了编码工作。为了生成样本表示,在Transformer架构中主要包含两个关键组件:首先是一个标记化过程将原始数据转换为标记形式;其次是一个编码模块则通过注意力机制提取各标记的特征信息。
Conclusion
结论
In this paper, we introduced a graph prompt finetuning approach for neurological disorders diagnosis. By considering both irrelevant patches and structural information among tokens in multimodal medical data. Specifically, we conducted concept learning using semantic similarity between each token and disease-related concepts to reduce irrelevant token weights. Additionally, through graph prompt learning with concept embeddings, we aimed to bridge gaps between multimodal models and neurological disease diagnosis. Experimental results showed that our method outperformed existing approaches in diagnosing neurological diseases. These findings highlight the potential effectiveness of our proposed method in advancing neurological disease diagnosis
在本文中,我们开发了一种适用于神经系统疾病诊断的图提示学习微调框架,该框架通过融合多模态医学数据中与疾病无关的补片影响以及标记间结构信息来实现概念学习。其中,我们旨在根据每个标记与其相关疾病概念之间的语义相似度来降低对与疾病无关联标记的关注程度。此外,我们利用概念嵌入技术来弥合多模态模型与神经系统疾病诊断之间的差距。实验结果表明,相较于现有最先进的方法,我们提出的方法在神经系统疾病诊断任务中展现出显著的优势
Figure
图

Fig. 1. The proposed MMGPL system is illustrated in Figure 1, which contains three key modules: a multimodal data tokenizer (light blue block), a concept learner (light green block), and a graph prompt learner (light yellow block). Initially, MMGPL processes multimodal medical data by partitioning it into multiple segments and projecting them into a unified embedding space (Section 3.2). Next, MMGPL activates GPT-4 to elicit disease-related concepts while simultaneously adjusting token weights based on their semantic similarity to these concepts (Section 3.3). Subsequently, MMGPL constructs a graph over tokens to capture structural information, which is then used to guide the training of a unified encoder (Section 3.4). Finally, MM格尔 acquires outputs from the unified encoder to predict the label of the subject.
本研究提出MMGPL流程图主要包含三个部分:多模态数据标记器、概念学习和图提示学习。在第一步中,MMGPL将多模态医学数据划分为多个区域并将其投影到共享嵌入空间中(第3.2节)。随后,在第二步中利用GPT-4生成与疾病相关的潜在概念,并通过分析这些概念与标记之间的语义关联进一步优化标记权重(第3.3节)。接着,在第三步中MMGPL构建了一个关系网络来捕捉各标记之间的相互作用,并从中提取结构信息(第3.4节)。最后整合输出结果用于预测受试者的疾病标签。

_Fig. 2. Performance of MMGPL with different component combinations across all datasets, where 'B' represents the baseline method, 'B+G' incorporates graph structure plus graph prompt learning, 'B+W' applies token weights in addition to the baseline approach, and 'B+W+G' combines both token weights and graph structure along with graph prompt learning.]
图2展示了MMGPL在各类数据集上基于不同组件组合展现的性能特征,在具体实现中,“B”代表基础型方法,“B+G”则指代将图形提示学习融入基础型方法,“B+W”则是指代引入标记权重到基础型方法的情形,“B+W+G”则综合采用了图形提示学习与标记权重两种策略进行拓展研究

Fig. 3. Performance of MMGPL with different modalities
图3. MMGPL在不同模态下的性能表现。

Fig. 4. Heat maps generated by MMGPL on different subjects in ADNI dataset.
图4. MMGPL在ADNI数据集不同受试者上生成的热图。

Fig. 5. A visualization of concept-similarity graph within the ADNI dataset. The horizontal and vertical axes denote concepts and tokens, respectively. Different color regions represent concepts associated with distinct categories: red for those linked to NC, green for LMCI-related concepts, and blue for AD-associated terms.
在图5中展示了ADNI数据集上概念相似性图的可视化结果。横向轴和纵向轴分别表示概念及其对应的标记。通过不同颜色区分不同类别中的相关概念。其中红色区域表示与正常组(NC)相关联的概念区域;绿色区域对应轻度认知损害组(LMCI)的相关区域;蓝色区域则对应老年阿尔茨海默病(AD)患者的概念区域。

Fig. 6. 该图展示了不同概念对ADNIdataset的量化影响可视化结果。其中,在图表左侧展示的是各个关键概念,在图表右侧展示的是不同的类别。连接各关键概念与类别的线条宽度反映了各权重值的大小程度,并标注了具体的权重数值.
图6展示了不同概念维度对ADNI数据集量化影响的结果可视化。左侧呈现的是具体的概念维度分布情况,在右侧则直观反映了各类别对应的分类分布特征。通过线宽直观反映各权重值大小的具体数值,则通过颜色渐变标记出各个区域的具体位置关系;具体数值标注了每个权重值所对应的实际指标参数范围。
Table
表

Table 1 presents the diagnostic performance (mean and standard deviation) of all methods across all datasets. Note: ADNI-3CLS and ADNI-4CLS reflect the classification into three categories (NC/LMCI/AD) and four categories (NC/EMCI/LMCI/AD), respectively.
表格1列出了各类方法在各个数据集上表现的诊断性能指标包括均值与标准差。特别说明的是,在"ADNI-3CLS"中涉及三类状态(NC/LMCI/AD),而"ADNI-4CLS"则涵盖四类状态(NC/EMCI/LMCI/AD)。

Table 2 Comparison of MMGPL with related works on scalability. Note: The vanilla implementation, denoted by a checkmark (✓), demonstrates the capability of supporting only two modalities while highlighting the difficulty in expanding its capabilities to accommodate additional modalities.
表2 MMG GPL 与其他工作的对比研究表明:其中 √(vanilla)仅限于支撑两种模式,并无法有效地扩展以容纳更多模式类型
