Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based

阅读量：

本文属于LLM系列文章，并对《Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation》一文的研究工作进行了详细阐述与翻译工作

对大型语言模型进行评估，旨在通过文本分析强化刻板印象识别能力，并检验探究性偏见的存在。

摘要
1 引言
2 相关工作
3 方法
4 结果与讨论
5 结论和未来工作

摘要

Recent advances in large language models (LLMs) have significantly expanded their presence in human-centric artificial intelligence applications. However, LLMs may inadvertently replicate or exacerbate biases inherent in their training datasets. This study introduced the Multiverse Gender and Stereotype (MGS) dataset, which contains 51,867 instances encompassing gender, race, occupation, religion, and stereotypical text. These instances were compiled by integrating multiple publicly available stereotype detection datasets. The research explored various machine learning methodologies aimed at establishing a baseline for stereotype detection and fine-tuned several model architectures and configurations of language models. A series of stereotype detection classifiers trained on MGS were developed to analyze English text for stereotypical content. To ensure our detector aligns with human common sense, we employed explainable AI tools such as SHAP, LIME, and BertViz. A range of example case analyses was conducted to evaluate performance consistency. Additionally, we crafted stereotype-inspired prompts using our most effective detector from prior work and evaluated its performance using popular LLMs on text generation tasks to identify stereotypical outputs. Our experimental findings reveal several key insights: First, multi-dimensional training approaches yield better results than one-dimensional classifier training. Second, an ensemble of MGS datasets enhances the detector's within-dataset and cross-dataset generalization capabilities compared to single-dataset approaches. Thirdly, newer GPT versions exhibit a reduced occurrence of prototypical content.

1 引言

2 相关工作

3 方法

4 结果与讨论

5 结论和未来工作

总之，我们通过基于文本的刻板印象分类为LLM中的审计偏见框架奠定了基础。使用MGS数据集和微调的PLM，我们的方法超越了提出的基线，并证明了多个原型维度分类器优于单个原型维度分类器，以及使用MGS数据库优于单个原型数据集。为了验证我们的模型所做的决策，我们采用了SHAP、LIME和BertViz等XAI技术。基准测试结果进一步证实了GPT系列新版本中偏差的减少。
对于未来的工作，首先，我们的目标是发现多标签数据集和模型开发，以检测重叠的刻板印象，并评估它们对疗效的协同作用，超越目前的多类别方法。其次，我们计划扩大刻板印象类别，包括LGBTQ+（WinoQueer）和地区刻板印象。此外，受token级幻觉检测的启发，我们将发现token级刻板印象检测来提高分析粒度。

全部评论 (0)

还没有任何评论哟~

Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based

本文是LLM系列文章，针对《AuditingLargeLanguageModelsforEnhancedTextBasedStereotypeDetectionandProbingBasedBiasE...

Probing Multimodal Large Language Models for Global and Local Semantic Representations

本文是LLM系列文章，针对《ProbingMultimodalLargeLanguageModelsforGlobalandLocal SemanticRepresentations》的翻译。

Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based

title:LLMfortablereasoning mathjax:true date:2024051111:44:58 tags: LargeLanguageModelsareVersatileD...

SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models

本文是LLM系列文章，针对《SPUQ:PerturbationBasedUncertaintyQuantificationforLargeLanguageModels》的翻译。

Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

本文是LLM系列文章，针对《EnhancedShortTextModeling:LeveragingLargeLanguageModelsforTopicRefinement》的翻译。

[论文精读]Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

PretrainingDataDetectionforLargeLanguageModels:ADivergencebasedCalibrationMethod <http://arxiv.org/a...

Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives

本文是LLM系列文章，针对《LargeLanguageModelsEmpoweredAgentbasedModelingandSimulation:ASurveyandPerspectives》的翻译...

Large Language Models Based Fuzzing Techniques: A Survey

本文是LLM系列文章，针对《LargeLanguageModelsBasedFuzzingTechniques:ASurvey》的翻译。基于大型语言模型的模糊化技术综述摘要 1引言 2背景 3基于...

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

本文是LLM系列文章，针对《UnsupervisedRealTimeHallucinationDetectionbasedontheInternalStatesofLargeLanguageModel...

BloomLLM: Large Language Models Based Question Generation Combining Supervised Fine-Tuning and Bloom

文章目录题目摘要简介背景和相关工作 BloomLLM 结论题目 BloomLLM：基于大型语言模型的问题生成，结合监督微调和布鲁姆分类法论文地址：https://dl.acm.org/do...

是否确定退出登录?

Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based

摘要

1 引言

2 相关工作

3 方法

4 结果与讨论

5 结论和未来工作

全部评论 (0)

相关文章推荐

Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based

Probing Multimodal Large Language Models for Global and Local Semantic Representations

Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based

SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models

Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

[论文精读]Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives

Large Language Models Based Fuzzing Techniques: A Survey

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

BloomLLM: Large Language Models Based Question Generation Combining Supervised Fine-Tuning and Bloom