论文笔记:Evaluating the Performance of Large Language Models on GAOKAO Benchmark
1 论文思路
采用zero-shot prompting的方式,将试题转化为ChatGPT的输入
对于数学题,将公式转化为latex输入
主观题由专业教师打分
2 数据
2010~2022年,一共13年间的全国A卷和全国B卷

3 结论
3.1 不同模型的zeroshot 高考总分


3.2 各科主观题&客观题得分



3.3 不同年份的得分

全部评论 (0)
相关文章推荐
论文笔记:Evaluating the Performance of Large Language Models on GAOKAO Benchmark
1论文思路 采用zeroshotprompting的方式,将试题转化为ChatGPT的输入 对于数学题,将公式转化为latex输入 主观题由专业教师打分 2数据 20102022年,一共13年间的全国...
论文笔记--Evaluating Large Language Models Trained on Code
论文笔记EvaluatingLargeLanguageModelsTrainedonCode 1\.文章简介 2\.文章概括 3文章重点技术 3.1评估 3.2模型训练Codex 3.3微调模型Cod...
Evaluating the Generation Capabilities of Large Chinese Language Models
文章目录 题目 摘要 相关工作 CGEval 实验 题目 评估大型中文语言模型的生成能力 论文地址:https://arxiv.org/abs/2308.04823 项目地址:http://cgeva...
翻译:arXiv-2023 PromptRobust: Towards Evaluating the Robustness of Large Language Models on
PromptRobust:TowardsEvaluatingtheRobustnessofLargeLanguageModelsonAdversarialPrompts <https://arxiv....
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models
本文是LLM系列文章,针对《MANGO:ABenchmarkforEvaluatingMappingandNavigationAbilitiesofLargeLanguageModels》的翻译。
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MULTIQ
本文是LLM系列文章,针对《EvaluatingtheElementaryMultilingualCapabilitiesofLargeLanguageModelswithMULTIQ》的翻译。
A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Mode
本文是LLM系列文章,针对《CIFBench:AChineseInstructionFollowingBenchmarkforEvaluatingtheGeneralizabilityofLargeL...
[论文阅读]On the Risk of Misinformation Pollution with Large Language Models
OntheRiskofMisinformationPollutionwithLargeLanguageModels <http://arxiv.org/abs/2305.13661 EMNLP2023...
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey
本文是LLM系列文章,针对《BeyondAccuracy:EvaluatingtheReasoningBehaviorofLargeLanguageModelsASurvey》的翻译。
Evaluating Interventional Reasoning Capabilities of Large Language Models
本文是LLM系列文章,针对《EvaluatingInterventionalReasoningCapabilitiesofLargeLanguageModels》的翻译。




