Advertisement

Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers

阅读量:

本文属于LLM系列文章中的一个部分,并且特别针对《Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers》一文进行了翻译。

阅读潜台词:与作家一起评价短篇小说摘要的大型语言模型

  • 摘要
  • 1 引言
  • 2 相关工作
  • 3 作家与数据
  • 4 摘要生成
  • 5 人类评估
  • 6 结果
  • 7 讨论
  • 8 结论

摘要

针对总结短篇小说这一极具挑战性的任务(LLM),我们进行了评估。值得注意的是,在开展这项研究时(研究),我们特别选择了与作者直接合作的方式(合作),以便确保故事内容没有被网络传播(防止...)。通过使用作者自身的判断标准(判断标准),对其摘要质量(质量)进行了知情评价(评价)。采用叙事理论框架下的定量与定性方法(方法)对该问题进行了考察(考察)。结果显示,在超过50%的情况下这些模型都出现了偏离事实的情况(情况)。然而,在最佳情况下这些模型仍能提供深入的主题分析能力(能力)。此外我们还证实了LLM在判断摘要质量方面与作者反馈之间存在差异性(差异性)。

1 引言

2 相关工作

3 作家与数据

4 摘要生成

5 人类评估

6 结果

7 讨论

8 结论

我们携手 renowned writers to offer them original short stories, which they have not yet published, and we carefully examine the quality of summaries generated by LLMs on these narratives. By developing a comprehensive evaluation framework grounded in narrative theory, we establish both quantitative and qualitative metrics to assess the effectiveness of these story summaries. This evaluation is based on data that has not undergone LLM training, ensuring an unbiased assessment. Through this process, we discovered that LLMs demonstrate a capacity to understand long-form narratives and conduct in-depth theme analyses. However, a significant challenge remains in reliably interpreting latent subtextual elements, particularly those pertaining to emotional resonance and narrative tone. Our methodological framework sets a precedent by illustrating how collaborative efforts with domain experts can transcend conventional approaches for evaluating LLM performance on trained datasets.

全部评论 (0)

还没有任何评论哟~