Advertisement

实用提示词工程:ChatGPT的 Prompt 提示和技巧教程 Practical Prompt Engineering

阅读量:

owing to their text-to-text nature, large language models (LLMs) demonstrate remarkable versatility by handling multiple tasks through a single system. Initially, this capability was showcased using zero and few-shot learning techniques, exemplified by models such as GPT-2 and GPT-3 [5, 6]. Once fine-tuned to better align with human preferences and instructions, these models exhibit even greater potential, enabling a range of popular generative applications like coding assistants, information-seeking dialogue agents, and chat-based search experiences.

Due to the applications they make possible, large language models (LLMs) have quickly gained attention from both research communities and popular culture. During this period of growth, we have also witnessed the emergence of a complementary field known as prompt engineering. From a high-level perspective, LLMs function by first accepting text (referred to as prompts) as input and then generating textual output that can be analyzed for useful information, such as classifications, summaries, translations, or other forms of processing. The flexibility of this approach is advantageous. However, it is equally important to carefully craft the input prompts in order to maximize the likelihood of the LLM producing the desired output.

Prompt工程学是一门实证科学,它研究如何利用不同的提示策略来提升LLM的表现.尽管存在多种方法,但本概述旨在帮助我们理解提示的一般机制,以及几种基本但非常有效的提示技术,例如零/少样本学习和指令提示.在此过程中,我们将学习实用技巧和要点,并立即采用这些技巧和要点来成为更有效的提示工程师和LLM从业者.

(created by author) (作者创建)

Understanding LLMs. Due to its focus upon prompting, this overview will not explain the history or mechanics of language models. To gain a better general understanding of language models (which is an important prerequisite for deeply understanding prompting), I’ve written a variety of overviews that are available. These overviews are listed below (in order of importance):
了解LLM。由于其重点是提示,本概述不会解释语言模型的历史或机制。为了更好地了解语言模型(这是深入理解提示的重要先决条件),我编写了各种可用的概述。下面列出了这些概述(按重要性排序):

  • Language Modeling Basics (GPT and GPT-2) [link]
    语言建模基础知识(GPT 和 GPT-2)[链接]

The Scaling of Language Models, Particularly GPT-3, Holds Significant Implications for Natural Language Processing. [link]

  • Modern [link] and Specialized [link] LLMs
    现代 [链接] 和专业 [链接] LLM

Palm、T-Model(前两部分)、Llama(前两部分)

Prompting at a Glance 提示一目了然

Language models are capable of solving diverse tasks by leveraging their generic text-to-text architecture (from [1])

Considering the current buzz around large language models (LLMs), we might ponder: what are the core capabilities that give them such immense power? While there isn’t a straightforward answer (for instance, model architecture, vast pre-training datasets, or human oversight—each plays a role), one significant advantage of LLMs lies in their ability to process text-to-text tasks. These models excel at predicting the next token in a sequence, making them incredibly versatile for an array of applications when properly trained and optimized!

完成某一任务时,我们仅需提供包含相关信息的文本输入,并从中提取输出即可。这种统一的方法可用于多种用途,包括翻译、摘要、问答、分类等。然而,在某些情况下这一观点并不(真正)成立:即当LLM接收的提示措辞和结构(即输入文本)存在显著差异时其准确性也会有所不同。换句话说,在LLM中进行工程性操作是一件非常重要的事。

What is prompt engineering?

什么是即时工程?

“Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use LMs for a wide variety of applications and research topics.” — from [2]
“提示工程是一门相对较新的学科,用于开发和优化提示,以有效地将 LM 用于各种应用和研究主题。” ——摘自[2]

Given the importance of properly crafting our prompt contents for achieving useful results with an LLM, prompt engineering has garnered significant attention in recent months. However, it’s an empirical discipline; discovering the optimal prompts usually relies on heuristic methods and demands experimentation. We can enhance our prompts by tracking and maintaining versions over time while experimenting with various ideas to identify what works best.

向LLM发出指令(由作者编写)
通过发出指令来引导LLM(由作者编写)

主要组成部分。 创建提示时存在多种构建选项。然而,大多数提示都由相同的几个(可选)核心要素构成:

  • Input Data: 这是LLM实际预期处理的数据(例如,在进行翻译或分类时涉及的具体句子,在进行摘要时涉及的具体文档等)。
    输入数据:这是LLM实际预期处理的数据(例如,在进行翻译或分类时涉及的具体句子,在进行摘要时涉及的具体文档等)。

Exemplars: A set of effective strategies for teaching an LLM proper behavior includes including specific input-output pairs in the prompt.

Instruction : instead of demonstrating concrete instances of correct behavior in the prompt, we could simply describe the direction to be taken; as previously noted.

指标:以清晰且固定的格式向LLM提供输入是有帮助的;因此,在提示中区分不同部分时可以采用指示项来实现这一目标;见下文。

Context : Outside the modules previously outlined, we might consider including additional data or context for the model. Background: Beyond the components already mentioned, it could be beneficial to incorporate supplementary information or background data for improved performance.

Indicator can serve as a tool to organize prompts in various ways, created by the author.

General tips. The details of prompt engineering differ a lot depending on the model being used and what task we are trying to solve. However, there are a few generally-accepted principles for prompt engineering that are helpful to keep in mind [1, 3].
一般提示。根据所使用的模型和我们要解决的任务,即时工程的细节有很大不同。然而,有一些普遍接受的快速工程原则值得牢记 [1, 3]。

从简单开始:从简单的提示开始,然后逐步修改提示的同时跟踪经验结果。

Be straightforward: If we want the LLM to conform to a distinctive style or form, we must be clear and direct in articulating our requirements. Accurately specifying precisely what is needed ensures that the message is effectively conveyed.

  • Specificity : ambiguity is the enemy of every prompt engineer. We should make the prompt detailed and specific without going overboard and providing an input that is too long (i.e., there are limitations to how long the prompt can be!).
    特异性:模糊性是每个提示工程师的敌人。我们应该使提示变得详细和具体,但不要太过分,也不要提供太长的输入(即,提示的长度是有限制的!)。

Vanderbilt exemplars serve as powerful tools: if describing what we aim to achieve is challenging, offering specific instances of the correct output or behavior for various input types can prove particularly beneficial.

Analyzing the context window within a language model framework (developed by the author)

当我们探索不同的提示技巧和方法时, 我们必须记住, 在提示中只能包含有限的信息. 所有LLM都具有预先定义好的上下文窗口, 并设定了一次处理标记总数的限制. 不同模型中的上下文窗口大小各不相同. 注意到的是, 在推动人工智能发展的过程中,"更大的上下文窗口"是一个关键的趋势. 例如, 在OpenAI的产品线中,"GPT-4"的上下文窗口达到了惊人的32,000个标记(4倍于之前版本).

Common Prompting Techniques

常见的提示技巧

Zero and small sample learning are emerging from [4,5,6].

尽管LLM技术经历了快速增长阶段(如ChatGPT等流行模型的出现),提示技术的概念并非全新产物(已有较长时间的发展历史)。最初阶段的语言模型如[GPT][4]主要通过微调方法来解决下游任务相关问题。随后随着[GPT-2][5]的提出与应用,在研究领域中逐渐出现了零样本学习的应用场景,并开始探索通过单个基础模型实现多个下游任务的学习目标。而随着近年来大语言模型规模的进一步扩大化发展,GPT-3展示了语言模型在少样本学习方面的能力

Zero-Shot Learning 零样本学习

(from [6]) (来自[6])

Zero-shot learning has a straightforward concept. When we provide a task description and relevant input data to an LLM, it generates results; refer above for details. Thanks to their extensive pre-training data, LLMs are typically adept at solving tasks using this method. It allows them to address a relatively large number of tasks effectively; see the examples that follow (generated by GPT-3.5)

Zero-shot learning via GPT-3.5 (developed by the author)
通过 GPT-3.5 实现 zero-sample 学习(由作者开发)。

GPT-2等模型对零样本学习进行了广泛的研究,并在某些情况下表现出色。然而,在零样本学习未能解决我们的任务时,我们该怎么办?在许多情况下,通过提供更具体和明确的信息来显著提升LLM的表现是可行的。特别是,在提示中开始添加所需输出的示例,从而允许模型从提示中看到的数据复制模式。

Few-Shot Learning 少样本学习

不仅仅是一个任务描述,在生成提示时我们可以通过提供高质量的输入输出示例来增强提示内容。这些技术构成了小样本学习的基础——一种旨在通过提供明确正确行为示例来提高LLM性能的方法。当应用得当并针对正确的模型时小样本学习展现出惊人的能力正如像 GPT-3 这样的LLM展现了突破性的能力(参考文献 6);见下文

(from [3]) (来自[3])

However, mastering how to maximally utilize the few-shot learning capabilities of large language models (LLMs) may involve intricate considerations. Which examples should be included in the prompt? Is there a standard approach for structuring prompts? Do alterations to prompts have substantial impacts on LLM performance?

Significantly, most large language models (LLMs) are prone to being influenced by how prompts are constructed, rendering prompt engineering both challenging and crucial. In contrast, recent models, such as GPT-4, exhibit reduced sensitivity towards minor perturbations in prompts [2]. However, despite this observation, research communities have consistently offered valuable guidelines for effectively utilizing few-shot learning techniques.

  • Sample ordering is crucial, and shuffling limited-shot samples significantly impacts LLM performance. Including additional few-shot examples does not alleviate this issue.

The spread of labels within small labeled datasets holds significant importance and must align with real-world data distributions found in nature. Interestingly, label accuracy does not hold as much significance compared to other factors that influence model performance

LLMs exhibit a strong tendency to exhibit repetitive behavior when processing few-shot examples, particularly favoring the most recent instance among few-shot examples (namely recency bias).

Examples that are part of the prompt should be varied and randomly arranged.

Optimal data sampling. 最佳数据采样策略应选择多样化、随机排序且与测试样本高度相关的示例以确保有效性。然而,在这些基本直觉之外,已进行了大量研究来确定如何为提示选择最优示例。例如,在少数节目学习场景中可以通过多样性筛选[8]、基于不确定性原则的选择[9]或根据与测试样本相似性进行筛选[10]来实现这一目标。

(from [3]) (来自[3])

Few-shot learning versus fine-tuning. Before proceeding, I want to clarify a key point of confusion that often arises in this context. Few-shot learning does not involve fine-tuning. The few-shot learning approach incorporates examples into the model prompt, which then serves as relevant context for generating accurate outputs—a process referred to as "in-context learning." The model's parameters remain unchanged through this method. In contrast, fine-tuning explicitly involves training the model (i.e., updating its weights through backpropagation) on a selected dataset.

Instruction Prompting 指令提示

An instruction-tuned language model can be employed to serve as a tool for programming tasks (from [15]).

小样本学习极具效力,但它有一个明显缺点:样本占用大量标记。鉴于LLM的上下文窗口有限,在探索不消耗过多标记的提示方法方面可能存在空间。例如,我们可以用文字向LLM解释正确的行为吗?简短的回答是肯定的!这一技术仅仅将书面指令作为提示的一部分,并被称为指令提示。它在特定类型的LLM中表现最佳。

Instruction tuning and alignment.
近年来语言模型的发展重点转向提升指令遵循能力。
经过预先训练的LLM通常不具备遵循指令的能力。
但通过教导这些模型如何遵循指令可以使它们在完成用户所需任务方面表现出色(即改善人类对齐)。
LLMs 的指导则支持多种实用的应用程序从信息寻求对话代理(例如 ChatGPT)到编码助手(例如 Codex [13]);见下文内容。

(from [13, 14]) (来自[13, 14])

Having been extensively discussed in prior writings, the initial phase of developing an LLM involves employing a language modeling objective over a vast, unlabeled corpus of text. During this phase, the system acquires knowledge and aims to precisely predict subsequent tokens. However, its output tends to be unremarkable or less than ideal for practical applications and often fails to handle intricate instructions effectively. To foster such behavior, we must advance beyond foundational training.

开发遵循指令的LLM系统。

Aligning LLMs based on human feedback

根据人类反馈调整LLM

Develop practical descriptions. When an LLM is accessed that has been trained in following directions, it enables us to achieve numerous tasks through effective prompts. Below are some essential tips and concepts regarding instruction prompting:

The instructions, similar to those in our prompt, must be specific and detailed.

We should prevent instructing the LLM to refrain from doing something in the prompt. Instead, we should directly instruct the LLM about how it should act within prompts.

By using an input format which clearly marks the instructions required for the task, it becomes more efficient to understand and follow them.

Various forms of instruction prompting have been developed by the author.

Role prompting. Another interesting prompting technique that is tangentially related to instruction prompting is role prompting, which assigns a “role” or persona to the model. This role is assigned within the prompt via a textual snippet such as:
角色提示。另一种与指令提示无关的有趣提示技术是角色提示,它为模型分配一个“角色”或角色。该角色是通过文本片段在提示中分配的,例如:

您是一位著名的数学家。

  • You are a doctor. 你是医生。
  • You are a musical expert.
    你是一位音乐专家。

Attractive are recent large language models (LLMs) capable of assuming and preserving such roles throughout the course of a conversation [18]; refer below for further details.

Role prompting with LaMDA (from [18])
使用 LaMDA 进行角色提示(来自 [18])

Proceeding further, role prompting isn’t merely an engaging tactic. The provision of roles to the LLM can actually result in improved performance. For instance, instructing GPT-3 to assume the persona of an expert mathematician can significantly enhance its ability to solve arithmetic problems (as demonstrated on learnprompting.org/docs/basics/roles). Nevertheless, this method’s effectiveness is context-dependent.

在赋予人工智能一个具体角色的过程中, 我们为其提供了相关信息.
这些背景信息有助于提升人工智能对问题的理解能力.
通过更深入的理解后, 人工智能通常会呈现更有价值的答案.

在赋予人工智能一个具体角色的过程中, 我们为其提供了相关信息.
这些背景信息有助于提升人工智能对问题的理解能力.
通过更深入的理解后, 人工智能通常会呈现更有价值的答案.

在现实世界中进行指令提示。

该 ChatGPT 检索 API 提供了提示功能以实现元数据提取和个人信息识别(由作者开发)。
在 ChatGPT 的检索功能中提供了提示以实现元数据提取和 PI 识别(由作者开发)。

Within these prompts, the LLM is equipped with clear and thorough instructions on how to execute its intended function. Some key features of these instructions are characterized by their clarity, specificity, and thoroughness.

明确说明所需输出格式类型(如json或true/false)。

The directive employs an organized structure to organize vital data.

  • The primary function of the LLM, encompassing tasks such as identifying personal identifiable information (PII) or extracting metadata, is clearly specified in the query. The LLM's function, including identifying personal identifiable information (PII) and extracting metadata, is clearly outlined in the query.

Interesting, these prompts instruct the model to avoid certain actions across various scenarios, a practice that is generally discouraged.

考虑到LLM的一些限制,在某些情况下完全依赖其进行PII检测等关键任务并非最佳选择。然而,在实际应用中这一方法展现了指令提示技术的巨大潜力。实际上,在编程方面可能会遇到诸多挑战的情况下,并非所有的复杂问题都需要通过编写完整的程序或服务来解决——只需撰写一个简洁明了的提示指令就能高效解决多个问题。

Takeaways 要点

"Crafting an outstanding prompt for a chatbot persona represents a highly impactful skill and serves as an early instance of integrating minimal natural language into programming."—_Sam Altman
"Creating superb prompts for chatbot personas is both highly effective and demonstrates the use of small portions of natural language in programming."——Sam Altman

If no other lessons emerge from this overview, one takeaway is that designing effective prompts (i.e., prompt engineering) plays a significant role in practically applying large language models (LLMs). Language models exhibit remarkable versatility in processing text due to their architecture, enabling them to address diverse tasks efficiently. To enhance their performance, it's essential to ensure that these models are provided with comprehensive and contextually relevant information. While optimal prompting strategies vary depending on the specific model and task at hand, there are several overarching principles that can significantly boost the likelihood of success.

Transitioning from zero to few-shot learning, these large language models (LLMs) benefit from extensive pre-training today often including fine-tuning capabilities. This approach allows them to handle an enormous amount of information enabling them to perform well across various tasks without additional input. To achieve this goal we provide the model with task descriptions and relevant inputs which it then uses to generate accurate outputs. However zero-shot learning is inherently limited by the limited context it receives. To enhance its performance we should employ few-shot learning by adding exemplars to the prompt.

遵循指令的LLM。尽管它表现良好,但小样本学习通常会消耗大量令牌——鉴于大多数LLM的上下文窗口有限——这成为一个问题。为了应对这一问题——我们可以采用一种指令提示方法——通过提供LLM所需行为的精确文本描述而非用正确输出的具体示例来捕获这种行为——这种方法非常强大——但它需要经过微调(例如通过指令调整或RLHF)后才能充分发挥作用。经过预先培训的LLM并不擅长遵循开箱即用的指令

窍门和技巧. 现代即时工程(Prompt engineering)拥有多种独特的窍门和最佳实践可供采用。通常情况下,在每个新模型发布时会引入新的技术(例如,在处理非结构化提示方面的表现上,GPT-4较之前的模型有了显著提升[2]),但仍然有一些原则在过去相当长的时间里一直适用。首先,在开发提示时,请尽量从简单的提示开始,并逐步增加复杂性;在开发提示时,请力求具体且详细;同时避免过于冗长(因为上下文窗口有限)。最后,在真正提升LLM性能方面,请尽量采用少样本学习、指令提示或更为复杂的途径

Closing Remarks 结束语

感谢您抽出时间阅读这篇文章。我是Cameron R. Wolfe博士,在Rebuy公司担任人工智能总监(Director of AI)。我的研究集中在深度学习的实证与理论基础。如需了解更多信息,请访问我的edium专栏“其他文章”。如果您喜欢它,请在Twitter上关注我(@cwolferesearch)或者订阅我的《深度(学习)焦点》电子报……其中我将通过易于理解的方式对热门AI论文进行概述帮助读者更深入地理解相关主题

Bibliography 参考书目

[1] Raffel, Colin, et al. “Systematically exploring the boundaries of transfer learning through a unified text-to-text transformer model.” The Journal of Machine Learning Research 21.1 (2020): 5485–5551.
[1] 拉斐尔、科林等人。 “通过系统性整合文本到文本转换器模型深入探讨迁移学习的边界。”机器学习研究期刊 21.1 (2020): 5485–5551。

[2] Saravia, Elvis, and others. "Prompt Engineering Manual," GitHub - dair-ai/Prompt-Engineering-Guide: A repository containing guides、papers、lectures、notebooks and other resources related to prompt engineering (2022).
[2] 萨拉维亚、埃尔维斯和其他人。“提示工程指南手册”,https://github.com/dair-ai/Prompt-Engineering-Guide (2022)。

[3] Weng, Lilian. (Mar 2023). Prompt Engineering. Lil’Log. https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/.
[3] 翁莉莲. (2023 年 3 月)。及时工程。莉尔·洛格。 https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/。

[4] Radford, Alec, et al. “基于生成预训练提升语言理解.” (2018).
[4] Alexey Rostovtsev等人。“基于生成预训练提升语言理解。”(2018).

[5] Radford et al. introduced that language models represent a class of unsupervised multi-task learning architectures.
[5] 亚历克·雷德福及其同事提出了一种无监督多任务学习架构的语言模型。

[6] Brown, Tom, et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877–1901.
[6] 汤姆·布朗等人。 “语言模型是小样本学习者。”神经信息处理系统的进展33(2020):1877-1901。

[7] Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. 进行校准前使用:提升小样本学习能力的语言模型优化研究。ICML。
[7] 托尼·赵、埃里克·华莱士、石峰、丹·克莱因和萨米尔·辛格。 2021. 进行校准前使用:提升小样本学习能力的语言模型优化研究。ICML。

[8] Su, Hongjin, et al. "Strategic selection of annotations enhances the effectiveness of language models in few-shot learning tasks." arXiv preprint arXiv:2209.01975 (2022).
[8] 苏洪进, 等. "智能标注策略有助于语言模型在few-shot learning中更加有效地应用于各种场景。" arXiv 预印本 arXIV:2209.01975 (2022).

[9] Diao Shizhe and her colleagues present "Active prompting enhanced by chain-of-thought mechanisms in large language models." This research was published as an arXiv preprint under the identifier arXiv:2302.12246 in 《arXiv》.
[9] 刁世哲及其团队则提出"思想链主动提示"这一方法论框架,并在《arXiv》发表于编号为arXiv: 《思想链主动提示》发布于 arXiv 预印本(编号:arXIV: 《思想链主动提示》发布于 arxIV 预印本(编号:arxIV:

[10] Liu, Jiachang, et al. “What Makes Good In-Context Examples for GPT-3 ?.” arXiv preprint arXiv:2101.06804 (2021).
[10] 刘家昌,等。 “什么是 GPT-3 美元的良好上下文示例?” arXiv 预印本 arXiv:2101.06804 (2021)。

[11] Wei, Jason, et al. “思维链提示能够促进大型语言模型的推理过程。” arXiv预印本 arXiv:2201.11903 (2022).

[12] Wei, Jason, et al. “Retrained language models are zero-sample learners.” arXiv preprint arXiv:2109.01652 (2021).
[12] 魏杰森等人。 “经过重训练的语言模型属于零样本学习者。” arXiv 预印本 arXiv:2109.01652 (2021)。

[13] Chen, Mark, et al. “Evaluating large language models trained on code.” arXiv preprint arXiv:2107.03374 (2021).
[13] 陈马克等人。 “评估在代码上训练的大型语言模型。” arXiv 预印本 arXiv:2107.03374 (2021)。

[14] Ouyang, Long, 等. “基于人类反馈指导训练语言模型.” Advances in Neural Information Processing Systems 35 (2022): 27730–27744.
[14] 欧阳龙, 等。 “基于人类反馈指导训练语言模型.”神经信息处理系统的进展 35 (2022): 27730–27744.

[15] Touvron, Hugo, et al. “Llama 是一个开放且高效的 基础 语言模型。” _arXiv 预印本 arXiv:2302.13971 (2023).
[15] 雨果图夫龙等人。“Llama 是一个开放且高效的 基础 语言模型。” arXiv 预印本 arXIV:230₂.₁₃₉₇₁ (₂₀₂₃).

[16] Iyery, Srinivasan et al. 'OPT-IML: Expanding Language Model Instruction Meta-Learning from the Perspective of Generalization.'
[16] 艾耶及其合著者‘OPT-IML:从泛化的角度来看扩展语言模型指令级元学习’。

[17] Glaese, Amelia, et al. “Enhancing coherence in conversational systems through human evaluation strategies.” ... (2022).
[17] 格莱泽、阿米莉亚等人。 “研究团队通过人类评估策略来优化对话一致性和协调性。” ... 预印本 (2022)。

[18] Thoppilan, Romal et al. “LAMDA: Language Models for Dialog Applications.” ArXIV Preprint, https://arxiv.org/abs/... (年份).
[18] 托皮兰、罗马尔等人。“LAMDA:对话应用程序的语言模型。” ArXIV预印本, https://arxiv.org/abs/... (年份)。

https://towardsdatascience.com/practical-prompt-engineering-74e96130abc4

全部评论 (0)

还没有任何评论哟~