transformers-Generation with LLMs

阅读量：

该文本介绍了如何使用Hugging Face Transformers库训练和使用大型语言模型（LLM），重点在于以下内容：
模型结构：展示了如何从预训练权重加载模型并将其部署到GPU上进行推理。
tokenization：讨论了如何配置tokenizer进行左填充以适应仅解码器架构的需求。
生成策略：解释了generate函数的不同调用方式及其对输出长度和多样性的影响。
上下文管理：强调了通过应用聊天模板确保正确的提示格式以获得预期的结果。
常见问题：指出了常见的错误提示问题及其解决方案。
该摘要涵盖了文本的核心内容，包括模型构建、tokenization方法、生成控制以及正确提示的重要性，并保持在合理字数范围内。

[https://huggingface.co/docs/transformers/main/en/llm_tutorial

终止条件是通过模型自行决定的，该模型应具备识别并生成（EOS）标记的能力。若该情形未能实现，则将在设定的一个最大序列长度内终止生成过程。

复制代码

 from transformers import AutoModelForCausalLM

    
  
    
 model = AutoModelForCausalLM.from_pretrained(
    
     "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
    
 )
    
    
    
    
    AI助手

复制代码

 from transformers import AutoTokenizer

    
  
    
 tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")
    
 model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to("cuda")
    
    
    
    
    AI助手

复制代码

 generated_ids = model.generate(**model_inputs)

    
 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
 'A list of colors: red, blue, green, yellow, orange, purple, pink,'
    
    
    
    
    AI助手

复制代码

 tokenizer.pad_token = tokenizer.eos_token  # Most LLMs don't have a pad token by default

    
 model_inputs = tokenizer(
    
     ["A list of colors: red, blue", "Portugal is"], return_tensors="pt", padding=True
    
 ).to("cuda")
    
 generated_ids = model.generate(**model_inputs)
    
 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
    
 ['A list of colors: red, blue, green, yellow, orange, purple, pink,',
    
 'Portugal is a country in southwestern Europe, on the Iber']
    
    
    
    
    AI助手

生成策略有很多，

生成结果太短或太长

当GenerationConfig文件未配置时，默认情况下generate方法会生成不超过20个标记。请确保在使用generate方法时配置max_new_tokens参数以限制生成的标记数量。特别注意，在LLM系统中（更准确地说是其解码器组件），输入提示会被包含在输出结果中。

复制代码

 model_inputs = tokenizer(["A sequence of numbers: 1, 2"], return_tensors="pt").to("cuda")

    
  
    
 # By default, the output will contain up to 20 tokens
    
 generated_ids = model.generate(**model_inputs)
    
 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
 'A sequence of numbers: 1, 2, 3, 4, 5'
    
  
    
 # Setting `max_new_tokens` allows you to control the maximum length
    
 generated_ids = model.generate(**model_inputs, max_new_tokens=50)
    
 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
 'A sequence of numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,'
    
    
    
    
    AI助手

生成模式不正确

通常情况下，在每一次循环过程里，默认策略下会采用生成器函数来完成对最高概率标记的选择（贪心解码方法）。不过，在特殊配置文件中有特别设置的时候，则需要采取其他处理方式。

复制代码

 # Set seed or reproducibility -- you don't need this unless you want full reproducibility

    
 from transformers import set_seed
    
 set_seed(42)
    
  
    
 model_inputs = tokenizer(["I am a cat."], return_tensors="pt").to("cuda")
    
  
    
 # LLM + greedy decoding = repetitive, boring output
    
 generated_ids = model.generate(**model_inputs)
    
 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
 'I am a cat. I am a cat. I am a cat. I am a cat'
    
  
    
 # With sampling, the output becomes more creative!
    
 generated_ids = model.generate(**model_inputs, do_sample=True)
    
 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
 'I am a cat.  Specifically, I am an indoor-only cat.  I'
    
    
    
    
    AI助手

边缘填充错误

该LLM架构仅包含解码器组件，在处理文本时能够通过迭代机制逐步优化输出结果。当输入序列长度与模型预期不符时，这通常需要对输入进行补足操作。需要注意的是这类模型在训练过程中并未特别针对填充标记进行优化因此建议采用左对齐策略完成填充工作。此外为了确保生成过程的准确性必须将注意力掩码参数传递给generate函数以便正确管理模型注意力机制！

复制代码

 # The tokenizer initialized above has right-padding active by default: the 1st sequence,

    
 # which is shorter, has padding on the right side. Generation fails to capture the logic.
    
 model_inputs = tokenizer(
    
     ["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt"
    
 ).to("cuda")
    
 generated_ids = model.generate(**model_inputs)
    
 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
 '1, 2, 33333333333'
    
  
    
 # With left-padding, it works as expected!
    
 tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")
    
 tokenizer.pad_token = tokenizer.eos_token  # Most LLMs don't have a pad token by default
    
 model_inputs = tokenizer(
    
     ["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt"
    
 ).to("cuda")
    
 generated_ids = model.generate(**model_inputs)
    
 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
 '1, 2, 3, 4, 5, 6,'
    
    
    
    
    AI助手

错误的prompt

某些模型和任务依赖于明确指定的输入形式才能正常运行。如果不遵循该格式，“模型可能运行良好；然而，在遵循预设指示的情况下执行时的效果会更好。”

复制代码

 tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")

    
 model = AutoModelForCausalLM.from_pretrained(
    
     "HuggingFaceH4/zephyr-7b-alpha", device_map="auto", load_in_4bit=True
    
 )
    
 set_seed(0)
    
 prompt = """How many helicopters can a human eat in one sitting? Reply as a thug."""
    
 model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
    
 input_length = model_inputs.input_ids.shape[1]
    
 generated_ids = model.generate(**model_inputs, max_new_tokens=20)
    
 print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])
    
 "I'm not a thug, but i can tell you that a human cannot eat"
    
 # Oh no, it did not follow our instruction to reply as a thug! Let's see what happens when we write
    
 # a better prompt and use the right template for this model (through `tokenizer.apply_chat_template`)
    
  
    
 set_seed(0)
    
 messages = [
    
     {
    
     "role": "system",
    
     "content": "You are a friendly chatbot who always responds in the style of a thug",
    
     },
    
     {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
    
 ]
    
 model_inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
    
 input_length = model_inputs.shape[1]
    
 generated_ids = model.generate(model_inputs, do_sample=True, max_new_tokens=20)
    
 print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])
    
 'None, you thug. How bout you try to focus on more useful questions?'
    
 # As we can see, it followed a proper thug style 😎
    
    
    
    
    AI助手

全部评论 (0)

还没有任何评论哟~

transformers-Generation with LLMs

https://huggingface.co/docs/transformers/main/en/llmtutorialhttps://huggingface.co/docs/transformers...

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

本文是LLM系列文章，针对《LongRAG:EnhancingRetrievalAugmentedGenerationwithLongcontextLLMs》的翻译。

【PaperReading】Text Generation from Knowledge Graphs with Graph Transformers

TextGenerationfromKnowledgeGraphswithGraphTransformers 基于图转换器从知识图谱中生成文本 RikKoncelKedziorski1,Dhanush...

Natural Language Generation using Transformers and Seq2

作者：禅与计算机程序设计艺术 1.简介自然语言生成NLG任务旨在通过计算机系统生成人类可以理解的文本、图像或音频等多种形式的输出。这一领域的研究已经有了长足进步，基于神经网络的模型已取得令人满意的成...

VideoGPT：Video Generation using VQ-VAE and Transformers

introduction 对于视频展示，选择哪种模型比较好？基于似然transformers自回归。在没有空间和时间溶于的降维潜在空间中进行自回归建模是否优于在所有空间和时间像素级别上的建模？选择前者...

transformers之text generation解码策略

目录参数 Temperature ToppandTopk 1\.选择最上面的token:贪婪解码 2\.从最上面的tokens中选择:topk 3\.从概率加起来为15%的toptoken中选择:t...

Scalable diffusion models with transformers

扩散模型之DiT：纯Transformer架构知乎扩散模型大部分是采用UNet架构来进行建模，UNet可以实现输出和输入一样维度，所以天然适合扩散模型。扩散模型使用的UNet除了包含基于残差的卷积模块...

Boosting Crowd Counting with Transformers

TAM与RTM（人群计数）提出问题：通过将更大的上下文集成到卷积神经网络（CNN）中，在人群计数问题上取得了重大进展。解决方法：研究了全局上下文在人群计数中的作用。提出了两个新的模块：toke...

Going deeper with Image Transformers

1、引言论文链接： <https://openaccess.thecvf.com/content/ICCV2021/papers/TouvronGoingDeeperWithImageTransfo...

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

本文是LLM系列文章，针对《RAGFoundry:AFrameworkforEnhancingLLMsforRetrieval AugmentedGeneration》的翻译。

是否确定退出登录?

transformers-Generation with LLMs

全部评论 (0)

相关文章推荐

transformers-Generation with LLMs

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

【PaperReading】Text Generation from Knowledge Graphs with Graph Transformers

Natural Language Generation using Transformers and Seq2

VideoGPT：Video Generation using VQ-VAE and Transformers

transformers之text generation解码策略

Scalable diffusion models with transformers

Boosting Crowd Counting with Transformers

Going deeper with Image Transformers

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation