Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

阅读量：

基本信息

📝 原文链接 : https://arxiv.org/abs/2502.06635
👥 作者 : Qingshui Gu, Shu Li, Tianyu Zheng, Zhaoxiang Zhang
🏷️ 关键词 : Soft Mixture of Experts, open-source, resource-efficient, Chinese-centric LLM, Flash Attention
📚 分类 : 自然语言处理, 机器学习

摘要

中文摘要

Steel-LLM 是一个以中文为中心的从零开始开发的语言模型，旨在在有限的计算资源下创建一个高质量、开源的模型。该项目于2024年3月启动，目标是基于大规模数据集训练一个拥有10亿参数的模型，优先考虑透明度和实用见解的共享，以帮助社区中的其他人。训练过程主要侧重于中文数据，包含一小部分英文数据，通过提供更详细和实用的模型构建历程来弥补现有开源LLM的不足。Steel-LLM在CEVAL和CMMLU等基准测试中表现出色，超越了来自更大机构的早期模型。本文对项目的关键贡献进行了全面总结，包括数据收集、模型设计、训练方法以及过程中遇到的挑战，为希望开发自己LLM的研究人员和从业者提供了宝贵的资源。模型检查点和训练脚本可在 https://github.com/zhanshijinwat/Steel-LLM 获取。

原文摘要

Steel-LLM is a Chinese-centric language model developed from scratch with the goal of creating a high-quality, open-source model despite limited computational resources. Launched in March 2024, the project aimed to train a 1-billion-parameter model on a large-scale dataset, prioritizing transparency and the sharing of practical insights to assist others in the community. The training process primarily focused on Chinese data, with a small proportion of English data included, addressing gaps in existing open-source LLMs by providing a more detailed and practical account of the model-building journey. Steel-LLM has demonstrated competitive performance on benchmarks such as CEVAL and CMMLU, outperforming early models from larger institutions. This paper provides a comprehensive summary of the project’s key contributions, including data collection, model design, training methodologies, and the challenges encountered along the way, offering a valuable resource for researchers and practitioners looking to develop their own LLMs. The model checkpoints and training script are available at https://github.com/zhanshijinwat/Steel-LLM.

论文解读

一句话总结

本文介绍了Steel-LLM，一个基于有限计算资源开发的、以中文为中心的开源语言模型，通过提供透明度和实用见解，帮助社区其他成员进行模型构建。

…

阅读全文 ：https://www.llamafactory.cn/daily-paper/detail/?id=1160

全部评论 (0)

还没有任何评论哟~

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

基本信息 📝原文链接:https://arxiv.org/abs/2502.06635 👥作者:QingshuiGu,ShuLi,TianyuZheng,ZhaoxiangZhang 🏷️关键词...

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

本文是LLM系列文章，针对《ChineseTinyLLM:PretrainingaChineseCentricLargeLanguageModel》的翻译。

Building A Neural Network From Scratch in Python

作者：禅与计算机程序设计艺术 1.简介人工神经网络（ArtificialNeuralNetworks,ANN）是一种模仿人脑神经元网络的计算模型。它由输入层、输出层、隐藏层组成，其中每层包括多个神经...

Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source LLM

文章主要内容总结研究背景针对通用大语言模型在宗教领域（如古兰经研究）回答问题时存在的幻觉问题和准确性不足，本研究提出结合检索增强生成（RAG）框架，通过集成领域特定知识（古兰经章节描述数据集）提升...

Weight Initialization in Neural Networks: A Journey From the Basics to Kaiming

我想邀请您加入我的探索，通过不同的方法初始化神经网络中的图层权重。通过各种简短的实验和思想练习，我们将逐步发现为什么在训练深度神经网络时足够的重量初始化非常重要。在此过程中，我们将介绍研究人员多年来提...

LLM in a flash

苹果这项新工作将为未来iPhone加入大模型的能力带来无限想象力。CPU推理提升4到5倍，苹果用闪存加速大模型推理，Siri2.0要来了？近年来，GPT3、OPT和PaLM等大型语言模型（LLM）在...

A Guide to Building a Machine Learning Pipeline at Scal

作者：禅与计算机程序设计艺术 1.简介自从人工智能（AI）这个词被提出后，无论是在科技还是产业界，人们对其产生了浓厚兴趣。随着移动互联网、云计算、大数据等技术的迅猛发展，机器学习也经历了一个重要的变...

第二篇：A Guide To Building A KE Architecture From Start To

作者：禅与计算机程序设计艺术 1.简介：KE（KnowledgeExtraction）即知识抽取，是一种通过文本、图像等媒体信息自动提取结构化数据的计算机科学领域。如此一来，用户就不需要手动的去整理...

Building a Hybrid Cloud Infrastructure in AWS

作者：禅与计算机程序设计艺术 1.简介当今世界正在发生着前所未有的变化。从工业革命到信息时代的到来，人类已经进入了新纪元。在这个过程中，传统的单一IT基础设施逐渐被拆解、重组并升级。这个过程对公司、...

The AI Journey: A Beginner’s Guide to Artificial Intelligence

作者：禅与计算机程序设计艺术文章目录 1.背景介绍 2\.CoreConceptsandConnection 2.1.SupervisedvsUnsupervisedLearning 2.2.Rei...

是否确定退出登录?

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

基本信息

摘要

中文摘要

原文摘要

论文解读

一句话总结

全部评论 (0)

相关文章推荐

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Building A Neural Network From Scratch in Python

Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source LLM

Weight Initialization in Neural Networks: A Journey From the Basics to Kaiming

LLM in a flash

A Guide to Building a Machine Learning Pipeline at Scal

第二篇：A Guide To Building A KE Architecture From Start To

Building a Hybrid Cloud Infrastructure in AWS

The AI Journey: A Beginner’s Guide to Artificial Intelligence