Why RAG is slower than LLM?
I deployed RAG using LLAMA3 in my AI bot project. I discovered that employing RAG alongside ChromaDB proves significantly less efficient compared to directly invoking a language model. After analyzing the test outcomes, a single web page containing approximately 1,000 words now requires over 2 seconds to retrieve information from.
我结合RAG并用LLAMA3来构建AI机器人。我发现这种基于chromadb架构的RAG系统在性能上明显落后于直接调用LLM(大型语言模型)。经过测试分析后发现,在仅用于测试性能时检索一个约1000词的小网页需要花费超过2秒的时间。
Time used for retrieving: 2.245511054992676
Time used for LLM: 2.1182022094726562
Here is my simple code: 这是我的简单代码:
embeddings = OllamaEmbeddings(model="llama3")
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
retriever = vectorstore.as_retriever()
question = "What is COCONut?"
start = time.time()
retrieved_docs = retriever.invoke(question)
formatted_context = combine_docs(retrieved_docs)
end = time.time()
print(f"Time used for retrieving: {end - start}")
start = time.time()
answer = ollama_llm(question, formatted_context)
end = time.time()
print(f"Time used for LLM: {end - start}")
I noticed that when the size of my chromaDB approached approximately 1.4 million, the retrieval process took over 20 seconds, while LLM queries completed in just under 3 to 4 seconds. Could there be something I overlooked? Or does RAG technology truly operate at such a reduced speed?
在使用chromaDB达到约1.4 million的数据量时,在查询效率方面确实存在明显差距:检索耗时明显超过20秒;相比之下,在调用大型语言模型LLM仅需约3至4秒的情况下;这让我感到困惑:是否是我的方法存在不足?还是说RAG技术的效率本身就有限?
参考回答:
RAG models are slowed down compared with LLMs because of the additional retrieval process.
相较于大型语言模型LLMs而言,检索增强生成(RAG)模型由于多了一个检索环节而运行速度较慢。
Since RAG models rely on precomputed data to retrieve relevant information, this process can become inefficient, particularly when dealing with extensive datasets, leading to slower performance. In contrast, LLMs offer quicker responses by building upon established knowledge without requiring extensive database searches.
由于RAG模型需要在数据库中进行信息检索操作的缘故,在处理大数据集时特别费时间
It is imperative to also observe that LLMs do not possess the most up-to-date and precise information that RAG models typically rely on, which generally access external data sources and are capable of delivering highly detailed responses when leveraging the most recent information.
还需特别注意的是,在与现有的RAG(检索增强生成)模型相比时,在大型语言模型(LLMs)中可能存在一定的局限性。具体而言,在无法直接获取最新的或特定的相关信息时可能会出现不足之处。这是因为检索增强生成技术通常依赖于调用外部数据资源的能力,并能够调用这些资源以获取所需的信息内容;因此,在这种情况下这类系统一般能够调用外部数据资源,并利用最新信息来生成更为详尽的回答。
Thus, Though RAG models may be slower than other approaches, they offer an edge in terms of response quality and relevance when dealing with complex and information-rich queries. I look forward to assisting you,
鉴于此

