当前位置：首页 > news >正文

【RAG】【vector_stores034】Elasticsearch基础示例分析

news 2026/6/26 5:19:24

1. 案例目标

本案例展示了如何使用Elasticsearch作为向量存储与LlamaIndex集成的基础用法。通过将Paul Graham的文章分割成块，使用开源嵌入模型进行向量化，加载到Elasticsearch中，然后进行查询，演示了基本的向量检索功能。

2. 技术栈与核心依赖

Elasticsearch: 支持全文搜索和向量搜索的搜索数据库
llama-index-vector-stores-elasticsearch: LlamaIndex与Elasticsearch集成的向量存储包
llama-index-embeddings-huggingface: LlamaIndex与Hugging Face嵌入模型集成包
llama-index: LlamaIndex核心框架
BAAI/bge-small-en-v1.5: 用于文本嵌入的开源模型

3. 环境配置

3.1 依赖安装

%pip install -qU llama-index-vector-stores-elasticsearch llama-index-embeddings-huggingface llama-index

3.2 导入必要的库

# import from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.vector_stores.elasticsearch import ElasticsearchStore from llama_index.core import StorageContext

3.3 设置OpenAI API密钥

# set up OpenAI import os import getpass os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

注意：虽然示例中设置了OpenAI API密钥，但在本示例中实际使用的是Hugging Face的嵌入模型，而不是OpenAI的嵌入模型。API密钥可能是用于查询生成部分。

4. 案例实现

4.1 数据准备

下载Paul Graham的文章数据：

!mkdir -p 'data/paul_graham/' !wget -nv 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

4.2 配置嵌入模型

使用Hugging Face的BAAI/bge-small-en-v1.5模型作为嵌入模型：

from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import Settings # define embedding function Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5" )

4.3 加载文档并创建索引

# load documents documents = SimpleDirectoryReader("./data/paul_graham/").load_data() # define index vector_store = ElasticsearchStore( es_url="http://localhost:9200", # 更多认证选项请参见Elasticsearch Vector Store文档 index_name="paul_graham_essay", ) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )

提示：

确保Elasticsearch服务在http://localhost:9200上运行
如果使用远程Elasticsearch或需要认证，请参考Elasticsearch Vector Store文档中的认证选项
index_name参数指定了在Elasticsearch中创建的索引名称

4.4 查询数据

# Query Data query_engine = index.as_query_engine() response = query_engine.query("What did the author do growing up?") print(response)

示例输出：

The author worked on writing and programming outside of school. They wrote short stories and tried writing programs on an IBM 1401 computer. They also built a microcomputer kit and started programming on it, writing simple games and a word processor.

5. 案例效果

通过本示例，用户可以：

成功将Paul Graham的文章加载到Elasticsearch向量存储中
使用开源嵌入模型(BAAI/bge-small-en-v1.5)对文本进行向量化
执行自然语言查询，获取与问题相关的文本段落
基于向量相似性检索相关内容，而不仅仅是关键词匹配

6. 案例实现思路

环境准备：安装必要的依赖包，包括Elasticsearch向量存储和Hugging Face嵌入模型
数据获取：下载Paul Graham的文章作为示例数据
嵌入模型配置：设置Hugging Face的BAAI/bge-small-en-v1.5模型作为文本嵌入模型
向量存储初始化：创建ElasticsearchStore实例，连接到本地Elasticsearch服务
索引构建：使用StorageContext和VectorStoreIndex从文档构建向量索引
查询执行：创建查询引擎并执行自然语言查询
结果展示：输出查询结果，展示向量检索的效果

7. 扩展建议

高级检索策略：参考[Elasticsearch Vector Store](https://docs.llamaindex.ai/en/stable/examples/vector_stores/ElasticsearchIndexDemo/)示例，探索不同的检索策略，如密集向量检索、稀疏向量检索、关键词搜索和混合搜索
元数据过滤：添加文档元数据并实现基于元数据的过滤查询
自定义嵌入模型：尝试其他开源嵌入模型或训练自定义嵌入模型
批量处理：优化大批量文档的处理流程，提高索引构建效率
查询优化：调整查询参数，如top_k、相似度阈值等，以获得更精确的搜索结果
安全配置：为Elasticsearch添加认证和授权机制，保护数据安全
性能监控：监控Elasticsearch的性能指标，优化查询响应时间
多语言支持：配置适合不同语言的分析器和嵌入模型

8. 总结

Elasticsearch基础示例展示了如何将Elasticsearch作为向量存储与LlamaIndex集成，实现基本的向量检索功能。通过使用开源的嵌入模型和Elasticsearch的强大搜索能力，用户可以构建高效的语义搜索系统。这个示例为更复杂的向量检索应用奠定了基础，如文档问答系统、内容推荐引擎等。Elasticsearch的可扩展性和丰富的查询功能使其成为构建大规模向量检索应用的理想选择。

查看全文

http://www.jsqmd.com/news/628908/