当前位置：首页 > news >正文

【RAG】【vector_stores033】Elasticsearch自动检索

news 2026/6/26 4:52:30

案例目标

本案例展示了如何使用Elasticsearch向量存储与LlamaIndex实现自动检索功能。自动检索是一种高级检索技术，它可以根据自然语言查询自动推断出适当的元数据过滤条件和查询字符串。

通过本案例，您将学习到：

如何使用Elasticsearch作为向量存储后端
如何定义向量存储的元数据信息
如何使用VectorIndexAutoRetriever实现自动检索
如何让LLM自动推断元数据过滤条件
如何结合语义搜索和元数据过滤实现更精确的检索

自动检索

根据自然语言查询自动推断元数据过滤条件

向量存储

使用Elasticsearch作为向量存储后端

元数据过滤

支持多种元数据类型的过滤条件

语义搜索

结合语义搜索和元数据过滤提高检索精度

技术栈与核心依赖

本案例使用以下技术栈和依赖：

LlamaIndexElasticsearchOpenAIPython

核心依赖包：

llama-index-vector-stores-elasticsearch llama-index openai

关键组件：

ElasticsearchStore: 连接Elasticsearch向量存储
VectorIndexAutoRetriever: 实现自动检索功能
VectorStoreInfo: 定义向量存储和元数据信息
MetadataInfo: 描述元数据字段信息

环境配置

1. 安装必要的依赖包

pip install llama-index-vector-stores-elasticsearch pip install llama-index

2. 配置OpenAI API密钥

import os import getpass os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") import openai openai.api_key = os.environ["OPENAI_API_KEY"]

3. 导入必要的依赖

import logging import sys logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout)) from llama_index.core import VectorStoreIndex, StorageContext from llama_index.vector_stores.elasticsearch import ElasticsearchStore from llama_index.core.schema import TextNode from llama_index.core.retrievers import VectorIndexAutoRetriever from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo

4. 启动Elasticsearch服务

确保Elasticsearch服务在本地运行，默认地址为 http://localhost:9200

案例实现

1. 定义示例数据

# 定义带有元数据的文本节点 nodes = [ TextNode( text=( "A bunch of scientists bring back dinosaurs and mayhem breaks" " loose" ), metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"}, ), TextNode( text=( "Leo DiCaprio gets lost in a dream within a dream within a dream" " within a ..." ), metadata={ "year": 2010, "director": "Christopher Nolan", "rating": 8.2, }, ), TextNode( text=( "A psychologist / detective gets lost in a series of dreams within" " dreams within dreams and Inception reused the idea" ), metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6}, ), TextNode( text=( "A bunch of normal-sized women are supremely wholesome and some" " men pine after them" ), metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3}, ), TextNode( text="Toys come alive and have a blast doing so", metadata={"year": 1995, "genre": "animated"}, ), ]

2. 构建Elasticsearch向量索引

# 创建Elasticsearch向量存储 vector_store = ElasticsearchStore( index_name="auto_retriever_movies", es_url="http://localhost:9200" ) storage_context = StorageContext.from_defaults(vector_store=vector_store) # 创建向量索引 index = VectorStoreIndex(nodes, storage_context=storage_context)

3. 定义VectorIndexAutoRetriever

# 定义向量存储信息 vector_store_info = VectorStoreInfo( content_info="Brief summary of a movie", metadata_info=[ MetadataInfo( name="genre", description="The genre of the movie", type="string or list[string]", ), MetadataInfo( name="year", description="The year the movie was released", type="integer", ), MetadataInfo( name="director", description="The name of the movie director", type="string", ), MetadataInfo( name="rating", description="A 1-10 rating for the movie", type="float", ), ], ) # 创建自动检索器 retriever = VectorIndexAutoRetriever( index, vector_store_info=vector_store_info )

4. 执行自动检索查询

# 查询1：查找Christopher Nolan在2020年前导演的电影 results = retriever.retrieve( "What are 2 movies by Christopher Nolan were made before 2020?" ) print(results)

# 查询2：查找Andrei Tarkovsky导演的科幻电影 results = retriever.retrieve("Has Andrei Tarkovsky directed any science fiction movies") print(results)

案例效果

本案例展示了Elasticsearch自动检索的强大功能，实现了以下效果：

自动元数据过滤：系统能够根据自然语言查询自动推断出适当的元数据过滤条件
语义搜索增强：结合语义搜索和元数据过滤，提高检索的精确度
灵活查询处理：可以处理各种复杂的查询场景，包括多条件组合查询
智能查询理解：系统能够理解查询意图，并自动选择最合适的检索策略

关键观察：

对于"What are 2 movies by Christopher Nolan were made before 2020?"查询，系统自动推断出director="Christopher Nolan"和year<2020的过滤条件
对于"Has Andrei Tarkovsky directed any science fiction movies?"查询，系统自动推断出director="Andrei Tarkovsky"和genre="science fiction"的过滤条件
系统不仅应用了元数据过滤，还使用了语义搜索，确保检索结果的语义相关性

输出示例： INFO:llama_index.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using query str: science fiction Using query str: science fiction INFO:llama_index.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using filters: {'director': 'Andrei Tarkovsky'} Using filters: {'director': 'Andrei Tarkovsky'} INFO:llama_index.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using top_k: 2 Using top_k: 2