当前位置：首页 > news >正文

【RAG】【vector_stores047】Lantern向量存储索引示例

news 2026/6/22 3:26:16

案例目标

本案例演示如何使用PostgreSQL数据库和Lantern扩展与LlamaIndex框架结合，实现高效的向量搜索和混合搜索功能。主要目标包括：

展示如何创建基于Lantern的向量索引
演示如何使用HNSW索引参数优化搜索性能
展示如何实现混合搜索（向量搜索+全文搜索）
演示如何从现有向量存储创建索引
展示如何配置文本搜索语言参数

技术栈与核心依赖

核心技术

LlamaIndex: 用于构建文档索引和查询的框架
PostgreSQL: 关系型数据库，作为向量存储的基础
Lantern: PostgreSQL的向量扩展，提供向量搜索功能
OpenAI: 用于生成文本嵌入向量

核心依赖

pip install llama-index-vector-stores-lantern
pip install llama-index-embeddings-openai
pip install psycopg2-binary
pip install asyncpg

环境配置

在开始之前，需要进行以下环境配置：

1. 安装必要的依赖包

%pip install llama-index-vector-stores-lantern %pip install llama-index-embeddings-openai !pip install psycopg2-binary llama-index asyncpg

2. 配置OpenAI API密钥

import os os.environ["OPENAI_API_KEY"] = "<your_key>" openai.api_key = "<your_key>"

3. 配置嵌入模型

from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.core import Settings # 设置全局嵌入模型 Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

4. 创建PostgreSQL数据库

import psycopg2 connection_string = "postgresql://postgres:postgres@localhost:5432" db_name = "postgres" conn = psycopg2.connect(connection_string) conn.autocommit = True with conn.cursor() as c: c.execute(f"DROP DATABASE IF EXISTS {db_name}") c.execute(f"CREATE DATABASE {db_name}")

案例实现

1. 导入必要的库

from llama_index.core import SimpleDirectoryReader, StorageContext from llama_index.core import VectorStoreIndex from llama_index.vector_stores.lantern import LanternVectorStore import textwrap import openai from sqlalchemy import make_url

2. 加载文档数据

# 创建目录并下载数据 !mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt' # 加载文档 documents = SimpleDirectoryReader("./data/paul_graham").load_data() print("Document ID:", documents[0].doc_id)

3. 创建Lantern向量存储和索引

url = make_url(connection_string) vector_store = LanternVectorStore.from_params( database=db_name, host=url.host, password=url.password, port=url.port, user=url.username, table_name="paul_graham_essay", embed_dim=1536, # openai embedding dimension ) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, show_progress=True ) query_engine = index.as_query_engine()

4. 执行查询

# 查询作者做了什么 response = query_engine.query("What did the author do?") print(textwrap.fill(str(response), 100)) # 查询1980年代中期发生了什么 response = query_engine.query("What happened in the mid 1980s?") print(textwrap.fill(str(response), 100))

5. 从现有向量存储创建索引

vector_store = LanternVectorStore.from_params( database=db_name, host=url.host, password=url.password, port=url.port, user=url.username, table_name="paul_graham_essay", embed_dim=1536, # openai embedding dimension m=16, # HNSW M parameter ef_construction=128, # HNSW ef construction parameter ef=64, # HNSW ef search parameter ) index = VectorStoreIndex.from_vector_store(vector_store=vector_store) query_engine = index.as_query_engine() response = query_engine.query("What did the author do?") print(textwrap.fill(str(response), 100))

6. 实现混合搜索

# 创建支持混合搜索的向量存储 hybrid_vector_store = LanternVectorStore.from_params( database=db_name, host=url.host, password=url.password, port=url.port, user=url.username, table_name="paul_graham_essay_hybrid_search", embed_dim=1536, # openai embedding dimension hybrid_search=True, text_search_config="english", ) storage_context = StorageContext.from_defaults( vector_store=hybrid_vector_store ) hybrid_index = VectorStoreIndex.from_documents( documents, storage_context=storage_context ) # 创建混合查询引擎 hybrid_query_engine = hybrid_index.as_query_engine( vector_store_query_mode="hybrid", sparse_top_k=2 ) hybrid_response = hybrid_query_engine.query( "Who does Paul Graham think of with the word schtick" ) print(hybrid_response)