当前位置：首页 > news >正文

【RAG】【vector_stores038】Firestore向量存储示例

news 2026/8/3 15:13:04

案例目标

本案例展示如何使用Google Firestore作为向量数据库，与LlamaIndex集成实现高效的文档存储和相似性搜索功能。Firestore是Google Cloud提供的无服务器文档数据库，可以自动扩展以满足任何需求。

通过本示例，您将学习：

如何配置Google Cloud项目和Firestore数据库
如何使用FirestoreVectorStore存储和检索向量数据
如何执行向量相似性搜索查询
如何应用元数据过滤来优化搜索结果

技术栈与核心依赖

核心依赖

依赖包	用途
llama-index-vector-stores-firestore	LlamaIndex与Firestore的集成包
llama-index-embeddings-huggingface	HuggingFace嵌入模型集成
llama-index	LlamaIndex核心框架
google-cloud-firestore	Google Cloud Firestore客户端库

技术栈

Google Firestore GCP

无服务器文档数据库，提供自动扩展和强一致性

HuggingFace嵌入

使用BAAI/bge-small-en-v1.5模型生成文本向量表示

文档处理

LlamaIndex的SimpleDirectoryReader用于加载文档

向量索引

LlamaIndex的VectorStoreIndex用于构建索引和查询

环境配置

Google Cloud项目设置

在运行示例之前，需要完成以下步骤：

创建Google Cloud项目：访问Google Cloud控制台创建新项目
启用Firestore API：在API库中启用Firestore API
创建Firestore数据库：按照Firestore文档创建数据库

安装依赖

%pip install --quiet llama-index %pip install --quiet llama-index-vector-stores-firestore llama-index-embeddings-huggingface

设置Google Cloud项目ID

# 设置您的Google Cloud项目ID PROJECT_ID = "YOUR_PROJECT_ID" # 替换为您的项目ID # 使用gcloud命令行工具设置项目 !gcloud config set project {PROJECT_ID}

提示：如果您不知道项目ID，可以运行以下命令之一：

gcloud config list- 查看当前配置
gcloud projects list- 列出所有项目

身份验证

在Colab环境中，可以使用以下代码进行身份验证：

from google.colab import auth # 进行身份验证 auth.authenticate_user()

注意：如果您在Vertex AI Workbench中运行此笔记本，请参考设置说明进行身份验证。

案例实现

1. 导入必要的库

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext, ServiceContext from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.vector_stores.firestore import FirestoreVectorStore from llama_index.core import Settings

2. 加载文档数据

# 加载Paul Graham的文章数据 documents = SimpleDirectoryReader( "../../examples/data/paul_graham" ).load_data()

3. 配置嵌入模型

# 设置HuggingFace嵌入模型 embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

4. 创建Firestore向量存储

# 指定集合名称 COLLECTION_NAME = "test_collection" # 创建Firestore向量存储 store = FirestoreVectorStore(collection_name=COLLECTION_NAME)

5. 构建向量索引

# 创建存储上下文 storage_context = StorageContext.from_defaults(vector_store=store) # 创建服务上下文，禁用LLM，仅使用嵌入模型 service_context = ServiceContext.from_defaults( llm=None, embed_model=embed_model ) # 从文档创建向量索引 index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, service_context=service_context )

6. 执行查询

# 创建查询引擎 query_engine = index.as_query_engine() # 执行查询 res = query_engine.query("What did the author do growing up?") # 打印最相关的文档片段 print(str(res.source_nodes[0].text))

7. 应用元数据过滤

from llama_index.core.vector_stores.types import ( MetadataFilters, ExactMatchFilter, MetadataFilter, ) # 创建元数据过滤器 filters = MetadataFilters( filters=[MetadataFilter(key="author", value="Paul Graham")] ) # 创建带过滤器的查询引擎 query_engine = index.as_query_engine(filters=filters) # 执行带过滤的查询 res = query_engine.query("What did the author do growing up?") print(str(res.source_nodes[0].text))

Firebase/Firestore特性：

自动扩展：无需管理服务器，根据需求自动扩展
实时同步：数据更改会实时同步到所有客户端
离线支持：支持离线数据访问和同步
安全规则：提供细粒度的数据访问控制

案例效果

本示例展示了Firestore向量存储的完整工作流程，包括：

配置Google Cloud项目和Firestore数据库
使用HuggingFace嵌入模型将文档转换为向量
将Paul Graham的文章数据索引到Firestore向量存储中
执行向量相似性搜索查询，获取相关文档片段
应用元数据过滤器优化搜索结果

预期输出示例

What I Worked On February 2021 Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep. The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.