当前位置：首页 > news >正文

AI开发-python-milvus向量数据库（2-7 -milvus-精确使用模式创建collection）

news 2026/3/27 7:29:01

Milvus 向量数据库 Collection 创建与状态校验实战

Milvus 作为主流的开源向量数据库，Collection（集合）是其存储和管理向量数据的核心载体。本文基于 pymilvus SDK 实战讲解 Collection 的创建流程、核心参数含义及状态校验方法

看代码：

# 过滤 pymilvus 依赖的 pkg_resources 废弃警告（setuptools≥81 版本触发）
# 目的是消除版本兼容带来的无关警告，让控制台输出更整洁
import warnings
warnings.filterwarnings("ignore", message=".*pkg_resources is deprecated as an API.*")
from pymilvus import DataType, FieldSchema, CollectionSchema,Collection, connections,utility# ====================== 1. 定义字段结构（FieldSchema） ======================
# 定义主键字段：id
# - name: 字段名
# - dtype: 数据类型（INT64 整型）
# - is_primary: 是否为主键（True 表示是主键）
# - description: 字段描述
id_field = FieldSchema( name="id",  dtype=DataType.INT64,  is_primary=True,  description="primary id")# 定义普通字段：age（年龄）
# - dtype: INT64 整型，存储用户年龄
age_field = FieldSchema( name="age", dtype=DataType.INT64,  description="age")# 定义向量字段：embedding（嵌入向量）
# - dtype: FLOAT_VECTOR 浮点型向量
# - dim: 向量维度（128维，需与实际存入的向量维度一致）
embedding_field = FieldSchema( name="embedding",  dtype=DataType.FLOAT_VECTOR,  dim=128,  description="vector")# 定义分区键字段：position（位置）
# - dtype: VARCHAR 字符串类型
# - max_length: 字符串最大长度
# - is_partition_key: 是否为分区键（True 表示是分区键）
# 注：该字段在后续创建 Collection 时未加入 schema，如需使用需添加到 fields 列表中
position_field = FieldSchema( name="position",   dtype=DataType.VARCHAR,  max_length=256,   is_partition_key=True)# ====================== 2. 定义 Collection 整体结构（CollectionSchema） ======================
# 创建 Collection 的 schema（类似数据库表结构）
# - fields: 包含该 Collection 的所有字段列表（此处仅使用 id/age/embedding 三个字段）
# - auto_id: 是否自动生成主键（False 表示手动指定 id，与 is_primary=True 对应）
# - enable_dynamic_field: 是否启用动态字段（True 表示允许插入 schema 中未定义的字段）
# - description: Collection 的整体描述
schema = CollectionSchema(fields=[id_field, age_field, embedding_field],auto_id=False,enable_dynamic_field=True,description="desc of a collection"
)# ====================== 3. 连接 Milvus 服务器并创建 Collection ======================
# 建立与 Milvus 服务器的连接
# - host: Milvus 服务器的 IP 地址
# - port: Milvus 服务器的端口号（默认 19530）
conn = connections.connect(host="192.168.211.128", port=19530)
# connections.connect() 是一个全局连接管理的操作，它的核心作用是：
#
#     与指定的 Milvus 服务器建立连接，并将这个连接以别名（默认是 'default'）的形式注册到 Milvus 的全局连接池里。
#     后续创建 Collection、插入数据、查询等所有操作，都会自动使用这个默认的全局连接（除非你显式指定其他别名）。
#
# 换句话说，conn = connections.connect(...) 中的 conn 变量只是返回了连接对象本身，但 SDK 内部已经记住了这个连接，后续操作无需再通过 conn 调用，就像这样：# 定义要创建的 Collection 名称
collection_name = "table_3"# 创建 Collection（核心操作）
# - name: Collection 名称
# - schema: 已定义的 Collection 结构
# - using: 使用的 Milvus 别名（默认 'default'）
# - shards_num: 分片数量（1 表示将数据分为 1 个分片，提升并发性能）
collection1 = Collection(name=collection_name,schema=schema,using='default',shards_num=1
)# ========== 正确的状态/信息查看方式 ==========
# 方式1：查看 Collection 的详细元数据（最常用）
print("=== Collection 详细信息 ===")
collection_info = collection1.describe()
print(collection_info)# 方式2：验证 Collection 是否存在（返回 True/False）
print("\n=== 验证 Collection 是否存在 ===")
is_exist = utility.has_collection(collection_name, using='default')
print(f"Collection '{collection_name}' 是否存在：{is_exist}")# 方式3：查看 Collection 的加载状态（是否加载到内存，用于查询）
print("\n=== Collection 加载状态 ===")
load_state = utility.load_state(collection_name, using='default')
print(f"Collection 加载状态：{load_state}")# 方式4：查看所有已创建的 Collection 列表
print("\n=== 所有 Collection 列表 ===")
all_collections = utility.list_collections(using='default')
print(f"当前 Milvus 中的 Collection 列表：{all_collections}")

运行结果：

=== Collection 详细信息 ===
{'collection_name': 'table_3', 'auto_id': False, 'num_shards': 1, 'description': 'desc of a collection', 'fields': [{'field_id': 100, 'name': 'id', 'description': 'primary id', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'age', 'description': 'age', 'type': <DataType.INT64: 5>, 'params': {}}, {'field_id': 102, 'name': 'embedding', 'description': 'vector', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}], 'functions': [], 'aliases': [], 'collection_id': 464357759084926662, 'consistency_level': 2, 'properties': {'timezone': 'UTC'}, 'num_partitions': 1, 'enable_dynamic_field': True}

=== 验证 Collection 是否存在 ===
Collection 'table_3' 是否存在：True

=== Collection 加载状态 ===
Collection 加载状态：NotLoad

=== 所有 Collection 列表 ===
当前 Milvus 中的 Collection 列表：['table_1', 'table_2', 'table_3', 'two_table', 'one_talbe', 'three_table', 'quick_setup', 'custom_quick_setup']

更多学习资料尽在老虎网盘资源：http://resources.kittytiger.cn/ 老虎网盘资源

查看全文

http://www.jsqmd.com/news/391787/