当前位置：首页 > news >正文

保姆级教程：用Python 3.6和pymilvus 1.1.0搞定Milvus向量数据库的增删改查

news 2026/6/18 13:17:25

Python 3.6与Milvus向量数据库实战：从零开始构建AI应用

在人工智能和机器学习领域，向量数据库正成为处理高维数据的核心工具。Milvus作为一款开源的向量数据库，因其高效的相似性搜索能力而备受开发者青睐。本文将带你从零开始，使用Python 3.6和pymilvus 1.1.0版本，一步步掌握Milvus的核心操作。

1. 环境准备与安装

在开始之前，我们需要确保开发环境配置正确。Python 3.6是一个稳定的版本，与pymilvus 1.1.0兼容性良好。以下是具体步骤：

首先，创建一个干净的Python虚拟环境：

python3.6 -m venv milvus_env source milvus_env/bin/activate # Linux/macOS milvus_env\Scripts\activate # Windows

接下来安装pymilvus 1.1.0，使用清华镜像源加速安装过程：

pip install pymilvus==1.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

验证安装是否成功：

import pymilvus print(pymilvus.__version__) # 应该输出1.1.0

注意：确保你的Milvus服务已经启动并运行。可以通过Docker快速启动一个测试实例：
docker run -d --name milvus_cpu -p 19530:19530 milvusdb/milvus:1.1.0-cpu-d030521-1ea92e

2. 连接Milvus与集合管理

成功安装后，第一步是建立与Milvus服务的连接。以下是连接和集合管理的最佳实践：

from pymilvus import Milvus, MetricType, Status # 建立连接 milvus = Milvus(host='localhost', port='19530') # 检查连接状态 status = milvus.server_status() print(f"Server status: {status}") # 创建集合参数配置 collection_params = { 'collection_name': 'image_features', 'dimension': 512, # 向量维度 'index_file_size': 1024, # 索引文件大小(MB) 'metric_type': MetricType.L2 # 使用L2距离度量 } # 创建集合 status = milvus.create_collection(collection_params) if status.code == Status.SUCCESS: print("集合创建成功") else: print(f"集合创建失败: {status.message}")

集合创建后，我们可以进行一些基本操作：

检查集合是否存在：

status, exists = milvus.has_collection('image_features') print(f"集合存在: {exists}")

获取集合信息：

status, info = milvus.get_collection_stats('image_features') print(f"集合统计信息: {info}")

删除集合：

status = milvus.drop_collection('image_features')

3. 向量数据操作全流程

向量数据库的核心功能是对向量数据的增删改查。下面我们详细讲解每个环节。

3.1 向量插入

假设我们有一组从图像提取的512维特征向量，需要插入到Milvus中：

import numpy as np # 生成100条随机512维向量作为示例数据 vectors = np.random.rand(100, 512).tolist() # 插入向量 status, ids = milvus.insert( collection_name='image_features', records=vectors ) if status.code == Status.SUCCESS: print(f"成功插入{len(ids)}条向量") print(f"前5个ID: {ids[:5]}") else: print(f"插入失败: {status.message}") # 确保数据持久化 milvus.flush(['image_features'])

3.2 向量查询与搜索

Milvus最强大的功能是相似性搜索。以下是基本搜索操作：

# 创建一个查询向量 query_vector = np.random.rand(1, 512).tolist() # 执行搜索 search_params = { "nprobe": 16 # 搜索参数，控制搜索精度和性能的平衡 } status, results = milvus.search( collection_name='image_features', query_records=query_vector, top_k=5, # 返回最相似的5个结果 params=search_params ) if status.code == Status.SUCCESS: print("搜索结果:") for i, (id, distance) in enumerate(zip(results.id_array[0], results.distance_array[0])): print(f"{i+1}. ID: {id}, 距离: {distance:.4f}")

3.3 向量删除与统计

管理向量数据还包括删除和统计功能：

# 删除指定ID的向量 ids_to_delete = [1, 2, 3] # 假设要删除的ID列表 status = milvus.delete_entity_by_id('image_features', ids_to_delete) milvus.flush(['image_features']) # 确保删除操作持久化 # 统计集合中的向量数量 status, count = milvus.count_entities('image_features') print(f"当前集合中的向量数量: {count}")

4. 索引构建与性能优化

为了提升搜索效率，我们需要为集合创建适当的索引：

# 定义索引参数 index_params = { 'index_type': 'IVF_FLAT', # 倒排文件索引 'params': {'nlist': 128}, # 聚类中心数量 'metric_type': MetricType.L2 # 与集合创建时一致 } # 创建索引 status = milvus.create_index('image_features', index_params) if status.code == Status.SUCCESS: print("索引创建成功") else: print(f"索引创建失败: {status.message}") # 获取索引信息 status, index_info = milvus.describe_index('image_features') print(f"索引信息: {index_info}")

索引创建后，搜索性能会显著提升。我们可以比较索引前后的搜索速度：

import time # 无索引搜索 start = time.time() milvus.search('image_features', query_vector, top_k=5) no_index_time = time.time() - start # 有索引搜索 start = time.time() milvus.search('image_features', query_vector, top_k=5) with_index_time = time.time() - start print(f"无索引搜索时间: {no_index_time:.4f}s") print(f"有索引搜索时间: {with_index_time:.4f}s") print(f"性能提升: {(no_index_time/with_index_time):.1f}倍")

5. 实战：构建一个完整的图像搜索系统

现在，我们将前面学到的知识整合起来，构建一个简单的图像特征搜索系统：

class MilvusImageSearch: def __init__(self, host='localhost', port='19530'): self.milvus = Milvus(host=host, port=port) self.collection_name = 'image_features' def setup_collection(self, dimension=512): # 检查并删除已存在的集合 status, exists = self.milvus.has_collection(self.collection_name) if exists: self.milvus.drop_collection(self.collection_name) # 创建新集合 params = { 'collection_name': self.collection_name, 'dimension': dimension, 'index_file_size': 1024, 'metric_type': MetricType.L2 } status = self.milvus.create_collection(params) return status def insert_features(self, features): """插入图像特征向量""" if not isinstance(features, list): features = [features] status, ids = self.milvus.insert( collection_name=self.collection_name, records=features ) if status.code == Status.SUCCESS: self.milvus.flush([self.collection_name]) return ids else: raise Exception(f"插入失败: {status.message}") def search_similar(self, query_feature, top_k=5): """搜索相似图像""" search_params = {"nprobe": 16} status, results = self.milvus.search( collection_name=self.collection_name, query_records=[query_feature], top_k=top_k, params=search_params ) if status.code == Status.SUCCESS: return list(zip(results.id_array[0], results.distance_array[0])) else: raise Exception(f"搜索失败: {status.message}") def create_index(self): """创建索引优化搜索性能""" index_params = { 'index_type': 'IVF_FLAT', 'params': {'nlist': 128}, 'metric_type': MetricType.L2 } return self.milvus.create_index(self.collection_name, index_params) def close(self): """关闭连接""" self.milvus.close()

使用这个类可以轻松管理图像特征：

# 初始化系统 search_system = MilvusImageSearch() # 设置集合 search_system.setup_collection(dimension=512) # 模拟插入一些图像特征 features = np.random.rand(10, 512).tolist() # 10张图像的512维特征 ids = search_system.insert_features(features) print(f"插入的特征ID: {ids}") # 创建索引 search_system.create_index() # 搜索相似图像 query_feature = np.random.rand(1, 512).tolist()[0] # 随机查询特征 similar_images = search_system.search_similar(query_feature, top_k=3) print("最相似的3张图像:") for img_id, distance in similar_images: print(f"ID: {img_id}, 相似度: {1/(1+distance):.2%}") # 将距离转换为相似度百分比 # 关闭连接 search_system.close()

6. 常见问题与调试技巧

在实际使用Milvus时，可能会遇到各种问题。以下是一些常见问题的解决方法：

连接问题排查表：

问题现象	可能原因	解决方案
连接超时	Milvus服务未启动	检查服务状态`docker ps`或`systemctl status milvus`
认证失败	配置了错误的用户名/密码	检查连接参数，确保与服务器配置一致
端口不可达	防火墙阻止了端口	检查19530端口是否开放`telnet <host> 19530`

性能优化建议：

批量插入数据时，建议每次插入1000-5000条向量，而不是单条插入
根据数据量调整nlist参数：数据量越大，nlist值应该越大
搜索时合理设置nprobe参数，平衡精度和性能

错误处理最佳实践：

try: status, results = milvus.search( collection_name='image_features', query_records=[query_vector], top_k=5 ) if status.code != Status.SUCCESS: raise Exception(f"搜索失败: {status.message}") # 处理搜索结果 for id, distance in zip(results.id_array[0], results.distance_array[0]): print(f"ID: {id}, 距离: {distance}") except Exception as e: print(f"发生错误: {str(e)}") # 可能的恢复操作 milvus.close() milvus = Milvus(host='localhost', port='19530') # 重新连接

资源监控：

# 获取系统统计信息 status, stats = milvus.get_system_info() print("系统信息:") for key, value in stats.items(): print(f"{key}: {value}") # 获取集合分区信息 status, partitions = milvus.list_partitions('image_features') print(f"分区列表: {partitions}")

在实际项目中，我发现合理设置索引参数对性能影响最大。经过多次测试，对于百万级数据量，IVF_FLAT索引配合nlist=4096通常能提供较好的平衡。另外，定期调用compact接口可以优化存储空间和查询性能：

status = milvus.compact('image_features') print(f"压缩状态: {status}")

查看全文

http://www.jsqmd.com/news/643379/

重磅曝光！GPT-6 即将登场

告别两阶段！用单个冻结的ConvNeXt-Large CLIP，7.5倍速搞定开放词汇分割（附代码）

杰理之spi推灯有概率出现不亮灯【篇】

理解CAP定理与BASE理论：分布式系统的理论基础

概率论_深入解析概率公式中的符号：逗号(,)、竖线(|)、分号(；)及其运算优先级

从零到一：基于Vue3、Electron与Vite的现代化桌面应用实战指南

DeOldify图像上色服务部署详解：计算机组成原理视角下的GPU资源分配

Python的__getattribute__方法实现

你的 Vue 3 watchEffect()，VuReact 会编译成什么样的 React？

用Verilog在FPGA上实现一个带超级密码的电子锁（附完整状态机代码）

微信小程序的自驾游资助定制游旅游线路景点评论系统

Redis 慢查询分析与优化策略

从零到一：在M1 MacBook Pro上搭建全栈Java开发环境

LIO-SAM_based_relocalization在KITTI数据集上的轨迹评估与源码解析（一）—————— 重定位模块的架构与实现

AI绘画黑科技：用ControlNet实现线稿自动上色（附Colab笔记本）

C++模板入门：函数与类模板详解

Face3D.ai Pro精彩案例分享：从手机自拍到专业级3D模型的全流程作品集

实时手机检测-通用部署教程：Kubernetes集群中模型服务编排

阿里语音识别模型实战应用：从部署到批量处理录音文件全流程

尖峰神经网络新突破：Q-K注意力机制如何让Transformer在SNNs中高效运行

通义千问3-VL-Reranker-8B显存优化实战：4-bit量化让12GB显卡也能跑

麒麟服务器系统LVM实战：从物理卷到逻辑卷的完整配置指南

从零到一：基于Logisim的电子钟课设全流程拆解

translategemma-27b-it实战教程：结合CSDN文档图示的Ollama图文翻译全流程解析

Mathtype公式识别：LiuJuan20260223Zimage学术文档处理

4月15日成都地区磐金产无缝钢管(8163-20#;外径42-530mm)现货报价 - 四川盛世钢联营销中心

【Excel 公式学习】告别“”时代：TEXTJOIN 函数的万能用法

云服务器实战：从零搭建高可用Kubernetes集群

工业现场总线 (PROFINET/Modbus) 工控主板怎么选?协议适配与通信稳定性详解

FPC粘尘机易卡料问题解决：核心原因与技术方案讲解