当前位置：首页 > news >正文

AI应用开发之向量运算详解

news 2026/5/8 14:46:20

摘要：向量运算是一切人工智能算法的数学基石。无论是自然语言处理中的词向量表示、计算机视觉中的图像特征提取，还是推荐系统中的相似度计算，向量运算都扮演着不可或缺的角色。本文系统讲解了向量的定义与表示、基本运算（加法、减法、标量乘法）、内积与外积、余弦相似度、向量范数、正交与投影等核心概念，并配以完整的NumPy代码实现。通过Word2Vec词向量、图像特征向量、推荐系统相似度计算、注意力机制中的Query/Key/Value向量等典型应用场景，展示了向量运算在AI领域的广泛应用。掌握这些基础知识，将为深入学习机器学习和深度学习奠定坚实的数学基础。

关键词：向量运算；NumPy；余弦相似度；向量范数；词向量；注意力机制

1. 引言

在人工智能和机器学习领域，向量是最基本的数据表示形式之一。一个数字构成一维向量，一组数字构成多维向量，而AI模型的核心工作就是对这些向量进行各种数学运算，从中提取模式、学习特征、完成预测。

本文将带你深入理解向量运算的核心知识点，不仅讲解数学原理，更注重代码实现与应用场景的结合。所有代码示例均基于Python的NumPy库，可直接运行。

2. 向量的定义与表示

2.1 什么是向量

向量（Vector）是具有大小和方向的数学对象。在计算机中，我们通常将向量表示为一维数组。例如：

1维向量：只有一个分量，如[3.14]
n维向量：有n个分量，如[1.0, 2.5, -0.3, 4.7]

2.2 NumPy中的向量表示

import numpy as np  # 1维向量 v1 = np.array([3.14]) print(f"1维向量: {v1}, 形状: {v1.shape}")  # 4维向量 v2 = np.array([1.0, 2.5, -0.3, 4.7]) print(f"4维向量: {v2}, 形状: {v2.shape}")  # 列向量（实际上是2D数组，n行1列） column_vec = np.array([[1], [2], [3]]) print(f"列向量:\n{column_vec}, 形状: {column_vec.shape}")  # 行向量（3行1列的转置） row_vec = np.array([[1, 2, 3]]) print(f"行向量:\n{row_vec}, 形状: {row_vec.shape}")

输出：

1维向量: [3.14], 形状: (1,) 4维向量: [ 1. 2.5 -0.3 4.7 ], 形状: (4,) 列向量: [[1] [2] [3]], 形状: (3, 1) 行向量: [[1 2 3]], 形状: (1, 3)

2.3 向量的创建方式

# 从列表创建 vec_from_list = np.array([1, 2, 3, 4, 5])  # 创建全零向量 zeros_vec = np.zeros(5) print(f"全零向量: {zeros_vec}")  # 创建全一向量 ones_vec = np.ones(5) print(f"全一向量: {ones_vec}")  # 创建等差向量 arange_vec = np.arange(0, 10, 2) # 0, 2, 4, 6, 8 print(f"arange向量: {arange_vec}")  # 创建均匀分布向量 linspace_vec = np.linspace(0, 1, 5) # 0, 0.25, 0.5, 0.75, 1 print(f"linspace向量: {linspace_vec}")  # 创建随机向量 random_vec = np.random.randn(5) # 标准正态分布 print(f"随机向量: {random_vec}")

3. 向量基本运算

3.1 向量加法与减法

向量加法和减法是对应分量相加/相减，要求两个向量维度相同。

# 定义两个向量 a = np.array([1, 2, 3]) b = np.array([4, 5, 6])  # 向量加法：a + b = [1+4, 2+5, 3+6] = [5, 7, 9] c_add = a + b print(f"向量加法 a + b = {c_add}")  # 向量减法：a - b = [1-4, 2-5, 3-6] = [-3, -3, -3] c_sub = a - b print(f"向量减法 a - b = {c_sub}")  # 验证：加法和减法互为逆运算 print(f"(a + b) - b = {a + b - b}") # 应该等于a

几何意义：向量加法可以理解为在坐标系中平移向量首尾相接；向量减法则表示求两个向量指向的差。

3.2 标量乘法

向量与标量相乘，是将向量的每个分量都乘以该标量。

v = np.array([1, 2, 3]) scalar = 2.5  # 标量乘法：2.5 * [1, 2, 3] = [2.5, 5.0, 7.5] result = scalar * v print(f"{scalar} * {v} = {result}")  # 负数乘法（方向反转） neg_result = -0.5 * v print(f"-0.5 * {v} = {neg_result}")  # 标量除法（乘以倒数） div_result = v / 2 print(f"{v} / 2 = {div_result}")

几何意义：标量乘法会改变向量的长度（模），负数会使向量方向反转。

3.3 基本运算的NumPy函数

a = np.array([1, 2, 3]) b = np.array([4, 5, 6])  # 使用NumPy函数进行运算 add_result = np.add(a, b) sub_result = np.subtract(a, b) mul_result = np.multiply(a, 3) # 标量乘法用multiply  print(f"np.add: {add_result}") print(f"np.subtract: {sub_result}") print(f"np.multiply: {mul_result}")

4. 向量内积（点积）

4.1 内积的定义

向量内积（Inner Product / Dot Product）是将两个向量对应分量相乘后求和。设有两个n维向量：

数学公式： $$\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n$$

4.2 内积的几何意义

内积与向量夹角密切相关： $$\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}| |\mathbf{b}| \cos\theta$$

其中 $\theta$ 是两个向量之间的夹角。这个公式揭示了内积的深刻含义：

当 $\theta = 0°$（向量同向），内积最大，等于两向量模的乘积
当 $\theta = 90°$（向量正交），内积为0
当 $\theta = 180°$（向量反向），内积最小，等于两向量模的乘积的负值

4.3 代码实现

import numpy as np  # 定义两个向量 a = np.array([1, 2, 3]) b = np.array([4, 5, 6])  # 方法1：使用np.dot（推荐） dot_result = np.dot(a, b) print(f"np.dot(a, b) = {dot_result}")  # 方法2：使用@运算符（Python 3.5+） at_result = a @ b print(f"a @ b = {at_result}")  # 方法3：使用.sum() sum_result = np.sum(a * b) print(f"np.sum(a * b) = {sum_result}")  # 方法4：手动实现 def my_dot(a, b): """手动实现向量内积""" result = 0 for i in range(len(a)): result += a[i] * b[i] return result  print(f"手动实现: {my_dot(a, b)}")

4.4 内积性质验证

# 验证内积的几何意义 a = np.array([1, 0]) b = np.array([1, 0])  # 计算向量模 norm_a = np.linalg.norm(a) norm_b = np.linalg.norm(b)  # 计算内积 dot_ab = np.dot(a, b)  # 计算夹角余弦 cos_theta = dot_ab / (norm_a * norm_b) theta = np.arccos(cos_theta)  print(f"向量a: {a}") print(f"向量b: {b}") print(f"a·b = {dot_ab}") print(f"|a| = {norm_a}, |b| = {norm_b}") print(f"cos(θ) = {cos_theta}") print(f"夹角θ = {np.degrees(theta):.2f}°") # 应该是0°

5. 向量外积（叉积）

5.1 外积的定义

向量外积（Cross Product）仅定义于三维空间中，运算结果仍是一个向量。

数学公式： $$\mathbf{a} \times \mathbf{b} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \ a_1 & a_2 & a_3 \ b_1 & b_2 & b_3 \end{vmatrix}$$

展开得： $$\mathbf{a} \times \mathbf{b} = (a_2 b_3 - a_3 b_2, a_3 b_1 - a_1 b_3, a_1 b_2 - a_2 b_1)$$

5.2 外积的几何意义

外积的方向垂直于原来两个向量所在的平面（右手定则）
外积的模等于两向量构成的平行四边形面积：$|\mathbf{a} \times \mathbf{b}| = |\mathbf{a}| |\mathbf{b}| \sin\theta$

5.3 代码实现

import numpy as np  # 定义两个三维向量 a = np.array([1, 2, 3]) b = np.array([4, 5, 6])  # 使用np.cross计算外积 cross_result = np.cross(a, b) print(f"np.cross(a, b) = {cross_result}")  # 验证右手定则 print(f"a = {a}") print(f"b = {b}") print(f"a × b = {cross_result}")  # 外积的性质验证 norm_a = np.linalg.norm(a) norm_b = np.linalg.norm(b) norm_cross = np.linalg.norm(cross_result) sin_theta = norm_cross / (norm_a * norm_b) theta = np.arcsin(sin_theta) print(f"|a × b| = {norm_cross}") print(f"|a||b|sin(θ) = {norm_a * norm_b * sin_theta:.6f}") print(f"夹角θ = {np.degrees(theta):.2f}°")

5.4 外积与内积的对比

a = np.array([1, 2, 3]) b = np.array([4, 5, 6])  # 外积（三维向量） cross = np.cross(a, b) print(f"外积 a × b = {cross}")  # 内积 dot = np.dot(a, b) print(f"内积 a · b = {dot}")  # 注意：内积结果是标量，外积结果是向量 print(f"内积类型: {type(dot)}, 外积类型: {type(cross)}")

6. 余弦相似度

6.1 余弦相似度公式

余弦相似度（Cosine Similarity）衡量两个向量方向的相似程度，取值范围为[-1, 1]：

$$\text{cosine_similarity}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{a}| |\mathbf{b}|} = \cos\theta$$

当两个向量方向完全相同时，值为1；完全相反时，值为-1；正交时，值为0。

6.2 与内积的区别

特性	内积	余弦相似度
取值范围	$(-\infty, +\infty)$	$[-1, 1]$
受模长影响	是	否（只关心方向）
物理意义	投影与模的乘积	夹角的余弦值

6.3 代码实现

import numpy as np  def cosine_similarity(a, b): """ 计算两个向量的余弦相似度 参数: a: 向量 numpy数组 b: 向量 numpy数组 返回: 余弦相似度值 """ # 计算内积 dot_product = np.dot(a, b) # 计算各自的模 norm_a = np.linalg.norm(a) norm_b = np.linalg.norm(b) # 避免除零错误 if norm_a == 0 or norm_b == 0: return 0.0 return dot_product / (norm_a * norm_b)  def cosine_distance(a, b): """余弦距离 = 1 - 余弦相似度""" return 1 - cosine_similarity(a, b)  # 测试向量 a = np.array([1, 0]) b = np.array([1, 0]) c = np.array([0, 1]) d = np.array([-1, 0])  print(f"a={a}, b={b}: 余弦相似度 = {cosine_similarity(a, b):.4f}") # 1.0（同向） print(f"a={a}, c={c}: 余弦相似度 = {cosine_similarity(a, c):.4f}") # 0.0（正交） print(f"a={a}, d={d}: 余弦相似度 = {cosine_similarity(a, d):.4f}") # -1.0（反向）

6.4 使用scikit-learn计算余弦相似度

from sklearn.metrics.pairwise import cosine_similarity import numpy as np  # 计算多对向量的余弦相似度 A = np.array([[1, 0, 0], [1, 0, 0], [0, 1, 0]])  B = np.array([[1, 0, 0], [0, 0, 1], [0, 1, 0]])  # 计算余弦相似度矩阵 sim_matrix = cosine_similarity(A, B) print("余弦相似度矩阵:") print(sim_matrix)

7. 向量范数

向量范数（Norm）是衡量向量"长度"或"大小"的指标。

7.1 L1范数

L1范数是向量各分量绝对值之和，也称为曼哈顿距离：

$$|\mathbf{a}|1 = \sum{i=1}^{n} |a_i|$$

import numpy as np  a = np.array([3, -4, 5])  # L1范数：|3| + |-4| + |5| = 12 l1_norm = np.linalg.norm(a, ord=1) print(f"L1范数: {l1_norm}") # 12.0  # 手动验证 l1_manual = np.sum(np.abs(a)) print(f"手动计算L1: {l1_manual}")

7.2 L2范数

L2范数是欧几里得距离，即向量各分量平方和的平方根：

$$|\mathbf{a}|2 = \sqrt{\sum{i=1}^{n} a_i^2}$$

a = np.array([3, -4, 5])  # L2范数：sqrt(9 + 16 + 25) = sqrt(50) ≈ 7.07 l2_norm = np.linalg.norm(a, ord=2) print(f"L2范数: {l2_norm}") # 7.071...  # 手动验证 l2_manual = np.sqrt(np.sum(a**2)) print(f"手动计算L2: {l2_manual}")  # 单位向量：L2范数为1的向量 unit_vector = a / l2_norm print(f"单位向量: {unit_vector}, 范数: {np.linalg.norm(unit_vector)}")

7.3 无穷范数

无穷范数是向量各分量绝对值的最大值：

$$|\mathbf{a}|_\infty = \max(|a_1|, |a_2|, \ldots, |a_n|)$$

a = np.array([3, -4, 5])  # 无穷范数：max(|3|, |-4|, |5|) = 5 inf_norm = np.linalg.norm(a, ord=np.inf) print(f"无穷范数: {inf_norm}") # 5.0  # 手动验证 inf_manual = np.max(np.abs(a)) print(f"手动计算无穷范数: {inf_manual}")

7.4 不同范数的可视化理解

import numpy as np import matplotlib.pyplot as plt  # L1范数单位球（菱形） # L2范数单位球（圆形） # L∞范数单位球（正方形）  a = np.array([3, -4]) print(f"L1范数: {np.linalg.norm(a, ord=1):.2f}") # 7 print(f"L2范数: {np.linalg.norm(a, ord=2):.2f}") # 5 print(f"L∞范数: {np.linalg.norm(a, ord=np.inf):.2f}") # 4

7.5 范数的应用场景

# 场景1：L1正则化（产生稀疏解） # 机器学习中用于特征选择  # 场景2：L2正则化（权重衰减） # 防止模型过拟合  # 场景3：距离度量 a = np.array([1, 2, 3]) b = np.array([4, 5, 6])  # 欧几里得距离（L2距离） euclidean_dist = np.linalg.norm(a - b) print(f"欧几里得距离: {euclidean_dist:.4f}")  # 曼哈顿距离（L1距离） manhattan_dist = np.linalg.norm(a - b, ord=1) print(f"曼哈顿距离: {manhattan_dist:.4f}")

8. 向量正交与投影

8.1 正交的定义

如果两个向量的内积为0，则称这两个向量正交（垂直）：

$$\mathbf{a} \perp \mathbf{b} \iff \mathbf{a} \cdot \mathbf{b} = 0$$

import numpy as np  # 定义两个正交的向量 a = np.array([1, 0, 0]) b = np.array([0, 1, 0])  # 验证正交 dot = np.dot(a, b) print(f"a · b = {dot}") # 0  is_orthogonal = np.isclose(dot, 0) print(f"是否正交: {is_orthogonal}")  # 更一般的正交检测函数 def are_orthogonal(a, b, tolerance=1e-10): return np.abs(np.dot(a, b)) < tolerance  print(f"正交检测: {are_orthogonal(a, b)}")

8.2 投影的概念

向量 $\mathbf{a}$ 在向量 $\mathbf{b}$ 上的投影（记作 $\text{proj}_{\mathbf{b}} \mathbf{a}$）是 $\mathbf{a}$ 沿着 $\mathbf{b}$ 方向的分量。

投影公式： $$\text{proj}_{\mathbf{b}} \mathbf{a} = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{b}|^2} \mathbf{b}$$

8.3 投影的计算

import numpy as np  def project_onto(a, b): """ 计算向量a在向量b上的投影 返回投影向量 """ # 投影 = (a·b / |b|²) * b b_norm_sq = np.dot(b, b) # b的模的平方 if b_norm_sq == 0: raise ValueError("向量b不能为零向量") # 投影标量（分量系数） scalar = np.dot(a, b) / b_norm_sq # 投影向量 proj = scalar * b return proj, scalar  # 示例：计算[3, 4]在[1, 0]上的投影 a = np.array([3, 4]) b = np.array([1, 0])  proj, scalar = project_onto(a, b) print(f"向量a: {a}") print(f"向量b: {b}") print(f"投影向量: {proj}") print(f"投影标量: {scalar}") print(f"验证：投影向量的模 = {np.linalg.norm(proj):.4f}")  # 另一个例子 a = np.array([2, 3]) b = np.array([1, 1]) proj, scalar = project_onto(a, b) print(f"\n向量a: {a}") print(f"向量b: {b}") print(f"投影向量: {proj}") print(f"验证：a = proj + perp") perp = a - proj print(f"垂直分量: {perp}") print(f"proj · perp = {np.dot(proj, perp):.10f}") # 接近0

9. 向量在AI中的应用场景

9.1 词向量（Word2Vec、Embedding）

词向量是将词语映射到稠密向量空间的技术，使得语义相似的词在向量空间中距离较近。

import numpy as np  def cosine_similarity(a, b): """计算余弦相似度""" return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))  # 模拟Word2Vec词向量（简化示例，实际使用预训练模型） # 维度通常为50、100、200、300等  word_vectors = { "king": np.array([0.8, 0.3, 0.5, 0.1]), "queen": np.array([0.7, 0.4, 0.6, 0.2]), "man": np.array([0.6, 0.5, 0.2, 0.8]), "woman": np.array([0.5, 0.6, 0.3, 0.7]), "prince": np.array([0.75, 0.35, 0.45, 0.15]), "apple": np.array([0.1, 0.9, 0.1, 0.1]), "orange": np.array([0.15, 0.85, 0.1, 0.1]), }  # 计算词向量相似度 print("=== 词向量相似度示例 ===") print(f"king vs queen: {cosine_similarity(word_vectors['king'], word_vectors['queen']):.4f}") print(f"man vs woman: {cosine_similarity(word_vectors['man'], word_vectors['woman']):.4f}") print(f"apple vs orange: {cosine_similarity(word_vectors['apple'], word_vectors['orange']):.4f}") print(f"king vs apple: {cosine_similarity(word_vectors['king'], word_vectors['apple']):.4f}")  # 词向量算术：king - man + woman ≈ queen king = word_vectors['king'] man = word_vectors['man'] woman = word_vectors['woman'] queen = word_vectors['queen']  # 计算类比向量 analogy = king - man + woman similarity_to_queen = cosine_similarity(analogy, queen) print(f"\n词向量算术 king - man + woman 与 queen 的相似度: {similarity_to_queen:.4f}")

9.2 图像特征向量

图像特征向量是对图像内容进行抽象表示的向量，常用于图像分类、检索等任务。

import numpy as np  def extract_simple_image_features(image_pixels): """ 简化的图像特征提取（实际使用CNN等深度学习模型） 实际应用中，图像特征通常来自： - CNN的特征图（如ResNet、VGG的中间层输出） - SIFT、HOG等传统特征描述符 - CLIP等视觉语言模型的编码 """ # 这里用一些统计量作为简化特征 features = np.array([ np.mean(image_pixels), # 平均像素值 np.std(image_pixels), # 像素值标准差 np.median(image_pixels), # 中位数 np.percentile(image_pixels, 25), # 25%分位数 np.percentile(image_pixels, 75), # 75%分位数 ]) return features  def cosine_similarity(a, b): return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))  # 模拟图像像素数据（实际应用中会是真实图像） # 假设有4张图像，每张简化为100个像素值 np.random.seed(42) image1_pixels = np.random.rand(100) * 255 # 图像1：随机内容 image2_pixels = image1_pixels + np.random.randn(100) * 10 # 图像2：与图像1相似 image3_pixels = np.random.rand(100) * 255 # 图像3：随机内容 image4_pixels = 255 - image1_pixels + np.random.randn(100) * 10 # 图像4：负片  # 提取特征向量 feat1 = extract_simple_image_features(image1_pixels) feat2 = extract_simple_image_features(image2_pixels) feat3 = extract_simple_image_features(image3_pixels) feat4 = extract_simple_image_features(image4_pixels)  print("=== 图像特征向量相似度 ===") print(f"图像1 vs 图像2 (相似内容): {cosine_similarity(feat1, feat2):.4f}") print(f"图像1 vs 图像3 (不同内容): {cosine_similarity(feat1, feat3):.4f}") print(f"图像1 vs 图像4 (负片): {cosine_similarity(feat1, feat4):.4f}")

9.3 推荐系统中的相似度计算

import numpy as np  def cosine_similarity(a, b): return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))  def euclidean_distance(a, b): return np.linalg.norm(a - b)  # 用户-物品评分矩阵（简化推荐系统） # 每行是一个用户，每列是一个物品的评分 user_item_matrix = np.array([ [5, 4, 0, 0, 1], # 用户1：喜欢物品0、1，轻微喜欢物品4 [4, 0, 5, 0, 2], # 用户2：喜欢物品0、2 [0, 5, 0, 4, 0], # 用户3：喜欢物品1、3 [0, 0, 4, 5, 0], # 用户4：喜欢物品2、3 ])  # 获取用户特征向量（评分向量） user1 = user_item_matrix[0] user2 = user_item_matrix[1] user3 = user_item_matrix[2]  print("=== 用户相似度计算 ===") print(f"用户1 vs 用户2 (余弦): {cosine_similarity(user1, user2):.4f}") print(f"用户1 vs 用户3 (余弦): {cosine_similarity(user1, user3):.4f}") print(f"用户2 vs 用户3 (余弦): {cosine_similarity(user2, user3):.4f}")  # 基于用户的协同过滤推荐 def recommend_for_user(user_idx, user_item_matrix, top_n=2): """为用户找到最相似的其他用户，推荐他们喜欢但当前用户未评分的物品""" user_vector = user_item_matrix[user_idx] similarities = [] for i, other_vector in enumerate(user_item_matrix): if i != user_idx: sim = cosine_similarity(user_vector, other_vector) similarities.append((i, sim)) # 按相似度排序 similarities.sort(key=lambda x: x[1], reverse=True) # 推荐相似用户喜欢的物品（当前用户评分为0的） recommendations = [] for other_user_idx, sim in similarities[:top_n]: other_ratings = user_item_matrix[other_user_idx] for item_idx, rating in enumerate(other_ratings): if rating > 0 and user_vector[item_idx] == 0: # 相似用户喜欢但当前用户未评分的 recommendations.append((item_idx, other_user_idx, sim, rating)) return recommendations  print("\n=== 用户1的推荐 ===") recommendations = recommend_for_user(0, user_item_matrix) for item, similar_user, sim, rating in recommendations: print(f"推荐物品{item}（来自相似用户{similar_user}，相似度{sim:.4f}，评分{rating}）")

9.4 注意力机制中的Query/Key/Value向量

注意力机制是Transformer架构的核心，其中Query（Q）、Key（K）、Value（V）向量是核心概念。

import numpy as np  def softmax(x, axis=-1): """Softmax函数""" exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True)) return exp_x / np.sum(exp_x, axis=axis, keepdims=True)  def attention(Q, K, V): """ 缩放点积注意力机制 Args: Q: Query向量 (seq_len_q, d_k) K: Key向量 (seq_len_k, d_k) V: Value向量 (seq_len_k, d_v) Returns: 输出向量 (seq_len_q, d_v) """ d_k = K.shape[-1] # 1. 计算Q和K的点积 scores = np.dot(Q, K.T) # (seq_len_q, seq_len_k) # 2. 缩放 scores_scaled = scores / np.sqrt(d_k) # 3. Softmax得到注意力权重 attention_weights = softmax(scores_scaled, axis=-1) # 4. 加权求和V output = np.dot(attention_weights, V) return output, attention_weights  # 简化示例：单个Query，多个Key-Value # 模拟序列长度为3，d_k=d_v=4  np.random.seed(42) seq_len = 3 d_model = 4  # 模拟Q, K, V向量（实际来自输入的线性变换） # 假设查询是"机器"对应的向量 Q = np.array([[0.5, 0.3, 0.1, 0.2]]) # 1个Query，查询"机器"  # 假设序列是 ["机器", "学习", "算法"] K = np.array([[0.5, 0.3, 0.1, 0.2], # "机器"的Key [0.2, 0.4, 0.6, 0.1], # "学习"的Key [0.1, 0.1, 0.2, 0.8]]) # "算法"的Key  V = np.array([[0.5, 0.3, 0.1, 0.2], # "机器"的Value [0.3, 0.5, 0.6, 0.2], # "学习"的Value [0.2, 0.2, 0.3, 0.9]]) # "算法"的Value  # 计算注意力 output, attention_weights = attention(Q, K, V)  print("=== 注意力机制示例 ===") print(f"Query向量: {Q.shape}") print(f"Key向量: {K.shape}") print(f"Value向量: {V.shape}") print(f"\n注意力权重:") print(f" 单词'机器'的注意力: {attention_weights[0, 0]:.4f}") print(f" 单词'学习'的注意力: {attention_weights[0, 1]:.4f}") print(f" 单词'算法'的注意力: {attention_weights[0, 2]:.4f}") print(f"\n输出向量:") print(f" 形状: {output.shape}") print(f" 值: {output}")  # 多头注意力的简单示意 def multi_head_attention(x, num_heads=2, d_model=4): """简化的多头注意力""" d_k = d_model // num_heads # 简化：直接对输入x计算注意力 # 实际中会有多个权重矩阵 W_q, W_k, W_v Q = x[:, :d_model] K = x[:, :d_model] V = x[:, :d_model] # 分头 Q_split = np.split(Q, num_heads, axis=-1) K_split = np.split(K, num_heads, axis=-1) V_split = np.split(V, num_heads, axis=-1) heads_output = [] for h in range(num_heads): out, _ = attention(Q_split[h], K_split[h], V_split[h]) heads_output.append(out) # 拼接多头输出 output = np.concatenate(heads_output, axis=-1) return output  # 模拟输入序列（batch_size=1, seq_len=3, d_model=4） x = np.random.randn(1, 3, 4) print(f"\n多头注意力示例:") print(f"输入形状: {x.shape}") multi_head_output = multi_head_attention(x) print(f"输出形状: {multi_head_output.shape}")

10. 综合示例：文档相似度计算

import numpy as np  def cosine_similarity(a, b): """计算余弦相似度""" norm_a = np.linalg.norm(a) norm_b = np.linalg.norm(b) if norm_a == 0 or norm_b == 0: return 0 return np.dot(a, b) / (norm_a * norm_b)  def tfidf_vectorize(documents, vocabulary): """ 简化的TF-IDF向量化 实际应用中应使用 sklearn.feature_extraction.text.TfidfVectorizer """ n_docs = len(documents) n_vocab = len(vocabulary) vocab_index = {word: i for i, word in enumerate(vocabulary)} vectors = np.zeros((n_docs, n_vocab)) for doc_idx, doc in enumerate(documents): words = doc.lower().split() word_count = {} for word in words: if word in vocab_index: word_count[word] = word_count.get(word, 0) + 1 # TF for word, count in word_count.items(): tf = count / len(words) vectors[doc_idx, vocab_index[word]] = tf return vectors  # 示例文档 documents = [ "machine learning algorithm", "deep learning neural network", "machine learning model", "natural language processing" ]  # 构建词汇表 vocabulary = list(set(" ".join(documents).split())) print(f"词汇表: {vocabulary}")  # 向量化 doc_vectors = tfidf_vectorize(documents, vocabulary) print(f"\n文档向量矩阵形状: {doc_vectors.shape}") print(f"文档向量:\n{doc_vectors}")  # 计算文档相似度 print("\n=== 文档相似度矩阵 ===") n_docs = len(documents) similarity_matrix = np.zeros((n_docs, n_docs))  for i in range(n_docs): for j in range(n_docs): sim = cosine_similarity(doc_vectors[i], doc_vectors[j]) similarity_matrix[i, j] = sim print(f"文档{i} vs 文档{j}: {sim:.4f}")  print(f"\n相似度矩阵:\n{similarity_matrix}")