当前位置：首页 > news >正文

别再傻傻分不清了！PyTorch中矩阵的⊕、⊙、⊗操作符与*、@、torch.mul()的保姆级对照指南

news 2026/4/19 21:55:37

PyTorch矩阵操作符完全指南：从数学符号到代码实现

刚接触深度学习时，最让人头疼的莫过于论文中那些神秘的数学符号和实际代码之间的对应关系。⊕、⊙、⊗这些看似简单的符号，在PyTorch中到底该用+、*还是@？为什么有时候*能得到预期结果，有时候却会报错？本文将彻底解析这些符号在PyTorch中的实现方式，帮助你在阅读论文和编写代码时不再困惑。

1. 数学符号与PyTorch操作符对照表

在深入探讨之前，我们先建立一个清晰的对应关系表：

数学符号	数学含义	PyTorch操作符	等价函数调用	广播支持
⊕	逐元素加法	`+`	`torch.add()`	是
⊙	逐元素乘法	`*`	`torch.mul()`	是
⊗	矩阵乘法	`@`	`torch.matmul()`	部分

注意：PyTorch中的*操作符执行的是逐元素乘法(⊙)，而不是矩阵乘法(⊗)，这是新手最常见的误区之一。

2. 逐元素操作：⊕和⊙的实现

2.1 逐元素加法(⊕)

逐元素加法要求两个张量形状相同，或者满足广播规则。PyTorch提供了多种实现方式：

import torch # 创建两个相同形状的矩阵 A = torch.tensor([[1, 2], [3, 4]]) B = torch.tensor([[5, 6], [7, 8]]) # 三种等价实现方式 result1 = A + B # 使用+运算符 result2 = torch.add(A, B) # 使用add函数 result3 = A.add(B) # 使用张量的add方法

当形状不同但满足广播条件时，PyTorch会自动扩展较小的张量：

# 矩阵与标量相加 C = torch.tensor([[1, 2], [3, 4]]) scalar = 10 print(C + scalar) # 输出: tensor([[11, 12], [13, 14]]) # 矩阵与向量相加 D = torch.tensor([[1, 2], [3, 4]]) vector = torch.tensor([10, 20]) print(D + vector) # 输出: tensor([[11, 22], [13, 24]])

2.2 逐元素乘法(⊙)

与加法类似，逐元素乘法也有多种实现方式：

# 三种等价实现方式 result1 = A * B # 使用*运算符 result2 = torch.mul(A, B) # 使用mul函数 result3 = A.mul(B) # 使用张量的mul方法

广播机制同样适用：

# 矩阵与标量相乘 print(A * 2) # 输出: tensor([[2, 4], [6, 8]]) # 矩阵与向量相乘 vector = torch.tensor([2, 3]) print(A * vector) # 输出: tensor([[2, 6], [6, 12]])

3. 矩阵乘法(⊗)的多种实现

矩阵乘法是线性代数中的核心操作，PyTorch提供了丰富的实现方式：

3.1 基本矩阵乘法

# 创建两个矩阵 E = torch.tensor([[1, 2], [3, 4]]) F = torch.tensor([[5, 6], [7, 8]]) # 四种等价实现方式 result1 = E @ F # 使用@运算符(Python 3.5+) result2 = torch.matmul(E, F) result3 = torch.mm(E, F) # 专门用于2D矩阵 result4 = E.mm(F) # 张量方法版本

重要区别：torch.mm()仅支持2D矩阵，而torch.matmul()支持更高维度的张量，并实现了广播。

3.2 高维张量的矩阵乘法

当处理批量矩阵乘法时（如在神经网络中处理一批输入），matmul会自动应用广播：

# 批量矩阵乘法示例 batch1 = torch.randn(10, 3, 4) # 10个3x4矩阵 batch2 = torch.randn(10, 4, 5) # 10个4x5矩阵 result = torch.matmul(batch1, batch2) # 结果为10个3x5矩阵

3.3 向量与矩阵的乘法

PyTorch会自动处理向量和矩阵的乘法，无需显式转置：

matrix = torch.tensor([[1, 2], [3, 4]]) vector = torch.tensor([5, 6]) # 矩阵乘以向量 print(matrix @ vector) # 输出: tensor([17, 39]) # 向量乘以矩阵 print(vector @ matrix) # 输出: tensor([23, 34])

4. 广播机制详解

广播是PyTorch中强大的特性，它允许不同形状的张量进行运算。理解广播规则对正确使用⊕、⊙操作至关重要。

4.1 广播规则

从最后一个维度开始向前比较
维度大小相等或其中一个为1时可以进行广播
缺失的维度被视为1

# 广播示例1 A = torch.ones(3, 1, 4) B = torch.ones(2, 1) print((A + B).shape) # 输出: torch.Size([3, 2, 4]) # 广播示例2 C = torch.ones(5, 3, 4, 1) D = torch.ones(3, 1, 2) print((C * D).shape) # 输出: torch.Size([5, 3, 4, 2])

4.2 常见广播错误

# 不兼容的形状 try: A = torch.ones(3, 4) B = torch.ones(2, 4) print(A + B) except RuntimeError as e: print(f"错误: {e}") # 输出: 错误: The size of tensor a (3) must match...

5. 实际应用场景与性能考量

5.1 计算机视觉中的典型应用

在计算机视觉中，这些操作符无处不在：

# 特征图相加(⊕) feature_map1 = torch.randn(1, 64, 256, 256) feature_map2 = torch.randn(1, 64, 256, 256) combined_features = feature_map1 + feature_map2 # 注意力权重应用(⊙) attention_weights = torch.rand(1, 1, 256, 256) weighted_features = feature_map1 * attention_weights # 全连接层计算(⊗) weight_matrix = torch.randn(512, 1024) input_features = torch.randn(32, 1024) # 批量大小32 output = input_features @ weight_matrix.T

5.2 性能优化技巧

对于大型矩阵乘法，使用torch.matmul而不是多个torch.mm

就地操作可以节省内存：

A.add_(B) # 就地加法，比A = A + B更高效

使用torch.einsum表达复杂张量操作：

# 使用einsum实现批量矩阵乘法 result = torch.einsum('bij,bjk->bik', batch1, batch2)

6. 常见陷阱与调试技巧

6.1 形状不匹配问题

# 错误的矩阵乘法 A = torch.randn(3, 4) B = torch.randn(5, 6) try: C = A @ B except RuntimeError as e: print(f"矩阵乘法错误: {e}")

6.2 自动广播导致的意外结果

# 意外的广播行为 A = torch.tensor([[1, 2, 3]]) B = torch.tensor([[1], [2], [3]]) print(A * B) # 输出3x3矩阵，而不是预期的错误

6.3 类型不匹配问题

# 类型不匹配 A = torch.tensor([1, 2, 3], dtype=torch.float32) B = torch.tensor([1, 2, 3], dtype=torch.int64) try: print(A + B) except RuntimeError as e: print(f"类型错误: {e}")

调试技巧：

使用.shape检查张量形状
使用.dtype检查数据类型
对于复杂表达式，分步计算并检查中间结果

7. 进阶话题：自定义操作符行为

PyTorch允许通过重载魔术方法来定义自定义张量操作：

class CustomTensor(torch.Tensor): @staticmethod def __add__(self, other): print("自定义加法操作") return super().__add__(other) @staticmethod def __matmul__(self, other): print("自定义矩阵乘法") return super().__matmul__(other) A = CustomTensor([1, 2, 3]) B = CustomTensor([4, 5, 6]) _ = A + B # 会打印"自定义加法操作" _ = A @ B # 会打印"自定义矩阵乘法"

查看全文

http://www.jsqmd.com/news/667972/