【tensorflow】多维张量做tf.matmul

阅读量：

首发地址：https://zhuanlan.zhihu.com/p/138731311
线性代数都学过二维矩阵的乘法，而tf.matmul还可以处理多维矩阵，比如

复制代码

    import tensorflow as tf
    import numpy as np
    a = tf.random.uniform([2, 1, 2, 3])
    b = tf.random.uniform([1, 3, 3, 2])
    c = tf.matmul(a, b)
    
    
    python

c是什么呢？

先给出结论：不管几维矩阵都是先做最后两维的矩阵的乘法，再在不同维度重复多次。

多维的 tf.matmul(a, b) 的维度有如下两个要求：

1、a的axis=-1的值（只可意会）和b的axis=-2的值需要相等。比如上述例子中[3, 2, 3]最后的3，和[3, 3, 2]的第二个3。

2、a和b的各维度的值（除了axis=-1和-2的值），在任意维度上，都需要“相等”或“有一个是1”。

比如，[3, 2, 3]维度的张量与[3, 3, 2]维度的张量做tf.matmul的例子：

复制代码

    In [84]: import tensorflow as tf
    ...: import numpy as np
    ...: a = tf.random.uniform([3, 2, 3])
    ...: b = tf.random.uniform([3, 3, 2])
    ...: c = tf.matmul(a, b)
    ...: c.shape
    ...:
    ...:
    
    Out[84]: TensorShape([3, 2, 2])
    
    In [87]: tf.matmul(a[0],b[0])
    Out[87]:
    <tf.Tensor: id=374, shape=(2, 2), dtype=float32, numpy=
    array([[1.4506222 , 1.323427  ],
       [0.28268352, 0.2917934 ]], dtype=float32)>
    
    In [88]: tf.matmul(a[1],b[1])
    Out[88]:
    <tf.Tensor: id=383, shape=(2, 2), dtype=float32, numpy=
    array([[1.0278544 , 0.4219831 ],
       [0.865297  , 0.87740964]], dtype=float32)>
    
    In [89]: c
    Out[89]:
    <tf.Tensor: id=365, shape=(3, 2, 2), dtype=float32, numpy=
    array([[[1.4506222 , 1.323427  ],
        [0.28268352, 0.2917934 ]],
    
       [[1.0278544 , 0.4219831 ],
        [0.865297  , 0.8774096 ]],
    
       [[0.5752927 , 0.13066964],
        [0.5343988 , 0.2741483 ]]], dtype=float32)>
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/izKbmsSnAeCaDWOJBxvGjNtPQFIL.png)

可以看到，[3, 2, 3]维度的张量与[3, 3, 2]维度的张量做tf.matmul，可以理解成：

第一步，先在axis=1和2的维度上做[2, 3]维度的张量与[3, 2]维度的张量之间的二维张量的矩阵乘法，得到[2, 2]维度的结果；

第二部，然后在axis=0的维度上，分别选a的第i个和选b的第i个做上述的第一步，最终得到[3, 2，2]维度的输出。

如果，a和b的axis=0维度对不上，会bug：

复制代码

    In [95]: import tensorflow as tf
    ...: import numpy as np
    ...: a = tf.random.uniform([2, 2, 3])
    ...: b = tf.random.uniform([3, 3, 2])
    ...: c = tf.matmul(a, b)
    ...: c.shape
    ...:
    ...:
    ---------------------------------------------------------------------------
    InvalidArgumentError                      Traceback (most recent call last)
    <ipython-input-95-462c4976a35a> in <module>
      3 a = tf.random.uniform([2, 2, 3])
      4 b = tf.random.uniform([3, 3, 2])
    ----> 5 c = tf.matmul(a, b)
      6 c.shape
      7
    
    D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\util\dispatch.py in wrapper(*args, **kwargs)
    178     """Call target, and fall back on dispatchers if there is a TypeError."""
    179     try:
    --> 180       return target(*args, **kwargs)
    181     except (TypeError, ValueError):
    182       # Note: convert_to_eager_tensor currently raises a ValueError, not a
    
    D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\ops\math_ops.py in matmul(a, b, transpose_a, transpose_b, adjoint_a, adjoint_b, a_is_sparse, b_is_sparse, name)
       2725         b = conj(b)
       2726         adjoint_b = True
    -> 2727       return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
       2728
       2729     # Neither matmul nor sparse_matmul support adjoint, so we conjugate
    
    D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py in batch_mat_mul_v2(x, y, adj_x, adj_y, name)
       1700       else:
       1701         message = e.message
    -> 1702       _six.raise_from(_core._status_to_exception(e.code, message), None)
       1703   # Add nodes to the TensorFlow graph.
       1704   if adj_x is None:
    
    D:\S\Anaconda3_v3\lib\site-packages\six.py in raise_from(value, from_value)
    
    InvalidArgumentError: In[0] and In[1] must have compatible batch dimensions: [2,2,3] vs. [3,3,2] [Op:BatchMatMulV2] name: MatMul/       
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/0unlKr4dMxjXLbRsiaz6eptCHgE8.png)

但是当a和b中axis=0的值有一个是1，不会bug：

复制代码

    In [90]: import tensorflow as tf
    ...: import numpy as np
    ...: a = tf.random.uniform([1, 2, 3])
    ...: b = tf.random.uniform([3, 3, 2])
    ...: c = tf.matmul(a, b)
    ...: c.shape
    ...:
    ...:
    Out[90]: TensorShape([3, 2, 2])
    
    In [91]: c
    Out[91]:
    <tf.Tensor: id=398, shape=(3, 2, 2), dtype=float32, numpy=
    array([[[0.59542704, 0.60751694],
        [0.19115494, 0.36344892]],
    
       [[1.0542538 , 0.75257593],
        [0.26940605, 0.24408351]],
    
       [[1.1716111 , 0.4058628 ],
        [0.09086016, 0.28043625]]], dtype=float32)>
    
    In [92]: tf.matmul(a[0],b[0])
    Out[92]:
    <tf.Tensor: id=407, shape=(2, 2), dtype=float32, numpy=
    array([[0.59542704, 0.60751694],
       [0.19115494, 0.36344892]], dtype=float32)>
    
    In [93]: tf.matmul(a[0],b[1])
    Out[93]:
    <tf.Tensor: id=416, shape=(2, 2), dtype=float32, numpy=
    array([[1.0542538 , 0.7525759 ],
       [0.26940605, 0.2440835 ]], dtype=float32)>
    
    In [94]: tf.matmul(a[0],b[2])
    Out[94]:
    <tf.Tensor: id=425, shape=(2, 2), dtype=float32, numpy=
    array([[1.1716112 , 0.4058628 ],
       [0.09086016, 0.28043625]], dtype=float32)>
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/fkpYLXmcAlzR3Iy1N76rKUOxd8sJ.png)

依然遵循上述的先最后两维做乘法，再依次组成结果，只是由于a的axis=0的值为1，所以是b在axis=0的所有都对应a的axis=0的唯一。（还是看代码和输出结果更清楚。）

所以得到三维上的结论：

先做最后两维的矩阵的乘法，再在不同维度重复多次。

多维的 tf.matmul(a, b) 的维度有如下两个要求：

1、a的axis=2的值（只可意会）和b的axis=1的值需要相等。

2、a和b的axis=0的值需要“相等”或者“有一个是1”。

再看更高维度，比如四维的情况。

复制代码

    In [96]: import tensorflow as tf
    ...: import numpy as np
    ...: a = tf.random.uniform([2, 1, 2, 3])
    ...: b = tf.random.uniform([2, 3, 3, 2])
    ...: c = tf.matmul(a, b)
    ...: c.shape
    ...:
    ...:
    Out[96]: TensorShape([2, 3, 2, 2])
    
    In [97]: c
    Out[97]:
    <tf.Tensor: id=454, shape=(2, 3, 2, 2), dtype=float32, numpy=
    array([[[[1.0685383 , 1.9015994 ],
         [1.1457413 , 1.5246255 ]],
    
        [[0.953201  , 1.5544493 ],
         [0.7639411 , 1.4360913 ]],
    
        [[0.67427766, 0.49847895],
         [0.499685  , 0.39281937]]],
    
    
       [[[0.42752475, 0.7453967 ],
         [0.3735991 , 0.74812794]],
    
        [[0.54442215, 0.6510606 ],
         [0.6632798 , 0.38497943]],
    
        [[0.3459217 , 0.96300673],
         [0.45035997, 0.90772474]]]], dtype=float32)>
    
    In [98]: tf.matmul(a[0],b[0])
    Out[98]:
    <tf.Tensor: id=463, shape=(3, 2, 2), dtype=float32, numpy=
    array([[[1.0685383 , 1.9015994 ],
        [1.1457413 , 1.5246255 ]],
    
       [[0.953201  , 1.5544493 ],
        [0.7639411 , 1.4360913 ]],
    
       [[0.67427766, 0.49847895],
        [0.499685  , 0.39281937]]], dtype=float32)>
    
    In [99]: tf.matmul(a[1],b[1])
    Out[99]:
    <tf.Tensor: id=472, shape=(3, 2, 2), dtype=float32, numpy=
    array([[[0.42752475, 0.7453967 ],
        [0.3735991 , 0.74812794]],
    
       [[0.54442215, 0.6510606 ],
        [0.6632798 , 0.38497943]],
    
       [[0.3459217 , 0.96300673],
        [0.45035997, 0.90772474]]], dtype=float32)>
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/6kHd8bozf1BhXUqDgCYQW2ymMNAS.png)

和三维时候是一致的，层层都依次做tf.matmul，也都能转化为最后两维的二维矩阵乘法。

同理，axis=0维度位置的值，有一个是1，也行：

复制代码

    In [100]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([1, 3, 3, 2])
     ...: c = tf.matmul(a, b)
     ...: c.shape
     ...:
     ...:
    Out[100]: TensorShape([2, 3, 2, 2])
    
    
    python

不再赘述

最终结论：不管几维矩阵都是先做最后两维的矩阵的乘法，再在不同维度重复多次。

多维的 tf.matmul(a, b) 的维度有如下两个要求：

1、a的axis=-1的值（只可意会）和b的axis=-2的值需要相等。

2、a和b的各维度的值（除了axis=-1和-2的值），在任意维度上，都需要“相等”或“有一个是1”。

另外给出一些维度数量对不上的例子，供意会：

复制代码

    In [105]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([1, 3, 2])
     ...: c = tf.matmul(a, b)
     ...: c.shape
    Out[105]: TensorShape([2, 1, 2, 2])
    
    In [106]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([7, 3, 2])
     ...: c = tf.matmul(a, b)
     ...: c.shape
    Out[106]: TensorShape([2, 7, 2, 2])
    
    In [107]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([7, 9, 3, 2])
     ...: c = tf.matmul(a, b)
     ...: c.shape
    ---------------------------------------------------------------------------
    InvalidArgumentError                      Traceback (most recent call last)
    <ipython-input-107-ff6e40117cf7> in <module>
      3 a = tf.random.uniform([2, 1, 2, 3])
      4 b = tf.random.uniform([7, 9, 3, 2])
    ----> 5 c = tf.matmul(a, b)
      6 c.shape
    
    D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\util\dispatch.py in wrapper(*args, **kwargs)
    178     """Call target, and fall back on dispatchers if there is a TypeError."""
    179     try:
    --> 180       return target(*args, **kwargs)
    181     except (TypeError, ValueError):
    182       # Note: convert_to_eager_tensor currently raises a ValueError, not a
    
    D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\ops\math_ops.py in matmul(a, b, transpose_a, transpose_b, adjoint_a, adjoint_b, a_is_sparse, b_is_sparse, name)
       2725         b = conj(b)
       2726         adjoint_b = True
    -> 2727       return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
       2728
       2729     # Neither matmul nor sparse_matmul support adjoint, so we conjugate
    
    D:\S\Anaconda3_v3\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py in batch_mat_mul_v2(x, y, adj_x, adj_y, name)
       1700       else:
       1701         message = e.message
    -> 1702       _six.raise_from(_core._status_to_exception(e.code, message), None)
       1703   # Add nodes to the TensorFlow graph.
       1704   if adj_x is None:
    
    D:\S\Anaconda3_v3\lib\site-packages\six.py in raise_from(value, from_value)
    
    InvalidArgumentError: In[0] and In[1] must have compatible batch dimensions: [2,1,2,3] vs. [7,9,3,2] [Op:BatchMatMulV2] name: MatMul/
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/BTkFNA2ed5QYESK17uy4m3GvCgVU.png)

a和b的维度对不上也可以用，规则是“向右看齐”。

后面讨论多维 tf.matmul(a, b, transpose_b=True) 的情况：

复制代码

    In [111]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([2, 1, 2, 3])
     ...: c = tf.matmul(a, b, transpose_b=True)
     ...: c.shape
    Out[111]: TensorShape([2, 1, 2, 2])
    
    In [112]: import tensorflow as tf
     ...: import numpy as np
     ...: a = tf.random.uniform([2, 1, 2, 3])
     ...: b = tf.random.uniform([1, 5, 2, 3])
     ...: c = tf.matmul(a, b, transpose_b=True)
     ...: c.shape
    Out[112]: TensorShape([2, 5, 2, 2])
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/WeCKlmvTn8EwBrqMufY9JHx4dG5V.png)

transpose只是对最后两维做了转置，用于二维矩阵乘法能对的上。

感觉有用请点赞~~~谢谢

全部评论 (0)

还没有任何评论哟~

【tensorflow】多维张量做tf.matmul

首发地址：https://zhuanlan.zhihu.com/p/138731311 线性代数都学过二维矩阵的乘法，而tf.matmul还可以处理多维矩阵，比如 importtensorflowas...

tf计算矩阵维度_多维张量做tf.matmul

线性代数都学过二维矩阵的乘法，而tf.matmul还可以处理多维矩阵，比如 importtensorflowastf importnumpyasnp a=tf.random.uniform[2,1,2...

Tensorflow张量

文章目录 TensorFlow框架特性创建Tensor对象张量的numpy方法 tf.cast函数创建全0张量和全1张量创建元素值都相同的张量创建随机数张量正态分布截断正态分布均匀分布...

tensorflow张量

张量 importtensorflowastf 1.定义: 6.tensorflow想象成一个n维数组,类型tf.tensor,tensor里面有两个重要概念: a.类型: dtype:定义类型: ...

理解PyTorch 张量的多维张量索引

PyTorch的多维张量索引非常强大，但理解索引的取值有一定的难度。下面以具体的示例数据解释这种复杂的索引操作。 1\.单个多维张量索引示例代码： importnumpyasnp data=torc...

Tensorflow中高维矩阵的乘法运算tf.matmul(tf.linalg.matmul)详悉

1.问题由来在tensorflow框架下，经常会用到矩阵的乘法运算，特别是高（多）维的矩阵运算，在这些矩阵运算时，经常使用到其中的tf.matmul或tf.linalg.matmul等函数。

【TensorFlow】张量Tensor

Tensor张量学习目标 1\.张量Tensor 1.1张量的类型 1.2张量的阶 2\.创建张量的指令 3\.张量的变换 3.1类型改变 3.2形状改变 4\.张量的数学运算学习目标目标知道...

tensorflow之张量

基础知识张量时具有统一类型称为dtype的多维数组。张量与np.arrays有一定的相似性。就像python的数值和字符串一样，所有的张量都是不可变的：永远无法更新张量的内容，只能创建新的张量。...

TensorFlow ：Tensor（张量）

在TensorFlow中，张量（Tensor）是基本的数据结构，代表多维数组的通用概念。它是计算的核心单元，广泛用于存储和处理数据。以下是对TensorFlow中张量的详细介绍： 1\.张量的定义基...

tf计算矩阵维度_tensorflow tf.matmul() （多维）矩阵相乘（多维矩阵乘法）

@tfexportmatmul defmatmula, b, transposea=False, transposeb=False, adjointa=False, adjointb=False, a...

是否确定退出登录?

【tensorflow】多维张量做tf.matmul

全部评论 (0)

相关文章推荐

【tensorflow】多维张量做tf.matmul

tf计算矩阵维度_多维张量做tf.matmul

Tensorflow张量

tensorflow张量

理解PyTorch 张量的多维张量索引

Tensorflow中高维矩阵的乘法运算tf.matmul(tf.linalg.matmul)详悉

【TensorFlow】张量Tensor

tensorflow之张量

TensorFlow ：Tensor（张量）

tf计算矩阵维度_tensorflow tf.matmul() （多维）矩阵相乘（多维矩阵乘法）