Advertisement

tf计算矩阵维度_多维张量做tf.matmul

阅读量:
f98521212f3a93bed438704184f1a572.png

线性代数都学过二维矩阵的乘法,而tf.matmul还可以处理多维矩阵,比如

复制代码
 import tensorflow as tf

    
 import numpy as np
    
 a = tf.random.uniform([2, 1, 2, 3])
    
 b = tf.random.uniform([1, 3, 3, 2])
    
 c = tf.matmul(a, b)

c是什么呢?

先给出结论:无论多少维度的矩阵,在完成最后两维的矩阵乘法后,在不同维度上重复多次。

多维的 tf.matmul(a, b) 的维度有如下两个要求:

1、a沿轴-1展开所得的结果与b沿轴-2展开所得的结果应当相同。例如,在上述示例中数组[456789;987654;567894;456789;987654;567894;456789;987654;567894;456789;987654;567894]中的最后一个元素为...以及数组[...]中的第二个元素。

对于a和b的所有各个维度的值(除轴为-1和-2的情况外),在其任一给定维度中必须满足其值要么相等,要么其中至少一个为1。

比如,[3, 2, 3]维度的张量与[3, 3, 2]维度的张量做tf.matmul的例子:

复制代码
 In [84]: import tensorflow as tf

    
     ...: import numpy as np
    
     ...: a = tf.random.uniform([3, 2, 3])
    
     ...: b = tf.random.uniform([3, 3, 2])
    
     ...: c = tf.matmul(a, b)
    
     ...: c.shape
    
     ...:
    
     ...:
    
  
    
 Out[84]: TensorShape([3, 2, 2])
    
  
    
 In [87]: tf.matmul(a[0],b[0])
    
 Out[87]:
    
 <tf.Tensor: id=374, shape=(2, 2), dtype=float32, numpy=
    
 array([[1.4506222 , 1.323427  ],
    
    [0.28268352, 0.2917934 ]], dtype=float32)>
    
  
    
 In [88]: tf.matmul(a[1],b[1])
    
 Out[88]:
    
 <tf.Tensor: id=383, shape=(2, 2), dtype=float32, numpy=
    
 array([[1.0278544 , 0.4219831 ],
    
    [0.865297  , 0.87740964]], dtype=float32)>
    
  
    
 In [89]: c
    
 Out[89]:
    
 <tf.Tensor: id=365, shape=(3, 2, 2), dtype=float32, numpy=
    
 array([[[1.4506222 , 1.323427  ],
    
     [0.28268352, 0.2917934 ]],
    
  
    
    [[1.0278544 , 0.4219831 ],
    
     [0.865297  , 0.8774096 ]],
    
  
    
    [[0.5752927 , 0.13066964],
    
     [0.5343988 , 0.2741483 ]]], dtype=float32)>

可以看到,[3, 2, 3]维度的张量与[3, 3, 2]维度的张量做tf.matmul,可以理解成:

第一步,在axis=1和2的维度上进行[2, 3]维与[3, 2]维张量间的二维矩阵乘法运算,并得到[2, 2]维的结果。

在第二部分中,在axis=0维度上依次选择a和b中的对应元素,并对这些元素执行第一步操作。完成这一系列操作后,将获得一个具有[3, 2, 2]维度的结果数组。

如果,a和b的axis=0维度对不上,会bug:

复制代码
 In [95]: import tensorflow as tf

    
     ...: import numpy as np
    
     ...: a = tf.random.uniform([2, 2, 3])
    
     ...: b = tf.random.uniform([3, 3, 2])
    
     ...: c = tf.matmul(a, b)
    
     ...: c.shape
    
     ...:
    
     ...:
    
 ---------------------------------------------------------------------------
    
 InvalidArgumentError                      Traceback (most recent call last)
    
 <ipython-input-95-462c4976a35a> in <module>
    
       3 a = tf.random.uniform([2, 2, 3])
    
       4 b = tf.random.uniform([3, 3, 2])
    
 ----> 5 c = tf.matmul(a, b)
    
       6 c.shape
    
       7
    
  
    
 D:SAnaconda3_v3libsite-packagestensorflow_corepythonutildispatch.py in wrapper(*args, **kwargs)
    
     178     """Call target, and fall back on dispatchers if there is a TypeError."""
    
     179     try:
    
 --> 180       return target(*args, **kwargs)
    
     181     except (TypeError, ValueError):
    
     182       # Note: convert_to_eager_tensor currently raises a ValueError, not a
    
  
    
 D:SAnaconda3_v3libsite-packagestensorflow_corepythonopsmath_ops.py in matmul(a, b, transpose_a, transpose_b, adjoint_a, adjoint_b, a_is_sparse, b_is_sparse, name)
    
    2725         b = conj(b)
    
    2726         adjoint_b = True
    
 -> 2727       return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
    
    2728
    
    2729     # Neither matmul nor sparse_matmul support adjoint, so we conjugate
    
  
    
 D:SAnaconda3_v3libsite-packagestensorflow_corepythonopsgen_math_ops.py in batch_mat_mul_v2(x, y, adj_x, adj_y, name)
    
    1700       else:
    
    1701         message = e.message
    
 -> 1702       _six.raise_from(_core._status_to_exception(e.code, message), None)
    
    1703   # Add nodes to the TensorFlow graph.
    
    1704   if adj_x is None:
    
  
    
 D:SAnaconda3_v3libsite-packagessix.py in raise_from(value, from_value)
    
  
    
 InvalidArgumentError: In[0] and In[1] must have compatible batch dimensions: [2,2,3] vs. [3,3,2] [Op:BatchMatMulV2] name: MatMul/

但是当a和b中axis=0的值有一个是1,不会bug:

复制代码
 In [90]: import tensorflow as tf

    
     ...: import numpy as np
    
     ...: a = tf.random.uniform([1, 2, 3])
    
     ...: b = tf.random.uniform([3, 3, 2])
    
     ...: c = tf.matmul(a, b)
    
     ...: c.shape
    
     ...:
    
     ...:
    
 Out[90]: TensorShape([3, 2, 2])
    
  
    
 In [91]: c
    
 Out[91]:
    
 <tf.Tensor: id=398, shape=(3, 2, 2), dtype=float32, numpy=
    
 array([[[0.59542704, 0.60751694],
    
     [0.19115494, 0.36344892]],
    
  
    
    [[1.0542538 , 0.75257593],
    
     [0.26940605, 0.24408351]],
    
  
    
    [[1.1716111 , 0.4058628 ],
    
     [0.09086016, 0.28043625]]], dtype=float32)>
    
  
    
 In [92]: tf.matmul(a[0],b[0])
    
 Out[92]:
    
 <tf.Tensor: id=407, shape=(2, 2), dtype=float32, numpy=
    
 array([[0.59542704, 0.60751694],
    
    [0.19115494, 0.36344892]], dtype=float32)>
    
  
    
 In [93]: tf.matmul(a[0],b[1])
    
 Out[93]:
    
 <tf.Tensor: id=416, shape=(2, 2), dtype=float32, numpy=
    
 array([[1.0542538 , 0.7525759 ],
    
    [0.26940605, 0.2440835 ]], dtype=float32)>
    
  
    
 In [94]: tf.matmul(a[0],b[2])
    
 Out[94]:
    
 <tf.Tensor: id=425, shape=(2, 2), dtype=float32, numpy=
    
 array([[1.1716112 , 0.4058628 ],
    
    [0.09086016, 0.28043625]], dtype=float32)>

仍然遵循上述步骤:首先对最后两个维度进行乘法运算;接着依次构建结果矩阵或数组;值得注意的是由于矩阵a沿着轴0(即第一维)所有元素均为1这一特性;因此整个操作实际上等价于将矩阵b沿着轴0的所有元素与a轴0元素进行一一对应的操作(具体效果可以通过查看代码和运行结果来验证)。

所以得到三维上的结论:

先做最后两维的矩阵的乘法,再在不同维度重复多次。

多维的 tf.matmul(a, b) 的维度有如下两个要求:

1、a的axis=2的值(只可意会)和b的axis=1的值需要相等。

2、a和b的axis=0的值需要“相等”或者“有一个是1”。

再看更高维度,比如四维的情况。

复制代码
 In [96]: import tensorflow as tf

    
     ...: import numpy as np
    
     ...: a = tf.random.uniform([2, 1, 2, 3])
    
     ...: b = tf.random.uniform([2, 3, 3, 2])
    
     ...: c = tf.matmul(a, b)
    
     ...: c.shape
    
     ...:
    
     ...:
    
 Out[96]: TensorShape([2, 3, 2, 2])
    
  
    
 In [97]: c
    
 Out[97]:
    
 <tf.Tensor: id=454, shape=(2, 3, 2, 2), dtype=float32, numpy=
    
 array([[[[1.0685383 , 1.9015994 ],
    
      [1.1457413 , 1.5246255 ]],
    
  
    
     [[0.953201  , 1.5544493 ],
    
      [0.7639411 , 1.4360913 ]],
    
  
    
     [[0.67427766, 0.49847895],
    
      [0.499685  , 0.39281937]]],
    
  
    
  
    
    [[[0.42752475, 0.7453967 ],
    
      [0.3735991 , 0.74812794]],
    
  
    
     [[0.54442215, 0.6510606 ],
    
      [0.6632798 , 0.38497943]],
    
  
    
     [[0.3459217 , 0.96300673],
    
      [0.45035997, 0.90772474]]]], dtype=float32)>
    
  
    
 In [98]: tf.matmul(a[0],b[0])
    
 Out[98]:
    
 <tf.Tensor: id=463, shape=(3, 2, 2), dtype=float32, numpy=
    
 array([[[1.0685383 , 1.9015994 ],
    
     [1.1457413 , 1.5246255 ]],
    
  
    
    [[0.953201  , 1.5544493 ],
    
     [0.7639411 , 1.4360913 ]],
    
  
    
    [[0.67427766, 0.49847895],
    
     [0.499685  , 0.39281937]]], dtype=float32)>
    
  
    
 In [99]: tf.matmul(a[1],b[1])
    
 Out[99]:
    
 <tf.Tensor: id=472, shape=(3, 2, 2), dtype=float32, numpy=
    
 array([[[0.42752475, 0.7453967 ],
    
     [0.3735991 , 0.74812794]],
    
  
    
    [[0.54442215, 0.6510606 ],
    
     [0.6632798 , 0.38497943]],
    
  
    
    [[0.3459217 , 0.96300673],
    
     [0.45035997, 0.90772474]]], dtype=float32)>

在三维情况下具有相同的效果,在每一层都按照顺序执行tf.matmul操作,并且都能够转换为最终两个维度上的二维矩阵乘法运算。

同理,axis=0维度位置的值,有一个是1,也行:

复制代码
 In [100]: import tensorflow as tf

    
      ...: import numpy as np
    
      ...: a = tf.random.uniform([2, 1, 2, 3])
    
      ...: b = tf.random.uniform([1, 3, 3, 2])
    
      ...: c = tf.matmul(a, b)
    
      ...: c.shape
    
      ...:
    
      ...:
    
 Out[100]: TensorShape([2, 3, 2, 2])

不再赘述

无论多维矩阵都是先对最后两个维度进行相乘运算,在各个维度上反复进行操作以完成整体计算。

多维的 tf.matmul(a, b) 的维度有如下两个要求:

1、a的axis=-1的值(只可意会)和b的axis=-2的值需要相等。

2、由a与b所组成的结构中的各个维度(除轴为-1、-2的位置外),在其各个轴上的值必须满足以下两个条件:一是所有轴上的数值相等;二是存在至少一个轴其数值为1。

另外给出一些维度数量对不上的例子,供意会:

复制代码
 In [105]: import tensorflow as tf

    
      ...: import numpy as np
    
      ...: a = tf.random.uniform([2, 1, 2, 3])
    
      ...: b = tf.random.uniform([1, 3, 2])
    
      ...: c = tf.matmul(a, b)
    
      ...: c.shape
    
 Out[105]: TensorShape([2, 1, 2, 2])
    
  
    
 In [106]: import tensorflow as tf
    
      ...: import numpy as np
    
      ...: a = tf.random.uniform([2, 1, 2, 3])
    
      ...: b = tf.random.uniform([7, 3, 2])
    
      ...: c = tf.matmul(a, b)
    
      ...: c.shape
    
 Out[106]: TensorShape([2, 7, 2, 2])
    
  
    
 In [107]: import tensorflow as tf
    
      ...: import numpy as np
    
      ...: a = tf.random.uniform([2, 1, 2, 3])
    
      ...: b = tf.random.uniform([7, 9, 3, 2])
    
      ...: c = tf.matmul(a, b)
    
      ...: c.shape
    
 ---------------------------------------------------------------------------
    
 InvalidArgumentError                      Traceback (most recent call last)
    
 <ipython-input-107-ff6e40117cf7> in <module>
    
       3 a = tf.random.uniform([2, 1, 2, 3])
    
       4 b = tf.random.uniform([7, 9, 3, 2])
    
 ----> 5 c = tf.matmul(a, b)
    
       6 c.shape
    
  
    
 D:SAnaconda3_v3libsite-packagestensorflow_corepythonutildispatch.py in wrapper(*args, **kwargs)
    
     178     """Call target, and fall back on dispatchers if there is a TypeError."""
    
     179     try:
    
 --> 180       return target(*args, **kwargs)
    
     181     except (TypeError, ValueError):
    
     182       # Note: convert_to_eager_tensor currently raises a ValueError, not a
    
  
    
 D:SAnaconda3_v3libsite-packagestensorflow_corepythonopsmath_ops.py in matmul(a, b, transpose_a, transpose_b, adjoint_a, adjoint_b, a_is_sparse, b_is_sparse, name)
    
    2725         b = conj(b)
    
    2726         adjoint_b = True
    
 -> 2727       return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
    
    2728
    
    2729     # Neither matmul nor sparse_matmul support adjoint, so we conjugate
    
  
    
 D:SAnaconda3_v3libsite-packagestensorflow_corepythonopsgen_math_ops.py in batch_mat_mul_v2(x, y, adj_x, adj_y, name)
    
    1700       else:
    
    1701         message = e.message
    
 -> 1702       _six.raise_from(_core._status_to_exception(e.code, message), None)
    
    1703   # Add nodes to the TensorFlow graph.
    
    1704   if adj_x is None:
    
  
    
 D:SAnaconda3_v3libsite-packagessix.py in raise_from(value, from_value)
    
  
    
 InvalidArgumentError: In[0] and In[1] must have compatible batch dimensions: [2,1,2,3] vs. [7,9,3,2] [Op:BatchMatMulV2] name: MatMul/

a和b的维度对不上也可以用,规则是“向右看齐”。

后面讨论多维 tf.matmul(a, b, transpose_b=True) 的情况:

复制代码
 In [111]: import tensorflow as tf

    
      ...: import numpy as np
    
      ...: a = tf.random.uniform([2, 1, 2, 3])
    
      ...: b = tf.random.uniform([2, 1, 2, 3])
    
      ...: c = tf.matmul(a, b, transpose_b=True)
    
      ...: c.shape
    
 Out[111]: TensorShape([2, 1, 2, 2])
    
  
    
 In [112]: import tensorflow as tf
    
      ...: import numpy as np
    
      ...: a = tf.random.uniform([2, 1, 2, 3])
    
      ...: b = tf.random.uniform([1, 5, 2, 3])
    
      ...: c = tf.matmul(a, b, transpose_b=True)
    
      ...: c.shape
    
 Out[112]: TensorShape([2, 5, 2, 2])

transpose只是对最后两维做了转置,用于二维矩阵乘法能对的上。

-----------------------------------------------------------

感觉有用请点赞~谢谢~

全部评论 (0)

还没有任何评论哟~