RuntimeError Expected tensor for argument #1 'input' to have the same device as tensor for argument

阅读量：

RuntimeError: The tensors for arguments #1 'input' and #2 'weight' must be consistent in terms of their devices; however, they differ specifically between device 1 and 0 when verifying the parameters for the custom convolution operation.

记录该处出现的bug，并将详细信息保存下来。最初认为问题出在gpu并行设备配置上，但经过进一步排查发现原因在于模型内部函数调用流程出了问题。

主要整理内容源自pytorch的issue

以两个模块为例，在采用DataParallel的情况下会出现错误；而testModule2则不会出现错误。

复制代码

 import torch

    
 from torch import nn
    
  
    
 class testModule(nn.Module):
    
     def __init__(self):
    
     super(testModule, self).__init__()
    
     self.g = nn.Conv2d(in_channels=1, out_channels=1,
    
                      kernel_size=1, stride=1, padding=0)
    
     self.operation_function = self._realOperation
    
  
    
     def forward(self, x):
    
     output = self.operation_function(x)
    
     return output
    
  
    
     def _realOperation(self, x):
    
     x = self.g(x)
    
     return x
    
  
    
 class testModule2(nn.Module):
    
     def __init__(self):
    
     super(testModule2, self).__init__()
    
     self.g = nn.Conv2d(in_channels=1, out_channels=1,
    
                      kernel_size=1, stride=1, padding=0)
    
     def forward(self, x):
    
     x = self.g(x)
    
     return x
    
  
    
 if __name__ == '__main__':
    
     input = torch.rand(4, 1, 1, 1).cuda()
    
     net = testModule()
    
     net2 = testModule2()
    
     gpu_num = torch.cuda.device_count()
    
     print('GPU NUM: {:2d}'.format(gpu_num))
    
     if gpu_num > 1:
    
         net = torch.nn.DataParallel(net, list(range(gpu_num))).cuda()
    
         net2 = torch.nn.DataParallel(net2, list(range(gpu_num))).cuda()
    
     out2 = net2(input)
    
     print(out2.size())
    
     out = net(input)
    
     print(out.size())

复制代码

        self.operation_function = self._realOperation

原因不言而喻，在上述论述中涉及到了一种属性绑定对应的方式。当我们把模块分配到不同的GPU时，该属性（非张量）仅进行复制。这表明该模块的所有广播副本均引用了相同的绑定方法属性，并且该方法被绑定到同一个实例。因而采用相同的self.g后，在GPU 0上即可拥有所有参数。这会导致在GPU 1上出现错误。

在testModule2中，在每个广播副本之前执行操作时，程序会通过动态查找确定self.g为该副本的g属性，并将其参数传递到对应的GPU上处理。

解决方法：

涉及方法绑定的地方被放置到forward中执行 即被放置到forward中的操作包括自定义的操作如self._realOperation

除此之外，还可以将self.operation_function表示为该类的另一种方法（尝试了一下似乎不适用），并发现这一做法可能存在一定的问题（不确定是否存在问题）。

全部评论 (0)

还没有任何评论哟~

RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument

RuntimeError:Expectedtensorforargument1‘input’tohavethesamedeviceastensorforargument2‘weight’;butdev...

RuntimeError Expected tensor for argument #1 'input' to have the same device as tensor for argument

RuntimeError:Expectedtensorforargument1'input'tohavethesamedeviceastensorforargument2'weight';butdev...

Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’；

RuntimeErrorExpectedtensorforargument1'input'tohavethesamedeviceastensorforargumentruntimeerror:expe...

Expected tensor for argument #1 ‘indices‘ to have scalar type Long； but got torch.cuda.FloatTensor i

Expectedtensorforargument1'indices'tohavescalartypeLong;butgottorch.cuda.FloatTensorinsteadwhilechec...

Expected tensor for argument #1 ‘indices’ to have scalar type Long；but got torch.IntTensor instead

Expectedtensorforargument1‘indices’tohavescalartypeLong;butgottorch.IntTensorinstead 构建数据集代码 defdata...

Tensor for ‘out‘ is on CPU, Tensor for argument #1 ‘self‘ is on CPU

1、问题模型训练完后进行测试，报错 RuntimeError:Tensorfor'out'isonCPU,Tensorforargument1'self'isonCPU,butexpectedthe...

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 ‘self‘ in

RuntimeError:Expectedobjectofdevicetypecudabutgotdevicetypecpuforargument1'self'incalltothnnconv2dfo...

Tensor for argument #2 ‘mat1‘ is on CPU, but expected it to be on GPU (while checking arguments for

Tensorforargument2‘mat1’isonCPU,butexpectedittobeonGPUwhilecheckingargumentsforaddmm 需要将模型和输入数据都移动到d...

TypeError: expected Tensor as element 0 in argument 0, but got Embedding

项目场景：修改如下代码，使用预训练的数据对uembeds和iembeds进行初始化。 classmodelnn.Module definitself,args: super.init self.nu...

RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in

这是我在第一次用pytorch的时候遇见的一个问题，对于新手来说，折磨我一个多小时。我用的pycharm 解决办法： Google一下，有这么个博客：检查下面几点: 1.模型是否放到了CUDA上m...

是否确定退出登录?

RuntimeError Expected tensor for argument #1 'input' to have the same device as tensor for argument

解决方法：

全部评论 (0)

相关文章推荐

RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument

RuntimeError Expected tensor for argument #1 'input' to have the same device as tensor for argument

Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’；

Expected tensor for argument #1 ‘indices‘ to have scalar type Long； but got torch.cuda.FloatTensor i

Expected tensor for argument #1 ‘indices’ to have scalar type Long；but got torch.IntTensor instead

Tensor for ‘out‘ is on CPU, Tensor for argument #1 ‘self‘ is on CPU

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 ‘self‘ in

Tensor for argument #2 ‘mat1‘ is on CPU, but expected it to be on GPU (while checking arguments for

TypeError: expected Tensor as element 0 in argument 0, but got Embedding

RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in