深度学习论文: Learning to Resize Images for Computer Vision Tasks及其PyTorch实现
基于深度学习的图像缩放方法及其在计算机视觉任务中的应用,并提供对应的PyTorch代码实现。 Learning to Resize Images for Computer Vision Tasks PDF: https://arxiv.org/pdf/2103.09950.pdf PyTorch代码: https://github.com/shanglianlm0525/CvPytorch PyTorch代码: https://github.com/shanglianlm0525/PyTorch-Networks
1 概述
在图像预处理过程中扮演着关键角色的操作是缩放,在这一过程中将不同尺寸的图像统一调整至相同的尺度。然而现有使用的 resize 技术仍然较为陈旧,并不能适应数据变换的需求。Google Research团队开发了一种自适应的 resize模块, 该模块仅需对预处理阶段进行小幅调整即可实现性能提升.该方法的应用将显著提升计算机视觉任务的表现.

2 Resizer
提出的resizer模型架构如下图:

主要包含两个关键特性:一是双线性特征调节尺度(bilinear feature scaling),二是跳跃连接(skip connection)。前者支持双尺度图像处理能力,并与CNN功能进行有效集成。
第一个特性基于原始分辨率提取的特征与模型保持一致。跳过的连接有助于简化学习流程;而这些预训练网络能够直接传递经过双线性上采样的图像至基础任务中进行处理。
相较于传统的编码器-解码器架构而言,本文提出的新体系结构能够灵活地将图像尺寸调节至任意的目标尺寸及指定的 aspect ratio。此外,在可学习的resizer性能方面几乎不受双线性插值方案选择的影响,这表明其可以直接替代现有的其他方法。
3 Experiments

PyTorch代码:
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from functools import partial
    
    """
    Learning to Resize Images for Computer Vision Tasks
    https://arxiv.org/pdf/2105.04714.pdf
    """
    
    def conv1x1(in_chs, out_chs = 16):
    return nn.Conv2d(in_chs, out_chs, kernel_size=1, stride=1, padding=0)
    
    
    def conv3x3(in_chs, out_chs = 16):
    return nn.Conv2d(in_chs, out_chs, kernel_size=3, stride=1, padding=1)
    
    
    def conv7x7(in_chs, out_chs = 16):
    return nn.Conv2d(in_chs, out_chs, kernel_size=7, stride=1, padding=3)
    
    
    class ResBlock(nn.Module):
    def __init__(self, in_chs,out_chs = 16):
        super(ResBlock, self).__init__()
        self.layers = nn.Sequential(
            conv3x3(in_chs, out_chs),
            nn.BatchNorm2d(out_chs),
            nn.LeakyReLU(0.2),
            conv3x3(out_chs, out_chs),
            nn.BatchNorm2d(out_chs)
        )
    def forward(self, x):
        identity = x
        out = self.layers(x)
        out += identity
        return out
    
    
    class Resizer(nn.Module):
    def __init__(self, in_chs, out_size, n_filters = 16, n_res_blocks = 1, mode = 'bilinear'):
        super(Resizer, self).__init__()
        self.interpolate_layer = partial(F.interpolate, size=out_size, mode=mode,
            align_corners=(True if mode in ('linear', 'bilinear', 'bicubic', 'trilinear') else None))
        self.conv_layers = nn.Sequential(
            conv7x7(in_chs, n_filters),
            nn.LeakyReLU(0.2),
            conv1x1(n_filters, n_filters),
            nn.LeakyReLU(0.2),
            nn.BatchNorm2d(n_filters)
        )
        self.residual_layers = nn.Sequential()
        for i in range(n_res_blocks):
            self.residual_layers.add_module(f'res{i}', ResBlock(n_filters, n_filters))
        self.residual_layers.add_module('conv3x3', conv3x3(n_filters, n_filters))
        self.residual_layers.add_module('bn', nn.BatchNorm2d(n_filters))
        self.final_conv = conv7x7(n_filters, in_chs)
    
    def forward(self, x):
        identity = self.interpolate_layer(x)
        conv_out = self.conv_layers(x)
        conv_out = self.interpolate_layer(conv_out)
        conv_out_identity = conv_out
        res_out = self.residual_layers(conv_out)
        res_out += conv_out_identity
        out = self.final_conv(res_out)
        out += identity
        return out
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
        