Image-to-Image Translation with Conditional Adversarial Networks 论文笔记

阅读量：

会议与时间 Computer Vision and Pattern Recognition, CCF-A, Nov 2016
在这里插入图片描述

简介很多视觉（语义分割、卫星图转化为地图）和图形学（图像生成、图像涂色）的任务都可以归结为Image-to-Image Translation任务，本篇就提出了一种解决这种问题的通用方法cGANS 。

文章目录

主要贡献/创新点
模型搭建
- 概述
- 模型主要结构
- - 生成器 Generator
  - - Encoder-Decoder
- U-Net decoder
  - 判别器 Discriminator
- 目标函数
存在的问题
分析与结论
- 消融实验
- - L1和cGANs缺一不可
  - U-Net结构优于Encoder-Decoder结构
  - Patch的大小对生成结果有影响， $70\times 70$ 为佳
- 其他特点和注意事项

主要贡献/创新点

证明条件GANs在很多图像到图像翻译任务上可以取得好结果。
提出一个能产生合理结果的框架，并分析其中构成部分的重要性（component）。

模型搭建

概述

cGANs不同于普通的GANs，除了随机噪声z外，还在Generator和Discriminator中加入了原图x来进行计算。即GAN是学习从噪声z到输出y的映射，conditional GAN是从图像x和噪声z到输出y的映射。
$G: \{x, z\} → y$

本文的模型由两部分组成：

Generator 本文的生成器使用U-Net结构，而非简单的Encoder-Decoder结构。
Discriminator 判别器使用PatchGAN ，即把图像分割成 $70 \times 70$ 的大小，然后输出一个矩阵而不是单个值来作为判别器的结果（区别于原始GAN的判别器）。

模型主要结构

生成器 Generator

参考U-Net结构，由Encoder-Decoder和Skip Connection组成。

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. CDk denotes a Convolution-BatchNormDropout-ReLU layer with a dropout rate of 50%. All convolutions are 4 × 4 spatial filters applied with stride 2. Convolutions in the encoder, and in the discriminator, downsample by a factor of 2, whereas in the decoder they upsample by a factor of 2.

Encoder-Decoder

结构
encoder:C64-C128-C256-C512-C512-C512-C512-C512
decoder:CD512-CD512-CD512-C512-C256-C128-C64

解码器最后一层后，应用一个卷积来映射到输出通道的数量(通常是3个，但在着色任务（colorization）中是2个)，最后接一个Tanh的激活函数。
与图例不对应的例外：BatchNorm并不应用于encoder的第一层（C64）
编码器使用LeakyRelu（解码器使普通Relu），slope为0.2.

U-Net decoder

个人理解这里就是那个Skip Connection

结构 CD512-CD1024-CD1024-C1024-C1024-C512 -C256-C128

判别器 Discriminator

结构 C64-C128-C256-C512

在最后一层后面，应用卷积来映射到1维输出，最后再跟一个Sigmoid函数。
与图例不对应的例外：BatchNorm并不应用于第一层（C64）。所有ReLUs都是LeakyReLU，slope为0.2。

目标函数

条件GANs的目标函数如下所示，G的目的是最小化此函数，而D就努力最大化它。
$\mathcal{L}_{c G A N}(G, D)= \mathbb{E}_{x, y}[\log D(x, y)]+\mathbb{E}_{x, z}[\log (1-D(x, G(x, z))]$

原始GANs的目标函数为
$\mathcal{L}_{ G A N}(G, D)= \mathbb{E}_{x, y}[\log D(y)]+\mathbb{E}_{x, z}[\log (1-D(G(x, z))]$

为了使生成结果与ground truth接近，本文又引入了L1 distance的项。
$\mathcal{L}_{L1}(G)=\mathbb{E}_{x,y,z}[||y-G(x,z)||_1]$

最终的目标函数为：
$G^{*}=\arg \min _{G} \max _{D} \mathcal{L}_{c G A N}(G, D)+\lambda \mathcal{L}_{L 1}(G)$

对抗损失保证图像看起来真实，L1损失项可以使生成结果与ground truth接近。
L1损失项虽然不适合捕捉高频信息，但是可以捕捉低频信号，而PatchGAN可以捕获高频信息 ，也可以被理解成texture/style loss 的一种形式。

存在的问题

对于colorization等特定的领域，有时候效果会差于针对性的算法。
虽然本模型对于生成高度详细或相片的问题上等图形学和图像处理任务是高效的，但对于视觉问题，目标(即预测接近ground truth的输出)可能比图形任务更明确，而像L1这样的重建损失基本上就足够了。

但是本文的GANs甚至实现了生成标签这一壮举。

分析与结论

消融实验

L1和GAN项对比，条件GAN和普通GAN对比，分别进行消融实验。

L1和cGANs缺一不可

单独的L1会使得结果很模糊，单独使用cGAN虽然会使得结果更sharper但是引入了视觉上的artifact。
L1在不确定时，会偏向于取中间颜色，会导致生成图像的Colorfulness，而cGANs能意识到这是"不真实"的。

U-Net结构优于Encoder-Decoder结构

U-Net只是在Encoder-Decoder的基础上加入skip connection而已，但结果却更好。
在这里插入图片描述

Patch的大小对生成结果有影响， $70\times 70$ 为佳

pixelGANs即 $1\times 1$ GANs不但不影响spatial sharpness，还提高了colorfulness of the result.
$16\times16$ 的PatchGAN虽然促进了sharp的输出，提高了FCN-scores，但是引起了tiling artifacts.

FCN-scores就是使用训练好的分类器来对生成的图片进行分类，评估生成的结果。

$286\times286$ 的ImageGAN没有明显提升输出结果的可视质量，还让FCN-score降低了。（可能是参数多，深度深，难训练。）

综上，以后只用 $70\times70$ 的patch就好了。

其他特点和注意事项

只使用400张图像，在单个Pascal Titan X GPU上训练不到两个小时，就可以取得不错的结果。
在这里插入图片描述

欧式距离容易生成模糊的结果。设计合适的损失函数往往需要专业知识，但是GANs可以自行学习损失函数 。

全部评论 (0)

还没有任何评论哟~

《Image-to-Image Translation with Conditional Adversarial Networks》论文笔记

论文链接摘要我们研究条件对抗网络的目的是将之作为一种图片到图片“翻译”问题的通用的解决方法。这些网络不仅学习了从输入图像到输出图像的映射，还学习了训练这个映射的损失函数。这使得将这个一般方法解决通...

Image-to-Image Translation with Conditional Adversarial Networks 论文笔记

会议与时间ComputerVisionandPatternRecognition,CCFA,Nov2016 简介很多视觉（语义分割、卫星图转化为地图）和图形学（图像生成、图像涂色）的任务都可以归结为I...

【论文笔记】pix2pix Image-to-Image Translation with Conditional Adversarial Networks

【论文笔记】pix2pixImagetoImageTranslationwithConditionalAdversarialNetworks 1.工作概述 2.项目背景 3.相关工作 4.模型结构 4...

Image-to-Image Translation with Conditional Adversarial Networks

参考文献: https://arxiv.org/pdf/1611.07004.pdf githubtensorflow实现代码: https://github.com/yenchenlin/pix2p...

Image-to-Image Translation with Conditional Adversarial Networks

论文下载地址：<https://arxiv.org/abs/1611.07004 一、摘要本文研究条件对抗网络作为图像到图像转换问题的通用解决方案。这些网络不仅学习从输入图像到输出图像的映射，还学习...

Image-to-Image Translation with Conditional Adversarial Networks

ImagetoImageTranslationwithConditionalAdversarialNetworks PhillipIsola,JunYanZhu,TinghuiZhou,AlexeiA...

论文阅读——《Image-to-Image Translation with Conditional Adversarial Networks》

论文阅读之ImagetoImageTranslationwithConditionalAdversarialNetworks Introduction 这篇论文主要讲了如何将conditionalGA...

《Image-to-Image Translation with Conditional Adversarial Networks》论文总结

《ImagetoImageTranslationwithConditionalAdversarialNetworks》论文总结图1图像到图像的转换目前对于图像到图像间的转换问题（如图1所示）有多种...

论文总结：Image-to-Image Translation with Conditional Adversarial Networks

论文地址论文想要解决什么问题（动机motivation 图像处理、计算机图形学和计算机视觉中的许多问题都可以被视为将输入图像“翻译”成相应的输出图像。“翻译”常用于语言之间的翻译，比如中文和英文的之...

『论文阅读』Image-to-Image Translation with Conditional Adversarial Networks

来源：ImagetoImageTranslationwithConditionalAdversarialNets 源码：Github 这篇文章主要提供了一个基于cGAN的模型，并且利用这个genera...

是否确定退出登录?

Image-to-Image Translation with Conditional Adversarial Networks 论文笔记

文章目录

主要贡献/创新点

模型搭建

概述

模型主要结构

生成器 Generator

Encoder-Decoder

U-Net decoder

判别器 Discriminator

目标函数

存在的问题

分析与结论

消融实验

L1和cGANs缺一不可

U-Net结构优于Encoder-Decoder结构

Patch的大小对生成结果有影响，70\times 70为佳

其他特点和注意事项

全部评论 (0)

相关文章推荐

《Image-to-Image Translation with Conditional Adversarial Networks》论文笔记

Image-to-Image Translation with Conditional Adversarial Networks 论文笔记

【论文笔记】pix2pix Image-to-Image Translation with Conditional Adversarial Networks

Image-to-Image Translation with Conditional Adversarial Networks

Image-to-Image Translation with Conditional Adversarial Networks

Image-to-Image Translation with Conditional Adversarial Networks

论文阅读——《Image-to-Image Translation with Conditional Adversarial Networks》

《Image-to-Image Translation with Conditional Adversarial Networks》论文总结

论文总结：Image-to-Image Translation with Conditional Adversarial Networks

『论文阅读』Image-to-Image Translation with Conditional Adversarial Networks

Patch的大小对生成结果有影响， $70\times 70$ 为佳