Advertisement

【GANs】Generative Adversarial Nets

阅读量:

【GANs】Generative Adversarial Nets

  • 1 GAN
    • 1.1 GANs的简介
    • 1.2 思想与目标函数
    • 1.3 GAN代码
    • 1.4 全局最优推导
    • 1.5 GANs方向展望

1 GAN

1.1 GANs的简介

The Generative Adversarial Nets (GANs) framework was established through a comprehensive analysis of the training process between the generator and discriminator networks. The fundamental concept revolves around a minimax game where the generator aims to approximate the data distribution while the discriminator seeks to distinguish between real and generated samples. This adversarial interaction leads to equilibrium when the generator successfully replicates the training data distribution, thereby achieving optimal performance. The theoretical underpinnings of GANs were initially explored in [Goodfellow et al., 2014], providing a robust foundation for understanding their behavior and applications. In recent years, numerous variants have been developed to enhance stability, efficiency, and diversity in data generation across various domains. These advancements have significantly expanded the practical utility of GANs in fields such as computer vision, natural language processing, and reinforcement learning. Despite their success, challenges related to training dynamics, mode collapse, and evaluation metrics remain active areas of research within the machine learning community. Addressing these issues is crucial for further improving the reliability and effectiveness of GAN-based approaches in real-world applications.

判别器模型(网络):常见的带标签的分类和回归等监督学习都属于判别器模型。该模型旨在通过分析数据集来估计给定输入变量x对应的输出变量y的概率分布p(y|x)

生成器网络:该生成器网络旨在估计联合概率分布函数p(x,y)。该网络通过分析输入样本x的概率密度函数来推导出条件概率分布函数p(x|y)

在现实中,
判别模型在分类任务中的表现往往超过生成模型。
但是一些具有创新性的工作中,
特别值得注意的是,
生成模型显得更为重要。

1.2 思想与目标函数

生成器与判别器通过持续的竞争达到系统平衡阶段,在此状态下,判别器无法分辨生成器输出图像的真实与虚假。

请添加图片描述

我们想学习到的是像训练集那样的数据分布:p_g(z)➡真实数据分布,z是噪声。

然而 我们可知的是,在寻求概率分布时会遇到极其复杂的计算问题。这才关注作者所采用的方式。

逼近过程

请添加图片描述

x \sim P_{data}
z \sim P_z

  • 我们看训练D(想让D性能高):
复制代码
    if x is from Pdata : D(x) ↑ --→ log(D(x)) ↑
    
    if z is from Pz : D(G(z)) ↓ --→ 1 - D(G(z)) ↑ <=> log(1 -  D(G(z))) ↑

建立基于D的目标函数

\max\nolimits_D \left( \mathbb{E}_{x \sim P_{data}}[\log D(x)] + \mathbb{E}_{z \sim P_z}(1 - D(G(z))) \right)

  • 在来看训练G(想让G性能高):
复制代码
    z is from Pz : D(z) ↑ --→ 1-D(G(z)) ↓ <=> log(1-D(G(z))) ↓

得到关于G的目标函数:
^{min}_{\space\space\space G} E_{z \sim P_z}(1-D(G(z)))

基于GD两个模型要素所形成的综合考量下构建总的目标函数(取自GAN原始论文中的核心公式);
通过这一目标函数展开训练工作,则可使生成器与鉴别器最终在对抗过程中实现均衡状态;

全局最优存在且函数收敛,在介绍完代码后证明

1.3 GAN代码

复制代码
    # GAN_2014.py
    import numpy as np
    import matplotlib.pyplot as plt
    from tensorflow.keras import Sequential,Model
    from tensorflow.keras.layers import Dense,Reshape,Input,Flatten
    from tensorflow.keras.layers import LeakyReLU,BatchNormalization
    from tensorflow.keras.datasets import mnist
    from tensorflow.keras.optimizers import Adam
    
    class GAN():
    	def __init__(self):
    		self.latent_dim = 100
    		self.img_rows = 28
    		self.img_cols = 28
    		self.channel  = 1
    		self.img_shape = (self.img_rows,self.img_cols,self.channel)
    		
    		self.discrinator = self.build_discrinator()         # 构建
    		optimizer = Adam(0.0002,0.5)
    		self.discrinator.compile(loss='binary_crossentropy',
    								 optimizer=optimizer,
    								 metrics=['accuracy'])
    		 
    		self.generator = self.build_generator()             # generator就可以完成从noise到img的生成
    		
    		self.discrinator.trainable = False
    		z = Input(shape=(self.latent_dim,))
    		img = self.generator(z)
    		validity = self.discrinator(img)
    
    		self.combined = Model(z,validity)
    		self.combined .compile(loss='binary_crossentropy',optimizer=optimizer)  # 完成了叠加模型的构建
    		
    		
    		
    	# 构建生成器模型
    	def build_generator(self):
    		model = Sequential()
    		# 添加一层全连接层
    		model.add(Dense(256, input_dim=self.latent_dim ))  # 输入噪声是一维含有100个元素的向量
    		model.add(LeakyReLU(alpha=0.2))
    		model.add(BatchNormalization(momentum=0.8))         # BN层
    		model.add(Dense(512))
    		model.add(LeakyReLU(alpha=0.2))
    		model.add(BatchNormalization(momentum=0.8))         # BN层
    
    		model.add(Dense(1024))
    		model.add(LeakyReLU(alpha=0.2))
    		model.add(BatchNormalization(momentum=0.8))         # BN层
    
    		model.add(Dense(np.prod(self.img_shape),activation='tanh'))  # 神经元个数(输出)等于图像尺寸乘积 
    		model.add(Reshape(self.img_shape))                 # reshape成图像形状 28*28*1
    		model.summary()                                    # 记录各层参数情况
    		
    		noise = Input(shape=(self.latent_dim,))
    		img = model(noise)
    		
    		return Model(noise,img)                            # 返回输入为噪声,输出为预测图的Model
    
    	# 构建判别器模型
    	def build_discrinator(self):
    		# 需要输入图片 然后判别真假  img--→label
    		model = Sequential()
    		model.add(Flatten(input_shape=self.img_shape))     # 将图像拉成一维
    		model.add(Dense(512))
    		model.add(LeakyReLU(alpha=0.2))
    		model.add(Dense(256))
    		model.add(LeakyReLU(alpha=0.2))
    		model.add(Dense(1,activation='sigmoid'))           # 输出结果落在0-1概率区间
    		
    		model.summary()                                    # 记录各层参数情况
    		
    		img = Input(shape=self.img_shape)
    		validity = model(img)                              # 将输入图输入模型之后预测出来的输出概率
    		return Model(img,validity)                         # 由图像生成可能性
    	
    	def train(self,epochs,batch_size=128,sample_interval=50):
    		# 获取mnist手写数字数据集
    		(X_train, _),(_,_) = mnist.load_data()             # shape : 60000*28*28
    		# 将图像值转化成(-1,1)
    		X_train = X_train / 127.5 - 1.
    		X_train = np.expand_dims(X_train,axis=3)           # 扩展成60000*28*28*1
    		
    		valid = np.ones((batch_size, 1))                   # 完成对Batch个图像进行1标签的操作
    		fake  = np.zeros((batch_size,1))                   # 完成对虚假图像进行0标签的操作
    		for epoch in range(epochs):
    			# ---------------------------
    			# 训练判别器
    			# ---------------------------
    			# 选择batch_size个图像样本(随机)
    			idx = np.random.randint(0,X_train.shape[0],batch_size)
    			imgs = X_train[idx]                            # batch_size*28*28*1  真实样本
    			noise = np.random.normal(0,1,(batch_size,self.latent_dim))
    			# 使用生成器将noise生成img
    			gen_imgs = self.generator.predict(noise)       # 完成了噪声生成图像 也就是 虚假图像
    			d_loss_real = self.discrinator.train_on_batch(imgs,valid)
    			d_loss_fake = self.discrinator.train_on_batch(gen_imgs,fake)
    			d_loss = 0.5 * np.add(d_loss_fake, d_loss_real)
    			# 完成了对判别器的训练
    			
    			# ---------------------------
    			# 训练生成器
    			# ---------------------------
    			noise = np.random.normal(0,1,(batch_size,self.latent_dim))  # batch_size*100
    			g_loss = self.combined.train_on_batch(noise,valid)
    			
    			print('%d [D loss: %f, acc.: %.2f%%] [G loss: %f]' % (epoch,d_loss[0], 100*d_loss[1],g_loss)) 
    			
    			# 每200轮保存一个Batch图像
    			if epoch % sample_interval == 0:
    				self.sample_images(epoch)
    		
    	def sample_images(self,epoch):
    		r,c = 5,5
    		noise = np.random.normal(0,1,(r*c,self.latent_dim))
    		gen_imgs = self.generator.predict(noise)
    		
    		# Rescale image 0 - 1
    		gen_imgs = 0.5 * gen_imgs + 0.5
    		
    		fig,axs = plt.subplots(r,c)
    		cnt = 0
    		for i in range(r):
    			for j in range(c):
    				axs[i,j].imshow(gen_imgs[cnt, :, :, 0],cmap='gray')
    				axs[i,j].axis('off')
    				cnt += 1
    		fig.savefig('images/%d.png' % epoch)
    		plt.close()
    			  
    
    if __name__ == '__main__':
    	gan = GAN()
    	gan.train(epochs=30000,batch_size=32,sample_interval=200)

tree

复制代码
    test
    │  GAN_2014.py
    └─ images
在这里插入图片描述

1.4 全局最优推导

请添加图片描述

x \sim P_{data}
z \sim P_z

主要目的或核心任务是建立一个生成对抗网络模型

standard generative model: it aims to model the generator, which entails determining the probability distribution P_g, with parameters \theta_g, and then employing maximum likelihood estimation: \arg \min KL(P_{data} || P_g).

在信息论中所述,在衡量两个概率分布之间的差异程度时所使用的指标KL散度,则用于表征目标概率分布P_{data}与生成模型所近似的概率分布P_g之间的信息论意义上的差异程度。

而GAN利用对抗学习逼近P_{data}

定义损失函数 V(D, G) = \min_G \max_D \mathbb{E}_{x \sim P_{data}}[\log D(x)] + \mathbb{E}_{x \sim P_g}(1-D(x)) ,其中 x=G(z) ,基于生成器参数的概率分布数据被称作...

请添加图片描述

先固定G,求V的最大值(^{max}_{\space\space\space D}V(D,G))

推导过程:
\begin{align} {^{max}_{\space\space\space D}V(D,G)} &=\int { {P_{data}}[\log (D(x))]dx + } \int { {P_g}[\log (1 - D(x))]dx} \\ &=\int { {P_{data}}[\log (D)]dx + } \int { {P_g}[\log (1 - D)dx} \end{align}
\frac{{\partial \mathop {\max }\limits_D V(D,G)}}{{\partial D}} = \int {\frac{{\partial {P_{data}}[\log (D) + {P_g}\log (1 - D)]}}{{\partial D}}dx = \int {{P_{data}}\frac{1}{D} + {P_g}\frac{{ - 1}}{{1 - D}}dx = 0} }
偏导等于0时,取得最大值,此时{{P_{data}}\frac{1}{D} + {P_g}\frac{{ - 1}}{{1 - D}}dx = 0}
那么求出:D^*_g=\frac{P_{data}}{P_{data}+p_g},也就是当G固定,D^*_g=\frac{P_{data}}{P_{data}+p_g}时,V取得最大值。

代入最优解:将最优解代入得到 D^*_g=\frac{P_{data}}{P_{data}+p_g} 后进行求取V的最小值运算。(即 \min_G V(D,G)

P_{data}=\frac{P_{data}+P_g}{2}=P_g时,'='成立,也就是当P_{data}=P_g.

D^*_{g}=\frac{P_{data}+P_g}{2}=\frac{1}{2}时,接近稳定点,训练结束达到全局最优解。

1.5 GANs方向展望

GAN相关

自提出以来持续火爆,在CVPR等顶级会议论文领域已连续多年发表数百篇相关研究论文;其中20年代发布约有数百篇核心论文、21年更是突破了千篇大关;研究团队持续致力于改进生成对抗网络(GAN)中的生成器与判别器网络模型架构;并创新性地结合最新的Transformer架构与GAN框架,在提升生成质量与判别准确性方面取得了重要进展

创作方面
在图像生成,风格转换,学习创造性风格上表现优异。

图像修复、画质提升
在图像修复,画质提升,场景还原上也有广泛应用。

视频生成,语言融合
在视频生成,语义融合中也有广泛应用。

论文方向

The architecture has undergone significant restructuring to enhance its operational efficiency. The system architecture has been completely redesigned to improve the performance of both the generator and discriminator components. The overall system's operational efficiency has been significantly enhanced through comprehensive architectural innovations and optimizations. Advanced technical strategies have been implemented to explore cutting-edge solutions for system design and operation.

从提升GANs训练效能的角度出发进行创新研究

Application
将已有技术整合或在新兴领域进行突破性应用以实现创新价值。

Training Strategy
从训练策略入手,添加预训练模型等,优化网络性能。

全部评论 (0)

还没有任何评论哟~