ai生成图片是什么技术_人工智能如何学习生成猫的图片
ai生成图片是什么技术
by Thomas Simonini
通过托马斯·西蒙尼(Thomas Simonini)
人工智能能够通过复杂算法模型来实现对图像数据的学习与分析,并基于这些数据训练出能够模仿人类认知能力的技术系统。该系统可以通过深度神经网络架构构建出具备自我改进能力的学习机制,并利用强化学习方法优化其图像生成效果。在实际应用中,这类技术系统能够根据输入样本的特点自动调整参数配置,并在此基础上完成高质量图像样本的数据合成工作。
In 2014, the research paper Generative Adversarial Nets by Goodfellow et al. is considered a landmark achievement in the domain of generative models.
2014年,研究论文Generative Adversarial Nets 由Goodfellow等提出。(GAN) 是生成模型领域的重大突破。
Renowned researcher Yann Lecun designated adversarial nets as the most groundbreaking concept in machine learning over the past two decades.
Yann Lecun自称对抗网络是"近年来机器学习领域最具革命性的概念之一"。
Today, thanks to this architecture, we will be constructing an artificial intelligence system designed to generate realistic cat images. This truly is a remarkable achievement!
今天我们将基于这种架构来开发一个AI系统用于生成逼真的猫咪图像。非常令人兴奋!
To visit my GitHub repository containing the full working code, please go to my Github repository. Having experience in Python, Deep Learning, and Tensorflow is beneficial. Additionally, knowledge of CNNs (Convolutional Neural Networks) is recommended.
请访问我的GitHub存储库https://github.com/simoninithomas/CatDCGAN,以获取完整的工作代码。如果您具备Python编程基础、深度学习知识以及Tensorflow和CNN的相关技能,则该项目将对您有所帮助。
If you are new to Deep Learning, please refer to this highly recommended comprehensive article series:
如果您是深度学习的新手,请查看以下出色的系列文章:
Machine learning can be enjoyable.[The world's simplest guide to machine learning medium.com
机器学习很有趣! 全球最简单的Machine Learning medium.com简介
什么是DCGAN? (What is DCGAN?)
Deep Convolutional Generative Adversarial Networks (or DCGAN) are a type of deep learning architecture capable of classifying as those found in their training data.
一种基于convolutional deep的生成对抗网络体系结构被称为DCGAN。该体系结构能够产生的内容具有与训练样本的数据相似性。
该模型通过将生成对抗网络中的完全连接层替换为卷积层来实现改进。
该模型用卷积层替换了生成对抗网络模型的完全连接层。
To elucidate how DCGAN operates, let's employ an analogy involving an art expert and a forger.
为了说明DCGAN的工作原理,让我们使用 **** 艺术专家和伪造者的隐喻。
The counterfeiter, alternatively referred to as "the generator," is endeavouring to create forgeries of the artist's works and present them as authentic.
仿制者(亦称'生成者')致力于制造一件梵高式的假画,并将其扩散到真实作品的领域中。
同时, 艺术专家(亦称‘鉴别者’)通过运用其对梵高作品的专业知识, 设法识别出假冒者。
另一方面,在绘画领域中,“辨别者”(即艺术鉴定家)通过对其专业知识的运用和深入研究梵高作品的经验来识别假制品。
Through consistent practice over an extended period of time, an art expert enhances their proficiency in identifying authentic versus fake artworks. Meanwhile, the forger becomes increasingly skilled in producing convincing replicas.
随着时间推移(替换为"随着时间流逝"),艺术专家对于辨识仿制品的能力将得到提升(替换为"会变得更好"),同时(使用"同时"替代"而")仿制品对于真迹的模仿效果也会逐渐增强(替换为"变得更好")。
From our perspective, DCGANs are built from two distinct deep neural networks in competition with one another.
如我们所见,DCGAN由相互竞争的两个独立的深度神经网络组成。
The generator acts as a forger, aiming to create data that appears real. It does not know what constitutes real data but is capable of adapting based on feedback from another model.
生成器扮演着一个模仿者的角色,在其运行过程中旨在模仿真实数据的特征,并通过分析其他模型的反馈机制来优化自身的性能。
- The Discriminator system plays the role of an evaluator, tasked with identifying characteristics of synthetic counterfeit data by comparing it with authentic datasets. It operates with precision, ensuring that it does not generate unnecessary false positives on authentic datasets. The generated outputs from this model will be employed in the backpropagation process of the generator.
鉴別者 __ 是一名監督員,旨在識別看似真实的虚假數據,并尽可能避免誤判真正存在的數據。 其模型輸出將被用於生成器的逆向傳播Processing.
- The generator takes a random noise vector and generates a picture.
生成器获取随机噪声矢量并生成图片。
This picture is passed into the discriminator, which analyzes the training set as opposed to the generated image.
该图片被馈送到鉴别器,鉴别器将训练集与生成的图像进行比较。
- The discriminator returns a number between 0 (fake image) and 1 (real image).
鉴别器返回介于0(伪图像)和1(真实图像)之间的数字。
让我们创建一个DCGAN! (Let’s create a DCGAN!)
Now, we’re ready to create our AI.
现在,我们准备创建我们的AI。
In this section, we will emphasize the core components of our model. If you're interested in accessing the complete code, you can refer to the notebook provided here: https://github.com/simoninithomas/CatDCGAN/blob/master/Cat DCGAN.ipynb.
在此处中, 我们将着重分析模型的核心要素, 包括其架构设计与工作原理等关键组成部分. 如需查阅完整的代码库链接, 请访问以下地址: https://github.com/simoninithomas/CatDCGAN/blob/master/Cat DCGAN.ipynb
输入项 (Inputs)
Here, we construct the input placeholders: x_{real} used for distinguishing real data from x_{generated} by the discriminator and z used as latent space input in the generator network.
在这里,我们创建输入占位符:鉴别符的inputs_real和生成器的inputs_z。
Note that we employ two learning rates, one assigned to the generator and another allocated to the discriminator.
请注意,我们使用两种学习率,一种用于生成器,另一种用于鉴别器。
DCGANs are highly sensitive to hyperparameters, making it crucial that they be tuned with precision.
DCGAN对超参数非常敏感,因此精确调整它们非常重要。
鉴别器和生成器 (The discriminator and the generator)
We use tf.variable_scope for two reasons.
我们使用tf.variable_scope有两个原因。
Initially, we require that all variable names begin with generator or discriminator. This will be helpful in training the two networks later.
在本研究中我们希望所有变量名称都以generator/discriminator命名,并将这一设计原则应用于生成器和判别器的训练过程中
Second, we want to reuse these networks with different inputs:
其次,我们想以不同的输入重用这些网络:
To train the generator, we will be training it not only for its primary function but also for generating fake images afterward.
对于生成器:我们将对其进行训练,但还要在训练后从中采样伪造的图像。
The discriminator requires that variables be shared between the fake and real input images.
对于鉴别器:我们需要在假输入图像和真实输入图像之间共享变量。
Let us now proceed to construct the discriminator. It is important to note that this component accepts either real-world images or synthetic ones and generates corresponding scores.
为了便于后续分析,请我们尽快搭建一个鉴别器。请注意,该系统将基于真实或模拟图像进行评估,并生成评估分数。
为了便于后续分析,请我们尽快搭建一个鉴別器。請注意,該系統將基於真實或模擬圖像進行評估,并生成评估分数。
Some technical remarks:
一些技术说明:
- The principle is to**** double the filter size at each convolution layer.
原理是 **** 在每个卷积层将滤镜大小加倍。
It does not suggest the use of downsampling. Rather than using downsampling, we exclusively employ strided convolutions.
不建议使用下采样。 相反,我们仅使用跨步卷积层。
Implement batch normalization in every subsequent layer (excluding the input layer) since it minimizes the covariance shift. For more information, check this great article.
大多数神经网络架构中,在各隐藏层中应用批处理归一化通常会带来显著的优势。其主要优势在于能够有效降低协方差偏移这一现象,在训练深度神经网络的过程中发挥着重要作用。
We adopt Leaky ReLU as an activation function, as a result of its capability to prevent the vanishing gradient effect.
我们利用Leaky ReLU作为激活功能,因为它有助于避免消失的梯度效应。
Then, we built the generator. Please note that it synthesizes a fake image by taking a random noise vector (z) as input, owing to the transposed convolution layers.
随后, 我们创建了一个生成器。 请注意, 在这个生成器中会使用随机噪声矢量(z)作为输入, 并会通过转置卷积层来生成伪造图像
The concept is that as we progress through each layer, we reduce the filter size by half in each layer, and increase the image size by a factor of two as we proceed to the next layer.
想法是在每一层我们将滤镜大小减半,并将图片大小加倍。
The generator has achieved optimal performance when implemented with the tanh activation function as the output layer.
使用tanh作为输出激活功能时,已发现该发生器的性能最佳。
鉴别器和发电机损耗 (Discriminator and generator losses)
Since we train both the generator and discriminator simultaneously, we must calculate losses between the two networks.
因为我们同时训练生成器和鉴别器,所以我们需要计算 两个 网络的损耗**。**
We desire the differentiator to assign a score of 1 when it believes an image is authentic, while assigning a score of 0 to fake images. Consequently, establishing appropriate loss functions becomes necessary.
我们旨在使鉴别器在识别图像为真实时给出标记1,在识别为伪图像时标记为0。 为此,我们需要设计一个损失函数来实现这一目标。
The discriminator loss is the sum of loss for real and fake images:
鉴别符损失是真实和伪造图像损失的总和:
d_loss = d_loss_real + d_loss_fake
d_loss_real represents the loss experienced by the discriminator when it classifies an actual image as fake but mistakenly. The loss value is determined through a specific calculation involving the predicted probabilities of the discriminator for real images.
判别器在判断输入图片为假(实际上为真)时所产生的一种损失。
计算公式如下:
- Use
d_logits_realand labels**** are all 1 (since all real data is real)
使用d_logits_real和标签 **** 都是1(因为所有真实数据都是真实的)
我们采用标签平滑化:这表示我们将标签从1.0稍微减少到0.9以帮助判别器更好地泛化。代码实现如下:labels = tf.ones_like(tensor) * (1 - smooth)其中smooth变量决定了我们将标签减少到0的程度。这一参数在训练过程中平衡了真实数据和生成数据的表现。
通过计算生成平滑标签...有助于提高分类器的判别能力。
The d_loss_fake represents the loss incurred by the discriminator when it predicts that a generated image is real, despite it actually being a fake image.
d_loss_fake是鉴别者预测图像是真实的,而实际上是假图像时的损失。
- Use
d_logits_fakeand labels**** are all 0.
使用d_logits_fake和标签 **** 都是0。
The generator network's loss function again outputs the d_logits_fake from the discriminator. In this case, all labels are set to 1, as the generator aims to deceive the discriminator.
生成器损耗第二次调用鉴别器计算其判别结果存储于变量 d\_logits\_fake 中。 此次标签全部标记为1是为了使生成器能够欺骗鉴别的模型。
优化器 (Optimizers)
Upon calculating the losses, we require updating the generator and discriminator individually.
计算损失后,我们需要分别更新发生器和鉴别器。
This is accomplished by retrieving the variables for each component using tf.trainable_variables(). The retrieval of variables for each component results in a comprehensive list encompassing all defined variables within our graph structure.
为了获取每个部件的变量,请调用tf.trainable_variables()函数来获取每个部件的变量。这将生成一个包含了图形中所有定义变量的列表。
训练 (Training)
Here, we’re implementing the training function.
在这里,我们正在实现训练功能。
The idea is relatively simple:
这个想法相对简单:
- We’re saving the model each five epochs.
我们每五个时期保存一次模型。
- We’re saving a picture in images folder each ten batches trained.
每训练十批,我们就会将图片保存在images文件夹中。
We are showing the g_loss , d_loss and the image generated every 15 epochs. This is because Jupyter notebook may cause issues if too many pictures are displayed.
我们正致力于呈现g_loss , d_loss以及每隔15个时代生成的图像。主要原因在于,若呈现大量图像可能会导致Jupyter Notebook出现错误.
This approach allows for the direct generation of realistic image data by accessing the pre-trained model, which will spare you 20 hours of training time.
通过以下方式:调用已保存的模型文件进行图像重建操作(这将显著减少20小时的专业培训时间)。
如何运行 (How to run it)
Unless you already have the necessary GPUs, running this on a typical personal computer will require you to wait up to ten years.
您不可以在普通电脑上执行该程序;如果您拥有自己的图形处理器(GPU),或者您愿意等大约十年才能看到结果。
Instead, you must use cloud GPU services, such as AWS or FloydHub.
相反,您必须使用云GPU服务,例如AWS或FloydHub。
Personally speaking, I spent 20 hours training the DCGAN model using the Azure platform along with its Deep Learning Virtual Machine to achieve my machine learning goals.
个人使用Microsoft Azure中的深度学习虚拟机(DLVM),访问其https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.dsvm-deep-learning并对其DCGAN进行了20小时的系统性训练。
Disclaimer: There are no existing business ties with Azure. I was particularly impressed by their exceptional customer service.
免责声明:本人与Azure之间不存在任何商业联系。 此类服务表示高度认可,并认为其服务质量无可挑剔。
Encountering issues when trying to run it on a virtual machine instance, consider following this suggested reading this excellent article here.
如果您的虚拟机无法启动该模型,请访问链接此处并关注本文。
That’s all, I hope that this tutorial has been helpful!
就这样,我希望本教程对您有所帮助!
If you’ve improved the model, don’t hesitate to make a pull request.
如果您改进了模型,请随时提出请求。
If you have any thoughts, comments, or want to share your results with me, please feel free to leave a comment below. You are welcome to contact me via email at hello@simoninithomas.com. Follow my Twitter account @ThomasSimonini for updates.
如您有任何想法、建议或其他想法,请在下方发表评论,并通过邮件联系 contact: hello@simoninithomas.com 或关注我的微博 @ThomasSimonini。
If you enjoyed my article, I kindly ask that you click the star icon below so that others may also see this post on Medium. I urge you not to forget to follow me.
如果您喜欢我的文章,请期待您单击“?”下方,并请您单击“?”以便其他人可以在Medium上看到。 也别忘了关注我哦!
Cheers!
干杯!
翻译自: https://www.freecodecamp.org/news/how-ai-can-learn-to-generate-pictures-of-cats-ba692cb6eae4/
ai生成图片是什么技术
