Advertisement

[深度学习论文笔记][Image Reconstruction] Understanding Deep Image Representations by Inverting Them

阅读量:

Mahendran et al. explored the understanding of deep image representations through inversion at the 2015 IEEE conference on computer vision and pattern recognition.

(CVPR). IEEE, 2015. (Citations: 142).

The representation of an image within an intermediate layer serves as a foundation for its reconstruction. What extent does this representation enable the reconstruction of the original image?

It is important to note that the representations are affected by nuisance variations within image data, such as illumination changes and varying viewpoints. Consequently, the representation cannot be uniquely inverted.

2 Idea

From randomly generated noise samples, search for an image that most accurately captures the features of the provided one.

In this context, A denotes the given representation, while  represents the corresponding element within image X. (X) acts as a regularizer.

capturing a natural image prior.

Note after optimization, training set mean image should be added on X.

3 Regularisers

Discriminatively-trained representations could tend to discard a notable portion of low-level image statistics, which are typically not particularly relevant or engaging for high-level tasks.

3.1 p-norm

Selecting a sufficiently high exponent value (where p = 6 has proven effective in experimental settings) helps ensure that the image's range remains within a predefined target interval rather than diverging.

3.2 Total Variation (TV) The total variation (TV) penalizes the presence of edges in an image, effectively encouraging it to be composed of piecewise-constant regions.

In cases where a CNN employs a max pooling layer, a value of β equal to 1 induces reconstruction spikes due to the total variation (TV) measuring the overall intensity change. Such a scenario leads to an increase in artifacts. Conversely, selecting a β value less than 1 exacerbates this problem. On the other hand, choosing β greater than 1 eliminates such artifacts, though it may result in loss of detail because edge preservation becomes prioritized over other aspects.

4 Balancing the Different Terms

Divided the loss by ||A||^2 F will let it in the region of 0, 1).

5 Result
See Fig. All convolutional layers preserve a high-fidelity representation of the image, though with gradually decreasing detail. The fully connected (FC) layers reconstruct a composite of segments that are similar yet not identical to those in the original image. These FC layers lose spatial positioning information about objects but maintain some discriminative features essential for recognition.

全部评论 (0)

还没有任何评论哟~