Advertisement

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

阅读量:

Google Research

Paper name aimed two point: 1. Synthetic Depth-of-Field

2. Single-Camera Mobile Phone

Key Words:

When the depth of field is low or shallow, it results in both the image's foreground and background becoming blurred. Consequently, only a narrow region within the image remains in sharp focus.

dual-pixel:

Dual pixel technology effectively separates each pixel into two distinct imaging regions. Each pixel includes two photodiodes, which are arranged in close proximity to one another, situated beneath a micro lens.

alpha matte:

A matte is a layer (or any of its channels) that specifies the transparent areas of that layer or another layer.

Bayer plane:

Background:

cell phone's carema is all-in-focus images.

AIM:

The system is introduced as an innovative tool for generating images with shallow depth-of-field effects. It is capable of computationally generating these images using both an affordable, portable mobile camera and an intuitive, user-friendly interface that requires only one button to operate.

Result:

Our system is capable of processing a 5.4-megapixel image within just four seconds on a mobile device. It operates entirely without user intervention and offers exceptional reliability, making it accessible to users without specialized knowledge.

1. Introduction

Some methods: a) two cameras

b) time-of-flight or structured-light direct depth sensor

c) Lens Blur

Our method:

Our system integrates two distinct technologies in such a manner that it can operate effectively using just one. A neural network is employed to separately identify individuals along with their belongings. In addition, when such sensors are available, we utilize those equipped with dual-pixel (DP) auto-focus technology to capture an array of light fields within an ultra-narrow 1-millimeter baseline.

The first :

The second:

这种快门技术专为在高动态范围和低光环境下进行成像而设计,在移动相机设备上应用广泛。

Robert Anderson、David Gallup、Jonathan T Barron、Janne Kontkanen、Noah Snavely、Carlos Hernández、Sameer Agarwal和Steven M Seitz于2016年发表论文《跳跃:虚拟现实视频》于SIGGRAPH Asia会议(第20届)

J.T. Barron, Andrew Adams, Yi-Chang Shih, and Carlos Hernández (2015). Fast bilateral-space stereo applied to synthetic defocus.

This paper by Johannes Kopf and his co-authors Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele presents a method called joint bilateral upsampling in 2007.

We present a calibration procedure.

summary:

Our rendering technique segments the scene into multiple layers based on varying depths, converting pixels into translucent disks according to depth values and subsequently blending these layers with weights corresponding to their respective depth values.

Another question:

A typical mobile camera's wide-angle view is poorly suited for portraits. It results in photographers positioning themselves close to subjects, which distorts the faces' perspectives.

2. Related work

--Carlos HERNANDEZ, 2014, "Lenz Schärfe in der neuen Google-Kamera-Application," verffentlicht im Blog von Google Research: http://research.googleblog.com/2014/04/lens-blur-in-new-google-camera-app.html.

---etc.

3. Person segmentation

Our contributions consist of: (a) employing training and data collection methods to develop a fast and precise segmentation model that can be deployed on a mobile device, and (b) implementing edge-aware filtering to enhance the upsampling of masks generated by neural networks.

3.1 Data Collection

selecting a comprehensive range of poses, discarding images that are deemed inadequate for training, correcting inaccurate polygon masks, and so on.

Each time we made an enhancement during a 9-month training period, we noticed that the quality of our defocused portraits increased in kind.

3.2 Training

The network accepts a four-channel image with dimensions of 256 × 256 pixels. Three channels correspond to an RGB image that has been resized and padded while maintaining its aspect ratio. The fourth channel represents the face's location through a posterior map of an isotropic Gaussian distribution centered around the face detection box, which has a standard deviation of 21 pixels and is normalized to one at its mean position.

3.3 Inference

During inference phase, we receive a color video feed and detected facial landmarks identified by a detector system. The model predicts its segmentation mask, which corresponds to its facial area within each frame of the video stream.

3.4 Edge-Aware Filtering of a Segmentation Mask

Previous research has shown that mask boundaries frequently align with image edges. Therefore, in our approach, we employ a similar strategy.

an edge-aware filtering strategy to increase the resolution of the low-resolution mask M(x) predicted by a neural network.

3.5 Accuracy and Efficiency

The performance of the model is 3.07 GigaFLOPS, which is significantly higher than 607 in PortraitFCN+ and 3160 in Mask-RCNN when measured using the Tensorflow Model Benchmark Tool.

4. Depth from dual-pixel camera

Dual-pixel (DP) auto-focus systems operate through dividing each pixel into two equal parts. By means of which, each pixel's left portion integrates light across the corresponding right portion within the aperture, while simultaneously, this process is reciprocated on both halves.

This system is typically employed for autofocus, where it is often referred to as phase-detection auto-focus.

Some techniques can compute detph but need two more than two views.

EXP:

--- Edward H. Adelson and John Yehuda Wang. 1992. Their research introduced an innovative single-lens stereo camera system utilizing a plenoptic camera, significantly advancing imaging techniques in computer vision and related fields.

TPAMI (1992)

We build upon the stereo work of Barron et al.

--- Jonathan T Barron, Andrew S Adams, Yi-Chang Shih, and Carlos Hernández. Year: 2015. Rapid bilateral-space stereo for synthetic depth-of-field. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

We thereby build upon the stereographic workflow established by Barron et al. [2015] and the edge-aware development from Anderson et al. [2016] to create a stereo algorithm which is both tractable at high resolutions and specifically suited for defocus tasks through its ability to adhere to image input edges.

4.1 Computing Disparity

We maintain a circular buffer of nine raw and DP frame sets acquired through our camera system for use in denoising.

Compute the disparity by examining each non-overlapping 8×8 tile from the first view and searching within a range of −3 to +3 pixels in the corresponding location of the second view at DP resolution.

Several heuristic criteria: SSD loss magnitude, horizontal gradient intensity within each tile, evidence for a near secondary minimum, and disparity measure correlation across adjacent tiles.

4.2 Imaging Model and Calibration

The equation presents two significant outcomes. First, disparity is influenced by the focus distance z, amounting to zero when the depth D equals the focus distance (D = z). Second, a linear relationship exists between the inverse of depth and disparity, with its parameters remaining constant across space.

4.3 Combining Disparity and Segmentation

4.4 Edge-Aware Filtering of Disparity

We employ the two-way filter [Barron and Poole, 2016] to convert the noisy disparities into a smooth edge-aware disparity map, which is ideal for shallow depth-of-field rendering.

5 RENDERING

5.1 Precomputing the blur parameters

5.2 Applying the blur

One obvious solution is to simply reexpress the scatter as a gather.

5.3 Production the final image

Final image with synthetic noise

6 RESULTS

Three pipelines:

1. DP + Segmentation

2. DP only

3. Segmentation only

7 DISCUSSION AND FUTURE WORK

全部评论 (0)

还没有任何评论哟~