Computer Vision 计算机视觉常见问题整理
包含有整理出来的学习知识点的相关问题集资料集合,在复习巩固相关知识方面可能会有所帮助等用途。这份笔记因为制作时间较早,在目前阶段尚未进行进一步修改或更新。
CV Question
- what’s machine vision?
Machine vision (MV) is the technologies and methods aimed to offer imaging-based automatic inspection and analysis for such as automatic inspection, process control, and industrial robot guidance.
Input: image, video, Output: inspection and analysis
Goal: give computers super human-level perception
- Typical perception channel
Through the lens of 'fancy math', we transition from one state to another. The focus of our analysis lies on these two aspects.
- Common Applications
Automated Visual Inspection (AVI), Object Recognition (OR), Face Detection (FD), Face Makeovers (FM), Vision in Cars (VIC). Image Stitching (IS), Virtual Fitting (VF), VR, Kinect Fusion (KF), 3D Reconstruction (DR)
- Subject connection
Image processing: Digital image processing employs digital computers to analyze and generate digital images through algorithms. As a specialized field within digital signal processing, digital image processing offers numerous benefits compared to analog image processing.
Computer Graphics: Computer graphics acts as a specialized discipline that utilizes computers to generate images.
Pattern Recognition: Pattern identification refers to the automatic detection of repeating patterns and consistent trends within data.
Visual Computing: Visual Computing is considered an interdisciplinary scientific field that focuses on how computers are able to achieve advanced understanding from multimedia sources.
Difference between Computer Vision and Machine Vision: Computer vision focuses on the automation of image capture and processing, emphasizing image analysis. To elaborate, computer vision aims not just at observation (seeing), but also at processing such observations to derive meaningful outcomes. Machine vision, on the other hand, involves applying computer vision techniques within industrial settings, thereby constituting a specialized branch within computer vision.
Artificial intelligence: Within computer science, AI research is defined as the study of intelligent agents—any system functioning within an environment with the aim of perceiving it and acting upon it in ways that maximize its potential for successful goal achievement.[1] A more detailed explanation characterizes AI as a system's capacity to process external information accurately, learn from such data, and utilize those insights to attain specific objectives and perform tasks through adaptive mechanisms.
- Vision Process
-
特征提取与区域分割。(低)
-
模型构建与模式表示。(中等)
-
描述与理解。(高)
- Difficulties faced by Machine Vision
-
Image ambiguity: When 3D sense is rendered as a two-dimensional image, both depth and non-observable features are not captured. Consequently, 3D objects with distinct shapes projected onto the image plane can yield identical visual representations.
-
Environment Factors: The parameters within the environment, such as lighting conditions, object shapes, surface colors, camera specifications, and spatial relationship evolutions, all influence the resultant visual information.
-
Knowledge guidance: Depending on varying knowledge contexts, identical imagery can lead to divergent analytical outcomes.
-
Large amounts of data: Gray-scale, color, and depth imagery each contain vast informational content. This substantial data volume necessitates significant storage requirements and complicates rapid processing tasks.
- Human Vision System
Physical structure: The Human Visual System (HVS) is built from optical components, the retina, and the visual processing pathway.
I refrain from acquiring HVS initial knowledge and instead avoid learning it initially. If I have more time available, the leftover HVS knowledge will be covered.
I refrain from acquiring HVS initial knowledge and instead avoid learning it initially. If I have more time available, the leftover HVS knowledge will be covered.
Key tech in Computer Vision System
-
Image processing (Noise reduction through smoothing techniques, Normalization procedures, Missing value and outlier value handling)
-
Feature extraction from images (Extraction of shape features, Texture characteristics, Color attributes, and Spatial relationship analysis)
-
Image recognition technology (Models such as GoogleNet and ResNet)
Image formation
The stochastic nature of the imaging process and its intricate complexity are determined by the nature of the result with a stochastic output.
An image bascially consists of:
- Illumination component i(x, y)
- Reflection component r(x, y)
So, The 2D function representation of the Image:
f(x, y) = i(x, y) * r(x, y)
- Human eye brightness perception range
The total dynamic range (TDR) of this system spans from 1×10⁻² to 1×10⁶, resulting in a contrast ratio c defined as c = B_max / B_min, which equals approximately 1×10⁸. The relative contrast percentage, denoted as c_r, is calculated using the formula c_r = (B - B₀) × 100% / B₀, where B₀ represents the background brightness level and B corresponds to the object's brightness level.
Correlation between subjective brightness denoted by S and actual brightness denoted by B:
S = K \ln{B} + K_0
- Brightness adaptability
Visually sensitive is contrast, not the brightness value itself.
Weber theorem :
If the brightness of an object differs from the surrounding background I (their ratio is a function). It is approximately constant within a certain range of brightness, with a constant value of 0.02, which is called the Weber ratio.
\frac{\Delta I}{I} = 0.02
Mach Effect : The visual system is less sensitive to spatial high and low frequencies, while it is more sensitive to spatial intermediate frequencies.Therefore, a brightness overshoot occurs at a sudden change in brightness. This overshoot can enhance the outline of the scene seen by the human eye.
- Color imaging model
Light energy's inherent property is the absence of color. Color is a physiological and psychological phenomenon perceived as such by human vision.
Lightwave : Light represents an electromagnetic wave that emits energy in accordance with its wavelength.
The Young–Helmholtz theory (trichromatic theory): The three types of cone photoreceptors can be categorized into short-wavelength preferring (violet), medium-wavelength preferring (green), and long-wavelength preferring (red) cone photoreceptors.
- Color property
Hue : 作为被测刺激与其自身被描述为红色、绿色、蓝色及黄色的其他刺激之间被区分开的程度进行衡量。
Saturation represents the degree to which the colorfulness of an area is determined by its brightness, calculated as the ratio of colorfulness to brightness.
Intensity : Defines the level of shade experienced by the human eye in response to color stimuli sources.
Grassman Laws :
|
第一定律 : 如果两种颜色在主波长、亮度或纯度上存在差异,则会呈现不同的颜色;其推论为:对于任何一种颜色都存在一种互补颜色,在与该互补颜色混合时会使得其中较亮的那种颜色分色被降色直至消失或者得到无色(灰色/白色)光线。 第二定律 : 当两种成分中的任意一种发生变化时, 混合光的颜色会发生改变;其推论为: 当两种非互补颜色的光线混合在一起时, 所得结果会呈现出根据各自色调的比例而变化的色调, 并呈现出根据两色调之间距离而变化的程度的饱和状态。 --- --- 第三定律 : 存在具有不同光谱能量分布但视觉效果相同的光源;其第一种情况: 这类视觉效果相同的光源在与任何其他光源混合时所产生的效果必须完全相同;第二种情况: 这类视觉效果相同的光源在被从任何其他光源中减去(即滤除)时所产生的效果也必须完全相同。 --- --- 第四定律 : 混合光的强度等于各分量光源强度之和。
- Color
the outcome of interaction between the light from the environment and our visual system
- Color Space
- Linear color space
- RGB color space
- HSV color space
- CIE XYZ
- White Balance
White balance (WB) represents the method of eliminating colorimperfectness, ensuring that objects appearing white in reality are rendered white in the image.
Color temperature characterizes the distribution of light energy radiated by a blackbody at a given surface temperature.
Von Kries adaptation :
- 对于每个信道应用一个缩放因子
- 更为普遍的变换形式将由任意3\times 3矩阵进行表示
Best way: gray card :
- Take a picture of a neutral object
- Deduce the weight of each channel
Brightest pixel assumption (non-staurated)
- Typically, high points share the same color as that of the light source.
- Weighting factors should be assigned in an inverse proportion to the brightness levels of the brightest pixels.
Gamutmapping
-
Gamut: The color space encompassing all possible pixel colors in an image
- Determine a transformation that maps the gamut of this image onto that of a standard reference image under white light.
- Mathematical representation of an image
The optical power at wavelength 𝜆 is absorbed by the camera's imaging target surface: I = f(x, y, λ, t). Typical image formats include:
- Binary image
- Grayscale image
- Index image
- RGB image
- Common concepts
pixel neighborhood: 4-neighborhood(N_{4}(p)), 8-neighborhood(N_{8}(p));
pixel adjacency ===> pixel connectivity;
Template(filter, mask) + convolution ===> filtering, smoothing, sharpening;
Convolution operation properties:
- Smoothness: Ensure the fine structure of each function is smooth
- Diffusivity: Expansion of the interval, Energy distribution undergoes diffusion process
Application of convolution:
- Deconvolution
- Remove noise
- Feature enhancement
- Pixels distance
Distance measurement function characteristics:
- D(p,q)>0\text{[positivity]} 或分离公理
- D(p,q)=0\iff p=q\text{[identity of indistinguishability]}
- 满足交换律\text{[symmetry]}
- D(p,r)\leq D(p,q)+D(q,r)\text{[subadditivity] 或三角不等式}
Common distance metric functions:
- 欧几里得距离:D_{E}(p, q) = \sqrt{(x - s)^2 + (y - t)^2}
- 曼哈顿距离:D_{4}(p, q) = |x - s| + |y - t|
- 棋盘距离:D(p, q) = \max(|x - s|, |y - t|)
p-norm: \norm{x}_p = (\sum_i{|x_i|p}){\frac{1}{p}}
Frobenius-norm:
KaTeX parse error:
Undefined control sequence:
\norm at position 1:
\no norm{A}_F = ...
Within an image context, the L² norm constraint is unable to differentiate between edge tangent directions and gradient directions within that same image. Additionally, it fails to account for or distinguish between texture regions and flat regions within an image.
The edge details near the borders may lose sharpness due to blurring effects inherent in this restoration method.
The L-1 norm constraint spreads solely in the tangential directions of edges, being restricted to these orientations. The aim is to maintain the proximity of image edges. This will result in subpar noise suppression and artifacts resembling stepped patches.
- Statistical characteristics of images
Information entropy: H = -\sum_{i=1}^k p_i \log_2{p_i}
Gray平均值:\bar{f} = \frac{\sum_{i=0}^{M-1}\sum_{j=0}^{N-1}f(i,j)}{MN},
该公式反映了图像中不同区域的平均反射强度
Gray mode
Median grayscale
Gray variance: S = \frac{\sum_{i=0}^{M-1}\sum_{j=0}^{N-1}[f(i, j) - \bar{f}]^2}{MN}
Grayscale range: f_{range}(i, j) = f_{max}(i, j) - f_{min}(i, j)
协方差度量用于计算两个M×N图像之间的相关性。 协方差矩阵中的元素S_{gf}等于S_{fg} ,其计算公式为:协方差矩阵中的元素S_{gf}等于S_{fg} ,其计算公式为:其中求和范围是从i=0到M−1和j=0到N−1的所有像素点。
Correlation coefficient: r_{fg} = \frac{S^2_{fg}}{S_{ff}S_{gg}}
Histogram represents a mathematical function that quantifies pixel counts for each shade of gray in an image and indicates their frequency. It possesses additivity.
Integral optical density represents the combined effect of image area and density. Integral optical density is defined as IOD = \int_{0}^{max(x)} \int_{0}^{max(y)} D(x, y) dx dy. For a target with a threshold area defined as M, the integral mean gray level within its boundaries is given by MGL = \frac{IOD(M)}{A(M)} = \frac{\int_{M}^{\infty}{DH(D)dD}}{\int_{M}^{\infty} H(D)dD}
- Method for converting color image into grayscale image
- Weighted average method
- Average method
- Maximum method
- Image enhancement
Enhance or refine specific characteristics of an image such as edge detection results、contour enhancement techniques、and contrast enhancement methods to ensure clear、detailed display for observation purposes and to enable detailed examination and subsequent processing steps.
- Basic operations on digital images
-
point operations
- algebra operations: 去除叠加的噪声并生成图像叠加效果
- logical operations
- geometric transformations
- Interpolation
-
The nearest-neighbor approach is a straightforward spatial interpolation technique.
- The bilinear technique is a widely used method for image scaling.
- The cubic algorithm provides smooth and accurate results in various numerical computations.
- Image noise
Image noise refers to stochastic fluctuations in the tone and hue details of images, which is typically categorized as a component of electronic noise.
Influences: 1. Results in a blurred image, 2. Overpowering visual elements, 3. Challenges encountered during image analysis
Features:
- Non-Gaussian statistical properties and magnitude with randomness
- Relationship between the presence of noise and image quality
- The additive nature of noise
Noise classification:
-
加性和乘性的噪声
-
增性噪声通常指的是热噪声与闪烁噪声。
-
乘性噪声通常由信道不完美引起。乘性的随机被视为由于系统时变性质(如衰落或多普勒效应)或非线性的结果。
- External noise and internal noise
- Stationary and non-stationary noise
Image noise model:
Gaussian noise
The probability density function p for a Gaussian random variable z is described by the following equation:
p_G(z) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(z - u)^2}{2\sigma^2}}
Salt-and-pepper noise(Impulse noise)
瑞利分布的噪声、埃尔朗分布的噪声、指数型分布噪声以及均匀分布的噪声等
Common noise removal methods:
- Image enhancement techniques ( enhance the visual clarity of images )
- Image restoration and recovery processes ( mitigating the impact of noise interference )
Classification of image denoising algorithms:
-
空间域滤波技术
-
变换域滤波方法
-
形态学去噪滤波器(开闭运算可用于去除图像中的噪声)
-
偏微分方程方法(偏微分方程具有各向异性特性,在图像去噪过程中能够有效保留边缘特征并消除细节中的噪声干扰)
-
变分模型(通过建立图像的能量函数模型,并找到使能量函数达到极小值的最优解来实现图像的平滑化处理)
- Image filtering
Image filtering emphasizes the spatial information within an image, reducing or eliminating noise and irrelevant data, serving as a technique for image correction or enhancement. The fundamental aspect of image filtering is the implementation of a neighborhood operation.
Classification:
-
空间域滤波(通过窗口或卷积核实现的局部区域限定)
- 频域滤波(对傅里叶变换后的频谱图像进行处理,在较大空间范围内或去除周期性噪声)
- Spatial filter
-
Gaussian spatial filtering algorithm ———— smoothing and reducing noise
- Mean filtering algorithm ———— average value calculation
- Median filtering algorithm ———— calculating the median value
-
锐化空间滤波器 ———— 增强图像细节或提升模糊细节
-
Laplace operator
-
Sobel operator
Definition of linear filters:
Linear filters analyze time-varying input signals such as audio or video streams to generate corresponding output signals, while adhering to the requirement of linearity, as defined by linearity.
Primary linear spatial filter:
- Low-pass filters: Effectively smooth images while removing noise
- High-pass filters: Perform edge enhancement and edge extraction
- Band-pass filters: Capable of removing specific frequencies while being less commonly employed in image enhancement
Primary non-linear spatial filter:
- Median filtering is a technique commonly employed to make images appear smooth by effectively eliminating noise. This method works by reducing the impact of random variations in pixel values, ensuring that the resulting image maintains clarity and detail without unwanted disturbances.
- Maximum filtering is a technique designed to identify areas of maximum intensity within an image. By focusing on bright points, this method helps in detecting significant features or points of interest in visual data.
- Minimum filtering, on the other hand, is utilized to pinpoint areas of minimal intensity. This technique is particularly useful for identifying dark spots or anomalies in an image.
Main purpose of the smoothing filter:
-
Discard insignificant minor details before initiating large image processing tasks
-
Merge fragmented lines and curves to form coherent shapes
-
Eliminate noise interference through advanced filtering techniques
-
Enhance image sharpness by smoothing over-sharpened areas
-
Image synthesis involves creating images with shading effects, smooth edges, and hazy textures.
- Smoothing Filter —— Mean Filter
The neighborhood averaging method assumes that the image exhibits strong spatial correlation between adjacent pixels and has relatively independent noise components.
Advantage:
- Standard linear denoising approach
- Straightforward and efficient, capable of effectively eliminating Gaussian noise.
Disadvantage:
Apply a blur effect to the image to reduce noise interference, particularly at edges and fine details. The extent of blurring increases with a larger neighborhood size, thereby enhancing overall denoising capabilities.
Improvement:
-
Solving for limitations in basic local averaging techniques *
-
Central issues revolve around how neighborhoods are determined based on their size, form, and orientation; how many data points are included when computing averages; and assigning weights to each data point within a neighborhood. *
- Smoothing Filter —— Order Statistical Filter
Main characteristics:
-
Invariance property to specific input signals: exhibits steadily increasing/decreasing and repeating patterns.
-
Denoising capability
-
Nonlinear nature of the spectrum: characterized by absence of a direct correspondence.
-
Excellent in reducing impulse noise; preserves image details effectively.
- Differential filter —— first order differential
水平方向:涉及数学公式的 KaTeX 错误环境配置问题;垂直方向:涉及数学公式的 KaTeX 错误环境配置问题
Post-processing:
Increment all pixel values by an entire positive integer to guarantee that they remain positive. Convert all pixel values into their absolute forms.
- Undirected first order sharpening
Cross Differential Algorithm (Roberts Algorithm):
g(i, j) = |f(i + 1, j + 1) - f(i, j)| + |f(i+1, j) - f(i, j+1)|
Sobel sharpening:
KaTeX parse error: No such environment: equation at position 55: … j)} \ \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ d_x = \left[\b…
Prewitt operator has been widely used for its excellent performance in edge detection.
Three guidelines:
- 低误差率:边缘检测器应仅对边缘做出响应并检测出所有边缘;而对非边缘则应予以排除。
- 边缘定位精度:边缘检测器找到的边与真实边之间的距离应尽可能小。
- 单侧反应:当存在单侧情况时;测试结果不应呈现多侧性。
Canny edge detection algorithm:
-
step1: 对图像应用高斯滤波器以实现平滑效果
-
step2: 使用一阶偏导数的有限差分方法计算梯度的大小和方向
-
step3: 对梯度幅度进行非最大值抑制处理
- 首先判断像素C在其8邻域内的灰度值是否为最大值
- 如果发现像素C的灰度值低于这两个相邻像素中的任何一个,则表明像素C不是局部最大值点,并因此被排除为边缘点
- 非最大值抑制处理完成后将生成一个二进制图像,在此图中非边缘点的灰度值均为0
-
步骤4:采用双阈值算法进行边缘检测与连接
-
双阈值算法实现:
- 在非最大值抑制图像上设定两个阈值th1和th2,并规定两者的比例关系为th_1 = \frac{2}{5} \times th_2
- 将梯度值小于th_1的所有像素灰度设为零以生成图像一
- 将梯度值小于th_2的所有像素灰度设为零以生成图像二
- 在图像二的基础上结合图像一以整合图像中的边缘线
-
connect edges
-
Step1: Scan image 2. Upon encountering a non-zero grayscale pixel p(x,y), trace along its perimeter beginning at p(x,y) until reaching its endpoint q(x,y).
-
Step2: Take into account that each edge point q(x,y)'s position in image 2 corresponds to an adjacent area in image 1. If there exists a non-zero pixel s(x,y) within these neighboring areas for any such edge point q(x,y), include it as r(x,y). From each identified r(x,y), repeat step1.
-
Step3: Once all pixels along this perimeter have been processed and marked as visited. Repeat steps1-3 to find additional perimeters.
-
Step4: The process continues by iterating through steps1-3 until no new perimeters are detected.
- Differential filter —— second order differential
Laplace sharpening operator:
拉普拉斯运算符既不是向量也不是矢量,在各个方向上具有相同的特性,并且具有线性特性以及恒定旋转特性;同时它还表现出各向同性的性质,在图像处理领域有着广泛的应用。
\frac{\partial^2 f}{\partial x^2} = f(i+1, j) - 2 f(i, j) + f(i-1, j)
\nabla^2f = \frac{\partial^2f}{\partial x^2} + \frac{\partial^2f}{\partial y^2}
常见的拉普拉斯运算符和Quallichain模板:
各向同性滤波器其响应与滤波后图像中不连续方向无关。
Effect analysis on edge sharpening was conducted utilizing multiple template patterns. KaTeX parse error: No such environment: equation at position 8: \begin{̲e̲q̲u̲a̲t̰ḭo̰n̯} H_1 = \left[\begin{array}{c}...\end{array}\right]
The Laplacian operator exhibits greater sensitivity to noise, and its response to certain edges in an image is twice as much.
Improvement strategy:
-
Based on human visual characteristics, this method aims to improve edge detection technology *
-
Typically, an image undergoes initial smoothing before generating a new template; in the context of edge detection algorithms, combining the Laplacian operator with a smoothing filter enhances the accuracy of the system *
- Laplacian of Gauss(LoG)
首先利用高斯函数对图像进行平滑处理,并随后应用拉普拉斯算子构建拉普拉斯-高斯算法。
该算法的核心思路及其步骤如下:
LoG(x, y) = \left( \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} \right) \frac{1}{2 \pi \sigma^2} e^{ - \frac{(x^2 + y^2)}{2 \sigma^2} } = \frac{-1}{2 \pi \sigma^4} (2 - (\frac{x^2 + y^2}{\sigma^2})) e^{ - \frac{(x^2 + y^2)}{2\sigma^2}}
该算法的核心思路及其步骤如下:
-
Filtering: 首先对图像f(x,y)进行平滑处理。其滤波函数根据人眼视觉特性选择高斯函数。
-
Enhancement: 拉普拉斯运算在平滑后的图像g(x,y)上进行。
-
Detection: 边缘检测准则定义为二阶导数为零的零交叉点(h(x,y)=0),这对应于一阶导数的最大值点。
- Wallis algorithm
Given that human visual characteristics incorporate a logarithmic link, the application of logarithmic processing techniques enhances the sharpening process:
g(i, j) = \log{f(i, j)} - \frac{1}{4}S , \
S = \log{f(i-1, j)} + \log{f(i+1, j)} + \log{f(i, j-1)} + \log{f(i, j+1)}
\
Note: * To avoid taking the logarithm of zero, $\log(f(i,j)+1)$ is effectively utilized in calculations. * Due to its relatively small magnitude, $\log(256)=5.45$, and thus $46 \times \log(f(i,j)+1)$ is appropriately employed in calculations. 35. Comparison of Sobel and Laplacian algorithms The boundary achieved through the Sobel operator is not very smooth, containing less detailed information about the edges, whereas the reflected boundary is more clear. The boundary determined by the Laplacian operator is highly detailed. The reflected boundary information contains a significant amount of detailed data, yet it lacks clarity. 36. Comparison of spatial domain sharpening methods Due to being first-order differential operators, Prewitt and Sobel operators exhibit a certain degree of attenuation effect towards noise. The latter demonstrates greater sensitivity to gradient changes compared to the former, thereby achieving better performance in terms of detection capability. The Laplacian operator consists of a second-order derivative operator that is highly responsive to changes in the gradient field and disturbances or interference. 37. Frequency domain enhancement principle What is the frequency domain: Space defined by frequency variables (u, v). A function f(t) with a defined period T complies with the Dirichlet conditions within the interval $[-T/2, T/2]$, thereby being representable as a Fourier series within this interval. Specifically, it can be expressed as: $f_T(t) = \frac{a_0}{2} + \sum_{n=1}^{\infty}(a_n\cos(n\omega t) + b_n\sin(n\omega t)) = \sum_{n=1}^{\infty}c_ne^{jn\omega t}$ Dirichlet conditions include three aspects: * 该周内连续的一级断点数量应当有限制性 * 极大值和极小值的数量(应受到严格限制) * 信号具有绝对可积性特性 How fast the signal changes is frequency dependent: * Noise patterns, edge features, and abrupt changes significantly contribute to the high-frequency components of the image. * Background regions and gradual variations play a significant role in representing the low-frequency components of the image. The method of frequency domain processing for visual data exhibits several key features. * 能量的可持续利用主要体现在能量的优化分配与重新分配之间。 * 该方法有助于从图像中提取特定特征。 * 在频域中采用高效算法能够有效减少计算负担并提升处理性能。 The steps for frequency domain image enhancement are: 将输入图像乘以$(-1)^{(x + y)}$执行中心对称变换 对计算所得结果图像执行二维傅里叶变换以获得F(u,v) 将F(u,v)与设计好的传递函数H(u,v)相乘得到G(u,v)= H(u,v)F(u,v) 计算步骤(3)的结果的反向傅里叶变换得到g(x,y)=F^{-1}[G(u,v)] 取步骤中计算结果的实部并将其乘以$(-1)^{(x+y)}$从而得到g(x,y) Design of the transfer function H (u, v): * Filters are predominantly set up intuitively. * To select a frequency filter, one can utilize the correspondence between frequency components and visual features in images. * The design process for two-dimensional digital filters is based on approximating mathematical and statistical criteria. 38. Frequency domain filtering Frequency-domain filtering emphasizes or reduces various spatial frequencies in an image, altering pixel data by adjusting its frequency components to aim at accomplishing the goals of reducing noise interference and enhancing image clarity. Low-pass filter: * 矩形滤波器 * 公式表示为:$f(x)= rect(\frac{x}{2a}) \Rightarrow F(s) = 2a \frac{\sin(2\pi as)}{2\pi as}$ * 三角形滤波器 * 最佳理想低通滤波器:其频率响应特性为$H(u, v) = $ ,其中$D(u, v) = \sqrt{u^2 + v^2}$ * 巴特沃思低通滤波器:其频率响应特性定义为$H(u, v) = \frac{1}{1 + [D(u, v)/D_0]^{2n}}$ ,其中$n$代表函数的阶数 * 指数型低通滤波器:其频率响应特性为$H(u, v) = e^{-[\frac{D(u, v)}{D_0}]^{n}}$ 。该滤波器在高、低频段之间过渡平滑,在高频区域衰减较快,在低频区域衰减较慢;具有良好的去噪能力并能有效降低模糊效应 * 指数型低通滤波器(巴特沃思低通滤波器的特殊情形):其频率响应特性由公式$H(u, v) = e^{-D^2(u, v)/{2\sigma^2}}$ 给出。 High-pass filter: The high-contrast regions and distinctive features of the image are primarily concentrated in the high-frequency band, and image blur is predominantly attributed to the deficiency of strong high-frequency components. The frequency response of the high-frequency emphasis in the spectral domain: $H_{hp}(u, v) = 1 - H_{lp}(u, v)$ As $D_0$ increases, the degree of sharpening becomes more pronounced. Butterworth high pass filter: $H(u, v) = \frac{1}{[1 + (D_0/D(u, v))]^{2n}}$ High-frequency enhancement filter: Insert a constant into the transfer characteristic of the high-frequency emphasis filter in the frequency domain to introduce some low-frequency components back. High-boost filter: Amplify the prototype image by an enlargement factor A and minus the low-pass image. The resulting high-band emphasis image is mathematically defined as $G_{HB}(u, v) = A \times F(u, v) - F_L(u, v)$, which can also be expressed as $(A-1)F(u, v) + F_H(u, v)$. When A equals 1, this corresponds to an ideal high-pass filter. If A exceeds 1, it represents a high-frequency enhancement filter. Band-pass and band-stop filter: Band stop filter: Definition of an nth-Order Radially Symmetrical Butterworth Band-Stop Filter (also a Butterworth Band-Pass Filter): The mathematical expression for this filter is given by: $H(u, v) = \frac{1}{1 + \left[\frac{D(u, v)W}{D^2(u, v)-D_0^2}\right]^{2n}}$ where W represents the cutoff frequency range (i.e., cutoff radius), and D₀ denotes the center radius of the cutoff region. Homomorphic filtering: 通过处理相应的同态系统,在广义叠加原理的操作下实现同态滤波。 Homomorphic filtering represents a technique that operates in the frequency domain, designed to achieve signal separation by reducing the dynamic range and improving the contrast ratio. Basic idea: Amplicate the high-frequency components while minimizing the low-frequency components can help in enhancing the contrast ratio and effectively reducing the effect of multiplicative noise interference. Step: * Logarithm: $f(x, y) = i(x, y) r(x, y) \Rightarrow \ln{f(x, y)} = \ln{i(x, y) + \ln{r(x, y)}}$ * Fourier transform: $F(u, v) = I(u, v) + R(u, v)$ * Design H (u, v) filtering: $H(u, v) F(u, v) = H(u, v)I(u, v) + H(u, v) R(u, v)$ * Inverse Fourier transform: $h_f(x, y) = h_i(x, y) + h_r(x, y)$ * Exp: $g(x, y) = \exp{h_f(x, y)} = \exp{h_i(x, y)} \cdot \exp{h_r(x, y)}$ H (u, v) design: $H_{homo}(u, v) = [H_H - H_L]H_{high}(u, v) + H_L$ 对比分析频率域技术和空间域技术之间的异同点:频率域技术通过频谱分析来处理信号信息,而空间域技术则基于图像的空间位置进行处理。 建立在部分像素上的空间域技术与基于全局的频域技术 (function)领域:在空间域中使用平滑滤波器和锐化滤波器,在频率域中使用低通滤波和高通滤波 Algorithmic approaches are employed for spatial processing, where convolution operations on images and templates are conducted. Additionally, frequency processing is applied for the frequency domain, involving the application of multiplication operations. 40. Short-time Fourier transform Inserting a small window into the signal stream primarily aims to transform the signal within that window, which effectively highlights its local characteristics. defect: * The size and shape of the window function are determined by factors unrelated to time and frequency, remaining constant, which presents challenges in analyzing non-stationary signals. * High-frequency signals possess brief existence intervals, while low-frequency signals persist over extended periods. It is desirable to employ smaller time windows for high-frequency analysis and larger windows for low-frequency examination; however, the Short-Time Fourier Transform (STFT) fails to meet this objective. * The inability to establish an orthogonal basis introduces computational inconveniences. 41. Wavelet Transform Advantage: * 继承并增强了STFT的定位特性 * 克服窗口大小不随频率变化所带来的不足,并非缺乏离散正交基 Extract changes in “specified time” and “specified frequency” in a signal. Orthogonal basis: * 被称为正交的两种不同描述 * 如果这些基能够完全表示所有的对象,则被称为完整的特征基。 Wavelet: A special waveform with a limited length and an average value of 0. Two key characteristics: Both characterized by brief duration and sudden changes in frequency and amplitude. Additionally, within a finite timeframe, the average tends to zero. Features: 1. 具有紧密支撑特性或近紧密支撑特性在时域上表现出来, 2. 振荡模式显示出正负交替变化的特征并带有零直流分量 Wavelet analysis: * A signal is decomposed into a series of wavelets after the mother wavelet is scaled and translated, so the wavelet is the basis function of the wavelet transform * The wavelet transform can be understood as the result of the Fourier transform of the sine and cosine waves of the Fourier transform with a series of wavelet functions scaled and translated The continuous wavelet transform is described by the following formula: $C(scale, position) = \int_{-\infty}^{+\infty}f(t)\psi(scale, position, t)dt$ The CWT yields a multitude of wavelet coefficients C. CWT wavelet transform steps: * Take a wavelet and compare it with its leading portion of the signal. * Compute the correlation factor C, which signifies... * Determine coefficient values after shifting; shift rightward and repeat steps 1 and 2. * Determine scaled coefficients by scaling; then perform similar computations. * Iterate across all scales. Discrete Wavelet Transform: If both the zoom factor and translation parameters are chosen as multiples of 2j. A wavelet transform employing these scaling factors and translation parameters is referred to as a Dyadic Wavelet Transform, which constitutes a specific instance of Discrete Wavelet Transform (DWT). Typically, DWT encompasses the concept of two-scale wavelet transformation. Mallat Algorithm: This technique is known as a signal decomposition method and is commonly referred to as dual-channel subband coding within the field of digital signal processing. 二元小波在图像边缘检测中被采用,并用于实现图像压缩以及重建 A filter, consisting of a low-pass filter, when used in conjunction with Approximations, provides an approximate value A for the signal. Another component is a high-pass filter, that retrieves the detail value D (retrieved as Detail). This coefficient is computed using a small scaling factor. Wavelet Decomposition Tree: The transformation of a real digital signal using a filter will produce twice as much processed information as the original raw data. Based on the Nyquist sampling theorem, a downsampling approach is introduced for signal processing applications. Specifically, every other sample within each channel is selected to reduce data density. The resulting coefficients obtained through discrete wavelet transform are represented by cD and cA respectively. Wavelet Reconstruction: By utilizing the wavelet transform coefficients of the signal, this procedure is referred to as wavelet reconstruction or Wavelet Synthesis. The approximate and detail values of the signal are calculated from its approximation and detail coefficients individually, provided that either the approximation or detail coefficient is set to zero. Selection of filters in signal reconstruction: Quadrature Mirror Filters, QMF system. Wavelet Packet: * 图像压缩 * 细节与近似具有相同的属性,并均具备可分解性 * 在N层分解过程中会产生$2N$条不同的路径 42. Commonly used wavelet functions The wavelet function satisfies the requirements: * The wavelet must exhibit oscillations * The wavelet's amplitude is confined to a brief interval of time, representing a localized region Haar小波: $\psi(t) = \begin{cases}1, & 0 \le t < 1/2\\ -1, & 1/2 \le t < 1\\ 0, & 其他情况\end{cases}$ 其数学表达式为$\hat{\psi}(\omega) = i\frac{4}{\omega}e^{-i\omega/2} \sin^2(\omega/4)$(其中$\omega$为频率变量)。 优点: 该方法具有良好的局部化能力(即能够精确定位信号特征),同时计算效率较高(基于快速算法)。 此外,在图像处理方面表现出色(如边缘检测和去噪),但在处理非平稳信号时存在局限性(即无法捕捉快速变化的高频成分)。 Tight support in time domain, non-zero interval is (0,1); Orthogonal wavelet Symmetry Currently, the unique orthogonal wavelet with both symmetry and finite support exists. Defect: Discontinuous Daubechies wavelet: Orthogonal wavelet, (also called db wavelet) Morlet小波用于地震信号分析:一个单一频率的复正弦函数具有高斯包络线,并且具有良好的对称特性。 Gauss小波函数定义为:$\psi(t) = -\frac{1}{\sqrt{2π}} te^{-t²/2}, \hat{\psi(ω)} = iω e^{-ω²/2}$。该小波具有良好的对称性,并用于提取阶梯状边界。 以Marr小波为基础的方法:其时间域表达式为$\psi(t) = \frac{2}{\sqrt{3\sqrt{\pi}}} (1 - t^{²}) e^{-t^{²}/ ²}$;而其频域形式则为$\hat{\psi}(\omega) = \frac{4^{1/4}}{3^{1/4}} |\omega|^{²} e^{-|\omega|^{²}/ ²}$;该方法在图像处理领域具有重要应用价值 It is the second derivative of the Gaussian function, playing a significant role in edge detection of signals and images. This method is primarily applied to the extraction of roof-type boundaries and Dirac edges. feature: 指数衰减特性, 不具有紧支集特性; 在时间-频率域具有良好的局部位质; 关于0轴对称性 …too much wavelet. 43. Application of discrete wavelet transform in image processing * 图像特征提取 * 图像压缩技术 * 数据隐藏与图片水印技术 * 图像融合技术 44. Image pyramid Image Pyramid consists of a sequence of images with resolutions progressively decreasing, originating from the same original image. Gauss Pyramid, Laplacian pyramid.
