Color in Complex Scenes论文及其翻译

阅读量：

Color in Complex Scenes

Type: Academic Journal, Author: Steven K. Sheevil, Link: sichub, Select: ★★★★★, Status: In progress, 备注：针对复杂场景下物体色彩描述的研究综述，并涵盖了背景色调与色度恒定性的相关信息, Year: 2008

Color in Complex Scenes

Key Words
颜色适应, 等均匀背景假设, 色彩恒定性, 色彩与形态, 色彩与运动

Abstract
The appearance of an object or surface is largely shaped by light from other visible objects and surfaces. This review focuses on color perception in complex visual environments, where multiple regions of differing colors are simultaneously or successively present, as occurs in natural observation. Two key characteristics distinguish the chromatic representation resulting from a complex scene from that of an isolated light patch. First, in complex scenes, an object's color is not entirely determined by the light it emits that reaches the observer's eye. Second, the chromatic representation of a complex scene influences not just hue, saturation, and brightness but also other sensory dimensions such as shape, texture, orientation, contour, depth perception, and motion perception. These two characteristics form the foundation of this review, which explores color perception under varying spatial or temporal contextual conditions, including aspects of color constancy and how chromatic contributions shape our understanding of orientation and motion.

物体或表面的颜色在很大程度上受其他物体与表面反射光的影响。本文探讨的是空间中出现多种色彩区域的情况。这些多色区域的出现如同自然环境中常见的现象。首先，在复杂场景中，物体的颜色并不完全由从该物体到达眼睛的光决定；其次，在复杂场景中的色彩表现不仅影响色调、明暗程度以及饱和度等视觉特征，还对形状感知、纹理感知以及物体现状分割等方面产生显著影响。这两个特性构成了本研究的基础框架：它关注随着空间或时间变化背景中的颜色知觉过程，并深入探讨了颜色恒常性问题及其对方向感知、轮廓识别、深度认知以及运动感知等方面的影响。

INTRODUCTION

Color vision offers numerous benefits. Beyond enhancing our visual experience, the amount of information conveyed through colored imagery surpasses that of black-and-white imagery. The same flowers stand out distinctly when viewed through colored imagery rather than black-and-white (see Figure 1a). These objects are also more readily recognizable within colorful environments compared to monochromatic ones. Furthermore, these scenes allow for easier grouping of similar elements due to their vibrant hues. Lastly, these findings highlight how color facilitates better memory retention compared to grayscale depictions.

During natural observation, colored objects are perceived within a complex natural environment. The neural mechanisms for color representation in intricate scenes are influenced by the intricate and often subtle characteristics of the natural environment. These mechanisms enable the extraction of an object's chromatic property from incident light entering the eyes, which is partially dependent on illumination sources and shading effects. Furthermore, these neural processes convey information about scene structure through space and time. In this review, we explore how contextual factors shape color perception and also examine how chromatic representations contribute to other visual perceptions such as shape, depth, movement, and object segmentation.

Color is Not in Light

The practice of describing light by its hue, for example a “yellow light,” confuses a perceptual phenomenon (the hue yellow) with the physical world (light). True, a particular physical wavelength near 580 nm appears yellow, 540nm lime green, 600nm orange, and 660nm red. But a yellow indistinguishable from 580 nm is seen also with a mixture of only 540nm and 600nm; or only 540 nm and 660nm; or an infinite number of other light mixtures. The identical color percept from all these physically distinct lights is mediated by identical neural responses evoked by the lights. The determination of color by neural responses to light—not apprehension of color within the light rays themselves—is a foundational and functional property of color vision.

The Significance of Context in Color Perception

The world we perceive would be disorderly if an object's color were solely influenced by its wavelengths entering the eye. A white hen egg observed at an outdoor market would appear yellow in a kitchen lit by a standard screw-in fluorescent light fixture, while what seems purple for an eggplant outside would likely show as a deeper shade of orange indoors. Obviously, this doesn't occur because neural representations account for changes in reflected wavelengths when light transitions from outdoor to indoor illumination conditions. More broadly, perception processes utilize both spectral and spatial light distributions to maintain consistent object color experiences despite variations in incident wavelengths reaching our eyes.

Neural processes cannot function effectively when an object's light is perceived in the absence of contextual information. Imagine two sheets of paper: one typically appears white under normal illumination, while the other is printed with ink that reflects primarily long wavelengths. When illuminated by daylight, one sheet appears white while the other turns red. However, if one sheet is lit solely by long wavelengths and the other by daylight, both sheets allow similar wavelengths to reach the eye. Without distinguishing factors like color perception differences or material properties, these sheets must appear identical. Under natural lighting conditions, light reflected from surrounding objects—the very essence of contextual cues—enables us to perceive these papers as having different colors across varying illumination spectra.

Classical theories of color vision primarily examine isolated beams of light that do not account for contextual information (Wright 1946). These theories highlight essential characteristics of photoreceptors, which convert physical light into neural signals during the initial phase of vision. However, context significantly influences responses within the retina, lateral geniculate nucleus, and visual cortex. This makes it understandable why nearby stimuli or previously viewed lights can alter the perceived color appearance of a fixed physical object. For instance, Figure 2 demonstrates how words like ANNUAL and REVIEW—printed in the same ink—exhibit different color appearances when placed in distinct contextual settings (similarly, OF and PSYCHOLOGY share a common ink). In natural settings, retinal images exhibit variegation across both space and time. Such intricate visual stimuli evoke neural processes underlying natural scene color perception but remain beyond the scope of analysis using isolated light patches.

The concept of trichromacy and its deficiency in explaining color during natural scene viewing.

Almost all human observers are capable of matching any light spectrum using a combination of three primary colors. The principle of trichromacy explains this phenomenon, rooted in neural coding. Each uniform, isolated patch of light elicits precisely three distinct neural responses. If two different distributions of light produce identical neural responses, they will be perceived as identical. However, trichromacy does not explain why certain colors appear the way they do; it only describes why some colors might seem alike to the human eye.

The distinction between neuroscopic responses that determine trichromacy and those representing hue is crucial yet often overlooked. The three photoreceptor cells responsible for color vision were formerly referred to as red, green, and blue cones. While "yellow light" may mislead by associating perception with physical light, "blue cone" incorrectly links perception with a specific type of neuron. Each cone type responds not only to blue hues but also extends beyond 500 nm into yellish-green wavelengths. The perception of blue does not depend solely on blue cone activity; it can be elicited by focused retinal stimuli outside the blue cone spectrum (Hofer et al. 2005). Modern terminology avoids using color terms in receptor names; instead, they are labeled L (long-wavelength sensitive), M (middle-wavelength sensitive), and S (short-wavelength sensitive). Trichromacy arises from the combined responses of these cones regardless of the specific colors they produce.

由于三色视觉的存在，在任何单一光斑中都可以找到三种原生色混合而成的效果。然而，在所有原生色混合可能达到的颜色范围与我们日常经历的颜色 gamut 相比显得狭小。这是因为上下文的作用不仅会影响我们对颜色的感知（图 2），还会极大地扩展我们所见到的颜色领域。单一混合物本身永远不会呈现出棕红、暗褐或灰色这些常见颜色特征，因为这些颜色仅在某一种光源与其他光源共存时才会显现出来（即在存在上下文的情况下）。三色视觉还表明，在黑暗环境中看似相同的两种不同光源，在任何共享背景下也会表现出相同的效果。这两种光源会在视觉的第一级反应中引发相同的神经响应模式，因此它们可以相互替代（Grassmann 1853）。尽管上下文会影响某种特定颜色的表现形式，但其作用是均等的；因此可以看出的是，在复杂场景下某些光源的真实外观是什么是不可能被揭示出来的。

To differentiate between light and the percept it evokes,the description here attributes physical properties such as wavelength or spectral distribution.Like uniform energy spread across all wavelengths constitutes a 'broadband' spectral distribution.Avoid using terms like 'nonselective' when referring to such distributions.Specific color names like brown red green pink and white are exclusively assigned to percepts.Additionally the term 'achromatic' refers to percepts of white gray or black.Whereas brightness pertains to the perceived intensity of emitted light.

INTRODUCTION

观察彩色图像具有诸多优势。不仅能够拓展视觉体验的广度，并且相对于黑白世界的丰富性而言，在信息含量上也更为庞大（图1a）。对于灌木上的花朵，在单色图像中可能会被视为背景细节而忽视其重要性；但在彩色图像中，则会被迅速识别出来（Domini & Lucas, 2001；Mollon, 2000；Sumner & Mollon, 200）。相比之下，在单色场景中发现物体需要依赖特定条件才能实现检测功能（Domini & Lucas, 2001；Mollon, 2.年；Sumner & Mollon, 2.年），这一现象在多光谱成像技术中得到了进一步证实（Mollon, 1989; 图1b）。此外，在单色条件下分类物体往往较为困难（图1c），而这种困难在多光谱成像系统中则得以显著缓解，并且这种缓解方式还可以提高记忆效果更好

在自然环境中进行观察时，默认情况下人们会注意到那些处于复杂背景中的彩色物体。当光线进入人眼时，在大脑内部会发生复杂的信号转换过程：这些信号转换使得我们能够根据光线信息推断出物体的颜色特性和亮度特征。其中光线特性部分受光源类型及阴影情况的影响；同时，在空间布局上也反映了场景的变化过程。本文综述研究指出，在分析背景因素对色彩感知的影响时需综合考虑多维度因素；并分析了颜色特征如何影响形状识别、深度估计、运动感知以及图像分割等其他视觉感知能力

视觉功能之一是色彩的表现。（a）被命名为蓝牛仔裤青蛙（Dendrobates pumilio）的青蛙可以通过其色彩容易地被识别出来（照片由D. Montero拍摄）。（b）色彩对于分类和鉴别水果是非常有帮助的。（照片由D.I. Thompson提供）

Color is Not in Light颜色不在光中

将光表征为色调的做法是一种复杂的技术手段。它混淆了两个关键概念：即我们感知到的颜色（色调黄色），以及光线本身在物理世界中的性质。实际上，在光学中存在一种明确的区别：接近580纳米波长的光线被我们视作黄色光源；而540纳米对应的是所谓的"明度较高"（brightness）较高的颜色，在这种情况下表现为酸橙绿色；600纳米则呈现为标准橙色；而660纳米则被视作红色光源。值得注意的是，在某些情况下，“接近黄色”的感觉可以通过仅使用540纳米与600纳米的混合光波实现；同样地，“接近黄色”也可以通过仅使用541-539nm与661-659nm之间的特定组合或其他多种混合光源实现。这些在光学上明显不同的光源却产生了相同的视觉感受（perception），这是因为它们都触发了相同的人类神经系统产生相应的反应机制（neural response）。因此，在人类视觉系统中识别颜色并据此进行处理的能力构成了色觉的核心功能特性（fundamental functional attribute）。

Understanding the Role of Context in Color Sensitivity：上下文在色彩敏感性中的作用

我们看到的世界如果一个物体的颜色由进入眼睛的来自它本身的波长直接决定的话就会变得混乱。商店里看上去白色的鸡蛋与紫色茄子在普通螺旋灯泡照明下厨房里分别变成黄色与橙色然而这并不会发生实际情况是因为为鸡蛋与茄子的颜色进行神经表征的过程能够补偿由于环境光线变化导致的颜色信息丢失更普遍地说中介感知涉及光谱信息以及它们的空间分布特征因此我们在室内所见的颜色几乎保持恒定即使这些物体发射出不同类型的波长

这些神经过程依赖于物体表面所接收光波的信息，在缺乏关于光源上下文的情况下无法正常运作。假设有一张普通的白色纸张以及一张仅对长波反射（long wave reflection）但未被油墨印刷处理过的纸张，在白天时分观察这两张纸时会发现明显区别——一张呈现白色而另一张呈现红色。然而当光线仅照射到第一张白纸上的长波区域时以及第二张纸上被日光照射的部分发生分离后就会发现两者的视觉感受变得相同——即光线经由眼睛到达的部分具有相同的波长。由于缺乏足够的信息来判断它们外观的不同这两者必须看起来一致。在自然环境中我们无法直接观察到这种差异因为视野中其他物体反射进来的光线提供了关键的数据以帮助我们在不同光照条件下辨别各种颜色纸张之间的细微差别

经典的色觉理论主要研究不受外部因素干扰的孤立光斑（Wright 1946）。该研究揭示了视网膜中感光细胞将物理光线转化为神经信号的基本特性。值得注意的是，在视觉的不同阶段（如视网膜、外侧膝状核及视觉皮层），外界环境的影响会显著改变响应模式。因此并不令人惊讶的是，在观察同一区域前后或邻近区域时（类似地，在不同领域如心理学与OF共享相同的印刷油墨），相同的物理刺激可能会导致颜色外观发生改变。通过图2可以看出，在不同的上下文中（例如OF与PSYCHOLOGY共享相同的油墨），颜色呈现效果也会有所差异。在自然环境中，视网膜接收的图像不仅在空间上复杂多变，在时间上也呈现出持续的变化特征。这种复杂的视觉刺激能够激发大脑处理自然场景中色彩感知的相关神经机制；但这些机制仅在单独分析一个孤立光斑时无法被观察到

ANNUAL与REVIEW均采用相同的墨水印刷（参考W位置上的连接条）。它们之间存在的颜色差异源于其所属图案背景的不同。同样地,OF与PSYCHOLOGY也采用了相同的方式进行墨水印刷（参考F位置下的连接条）。

The Framework of Trichromacy Lacks the Ability to Elucidate Color Phenomena in Natural Scenery.

三色性原理及其在自然观察中不能解释颜色的原因

一般而言，在人类视觉系统中可以通过由三种主色光组成的视觉信号系统来合成任意一种光谱分布的现象。这基于神经编码的机制：每个单一、均匀的视觉区域都能被独特的三个神经响应所编码。若两种物理上不同的光谱分布产生相同的三种响应，则它们将无法被区分开。然而，在人眼看来颜色是什么的问题上——即解释哪些光线看似相同而非分析光线如何产生"—这正是三色体所无法诠释的。

有关区分决定三色性的神经响应与色调的神经表征之间存在差异的研究往往被忽视。介导色觉的三种感光细胞曾一度被误称为红绿蓝锥细胞。正如'黄光'错误地将感知与物理光的概念混淆一样'蓝锥细胞'这一术语错误地将视觉感知与一种特定类型的神经元相混淆。这种类型的锥细胞对不同波长的颜色有特定反应其响应不仅限于传统的500纳米以下的颜色范围而是能够涵盖远超此范围的各种光线这在孤立状态下表现为一种特殊的黄色调而非传统意义上的蓝色感受能力然而通过特定的小而锐利的视网膜刺激即使不落在任何所谓的蓝锥细胞上也可以体验到这种独特的蓝色色调研究者Hofer等人在2005年对此进行了详细探讨。现代术语在受体命名时并未涉及颜色感知的概念而是将其命名为LMS分别代表长波、中波和短波敏感性而非传统的红绿蓝色彩名称三色性现象来源于这些LMS锥体的感受特性而并非由它们所引发的具体色调所决定

三色性涉及将任何独立光斑与三种主色光的组合进行匹配。然而，在所有主色光组合的情况下所呈现的颜色范围相对较小。这种差异主要由上下文引起，并非单一因素所能解释。上下文的影响不仅改变了颜色感知（如图2所示），还显著扩展了我们能够感知的颜色范围。单独混合永远不会呈现出棕色调、琥珀色调或灰色……这些通常是在其他光线存在时才显现的颜色特征（即在特定背景下）。三色性还表明，在共享上下文中看似相同的两个物理性质不同的光斑会呈现相似视觉效果。这是因为这两个光斑在视觉的第一阶段引发相同的神经响应过程（Grassmann 1853）。值得注意的是，在特定条件下这些不同类型的光源可能会表现出类似的行为模式；然而这并不意味着它们能够揭示复杂场景中光线的真实外观是什么样的

为了区别光源本身与其产生的视觉感受，在此我们采用光线波长或光谱分布这一手段来表征光源特性。均匀分布在各个波长上的能量相同的光谱分布被称为"宽谱"。颜色名称如棕红绿粉白等特定名称专门用于描述视觉感受，请注意通常将白色灰色及黑色归类为无色类别。亮度或辐射则是衡量光源特性的物理参数之一；而明亮度则指的是视觉系统对发射光线的感受程度

CONTEXT IN COMPLEX SCENES

Complex Context Cannot be Reduced to Simple Context

The most straightforward context is the uniform background domain. Its extensive study over more than a century includes key works such as Chichilnisky and Wandell (1995), Jameson and Hurvich (1972), Shevell (1982), von Kries (1905), and Walraven (1976); for comprehensive reviews, see Shevell (2003). Understanding the role of context in complex scenes may be facilitated by theories explaining color changes from uniform backgrounds. This concept is referred to as the equivalent-uniform-background hypothesis.

The hypothesis comprises alternative formulations, with one prominent version positing that a complex context equates to a uniform field characterized by space-average chromaticity and luminance. In this instance, averaging occurs at the level of physical light, reducing the context to a trichromatic representation of the space-averaged stimulus. Another version states that each region's effect in a complex background can be modeled as if it were an independent uniform background, with their combined effects explaining how complex contexts influence perception. The implication is that neural aggregation occurs. The third formulation suggests that an unspecified uniform background can fully substitute for any complex context.

The equivalent-uniform-background hypothesis serves as a foundational explanation for several studies on color perception (Bauml 1995; Brainard & Wandell 1992; Valberg & Lange-Malecki 1988), yet none of its variations successfully align with observer reports from complex visual environments. Testing the space-average form specifically involves using different non-uniform backgrounds that share identical space-average chromaticity and luminance levels. Despite these controlled conditions, alterations to perceived colors still occur when judging regions within otherwise uniformly lit areas where edge contrast remains constant (Barnes et al. 1984; Mausfeld & Andres 2003). Moreover, introducing retinal disparity—thereby accounting for stereoscopic depth—involves modifying how observers perceive spatial relationships within an image without changing the average light levels detected by each eye independently (Yang & Shevell 2003).

Aggregation of neural influences from each distinct region does not meet the test of independence. When considering three lights—specifically at wavelengths including 540 nanometers (which appear yellowish-green), 660 nanometers (which appear red), and a broad-spectrum light that appears white—initial measurements can establish how color changes when exposed to a uniform background limited to either just these specific wavelengths or an extended spectrum. As anticipated, an extended monochromatic background has little effect on perceived color. Moving forward, we examine two experimental setups: setup A features two separate regions—one emitting at precisely 540 nanometers and another emitting broad-spectrum light—and setup B replaces one region's wavelength from its original configuration in setup A with one emitting at 660 nanometers while maintaining identical broad-spectrum regions across both setups. The identical broad-spectrum regions should demonstrate equivalent influence irrespective of their pairing with either wavelength—furthermore, this influence is anticipated to be negligible due to their minimal individual impact when observed in isolation.

However, measurements indicate a notable change in color perception when the broadband region is introduced within either a 540 nm or 660 nm background. Furthermore, the color change caused by adding the broadband region to 540 nm [background (a)] proceeds in a contrary manner to that resulting from adding it to 660 nm [background (b)]. This suggests that even simple backgrounds with two chromaticities do not meet this hypothesis's independence assumption. Independence remains inconsistent also with patterns inducing colors at single spatial frequencies compared to those involving multiple frequencies. The compound patterns reveal nonlinear spatial interactions as demonstrated by Zaidi et al. (1992).

An approach that represents some uniform background as comparable to any complex context falls short in explaining chromatic appearance shifts specifically caused by chromatic variegation within a scene (Brenn Cornelissen 2002, Ekroll et al. 2004, Shevell & Wei 2000, Singer & D’Zmura 1994). The following illustration demonstrates how different chromatic lights appearing as green, red, blue, or yellow (rectangles in the upper panel of Figure 3) lose most of their color when viewed against a background pattern with strong chromatic contrast (lower panel from Brown & MacLeod 1997). The occurrence of gamut compression in high-contrast contexts persists even with minimal gray "grout" separating elements in the background mosaic. Notably, this outcome conclusively rules out local contrast at the edge of the rectangles as the primary cause for color differences observed in both panels of Figure 3. Furthermore, gamut compression can be separately attributed to variations in color and brightness (Brown & MacLeod 1997).

Color Perception with Context that Varies Over Space or Time

Environmental contexts varying across space or time can impact distinct processes at different hierarchical levels within the visual system. Background light measured at spatial frequencies exceeding four cycles per degree affects the number of photons absorbed by receptors located in adjacent regions, influenced by imperfections in the optical system (Smith et al. 2001). The distribution of spread light depends on specific wavelengths present in the stimulus and increases as spatial frequency rises
(Marimont & Wandell 1994).

With complex environments, distinct neural processes can be modulated by both average and variability within an underlying structure over time or space. For instance, consider a spatially uniform background that varies smoothly over time with a 1 Hz oscillation frequency between two colors. In such cases, an observer would perceive an intermediate color as achromatic under certain conditions; simultaneously, other colors would exhibit compressed hues along the axis of chromatic variation (Webster & Mollon 1994, 1995). Imagine multiple stimuli that all appear blue but differ slightly in hue due to relatively minor differences in L-, M-, and S-cone activation. After prolonged exposure to a steady uniform background matching their average chromaticity, these stimuli would be perceived as having hues distributed around white. However, if instead, a temporally varying field maintaining the same average chromaticity replaces this static background，则 further shifts in perceived colors would occur. The temporal variation induces compression in separate postreceptoral chromatic neural representations. Moreover,the selective compression of perceived color ranges along various chromatic oscillation directions suggests neural representations beyond trichromatic dimensions...

Stable backgrounds exhibiting spatial variation, as depicted in Figure 3's lower panel, do not produce the same compression effect as those involving temporal variation. When chromaticities within a background mosaic are randomly selected from a line in color space, comparable compression effects are observed. Furthermore, color-selective compression across any chromatic dimension indicates a non-trichromatic processing mechanism within the cortex (Webster et al., 2002).

A key neural mechanism is indicated by varying chromatic spatial context within one eye, leading to a consistent shift in light's color when presented to either the same or adjacent eyes (Shevell & Wei, 20xx). For Figure 2's perceived color differences, a cortical neural process may be inferred. The spatial-frequency tuning and chromatic selectivity of these shifts imply an S-cone antagonistic center-surround (+S/−S) receptive field pattern identified by Shevell and Monnier in their research. Notably absent from retinal cells are neurons exhibiting such characteristics; however, physiological studies have identified cortical neurons with S-cone centered surround organization.

不同背景中的色度变化会影响感知到的色度差异以及色调。
具有固定L-M色度对比度的区域，在高而非低L-M背景中会显得对比度更低。
感知到的对比度降低表现出显著的选择性尽管并不完全：感知到的L-M色度对比在等光饱和背景下受到更强烈的压缩（与仅S锥相关联的变化相比），因为这种背景下对L和M双锥细胞的刺激得到了显著增强（图4；类似地，在单一S锥相关联的变化背景下呈现出来的S锥色度被等光饱和背景更强烈地压缩（Singer & D’Zmura 1994））。
感知到的色度对比程度随着等光饱和背景的不同而发生改变。
这种现象表明存在一种中央神经机制。

Color variation observed in natural scenes often highlights the functional importance of these neural processes. Adaptation to specific color gamuts in individual scenes and natural environments is enabled by directional-specific chromatic compression mechanisms. Color variations are influenced by both (a) the range of colors present across various natural settings and (b) scene-specific chromatic changes that occur under differing spectral conditions.

Chromatic neural responses in the retina, lateral geniculate area, and visual cortex have received extensive studies [reviewed by Gegenfurtner (2003) and Solomon & Lennie (2007)]. A commonly held belief is that cortical responses exhibit a broader range of chromatic selectivity compared to subcortical neuron responses. These various levels of chromatic representation indicate that comprehensive understanding of color perception in intricate environments necessitates an understanding of contextual influences at every level.

CONTEXT IN COMPLEX SCENES

Complex Systemic Context Cannot Be Simplified to Simple Models复杂系统性语境无法简化为简单模型

最简单的背景下场是一种均匀的空间分布模式（如Chichilnisky＆Wandell 1995所研究），其研究历史已经超过一个世纪（Jameson＆Hurvich 1972；Shevell 1982；von Kries 1905；Walraven 1976）。关于这一领域的综述性研究，请参阅Shevell 2003）。在复杂场景中，在均匀背景中的颜色变化可以通过等效均匀背景假设这一理论来进行分析

该理论有替代方案。最强的理论认为复杂背景与具有相同空间平均色度和亮度的均匀背景等同于同一层次的空间平均刺激三色描述。在物理光水平上进行平均处理时上下文被简化为三个颜色参数的空间平均值这一假设下另一种理论则认为复杂背景中的每个区域的颜色变化可以用其作为均匀背景的影响来表示而多个区域的影响则通过某种方式结合起来解释复杂背景的整体效果这意味着在神经层面进行颜色感知的整体整合第三种理论则提出一种（未具体说明）均匀背景能够完全取代任何复杂背景这种假设

这些来自不同区域神经信号的综合影响未能达到独立性检验的要求

研究指出，在任何复杂背景下都存在一种均匀背景形式具有同等效用，并未能解释场景内色度斑驳导致的颜色外观变化（Brenn Cornelissen 2002；Ekroll等人2004；Shevell＆Wei 2000；Singer＆D’Zmura 1994）。例如，在包含强烈对比度背景图案的情况下观看不同色度光（如图3顶部矩形所示），观察者通常会丧失大部分色彩感知（图3底部面板参考Brown＆MacLeod 1997）。即使是在马赛克背景图案之间的缝隙中存在微小灰色区域（"灰泥"），高对比度背景下依然会发生色域压缩现象。这一结果排除了图3两个面板边缘处局部对比度差异作为可能原因，并揭示了造成图3两个面板中颜色差异的根本原因。值得注意的是，在此过程中色域压缩与亮度因素可分离开。

Color Perception in the Context of Spatial and Temporal Variations

在时空变化的影响下，视觉系统各层次的空间分离过程可能会受到影响。当空间频率超过每度4个周期时，在眼睛光学系统存在缺陷的情况下，在背景区域感光细胞中吸收的光线数量会发生变化；这种变化将导致扩散现象（Smith等人2001）。扩散现象的变化与其所受到刺激的具体波长密切相关，并且其程度会随着空间频率的增长而增强（Marimont＆Wandell 1994）。

在复杂背景下，在时间以及空间内背景的均值与变异性的影响下（Webster & Mollon, 1994, 1995），会引发不同层次分离神经过程的变化。举例而言，在时间上以1赫兹频率缓慢振荡且在两个色度之间呈现空间均匀分布的背景中（如图所示），物理平均值处的光会呈现出接近无色的状态；与此同时，在这种空间均匀分布下的其他光的颜色将沿着振荡轴被压缩（Webster & Mollon 1994, 1995）。考虑一组光线（如蓝色），它们看起来都是相同的颜色；但由于L、M和S锥细胞对这些光的感受存在微小差异而导致它们在色相上有细微差别（注：此处应有图示说明）。当观察者面对这些光线经过稳定均匀平均颜色背景后的视觉时（如白色区域），会感知到它们呈现不同色相范围的颜色变化；这种现象可以通过适应性引起L、M和S锥响应重新缩放来很好地解释（注：此处应有图示说明）。然而，在用同样平均颜色的时间振荡场替代稳定背景时（如动态变化），会引起分离后感受器特定颜色度神经表示范围的变化压缩效果更加明显。由于这种压缩效应主要发生在几乎任何一种颜色度振荡的方向上（如特定轴向），这表明所涉及的是一个维数超过三元非三色性皮质过程（可能位于高级皮质位置）

与空间变化不同的是（如图3所示），当背景为马赛克且色度随机选取自色彩空间中的一条线时（即图3下方面板所示），这会导致类似程度的压缩效果。（在任何给定色度方向上）的颜色选择性压缩均表明非三色体过程（Webster等人2002）的存在。

在一个注视点上，在空间背景色彩发生变化的情况下，则揭示了一个核心神经系统机制；这种机制导致同一注视点或其旁观者所感知到的光的颜色保持一致的变化（Shevell＆Wei 2000）。图中观察到的颜色差异可能提示某种皮质神经过程的存在。颜色变化的空间频率特性以及对色彩的偏好涉及到了S锥与抗中心-环区域（+S/−S）感受野的作用（Shevell＆Monnier 2005）。视网膜组织中并未发现具有这种特异性特征的神经元（Calkins 2001; Dacey 1996, 2000），但相关生理学研究证实了皮质层上存在此类特定结构（Conway 2001, Solomon等人 2004）。

背景环境中的色度变化影响了人们感知的颜色对比度和色调分布。在具有固定L/M锥颜色对比幅度的区域中，在不同L/M锥变化幅度的背景环境下呈现不同的视觉效果：当被高L/M锥变化幅度的背景包围时（如图4所示），人们感知到的颜色对比度会有所降低；相比之下，在较低L/M锥变化幅度下呈现的颜色对比效应则更为显著（Singer&D’Zmura 1994）。研究发现：人们感知到的颜色对比度会受到周围背景环境的影响程度不同：在具有S锥颜色差异性较高的均光背景下感知到的颜色差异性较弱（与仅含有S锥颜色差异性的均光背景相比），而在仅有L/M锥颜色差异性的均光背景下则表现出更为显著的颜色压缩现象（Webster&Mollon 1997, Webster等人2002）。这种颜色压缩现象不仅出现在单一视网膜视场中，在跨眼位转移现象中同样存在：即在一个眼睛中的颜色分布特性会被另一个眼睛所感知（Webster&Mollon 1995）。这一发现表明：根据具体场景中自然存在的色彩变换特征以及光照条件的不同组合方式来调节视网膜内的色彩敏感性机制是适应自然视觉环境的关键因素之一

基于Gegenfurtner（2003）与Solomon＆Lennie（2007）的研究[该文]对视网膜、外侧膝状体以及视觉皮层中的色度选择性神经响应进行了系统研究。研究结果表明，在处理颜色选择性方面的能力上，在皮质区域的表现比于亚皮质神经元要显示出更大的多样性程度。不同层次的色度表示不仅有助于深化对复杂场景中颜色知觉机制的理解，在各个层面所处的不同背景环境条件下也显示出其重要性

COLOR CONSTANCY

Most objects we observe are visible only when they reflect illumination from a light source like a lamp or sun. When lights go out or day ends, objects become invisible revealing a fundamental issue: The light emitted by an object relies on its choice of spectral reflectance and that of its surrounding illumination. However normally viewed an object's color remains remarkably consistent despite varying illumination since its chromatic characteristics are determined solely by its reflective properties. Color constancy refers to how an object maintains its perceived color across diverse lighting conditions irrespective of how light's spectrum changes as it travels from source to observer. For instance consider how eggs and eggplants maintain their hues under different lighting scenarios.

Perceived constancy would be a minor issue if variations in illumination led to small differences in the light spectrum entering the eye, but this is not the case. Unless an object's color were determined solely by its reflected light, the perceived colors would vary dramatically. The Macbeth ColorChecker (Munsell Color Laboratory, New Windsor, New York) is a mosaic of colored patches that provides a full range of colors under daylight (Figure 5, top). When illuminated by an indoor tungsten light source, this same mosaic reflects less short-wavelength light to the eye; this change in illumination would result in significant shifts in perceived color if each region's light contribution alone dictated its color (as shown in Figure 5, bottom). In general, changes in receptoral stimulation from a single object due to daily illumination variations are comparable to differences in receptoral excitation when contrasting hues are illuminated separately (Shevell 2003).

Context offers the biologically obtainable data essential yet insufficient for establishing color constancy. The varying theories on constancy highlight differing mechanisms through which context influences object color perception.

Color Constancy is Imperfect

颜色感知的结果更接近于物体反射波长所预期的颜色感知结果而非常数性

Receptoral Responses are Always Ambiguous

The light from an object absorbed by each type of cone L, M, or S, $Q_L$ , $Q_M$ , or $Q_S$ , is

该积分计算的是可见光波段（400至700纳米）各波长 $\lambda$ 上的积分；
其中 $E(\lambda)$ 表示未知的光源在每个波长处的能量分布函数；
而 $R(\lambda)$ 则代表未知的物体在每个波长处反射入射光的比例；
以及 $q_L(\lambda)$ 、 $q_M(\lambda)$ 和 $q_S(\lambda))分别是代表黄斑细胞（L、M、S）生理上的色觉灵敏度；只有物体在每个波长处的反射谱函数$ R(\lambda))携带了其固有的色彩特征；
如上文所述，
仅存在物体在每个波长处的反射谱函数 $R(\lambda))携带了其固有的色彩特征；然而，生物体实际上能够利用的信息（即$ \ Q_L), $\ Q_M\), 和$ \ Q_S))依赖于各个波长下$\ E(\lambda)\cdot R(\lambda)))之积。

In natural environments, the integrals yielding $Q_L$ , $Q_M$ , and $Q_S$ are closely approximated when summing values every $10\ \text{nm}$ increments (to say, at $400$ , $410$ , $420,\ldots$ , $700\ \text{nm}$ ). In this instance, $E(\lambda)$ , $R(\lambda)$ , $q_L(\lambda)$ , $q_M(\lambda)$ , $q_S(\lambda)$ are each vectors consisting of a vector containing $31$ values. And the amount of light absorbed by each cone type is

The way of summing up clearly shows that when more objects are included (i.e., context is considered), each bringing its own spectral reflectance value, denoted as $R(\lambda)$ ), does not deterministically solve the constancy issue because each new object adds three measurable quantities— $Q_L$ , $Q_M$ , and $Q_S$ —and also introduces an unknown vector of spectral reflectances, comprising 31 specific values from R(400) to R(700). The neural responses from receptors don't provide enough data to determine an object's color characteristics. Thus, theories addressing color constancy must make assumptions about either the visual system or the physical environment to resolve this ambiguity.

Modeling Illumination and Reflectance

With N个物体（每个物体具有特定反射率R(λ)），并且都在相同的照明条件下（能量分布为E(λ)），每隔10纳米进行积分计算将需要31个未知参数来确定每个物体的反射率曲线以及31个额外未知参数来确定光照条件（总计31(N+1)个未知参数）。每个物体将导致三个独立的视杆细胞吸收量（QL, QM, QS），从而产生3N个神经响应可用于确定上述未知参数数量庞大的反射率曲线值；显然这是生物体系中所不具备的足够信息量来求解如此多的未知反射率值。为了减少未知参数的数量以使生物体系提供的信息足够多来确定反射向量R(λ)，我们可以将照明条件E(λ)表示为三种已知固定能量分布e1(λ), e2(λ), e3(λ)的加权组合形式；这样一来仅需三个权重系数a1, a2, a3即可确定E(λ)=a1e1(λ)+a2e2(λ)+a3e3(λ)，从而将原来的31个未知参数减少到仅三个权重系数上。同样地也可以假设每个物体的反射率曲线是三种已知固定分布r1(λ), r2(λ), r3(λ) 的加权组合形式；这样任何单体物体的反射率曲线仅由三个权重系数b1, b2, b3决定；进而得到任意一个体物块体面反射率函数R(λ)=b1r1(λ)+b2r2(λ)+b3r3(λ)。基于上述假设以及一个参考片（其表面反射率为已知或假设为RSTANDARD ( λ ) ），则上述测量获得的3N个视杆细胞吸收量就足以确定这三个权重系数b1, b2, b3并最终实现对颜色恒定性的建立（Buchsbaum 1980 ，Sallstrom 1973）。

The assumptions that all illuminants and reflectances can be expressed as a weighted sum of just three spectral distributions are reasonable approximations of the variations of natural illumination that occur with weather and over the course of a day ( Judd et al. 1964) and of the spectral reflectances found in natural scenes (Cohen 1964, Dannemiller 1992). Neither assumption is perfect but neither is color constancy. If reflectance is assumed to be a weighted sum of only two distributions, so R(λ) = b1 r1(λ) + b2 r2(λ), then the 3N quantal catches (N ≥ 3) give constancy without requiring a reflectance standard in view (Maloney & Wandell 1986). Other models of illumination and reflectance invoke alternative assumptions (reviewed by Shevell 2003).

Estimating Illumination

Since Helmholtz (1866/1962), researchers have attributed stable color perception of objects to our ability to remove variations in spectral illumination (volume II, p.287). This is often misinterpreted as implying that the intrinsic chromatic property of an object—its reflectance [denoted as $R(λ)$ ]—can be directly determined by observing or inferring neural representations related to the illuminant. The underlying idea may stem from the fact that sensory stimuli from objects are based on the product $E(λ)R(λ)$ , which could theoretically reveal $R(λ)$ if a complete neural representation of the illuminant alone were available. However, simply observing or measuring illuminant provides only a trichromatic neural representation: specifically, how much each type of cone absorbs light, represented by $Q_L$ , $Q_M$ , and $Q_S$ . This accessible biological information is insufficient for determining either the spectral energy distribution of illumination ( $E(λ)$ ) or the intrinsic chromatic property ( $R(λ)$ ), as demonstrated by Maloney (1999).

Despite being challenging, estimating the illuminant remains a central element in theories of color constancy, which often incorporate models of illumination and reflectance (as discussed above). Different methods for estimating illumination may rely on analyzing how receptors respond to light across an entire scene ("gray world assumption") or focusing on the brightest patch visible ("Land & McCann 1971"), though these approaches often overlook valuable physical cues about illumination. Such cues may include reflections from smooth surfaces, shadows cast by objects, mutual reflections between items in a scene, and correlations between luminance and color within an environment (Golz & MacLeod 2002; MacLeod & Golz 2003; Maloney 2002; reviewed by Smithson 2005). Additionally, some cues require three-dimensional understanding of scenes (Maloney 1999). Natural environments often present multiple light sources (Yang & Shevell 2003), complicating illumination estimation. Given this complexity and variability in cue reliability across different scenarios, two primary questions emerge: how each individual cue should be selected or disregarded and how those selected cues should be combined to estimate overall illumination. This challenge is known as the cue combination problem (Maloney 2002), which ideally considers both the reliability and value of each cue for accurate illuminant estimation.

估计光源的一种方法利用了自然环境中光线的特性。在自然场景中光线的光谱特性并非随机，在感知系统存在歧义响应的情况下可以被用来推断最可能的光源与物体反射组合。例如，在引言部分所提及的两篇论文中描述了光线经由视网膜细胞传递到大脑的过程：当两种不同材料在可见光范围内具有相同的吸收特性和被测物体呈现相同颜色时，在这种情况下无论材料表面如何处理其颜色都会被感知成相同的颜色；这种现象可以通过引入先验知识来解释：当被测物体在不同光照条件下呈现出不同的颜色时人类视觉系统会根据这些信息调整对物体表面颜色的感受；因此我们观察到的颜色是由这些因素共同作用的结果；基于贝叶斯理论构建了一种估计光源及其相关色觉参数的方法；这种方法通过将先验知识与观测数据相结合能够有效地解决这一问题；该方法已经被广泛应用于各种实际问题中并且取得了显著成效（Brainard et al. 2006）。

Relational Color Constancy

Color constancy refers to the stable perceived color of objects under different spectral light sources. This definition, based on color appearance, reminiscent of the similar problem known as relational color constancy (Foster & Nascimento 1994), poses an analogous question: When light changes in a scene, is it perceived as a change in object color or solely as spectral illumination? Relational color constancy offers the functional capability to distinguish between an object’s physical change (e.g., ripening fruit) and ambient lighting variation (e.g., long-wavelength light at sunset). Despite potential changes in an object’s visual appearance due to illumination variations, its inferred chromatic properties remain unchanged when the variation in visual stimuli is entirely attributed to illumination changes.

An ideal visual system observing surfaces under a single illuminant exhibits a close connection between standard and relational color constancy, yet in reality, these two types of constancy are empirically distinguishable (Foster et al., 1997). The concept of relational color constancy is demonstrated through experiments where observers determine if the perceived difference in two stimuli results from varying illumination or changes in the object’s reflectance properties. Humans consistently achieve accurate and dependable discrimination in these scenarios. This ability can be modeled as the relative stimulation of receptors across space for each type of cone (L, M, or S), which shows minimal variation with alterations in spectral illumination (Foster & Nascimento, 1994). Notably, when a scene undergoes a subtle change in natural spectral illumination that slightly affects one cone type’s spatial excitation pattern, observers often misattribute this change to variations in the object’s reflectance. Conversely, when using artificial methods that maintain consistent receptor excitation patterns across space but alter illumination instead, observers frequently attribute such changes solely to illumination variations (Nascimento & Foster, 1997).

COLOR CONSTANCY

通常情况下，在光源（如灯或太阳）提供反射光线时，我们才能观察到大部分物体。当光源熄灭或太阳下落时，在正常光照条件下难以看到这些物体。这一现象揭示了一个关键问题：物体的颜色特性依赖于其选择性光谱反射与照射光源组成的光谱特性。然而，在常规观察中发现一个有趣的现象：即使只有物体自身的光谱反射（与光源无关）决定了其色度特征，在不同光源下人们感知到的颜色却相对稳定。这种稳定性被称为颜色恒常性，在此过程中无论外界光线如何变化都不会影响人们对物体颜色的一致感知能力。不过需要注意的是，在真实世界中由于从眼睛接收的光线光谱分布可能存在差异这一前提假设下这一现象并不完全成立，请回顾上文提到的鸡蛋与茄子作为实例使用的案例

如果光照变化略微影响到达人眼的光谱分布，则关于感知恒常性的讨论就显得意义不大；但实际情况恰恰相反。如果物体的颜色仅由其反射的光线所决定，则在光照条件发生变化时其颜色会明显改变；一块被称为Macbeth ColorChecker彩色方块瓷砖（图5上半部分）在阳光下展示了宽广的色彩范围。然而，在相同条件下用钨丝灯照射同一块马赛克时（图5下半部分），由于短波长光线穿透能力较弱这一现象所导致的人眼反射差异将产生显著的颜色偏移效果；由此可知，在日常生活中由于光源种类及参数的变化所带来的单个感受器刺激强度变化与不同色调物体单独照射所带来的刺激强度差异具有可比性

背景包含了支撑颜色恒常性实现所需（尽管不够充分）的生物学信息。
探讨了其在影响物体颜色外观方面的不同观点。

Color Constancy is Imperfect颜色恒常性是不完美的

颜色感知与恒常性的关联更为紧密而非直接来源于物体反射波长所期望的颜色感知然而恒常性并非完美无缺

Receptoral Responses are Always Ambiguous接收反应总是模棱两可的

被每种锥细胞类型 L、M 或 S 吸收的来自物体的光分别为 $Q_L$ 、 $Q_M$ 或 $Q_S$ 。

可见光谱范围内的波长 λ 从 400 到 700 纳米； $E(λ)$ 是一个函数，
代表每个波长处入射光所携带的能量分布；而 $R(λ)$ 则是另一个函数，
描述了物体表面对于不同颜色光线的反射比例；此外还有三个函数，
分别是 $q_L(λ)$ 、 $q_M(λ)$ 和 $q_S(λ)$ ，
分别对应 L、M 和 S 三种锥细胞的感受器特性；
只有在每个波长处的光谱反射率 $R(λ)$ 中才能提取出物体内部的颜色信息；
然而根据公式所示，
生物系统能够通过测量这些不同波长上的乘积值来推断物体的颜色特性。

在自然环境中进行，在10纳米间隔（即位于400,410,420,…,700纳米处）处进行采样求和的方式能够具有极高的精度来计算积分值 $Q_L$ 、 $Q_M$ 以及 $Q_S$ 。在此情况下中, $E(λ)$ , $R(λ)$ , $q_L(λ)$ , $q_M(λ)$ 以及$q_S(λ)分别包含31个数值的向量,其中每一种锥细胞类型所吸收的光分别为这些数值所代表的各种能量分布特征

这种数学表达式明确暗示，在引入更多的元素（相当于背景信息）时，并非所有新增元素都能提供足够的关于环境特性的信息。具体而言，在当前模型中假设每一个新增元素都具有独特的光谱反射特性 $R(\lambda)$ ，但这一假设并未得到充分验证。此外，在该模型中还引入了一个包含从400到700纳米范围内31个离散点的未知光谱特性向量，并且每一个新增元素还会增加三个可测量的颜色参数： $Q_L$ 、 $Q_M$ 和 $Q_S$ 。然而即便如此，在当前条件下仅凭感受器神经响应所获得的信息仍无法充分表征物体的颜色属性。基于此观察结果可知, 颜色恒定性理论需要依赖于对视觉系统或物理世界做出合理的前提假设才能解决上述隐含模糊的问题

Modeling Illumination and Reflectance建模照明和反射率

考虑N个物体，在具有能量分布E(λ)的单一光源照射下（其中E(λ)可表示为三个预定义能量分布函数a₁e₁(λ)+a₂e₂(λ)+a₃e₃(λ)）以10 nm间隔进行求和运算），每个物体都需要引入31个未知参数来描述其反射特性（即R_i(λ_i)，i=1到31），同时还需要额外引入31个未知参数来表征光源特性（即E_j(λ_j)，j=1到31）。因此总共需要确定62个未知参数才能完整表征所有物体的反射特性及光源特性。然而由于每个物体都会产生三种独立的量子吸收信号（Q_L, Q_M, Q_S），从而提供了每种物体质心处三种吸收峰的数据（共3×N个数据点）。然而这些数据点不足以唯一确定62个未知参数（因为方程组中方程数量远少于变量数目）。为了简化问题模型，在此我们假设光源是由三个预定义的能量分布函数线性组合而成（如e₁, e₂, e₃），则对应的光源能量分布将仅包含三个未知权重系数a₁, a₂, a₃；类似地每种物体的总反射特性也可以表示为三种预定义反射模式r₁, r₂, r₃ 的加权叠加（如b₁r₁+b₂r₂+b₃r₃），因此每种物体质心处仅需引入三个权重系数b₁,b₂,b₃即可表征其整体反射特性和颜色特征。基于上述模型假设，在已知参考块的标准颜色特性的前提下（如R_STANDARD），通过建立与测量数据匹配的关系矩阵即可解算出各权重系数b₁,b₂,b₃值进而实现颜色恒常性的校正（Buchsbaum 1980；Sallstrom 1973）。

所有光源及反射现象均可表示为仅包含三个光谱分布加权求和的假设性模型（Judd等,1964；Cohen,1964；Dannemiller,1992），这些研究结果在自然光照随天气变化以及日间周期性变化等自然场景中表现出了良好的近似效果。然而这些基本假说均存在一定的局限性由于颜色恒定性的特性而未能完全吻合实际情况如果进一步假设镜面反射率仅由两个光谱分布加权而成即R(λ)=b₁r₁(λ)+b₂r₂(λ)，那么在无需视网膜中测得镜面反射的标准下将可获得更好的色觉恒定性效果（Maloney&Wandell,1986）。此外关于光照与反射现象的研究还涉及多种替代假说Shevell在2003年对此进行了系统综述

Estimating Illumination

自赫尔姆霍兹（1866/1962）以来，物体的颜色感知稳定性被认为与我们能够消除光谱照明差异能力有关。这一观点可能被错误地理解为意味着通过直接观察光源（或推断等效的神经表示）来确定物体的内在于色彩特性[即其反射率 $R(λ)$ ]以便消除光源的影响。这一想法的核心可能是：如果存在光源相关的神经表示那么从物体产生的感受器刺激其取决于 $E(λ)R(λ)$ 的乘积可以揭示反射率 $R(λ)$ 。然而观察光源仅提供三种色觉感受器相关的神经表示：每种类型的视杆细胞对入射光吸收的数量 $Q_L$ 、 $Q_M$ 和 $Q_S$ 这些信息对于生物来说是有限度的不足以确定光源的能量分布曲线 $E(λ)$ 或物体的内在于色彩特性 $R(λ)$ （Maloney 1999）。

然而，在这种情况下

对照明估计的一种替代方法依赖于自然环境中的特定特征。

Relational Color Constancy相对色恒常性

物体在不同光谱光源下呈现稳定的感知颜色特性。基于此特征的颜色定义概念与相对色恒常性问题（Foster & Nascimento 1994）存在相似之处：当场景中的光源发生改变时，则需判断是否会导致被感知到的颜色变化？该研究则可帮助区分数体物相的变化（如水果成熟过程）与环境光照条件的影响。

对于在单一照明条件下观察表面的理想视觉系统而言，在标准色恒常性和相对色恒常性之间存在密切关系。然而，在实践中这些两种类型的恒定特性可通过实验性分离来区分（Foster等1997）。用于测试相对色恒常性的实验将观察者要求判断两个不同的视觉刺激之间的差异是由于光照变化还是物体反射变化所导致的。人类能够可靠准确地完成这一判别任务。这种能力可通过各感光细胞（L、M或S）的空间上相对于其他感光细胞的受激发度来建模——该受激程度在光照谱的变化中几乎没有变化（Foster与Nascimento 1994）。实际上一个场景仅经历自然光照谱的变化但在空间上各感光细胞类型所呈现的相对兴奋会发生轻微变化这通常会被误判为物体反射的变化；同时几乎相同的变化若能通过人工维持各感光细胞类型的空间相对兴奋则会被误认为只有光照变化所引起（Nascimento与Foster 1997）。

Color vision primarily involves the perception of hue and saturation. In essence, color vision refers to the ability to discern between two light sources irrespective of their brightness levels. However, chromatic neural representations also play a role in additional visual percepts like shape, texture, and object segmentation. For instance, color vision enables the identification of objects hidden in grayscale images (Figure 1a), where chromatic representations facilitate effective object segmentation. Context is crucial for understanding how color vision contributes to form perception and motion perception—functions that inherently involve relations within a stimulus across space and/or time. A notable indicator that hue perception differs from form perception is observed in cerebral achromatopsia patients who lose hue sensation but retain normal form perception for chromatic patterns.

涉及多个相互关联的问题关注着色神经响应对形状、运动以及其他感知的影响。第一个问题是是否存在证据表明纯色刺激驱动了形状或运动感知？第二个问题是如果存在这样的证据，则基于亮度的非选择性表示与基于颜色的选择性表示相比是更好还是更差？第三个问题是颜色表示与亮度非选择性表示如何相互作用以决定形状或运动感知？第四个问题是是否存在某些特定条件下两种信息结合以改变形或运动知觉的情况？尽管前两个问题在过去20年里得到了广泛的关注（参见Regan于2000年的综述以及Gegenfurtner与Kiper于2003年的研究），但如何探讨形和运动感知中色彩和非选择性表示之间的相互作用仍是一个最近才开始被关注的问题。为了探讨这些问题，请首先考虑形、明度和色彩在物理世界中的关系以及视觉系统如何将这些物理特征最初地编码起来。

Correlations Among Cone and Cone-Opponent Responses

Image provides visual information concerning form and motion through the spatiotemporal distribution of luminance and chromaticity patterns. The advantage derived from color vision relies on at least two independent neural representations, as any correspondence between brightness changes and color variations would negate the biologically valuable information encoded in color perception. In daylight vision, the retinal image formed by L, M, and S cones (Figure 6a) is transformed into three postreceptoral responses: a luminance response combining L and M cone signals, along with two chromatic selective pathways—one comparing L and M cone responses (the L-M response) and another comparing S cone responses with the summed L+M responses ([often erroneously labeled "red-green" and "yellow-blue", respectively]. These chromatic postreceptoral responses encode information but do not exclusively map to single hue pairs (Knoblauch & Shevell 2001, Mollon & Cavonius 1987, Wuerger et al. 2005)).

The three postreceptoral responses decorrelate the cone signals; that is, they remove information that is redundant in the signals by virtue of the close overlap in the cones’ spectral sensitivities (Buchsbaum & Gottschalk 1983, Fine et al. 2003, Johnson et al. 2005, Ruderman et al. 1998, Zaidi 1997; but see also Lee et al. 2002). The responses of each conetype to the image in Figure 6b are shown in Figure 6c, where cone-response magnitudes are represented by gray level. The Pearson correlation coefficient R between the image pixel values of each pair of cone responses is 0.96 for L and M, 0.78 for M and S, and 0.73 for L and S. These values are typical for natural scenes (Ruderman et al. 1998) and Munsell papers (McIlhagga & Mullen 1997). They show that the L-, M-, and S-cone responses are in most instances highly correlated. The postreceptoral responses formed by combining cone signals are shown in Figure 6e. The figure is drawn in black and white to reinforce the point that each response alone cannot signal hue. Hue is only made explicit at a later stage, where the postreceptoral responses (or subsequent recodings of them) are compared to each other (DeValois & DeValois 1993, Wuerger et al. 2005). The correlations among the postreceptoral responses in Figure 6e are much smaller than the correlations among the cone signals: 0.17 for L+M with L-M, 0.14 for L+M with S-(L+M), and –0.16 for L-M with S-(L+M). Decorrelation is a fundamental property of sensory systems (Barlow & Foldiak 1989, Field 1994, Simoncelli & Olshausen 2001) and means that the L-M, S-(L+M), and L+M signals are largely independent. There are, however, some consistent correlations among the postreceptoral responses in natural scenes, and these can be important for perception. Correlations between “redness” and “luminance” (L-M with L+M) are exploited for color constancy (Golz & Macleod 2002). The negative correlation between L-M and S-(L+M) in Figure 6e is found across scenes ( Johnson et al. 2005, Webster & Mollon 1997) and is especially pronounced in scenes with arid landscapes and blue skies (Webster & Mollon 1997), revealing a tendency for colors to fall along a continuum between the hues blue and yellow.

The distinctiveness of L-M and S-(L+M) responses underscores that individuals with color defects lack specific form information typically supplied by chromatic coding. Approximately 8% of men are born without normal L- or M-cone photopigment (Smith & Pokorny 2003). A quarter of these individuals, classified as dichromats, are lacking in either functional L or M cones and derive their color perception solely through the S-cone pathway. The loss of functional S cones resulting in tritanopia is exceedingly rare. The color perception deficits experienced by dichromats can be represented by converting a natural image into a two-cone system instead of its usual three-cone representation. Notably, Figure 7b,c indicates that color differences for protanopes and deuteranopes are reduced compared to normals: red flowers lose their prominence against green foliage. Conversely, tritanopes might expectantly lack violet花粉 perception. The figure highlights that what is lacking in the visual world of dichromats extends beyond inability to perceive certain hues; they also fail to detect specific object types.

Correlations Among Higher-Order Representations

The strongest correlations among the three postreceptoral neural representations are found not among pixel intensities but rather among pixel-intensity relations. The relations among points in an image define higher-order structures such as edges, contours, shapes, and textures. In the three postreceptoral representations, the higher-order structures have similar mathematical properties (Parraga et al. 2002, Wachtler et al. 2001) and for a given scene are positively correlated. For example, in Figure 6e, the shape of the violet flower is visible in both the S-(L+M) and L+M postreceptoral representations, and the upper edge of the terracotta pot can be seen in all three responses. The edge maps (Figure 6f ) reinforce the point that the three postreceptoral responses are closely related when image structure is considered (Fine et al. 2003, Johnson et al. 2005).

The higher-order co-variability among postreceptive responses to everyday scenes is due in part to a change in chromaticity that is often accompanied by a change in luminance at object boundaries; this typically results in simultaneous changes across multiple postreceptive responses. Given the critical role of object boundaries in visual object recognition, neurons in the visual cortex might reasonably be expected to encode both chromatic and luminance contrast information simultaneously, findings that confirm this hypothesis have been reported (Horwitz et al. 2005, Johnson et al. 2001; reviewed by Solomon & Lennie 2007).

是否意味着通过考察变化关系而非点强度可以消除颜色的作用？The response is negative for objects under illumination when observed naturally.The presence of object boundaries usually co-occurs with both luminance and chromatic changes.However, shadows and shading often result in luminance variations independent of chromatic changes(Kingdom et al.2004;Rubin & Richards1982).The relationships discussed are illustrated in Figure 8.Chromatic variation tends to be a more dependable indicator of material boundaries compared to variation in brightness(e.g.,Switkes et al.1988),especially where significant shadows or shading occur.

Experimental results demonstrate that the visual system employs chromatic-luminance relations to significantly impact perception. The luminance grating depicted in Figure 9a appears nearly uniform, yet its integration with an orthogonally oriented chromatic grating shown in Figure 9b generates a "plaid" effect (Figure 9c), which exhibits pronounced depth modulation. Notably, variations in luminance do not correlate with shifts in chromaticity; this observation supports the interpretation that the plaid pattern (Figure 9c) represents a material surface characterized by differential shading (similarly observed when only luminance changes are present, as seen in Figure 8). The shading effect is indicative of an object's texture and orientation under oblique illumination. Conversely, when both luminance and chromaticity variations occur simultaneously, such as in Figure 9d where an additional aligned chromatic grating is added, the perception of depth modulation is diminished. Instead, observers perceive these luminance changes as intrinsic to the material rather than reflective of illumination characteristics. These findings underscore how color vision facilitates the segregation of retinal images into perceived material properties and illumination sources. This highlights how color assumptions can sometimes lead to significant perceptual errors. For instance, while the building depicted in Figure 10 appears predominantly orange on two-thirds of its surface area due to visual effects influenced by illumination patterns, this orange component is actually part of an external sunset illuminant illuminating a uniformly colored surface.

The relationship between color and perceived shape is bidirectional, with each influencing the other. When a card is painted magenta on its left half and white on its right half, folding it vertically along the central boundary creates a concave shape. When viewed from one side, this fold appears farther away than its edges, causing light to reflect from the magenta half onto the white side, resulting in a noticeable pinkish glow. The same fold can be made to appear closer by using an optical device that reverses binocular disparities, even though neither the card nor its physical reflection of light remains unchanged. This substantial shift in color perception demonstrates how our visual system compensates for interreflection by utilizing information about shape. When consistent geometry supports interreflection, reflected light from one surface is attributed to another; however, when geometry does not support this explanation, as in viewing folded structures like peaked roofs from above, our brains attribute reflected light to different surfaces instead.

The spatial relationships between color and luminance play a significant role in our perception of transparency, especially when observing translucent materials. Spectrally unselective transparencies, such as dark glasses or simulated transparency in Figure 11a's central region, do not alter chromatic content but reduce light levels. In scenes with chromatic variation, abrupt changes in luminance without corresponding chromatic shifts can be interpreted as shadows or spectrally unselective transparencies (Kingdom et al. 2004). However, this does not imply that continuity of color across the transparency boundary is essential for perceiving transparency. Instead, consistent changes in chromaticity along the border (e.g., consistently shifting towards blue) are crucial for creating a sense of colored transparency (D’Zmura et al. 1997). The critical factor for transparency perception appears to be consistency in chromatic changes rather than their absence. This principle also applies to related phenomena like neoncolor spreading (Figure 12; Anderson 1997).

Equiluminance

Despite highlighting the significance of interactions between color and brightness representations in perceived shape, as previously elaborated, a majority of studies employ equiluminant stimuli to evaluate the role of color in shape or motion perception. Equiluminant stimuli (i.e., stimuli with constant brightness) are crafted specifically to induce exclusively chromatic neural representations. This approach is often utilized to address: Are neural responses driven solely by chromaticity inherently superior or inferior compared to those evoked by luminance? However, a challenge arises when comparing color and brightness for a given shape/motion task: performance typically enhances with stimulus contrast, yet there remains no universal metric to equate color contrast with brightness contrast. To circumvent this issue, researchers may opt for behavioral measures independent of physical contrast metrics. One such measure defines contrast relative to the threshold required for detecting a stimulus. By defining contrast in this manner, tasks involving purely chromatic stimuli (equiluminance) generally demand higher levels of contrast—measured in multiples—to achieve equivalent performance thresholds compared to their luminance counterparts—tasks involving stimuli with brightness contrasts (Morgan & Aiba 1985; Mullen & Boulton 1992; Simmons & Kingdom 1994; Webster et al. 1990). The precise underlying cause remains unclear but may stem from either elevated internal noise levels or reduced neural efficiency in processing chromatic form/motion mechanisms relative to those involved in color detection (Kingdom & Simmons 2000; Solomon & Lennie 2007).

Note that equiluminant stimuli are subject to a fundamental limitation in their ability to illustrate the interplay between chromaticity and luminance in shaping perceptual experiences related to form and motion. Within typical natural scenes, for instance, the contribution of chromatic contrast may be minimal for certain types of form or motion when sufficient luminance variation exists. Conversely, near-asymptotic performance can be achieved when sufficient luminance variation is coupled with appropriate chromatic contrast. The converse is true: certain benefits of color vision can only emerge when both chromatic and luminous variations are simultaneously present.

Color and Spatial Resolution

When the L-M and S-(L+M) component images from Figure 6 are rendered using colors that isolate their specific post-receptive details, as depicted in Figures 13b and c, a notable absence of fine detail is observed compared to both the original (Figure 6b) and the L-M image (Figure 13a). This phenomenon can be attributed to two interconnected factors. First, natural scenes tend to exhibit a more limited chromatic diversity than their gray-level counterparts. This difference arises mainly due to shading and shadows, which tend to enhance luminance contrast while diminishing chromatic variation. Second, color vision functions as a "low spatial-resolution" system (Granger & Heurtley, 1973; Mullen, 1985), as evidenced by the desaturation seen in Figure 13d when distant views reveal yellow and blue regions. As a result of this poor spatial resolution, pure-colored edges appear blurred in neuroscopic representations. However, humans remain unaware of this characteristic despite their ability to discern blur under equiluminance conditions (Webster et al., 2006; Wuerger et al., 2001). This oversight may stem from color spreading into regions defined by sharp luminance contrasts at boundaries (Boynton, 1978; Mollon, 1995; Pinna et al., 2001), a principle exploited by watercolor artists.

Color and Position, Orientation, Contour, and Texture

Contours significantly delineate object shapes, underscoring their critical role in object recognition (Marr 1982).

Dense arrays of local orientations generate textures, which play a crucial role in vision by enabling segmentation of a scene into distinct surfaces and the determination of their three-dimensional shapes. Equiluminant texture variations are detectable through visual mechanisms specifically tuned for color perception (Cardinal & Kiper 2003), yet unlike contour linking, they are not independent of isochromatic texture variations (Pearson & Kingdom 2002). Equiluminant textures are known to evoke an impression of three-dimensional shape (Troscianko et al. 1991, Zaidi & Li 2006), suggesting that chromatic signals contribute to neural processes involved in shape perception through texture.

The luminance-defined texture boundaries are covered by randomly varying chromatic components that appear as red and green. This allows individuals with dichromatic red-green vision to experience this phenomenon, which offers them a rare advantage. These observers can disrupt the camouflage due to their inability to perceive chromatic variation (Morgan et al. 1992).

Color and Stereopsis

Small differences in the eyes’ perspectives on a scene form the foundation of stereoscopic depth perception (Howard and Rogers 2002; Julesz 1971). A striking finding is that a random-dot stereogram incorporating a depth-target—visible only under stereoscopic viewing—disappears when the display achieves equiluminance (de Weert 1979; Gregory et al. 1977; Livingstone et al. 2004; Livingstone and Hubel et al. 256). This has led some researchers to suggest that color vision may be stereoblind (Livingstone et al. et al. et al., Livingstone and Hubel et al. et al.). However, stereo mechanisms sensitive to chromaticity have been identified with simpler stimuli like bars or grating patches (de Weert and Sazda et al.; Kingdom and Simmons et al.; Scharff and Geisler et al.). The limited depth quality in equiluminant random-dot stereograms likely stems from an inability to integrate numerous local chromaticity-based depth signals to create a precise surface definition (Kingdom et al. and others)

A key prerequisite for achieving successful stereopsis is ensuring correct pairing of corresponding scene elements across each viewpoint. Spurious matches are commonly observed in scenes featuring dense arrangements of similar elements distributed across multiple depth planes. However, even when these elements exhibit comparable luminance contrasts, orientations, and sizes, chromatic discrepancies can diminish false match counts provided that the visual system aligns corresponding color responses. Studies have shown that introducing chromatic disparities within complex stereo displays effectively reduces false match rates, thereby enhancing stereopsis (den Ouden et al. 2005; Jordan et al. 1990; Julesz 1971).

Color and Motion

Equiluminant objects seem to exhibit slower motion compared to their luminance counterparts (Cavanagh et al. 1984, Lu et al. 1999, Mullen & Boulton 1992, Troscianko & Fahle 1988), and the target shapes in random-dot kinematograms—only visible when the dots are moving (analogous to the random-dot stereograms mentioned above)—pose challenges at equiluminance (Livingstone & Hubel 1987, Ramachandran & Gregory 1978). Furthermore, motion of simple stimuli such as equiluminant chromatic L-M grating patches can be masked by the addition of randomly moving luminance "noise" (Yoshizawa et al. 2000). This alone suggests that chromatic signals may contribute to a shared mechanism for processing chromatic-luminance motion. Yet, the discovery that moving chromatic L-M noise doesn't obscure luminance motion (unless at very high contrasts) weakens this hypothesis (Yoshizawa et al. 2000).

另一种观点认为，简单L-M色觉刺激感知到的运动是由视系统中亮度产生的错误信号所引起的，并可能与L-和M-锥信号的时间差异有关（Mullen等, 2003）。然而这一观点与其它证据相悖，在那些可能与亮度信号相互作用的情况下（如Cropper&Derrington,1996; Dobkins&Ablight,1993; Gegenfurtner&Hawken,1996; Morgan&Ingle,1994；经Cropper&Wuerger,2005审阅），因此，在这种情况下染色对运动的贡献最多只能非常微弱。

大多数共识更倾向于通过色度影响运动在复杂物体中（如Cropper与Derrington于1995年及Yoshizawa等人于2000年的研究）。然而，某些人则认为无论简单还是复杂物体，在等照度下感知运动都主要依赖一种通用的关注驱动的高阶机制来处理任何图景-背景关系的变化（如Lu等人于1999年的研究）。这种机制被我们能够关注追踪等照度下移动物体的能力所证实（如Cavanagh于1992年的研究）。

Debates about whether color contributes to motion perception also emerge from research on visual processing (Figure 15). In such displays, most elements—the “distractors”—move randomly, while a subset—the “targets”—move coherently. However, disagreements persist regarding whether coherent motion is perceptible under equiluminance conditions (e.g., Bilodeau & Faubert 1999; Ruppertsberg et al. 2006). Studies using nonequiluminant displays, where all elements exhibit luminance contrast, found that introducing color differences between targets and distractors reduces the number of target dots needed to determine motion direction (Croner & Albright 1997; Figure 15). This suggests a potential role for chromatic contributions in global motion perception. However, when subjects are prevented from selectively attending to targets (Snowden & Edmunds 1999) or when displays are designed to render selective attention ineffective (Li & Kingdom 2001a), this advantage disappears. Consequently, while global-motion experiments using nonequiluminant displays indicate that color can help identify moving objects hidden by camouflage, they do not demonstrate that chromatic differences contribute to motion perception. The lack of evidence for chromatic contributions also arises from studies using local motion signals to generate impressions of global three-dimensional structure (Li & Kingdom 2001b).

There is a caveat, however, to these conclusions. It was pointed out above that in mixed color-luminance stimuli there are likely to be cases in which color contributes little or nothing to the task because the luminance contrast alone drives asymptotic performance. Thus, it is possible that besides the role of color in attentional cuing, there is a weak chromatic contribution to global motion,but in nonequiluminant displays,it does not manifest itself because it is swamped by the luminance signal. Some support for this idea comes from physiological studies of the chromatic sensitivity of neurons in monkey middle temporal(MT)area,the dorsal pathway component specialized for global motion processing(reviewed by Gegenfurtner & Kiper 2003). Most studies reveal that MT neurons are sensitive to both L-M(Gegenfurtner et al.1994;Thiele et al.1999,2001) and S-(L+M)chromatic contrast(Barberini et al.2005),though sensitivity to L-M chromatic contrast is generally low(e.g.,Gegenfurtner et al.1994).Importantly,however.when sufficiently high luminance contrast is present,the chromatic contribution from L-M contrast becomes negligible(Thiele et al.1999).This might be a physiological correlate of contradictory psychophysical findings,such as perceptible global motion at equiluminance(Ruppertsberg et al.2006)and absence of chromatic contributions with nonequiluminant stimuli(Li & Kingdom 2001a,b;Snowden & Edmunds 1999).

COLOUR REPRESENTATIONS IN SUPPORT OF VISUAL NEEDS BEYOND COLOR色彩表示在视觉需求中的应用

色觉的主要关注点在于色调与饱和度的信息处理能力。基于此定义

探讨色彩神经响应如何影响形状、运动以及其他感知方面的贡献存在许多相关问题。首先，是否存在证据表明单一色彩刺激能够驱动形状或运动的感知？其次，在确认存在此类证据的情况下，请比较基于颜色信息的空间不选择编码机制与仅依赖亮度信息的光谱不选择编码机制之间在感知效果上的差异。再次，请考察颜色空间不选择编码与其他空间不选择编码如何相互作用以确定被感知到的具体形状或运动特征。最后，在特定条件下，请分析颜色信息与光谱不选择信息的不同组合是否会影响对特定形状或运动的感受。尽管在过去20年里关于前两个问题的研究已经取得重要进展（如Regan 2000及Gegenfurtner＆Kiper 2003所作综述），但目前仍不清楚在何种情况下颜色信号与其他信号之间的相互作用会导致对形态或动作的感觉变化。为了更好地理解这些问题，请首先考察物理世界中色度、亮度和形态之间的内在关系以及视觉系统最初是如何处理这些物理特征的

视锥及其对抗视锥响应之间的关联存在

视觉系统中所感知的形式与运动的信息主要由图像的空间频率（亮度）和色调（色度）共同构成。由于如果亮度的空间频率变化与色调的变化存在一一对应关系，则会削弱这种色彩感知的效果。因此，在这种情况下将导致颜色信息无法有效提取。视网膜上的L、M和S视锥接收来自不同区域的信息，并将其转换成三种受体后反应：一种亮度反应（Derrington等人1984），其将来自L和M视锥的信号相加；以及两种色度选择性途径：一种是比较L和M锥体响应之差（L-M响应），另一种是将S锥体响应与来自L和M锥总和响应进行比较（[S-(L+M)响应]）。这些受体后颜色响应通常被错误地归类为'红-绿'或'黄-蓝'。（Knoblauch&Shevell 2001, Mollon&Cavonius 1987, Wuerger等人2005）

彩色神经反应的不同阶段包括：（a）L、M和S锥对光谱的感受能力；（b）原始影像；（c）不同种类圆锥体对视觉信息的反应情况；（d）多种圆锥体信号协同作用整合为感受器后的响应机制；（e）视觉系统处理后的图像反馈机制；（f）经过视觉处理后的边缘特征图。这些研究均来源于麦吉尔校准彩色图像数据库中的原始数据：http://tabby. 《视觉》，麦吉尔出版社.

三个后感受器响应通过去相关锥体信号，即通过消除由于锥体的光谱灵敏度之间的紧密重叠而导致的信号中的冗余信息（Buchsbaum＆Gottschalk 1983，Fine et al. 2003，Johnson et al. 2005，Ruderman et al. 1998，Zaidi 1997; 但也参见Lee et al. 2002）。图6b中每个锥体对图像的响应显示在图6c中，其中通过灰度级表示锥体响应幅度。图像像素值之间的Pearson相关系数R为L和M为0.96，M和S为0.78，L和S为0.73。这些值对于自然场景（Ruderman等人，1998）和Munsell纸张（McIlhagga＆Mullen 1997）是典型的。它们显示L-，M-和S-锥体响应在大多数情况下高度相关。通过组合锥体信号形成的后感受器响应显示在图6e中。该图是用黑白绘制的，以强调每个响应单独不能传递色调的观点。色调只有在以后的阶段才显式，后感受器响应（或随后的重新编码）在这个阶段被彼此比较（DeValois＆DeValois 1993，Wuerger等人2005年）。图6e中后感受器响应之间的相关性远远小于锥体信号之间的相关性：L + M与L-M为0.17，L + M与S-(L + M)为0.14，L-M与S-(L + M)为-0.16。去相关是感觉系统的基本属性（Barlow＆Foldiak 1989，Field 1994，Simoncelli＆Olshausen 2001），意味着L-M，S-(L+M)和L+M信号在很大程度上是独立的。然而，在自然场景中，后感受器响应之间存在一些一致的相关性，这些相关性对于感知非常重要。在图像6e中，“红色”和“亮度”（L-M与L+M）之间的相关性被用于色彩恒常性（Golz＆Macleod 2002）。图6e中L-M与S-(L+M)之间的负相关性在场景之间发现（Johnson等人2005，Webster＆Mollon 1997）并且在干旱的景观和蓝天出现的场景中尤为明显（Webster＆Mollon 1997），显示颜色在蓝色和黄色之间的色调上呈现连续分布的倾向。

L-M与S-(L+M)响应之间的相对独立性揭示了一个事实：即颜色编码系统中存在个体缺少某些特定形状信息的情况是存在的。根据Smith与Pokorny（2003）的研究显示，在男性中有约8%的人出生时就缺失了正常的L-或M-型锥状细胞光色素。其中四分之一被定义为二色性个体，这类人完全缺乏功能性的L或M型锥状细胞，在视觉系统中只能通过S型锥状细胞传递的颜色信息来感知世界。当一个人失去了S型锥状细胞的功能时，则只能依靠双色（色彩缺失）的方式来感知世界了；而这种情况下他们对色彩的感觉会显得异常单一甚至局限。二色性个体所缺失的颜色信息可以通过将自然图像转换为双色而非正常三色的方式来建模。然而由于这种缺失使得二色性个体无法像三色体那样体验到完整的色调范围（反之亦然），因此任何试图模拟二色体颜色知觉的模型都需要做一些假设（Vienot´等,1995）。然而图7b、c显示与正常人相比，在红绿色盲者中红色花朵的颜色差异有所减小：相对于正常人而言它们不再显示出在绿色叶子上引人注目的特点；另一方面对于三色失明者来说他们可能更希望错过紫色花朵的存在。这些结果显示了二色性个体所缺失的不仅是对某些色调的感受力而且也影响了他们识别特定物体的能力。

关联性：高级层次的表示形式之间的关系

三个后感受器神经表示之间的最强关联程度并非基于像素强度的直接比较,而是基于它们之间的空间位置关系。图像中的空间位置关系表征了边缘、轮廓、形状以及纹理等更高层次的空间特征。在三个后感受器神经表示所涵盖的区域中,高阶空间特征具有高度一致的数学特性（Parraga等人2002, Wachtler等人2001）。例如,如图6e所示,紫色花朵的形态特征可以在S-(L+M)以及L+M两个后感受器的空间映射中被同时捕获,而红陶罐顶部的边缘特征则可以在这三个响应空间中均得到体现。此外,边缘图（图6f）凸显了当考虑图像的空间组织结构时,这三个后感受器的空间响应表现出高度的一致性和协同作用（Fine等人2003, Johnson等人2005）。

在日常场景中观察到后感受器响应间的高阶相关性可能源于伴随大部分物体边界位置出现亮度变化的关系；当一个后感受器发生反应时”，另一个相关的反应也会随之发生。值得注意的是，在视觉皮层中存在许多神经元能够同时调节对色度和亮度对比敏感性，并有多项研究如Horwitz等人（2005）、Johnson等人（2001）以及Solomon与Lennie（2007）对此进行了详细讨论。

那么说考虑变量之间的相互作用而非单一因素时会怎样呢？对于被照明的物体比如在自然光线下的观察中答案是不会出现多余的色彩效果。（Kingdom等人2004；Rubin＆Richards 1982）这些关系如图8所示它们揭示了当阴影与明暗共存时亮度的变化往往伴随着色彩的变化这使得我们倾向于关注色彩信息以准确识别表面特性。（Switkes等人1988）因此在这种情况下色彩信息相较于亮度信息更能可靠地反映表面特征的存在与否

草药路缘处的颜色、亮度以及纹理都发生了显著的变化；在阴影投向处亮度变化明显；颜色变化较小；而纹理基本未发生变化。

复制代码

      实验证据表明，视觉系统以直接影响感知的方式使用这些色度-亮度关系。图9a中的亮度光栅看起来几乎是平的，然而当与图9b中正交方向的色度光栅结合时，产生了一个“方格”（图9c），在深度上呈现出明显的波浪状（这是一种从阴影中产生形状的示例）。亮度的变化与色度的变化无关；这促使了对图9c中的方格的感知解释，即它是一个具有明暗差异的材料表面（就像图8中的亮度但不是色度变化被解释为阴影一样）。这种明暗效果是斜照的波纹材料的特征，这也是被感知到的。另一方面，如果亮度变化伴随着相应的色度变化，那么阴影和深度的感知就应该丧失。这确实发生了。像图9d中与亮度光栅对齐的第二个色度光栅的添加，强烈减少了波浪状深度的印象。现在，亮度的变化被认为属于材料，而不是照明的阴影。这些来自色度-亮度模式的感知（Kingdom 2003）显示，色觉有助于将视网膜图像分割成感知的材料和照明组件，这对于对象感知至关重要。色度作为材料的假设有时可能导致引人注目的感知错误。图10中的建筑物看起来好像涂成了三分之二的橙色，然而实际上橙色部分是夕阳照射在一个光谱不可选择的反射表面上。
    
    
      
    
    AI助手

颜色属性不仅能够促进而且能够抑制形状着色过程。（a) 右侧偏移亮度投影装置。（b) 左侧偏移彩色投影模块。（c) 当结合（a）、（b)时会产生深度波纹表面的感觉。（d) 附加一个与(a)中亮度投影装置对齐的第二彩色投影模块会减弱深度效果。

颜色对视觉形态的影响是相互关联的：一方面颜色塑造着视觉形态；另一方面视觉形态又反过来塑造着颜色知觉效果。设想一张卡片，在其左半部分涂以品红色，在右半部分则以白色处理。当沿着中央分界线垂直折叠卡片时（从左侧观察会发现折痕比边缘位置更为远离），凹陷的形状会使得一部分来自品红色一侧的光线反射到另一侧；这种反射现象导致白面呈现粉红色晕轮效果（如Bloj等人1999所述）。值得注意的是：由于视觉形态变化所引发的颜色变换表现得格外明显，则表明视觉系统会综合形态信息来补偿表面反射光理的变化（如前所述）。具体而言，在几何形态与表面反射特性相协调的情况下（例如从直升机视角观察尖顶屋顶），视觉系统往往会降低白面中品红色光分量的影响；而在几何形态与表面特性不协调的情形下（例如从左侧垂直观察），则会倾向于将白面上品红色光归因于对面区域的颜色刺激。

颜色与明暗的空间关系对通过材料观察到的透明白色体感受具有重要意义。像墨镜或图11a中心所示的非选择性透明白色体降低了光线强度而未改变色调内容，在本质上类似于阴影通常表现为亮度变化而非色调变化的方式（Kingdom等, 2004年）。尽管在这种情况下引入清晰边界和X交叉点（Kanizsa 1979）会强化阴影印象以及类似非选择性透明白色体的影响（Kingdom等, 2004年, Ripamonti＆Westland 2003）。然而这并不意味着跨透明白色体边界上的色调连续性本身是影调印象的前提条件。只要在这些边界上所观察到的一致色调方向在整个边界长度上保持一致（即始终指向特定色调——例如图11d中始终指向蓝色），影调印象就会显得格外强烈但现在有颜色（彩色）透明白色体例如被着色过的醋酸盐（D’Zmura等人, 1997年; Fulvio等人, 2006年; Khang＆Zaidi 2002; Ripamonti＆Westland 2003）。似乎影调感知的关键条件在于跨透明白色体边界上的一致色调要求而非色调变化缺乏的现象同样重要因素这一现象被称为霓虹色调扩散（图12；Anderson 1997, van Tuijl 1975）。

颜色与透射特性。（a）基于彩色复合背景模拟的光谱非选择性透射特性。（b）旋转中心区域会导致对透射特性的感知受到影响。（c）通过引入环绕边界上的随机色彩变化来降低透射特性的感知程度。（d）围绕边界的均匀色调变化有助于增强整体呈现出彩效果的可能性。

Equiluminance等亮度

虽然上文探讨了色度与亮度在知觉形状形成中的相互作用及其重要性相关的问题。然而，在现有研究中仍存在主要依赖等亮度刺激的情况以评估色度对形状或运动知觉的具体影响。这些实验设计旨在仅激活与颜色相关的神经反应，并探索其独立作用。这些问题通常通过比较两种条件下对特定任务的表现来评估：即仅由色度驱动的神经反应与亮度驱动的反应相比是"更好还是更差"？然而，在比较给定形状/运动任务时会遇到一个挑战：即性能通常随着刺激对比度的变化而变化幅度不同，并且缺乏通用的标准来使两种条件下的对比效果相匹配。为此，一种解决方法是采用不依赖于物理对比度测量的方式来量化感知性能——其中一种解决方案是通过行为指标而非物理对比度来量化感知性能

请注意等亮度刺激具备一个固有的局限性即它们无法显示颜色与明暗如何共同影响形状或运动感知举例而言在给定的自然场景中某些形状或运动的颜色贡献可能非常有限这是因为这些区域之间有足够的对比度足以达到渐近水平的表现另一方面当颜色与明暗同时发生变化时则能够更好地展现色彩感知的优势

Color and Spatial Resolution颜色和空间分辨率

当图6中的L-M与S-(L+M)分量图像根据各自后感受器表示的颜色显示时（如图13b、c所示），相较于原始图像（图6b）以及L+M图像（图13a），它们显示出明显的细节缺失。这种现象可能源于两个相关因素：首先，在自然场景中，色度变化通常比亮度变化更为频繁密集（即呈现斑块状特征）。这主要由于遮挡与阴影的存在，在亮度表示中容易产生叠加效果但通常不会带来色度差异（如前所述）。其次，人眼的色彩感知系统在空间分辨率方面表现欠佳（Granger＆Heurtley 1973, Mullen 1985），例如从几米外观察黄色与蓝色区域时会发现去饱和现象明显。这种低空间分辨率特性导致纯色边缘在神经表征中变得模糊不清。尽管我们在等亮度条件下对于模糊性的判断较为精准（Webster等人2006, Wuerger等人2001），但在日常视觉体验中却并未对此有明显察觉（例如对比图6b及Wandell 1995第7页）。这种现象可能源于颜色感知倾向于向由显著亮度对比边界所定义的区域扩散（Boynton 1978, Mollon 1995, Pinna等人2001），这一现象正是水彩画家常利用的表现手法

Color and Spatial Resolution颜色与位置、方向、轮廓和纹理

轮廓主要体现了物体形状的关键特征（Marr 1982）。通过视觉系统早期阶段感知到轮廓局部区域的方向与位置信息。随后这些区域连接形成完整轮廓的过程中其形状特征被视觉系统更高水平地编码。研究显示采用等亮度测试图案能够揭示颜色对于不同空间属性的影响（Krauskopf＆Forte 2002 Morgan＆Aiba 1985）。具体而言较早被感知的颜色维度包括方向判断（Beaudot＆Mullen 2005 Clifford等人2003 Webster等人1990）以及模糊判断（Webster等人2006 Wuerger等人2001）。此外还研究了轮廓链接这一特性（McIlhagga＆Mullen 1996 Mullen等人2000），如图14a所示的任务是识别路径排列的关键元素：由等亮度元素构成的路径呈现对比度鲜明的颜色特征容易被检测到（未显示；McIlhagga＆Mullen 1996）。然而当交替出现等亮度与等色模式时或者采用L-M与S-(L+M)组合时性能显著下降这表明轮廓链接特性在颜色与亮度维度上具有特定的选择性

由密集的局部方向阵列生成的纹理，在视觉系统中具有重要意义；这种空间变化能够将场景分割为表面并同时确定它们的三维形状。等亮度纹理的变化是可以被视觉系统所检测到的一种现象（Cardinal＆Kiper 2003；McIlhagga等人1990；Pearson＆Kingdom 2002）。这种现象是通过广泛调谐于颜色特征的颜色视觉机制得以实现（Cardinal＆Kiper 2003）。值得注意的是虽然与等色性变化不同但它们仍具有相互依存的关系（Pearson＆Kingdom 2002）。此外等亮度的变化能够激发三维形状感知（Troscianko等人1991；Zaidi＆Li 2006）。这表明色度信号在调节基于纹理信息进行三维形状感知中的神经过程发挥着重要作用

由亮度定义的纹理边界被随机分布的红绿变化所隐藏。这种现象赋予具有双色性红绿色失明症个体一种罕见的能力：这种观察者能够突破伪装，并且由于他们的视觉缺陷无法感知亮度差异（Morgan等人1992）。

颜色与立体视觉

两眼观察场景中微小差异构成了立体深度感知的基础性认知（Howard＆Rogers 2002, Julesz 1971）。值得注意的是，在随机点立体图中深度效果仅在进行立体观察时才可察觉，在呈现等亮度时则会消逝（de Weert 1979, Gregory 1977, Livingstone 1996, Livingstone＆Hubel 1987）。由此推断有研究者认为色觉无法支持有效的立体视觉过程（Livingstone 1996, Livingstone＆Hubel 1987）。然而，在等亮度条件下通过简单的刺激形式如条纹或光栅斑块仍可实现深度感知（de Weert＆Sazda 1983, Kingdom＆Simmons 2000, Scharff＆Geisler 1992），这一发现表明可能存在专门针对色度敏感的立体机制（Simmons＆Kingdom 1997, Ts’o等人2001）。在等亮度随机点立体图中深度感知质量较差的原因可能是由于整合局部色度深度信号时存在特定缺陷（Kingdom等人1999）。

许多研究资料显示

颜色与运动

在亮度一致的情况下（Cavanagh等人1984；Lu等人1999；Mullen＆Boulton 1992；Troscianko＆Fahle 1988），观察者感知到的目标形状在随机点运动图中难以识别（Livingstone＆Hubel 1987；Ramachandran＆Gregory 1978）。此外，在某些情况下（如具有相同明暗度且呈L-M格纹图案的空间结构），通过引入随机运动方向性闪烁成分可有效遮蔽特定视觉现象（Yoshizawa等人2000）。值得注意的是，在分析实验结果时发现：当仅考虑最后一种现象时，则无法解释为色度信号进入通用的颜色-明暗运动机制这一可能性。”

另一个观点认为，在视觉系统中实现简单的L-M色度刺激所引起的知觉运动可能是通过视觉系统内部生成亮度伪影来实现的（Mullen等人2003）。然而这一建议与其他支持仅依赖于简单刺激引发视觉运动的研究存在矛盾；其他研究则提供了一些支持，并指出这些证据可能与亮度信号的作用有关（Cropper＆Derrington 1996；Dobkins＆Albright 1993；Gegenfurtner＆Hawken 1996；Morgan＆Ingle 1994）。值得注意的是，在仅依赖于简单刺激来引发视觉运动的研究中，在这种情况下，色度贡献的最大值却非常微小

对于基于色度对比而非单纯的颜色定义的对象而言，在讨论色度对运动贡献方面则存在更多的共识（Cropper＆Derrington 1995, Yoshizawa等人2000）。然而, 也有观点认为, 对于简单与复杂对象而言, 在等亮度条件下所引发的认知运动是由一种仅感知到任何由颜色、亮度或纹理所定义的图形-背景关系而建立的一种通用且依赖于注意机制所实现的一种高级过程。（Lu等人1999）这种机制在能够在运动知觉自身严重受损时仍能有效地追踪一个等亮度移动物体时发挥着关键作用（Cavanagh 1992）。

探讨是否存在色度对运动的贡献这一问题与通用的、基于注意的图形-背景机制相对立。在这些显示中大多数元素——‘干扰者’——以随机方向移动但一个子集——‘目标’——以一种特定的方向移动。关于在等亮度条件下这些显示是否能感知一致运动的研究存在争议（如Bilodeau＆Faubert 1999反对而Ruppertsberg等人2006支持）。使用非等亮度全局运动显示的研究表明其中所有元素均具有亮度对比度引入目标和干扰元素之间的色差减少了识别运动方向所需的目标点数量（Croner＆Albright 1997图15）。这可能意味着色度对全局运动有贡献然而如果受试者被阻止有选择性地注意目标点或者实验设计使得选择性注意无效则这种优势就会消失因此尽管非等亮度显示实验将颜色视为有用线索用于识别其他情况下会被伪装的移动物体但这些实验并未提供色差实际上对运动有所贡献的确凿证据同样地使用局部运动信号生成全局三维结构印象的研究也未能发现非等亮度显示中存在色度对全局运动有贡献。

需要注意的是，在混合色度亮度的刺激条件下，并非所有的颜色变化都能显著影响任务的表现能力。先前的研究指出，在仅依赖亮度差异来驱动渐近性能的情况下（因为此时颜色变化并未起到实质作用），全局运动处理主要依赖于亮度信号的作用机制。然而，在非等亮度显示条件下，“颜色对全局运动的影响微乎其微”的现象依然存在（如Gegenfurtner等人2003所回顾）。基于此推测，在人类视觉系统中可能存在一种特殊的机制（如MT区神经元所示），能够将局部的颜色信息整合到全局运动感知中（Gegenfurtner＆Kiper 2003）。大多数实验证据表明，在MT神经元中对L-M（Gegenfurtner等人1994；Thiele等人1999, 2001）以及S-(L+M)色度对比度都很敏感（Barberini等人2005）。值得注意的是，在光对比度足够高的情况下（如Thiele等人1999所发现），L-M对比度对MT反应的颜色贡献变得极其有限。这一最终发现可能与上述看似矛盾的心理物理学研究在生理学层面具有的对应关系：即在等亮度条件下可被感知到的全局运动（Ruppertsberg等人2006），以及在非等亮度刺激下颜色对全局运动没有显著贡献的现象（Li＆Kingdom 2001a, b；Snowden＆Edmunds 1999）。

CONCLUSION

Complex scenes demonstrate the significant role light plays in perception when varying across spectral composition. The chromatic variation within a scene over space or time changes how light's hue appears; additionally, this variation enables distinguishing an object's inherent color from its illuminating light even when receptors' responses are ambiguous due to illumination's spectral distribution. Furthermore, chromatic features in complex scenes affect more than just color perception; they also influence how orientation, shape, texture, and object segmentation are perceived.

注：如果需要进一步优化，请告诉我！

CONCLUSION结论

在复杂场景中得以体现的是光谱组成变化对知觉的影响。在场景内部的空间或时间上的色度变化会影响特定光照下的色调；同时这种现象能够将物体自身的颜色与其所接受到的光照区分开来。然而感受器对于光照分布敏感的程度会因光源特性而异。此外，在复杂场景中所呈现的颜色特征还会延伸至非视觉感知层面如方向、形状、纹理以及物体分割等方面的影响。

早期对光信号编码过程中存在的模糊性可通过神经机制得以解决；这些过程通常利用自然界的特性来辅助功能；这表明仅使用等亮度色度模式进行的研究存在根本限制；因为这些模式缺乏研究正常视觉所需两个隐含方面所需的自然场景的复杂性：其一是（a）色度与亮度的一致（或不一致）变化；其二是（b）在色度选择性和光谱非选择性的神经响应共同作用于同一知觉时所呈现的颜色相对重要性；而这种复杂性使得仅依赖等亮度色度模式无法充分反映真实视觉感知机制；因此需要通过多层次的色度神经表示来体现这种复杂性；具体而言；这些层次强调了在颜色变化与亮度变化并存的情况下所使用的复杂视觉刺激的重要性；特别是那些能够在颜色和亮度上都有显著变化的刺激能够有效推动我们对颜色在视觉感知中的作用的理解

全部评论 (0)

还没有任何评论哟~

Color in Complex Scenes论文及其翻译

ColorinComplexScenes Type:AcademicJournal Author:StevenK.Shevell Link:sichub Select:⭐️⭐️⭐️⭐️ Status:...

【论文翻译】Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events

论文翻译HumaninEvents:ALargeScaleBenchmarkforHumancentricVideoAnalysisinComplexEvents 论文地址：https://arxiv...

PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments论文阅读翻译 - 2020ECCV

PIoULoss:TowardsAccurateOrientedObjectDetectioninComplexEnvironments论文阅读翻译文章目录 PIoULoss:TowardsAccu...

论文翻译：Primes is in P

这篇论文主要研究素性测试的多项式时间算法。原译文：https://www.cnblogs.com/allegrovivace/p/12892568.html 虽然也是我写的，但博客园炸了hhh PR...

VIDEO INPAINTING OF COMPLEX SCENES

VIDEOINPAINTINGOFCOMPLEXSCENES AlasdairNewson†‡,Andr´esAlmansa‡,MatthieuFradet†,YannGousseau‡,andPat...

Missing information reconstruction integrating isophote constraint and color-structure contro（论文翻译）

ISPRS2024 论文名称:结合等照线约束和颜色结构控制的遥感数据缺失信息重建论文地址：代码地址:https://github.com/YuXiaoyu221/ICCSCInformationR...

论文翻译：A Survey on In-context Learning

[Submittedon31Dec2022v1,lastrevised18Jun2024thisversion,v4] ASurveyonIncontextLearning <https://arxi...

Three.js源码翻译及案例(一)-src/scenes/Fog.js

写在前面本系列文章主要为Three.js引擎的源码翻译及相关案例的展示。因为工作需要所以读Three的源码，也算是对自己学习的一个记录，下面正式开始翻译及相关案例。

《Unsupervised Monocular Depth Learning in Dynamic Scenes》论文笔记

参考代码：depthandmotionlearning 1\.概述导读：这篇文章是在（DepthfromVideosintheWild）的基础上进行改进得到的，在之前的文章中运动区域/物体通过mas...

Retinaface论文翻译及理解

摘要虽然在不受控条件下的人脸检测已经取得了非常显著的进展，在自然环境下准确有效的人脸检测依然具有挑战。本文提出了一种单级（singlestage）人脸检测器：RetinaFace.通过联合外监督（e...

是否确定退出登录?

Color in Complex Scenes论文及其翻译