Computer vision Introduction
Introduction:
what is computer vision:
Definitions :
To make useful decisions about real physical objects and scenes based on sensed images (Stockman and Shapiro).
根据感应到的图片,对真实的物体和场景做出有效的决定。
Extracting descriptions of the world from pictures or sequences of pictures (Forsyth and Ponce).
从图片或多图片序列中提取世界的描述。
Related disciplines :
Image processing: manipulation of an image.
图像处理:对图片进行操作。
Computer graphics: digitally synthesizing images.
计算机图学:数字合成图像。
它们之间的区别可以归纳如下:
Image process: image -> image
Computer vision: image -> description
Computer graphics: description -> image
Pattern recognition: recognising and classifying stimuli in images and other datasets.
模式识别:对图片或其他数据集中的一部分的识别并分类。
Biological vision: understanding visual perception in humans and animals (studied in Neuroscience, Psychology, and Psychophysics).
生物视觉:从人类和动物了解视觉感知机制(从神经学,心理学,及心理物理学方面研究)。
why is it important :
- 
Biological motivation :
Understanding how we see.
Vision is main way in which we experience the world.
Evolutionary important: 50% of cerebral cortex is devoted to vision.
(生物动机:理解我们看到的;视觉是我们最重要的感知世界的方式;进化层次上的重要性:50%的大脑皮质用于视觉感知。) - 
Artificial motivation :
Want machines to interact with world.
Digital images are everywhere.
Lots of applications…
(智能动机:需要机器与世界交互;数字图片在任何地方都存在;有许多应用) - 
** Applications of machine vision**:
Industrial inspection, quality control
Robot navigation
Autonomous vehicles
Medical image analysis
Object/face/character recognition
…
(机器视觉的应用:工业监视、质量把控;机器导航;自动驾驶汽车;医疗图像分析;物体/面部/文字识别等。这里是一段机器视觉在工业中的应用:Computer vision in industry) 
why is it difficult:
Well-posed and ill-posed :
Mapping from world to image (3D to 2D) is unique (well-posed): this is a forward problem (i.e. imaging).
(Well-posed: 从3维世界投影到2维平面,结果是唯一的,这也称为前向问题。)
Mapping from image to world (2D to 3D) is NOT unique (ill-posed): this is an “inverse problem” (i.e. vision).
(Ill-posed: 从2维平面推测到3维物体,结果是不唯一的,这也称为后向问题。)

One image → many interpretations
Problem is ill-posed.
One object → many images
Problem is exponentially large.
(一张图片可能对应多种结果[问题是ill-posed的],一个物体从不同角度可能有多张图像[问题有指数大小的解]。)
For any given image there are many objects that could have generated that image. Solved using constraints or priors: which make some interpretations more likely than others (usually the brain produces one interpretation from the many possible ones).
(对于任意给定的图片,许多物体都有可能产生同样的图片。可以用约束或先验知识来解决问题,可以产生一些比其他假设更可能的结果。通常大脑从产生的多种可能结果中抽选一种。)
Vision scales exponentially :
A single object seen from different viewpoints can vary greatly in appearance (object orientation, retinal location, scale, etc. all affect appearance). The resulting images have very little similarity.
(一个物体从不同的视角看外观会有很大的不同。物体的朝向,观察者的位置,大小等等都会影响外观。看到的图像相似度很小。)

A single object seen under different lighting conditions can vary greatly in appearance.
(一个物体在不同的光照条件下看起来也会有很大差别。)

Objects forming a single category can vary greatly in appearance.
(同种物体的外观区别可以很大。)

Images usually contain multiple objects. This leads background clutter and occlusion.
(图像经常包含很多物体。这会导致杂乱的背景和遮挡。)

Need for constrains or priors
To solve two major challenges for computer vision, i.e., one image -> many interpretations, and one object -> many images, we need to employ constraints or priors.
(为了解决计算机视觉的两个困难,也就是说一张图片可能对应多种解释,和一个物体可能对应多张图片,我们需要使用约束或先验知识来解决这些问题。)
Perception involves inference: we must combine prior information about the world with evidence from our senses (e.g. vision) to infer what is in the world.
推断认知:我们必须将我们感知到的证据与世界的先验信息结合起来,才能推断出世界是什么。
For example, contextual information from the whole image enables us to disambiguate parts of the image.
(比如说,从整张片中得到的文本信息使得我们可以忽略图片中部分的错误或模糊。)
We can categorise some sources as priors from:
- 
prior knowledge:
- learned familiarity with certain objects
 - knowledge of image formation process in general
 
 - 
prior exposure
-recent / preceding sensory input - 
current context
- surrounding visual scene (and concurrent input in other sensory modalities)
 
 
(我们可以将先验分为以下几种:
- 
先验知识:
- 从具体物体中学习相似性
 - 通常图像处理知识
 
 - 
先验曝光:
- 最近或之前的感知输入
 
 - 
当前文本:
- 周围的场景和从其他传感设备中获得的同步输入。)
 
 
