Computer Vision

Computer Vision is about enabling computers to see, perceive and understand the world around them. This is achieved through a combination of hardware and software. Computers are trained using lots of images/videos and algorithms/models are built. An understanding of human vision also informs the design of these algorithms/models. In fact, computer vision is a complex interdisciplinary field at the intersection of engineering, computer science, mathematics, biology, psychology, physics, and more.

Since the early 2010s, neural network approaches have greatly advanced computer vision. But given the sophistication of human vision, much more needs to done.

Computer vision is an important part of Artificial Intelligence. This is because we "see" less with our eyes and more with our mind. It's also because it spans multiple disciplines.

Discussion

How does Computer Vision compare with human vision?
Computers first see every image as a matrix of numbers. It's the job of algorithms to transform these low-level numbers into lines, shapes and objects. This isn't so different from human vision where the retina triggers signals that are then processed by the visual cortex in the brain, leading to perception.
CV uses Convolutional Neural Network (CNN). This is a model inspired by how the human visual cortex works, processing visual sensory inputs via a hierarchy of layers of neurons. While more work is needed to achieve the accuracy of human vision, CNNs have brought us the best results so far. CNNs lead to Deep CNNs where the idea is to match 2D templates rather than construct 3D models. This again is inspired by our own vision system.
Among the obvious differences are that CV can see 360 degrees; that CV is not limited to just visible light; that CV is not affected by fatigue or physiology; CV sees uniformly across the field of view but our peripheral vision is better in low-light conditions; CV has its own biases but they're free from biases and optical illusions that affect humans.
Isn't Computer Vision similar to image processing?
Computer vision vs image processing explained. Source: Isikdogan 2018.
Image processing comes from the disciplines of Electrical Engineering and Signal Processing whereas computer vision is from Computer Science and Artificial Intelligence. Image processing takes in an image, enhances the image in some way, and outputs an image. Computer vision is more about image analysis with the goal of extracting features, segments and objects from the image.
Adjusting the contrast in an image or sharpening the edges via a digital filter are image processing tasks. Adding colour to a monochrome image, detecting faces or describing the image are computer vision tasks. It's common to combine the two. For example, an image is first enhanced and then given to computer vision. Computer vision can detect faces or eyes, then image processing improves facial skin tone or removes red eye.
Let's note that Machine Learning can be used for both CV and image processing, although it's more commonly used for CV.
How is Computer Vision (CV) related to Machine Vision (MV)?
Machine vision is more an engineering approach to enable machines to see. It's about image sensors (cameras), image acquisition, and image processing. For example, it's used on production lines to detect manufacturing defects or ensure that products are labelled correctly. Machine vision is commonly used in controlled settings, has strong assumptions (colour, shape, lighting, orientation, etc.) and therefore works reliably.
Computer vision incorporates everything that machine vision does but adds value by way of image analysis. Thus, machine vision can be seen as a subset of computer vision. CV makes greater use of automation and algorithms, including machine learning but the line between CV and MV is blurry. Typically, vision systems in industrial settings can be considered as MV.
What are some applications of Computer Vision?
Google Maps uses CV on satellite or aerial images to create 3D models. Source: Miller 2014.
CV has far-reaching applications. Wikipedia's category on this topic has many sub-categories and dozens of pages:
- Recognition Tasks: Recognition of different entities including face, iris, gesture, handwriting, optical character, number plate, and traffic sign.
- Image Tasks: Automation of image search, synthesis, annotation, inspection, and retrieval.
- Applications: Enabling entire applications such as augmented reality, sign language translation, automated lip reading, remote sensing, mobile mapping, traffic enforcement camera, red light camera, pedestrian detection and video content analysis.
Facebook uses CV for detecting faces and tagging images automatically. Google's able to give relevant results for an image search because it analyzes image content. Microsoft Kinect uses stereo vision. Iris or face recognition are being used for surveillance or for biometric identification. Self-driving cars employ a variety of visual processing tasks to drive safely.
Gauss Surgical uses real-time ML-based image analysis to determine blood loss in patients. Amazon Go uses CV for tracking shoppers in the store and enabling automated checkout. CV has been used to study society, demographics, predict income, crime rates, and more.
Could you describe some common tasks in Computer Vision?
Labels and bounding boxes in object detection. Source: Murali 2017.
Image Segmentation groups pixels that have similar attributes such as colour, intensity or texture. It's a better representation of the image to simplify further processing. This can be subdivided into semantic or instance segmentation. For instance, the former means that persons and cats are segmented; the latter means that each person and each cat is segmented.
Image Classification is about giving labels to an image based on its content. Thus, the image of a cat would be labelled as "cat" with high probability.
Object Detection is about detecting objects and placing bounding boxes. Objects are also categorized and labelled. In a two-stage detector, boxing and classification are done separately. A one-stage detector will combine the two. Object detection leads to Object Tracking in video applications.
Image Restoration attempts to enhance the image. Image Reconstruction is about filling in missing parts of the image. With Image Colourization, we add colour to a monochrome image. With Style Transfer we transform an image based on the style (colour, texture) of another image.
What's the typical data pipeline in Computer Vision?
A typical computer vision data pipeline. Source: Thompson 2015.
A typical CV pipeline includes image acquisition using image sensors; pre-processing to enhance the image such as reducing noise; feature extraction that would reveal lines, edges, shapes, textures or motion; image segmentation to identify areas or objects of interest; high-level processing (also called post-processing) as relevant to the application; and finally, decision making such as classifying a medical scan as true or false for tumour.
Could you mention some algorithms that power Computer Vision?
Here's a small and incomplete selection of algorithms. For pre-processing, thresholding is a simple and effective method: conventional, Otsu global optimal, adaptive local. Filters are commonly used: median filter, top-hat filter, low-pass filter; plus filters for edge detection: Roberts, Laplacian, Prewitt, Sobel, and more.
For feature-point extraction, we can use HOG, SIFT and SURF. Hough Transform is another feature extraction technique. Viola-Jones algorithm is for object or face detection in real time. There's also the PCA approach called eigenfaces for face recognition.
Lucas-Kanade algorithm and Horn-Schunk algorithm are useful for optical flow calculation. Mean-shift algorithm and Kalman filter are for object tracking. Graph Cuts are useful for image segmentation. For 3D work, NDT, ICP, CPD, SGM, and SGBM algorithms are useful.
Bresenham's line algorithm is for drawing lines in raster graphics. To relate corresponding points in stereo images, use Fundamental Matrix.
From the world of machine learning algorithms we have CNNs and Deep CNNs. We also have SVM, KNN, and more.
What are the current challenges in Computer Vision?
It's been shown that "adversarial" images in which pixels are selectively changed can trick image classification systems. For example, Google Cloud Vision API thinks it's looking at a dog when really the scene has skiers.
Algorithms are capable of deductive reasoning but are poor with understanding context, analogies and inductive reasoning. For example, CV can recognize a book but the same book when used as a doorstop will be seen only as a book. In other words, CV is incapable of understanding a scene.
While CV has progressed with object recognition, accuracy can suffer if the background is cluttered with details or the object is shown under different lighting in a different angle. In other words, invariant object recognition is still a challenge.
There are also challenges in creating low-powered CV solutions that can be used in smartphones and drones. Embedded vision is becoming mainstream in automotive, wearables, gaming, surveillance, and augmented reality with a focus towards object detection, gesture recognition, and mapping functions.
What software tools would you recommend for doing Computer Vision?
A 2017 developer survey shows TensorFlow's popularity. Source: DataBricks 2019.
The easy approach to use CV into your applications is to invoke CV APIs: Microsoft Azure Computer Vision, AWS Rekognition, Google Cloud Vision, IBM Watson Visual Recognition, Cloud Sight, Clarifai, and more. These cover image classification, face detection/recognition, emotion detection, optical character recognition (OCR), text detection, landmark detection, content moderation, and more.
OpenCV is a popular multiplatform tool with C/C++, Java and Python bindings; but it doesn't have native GPU support. For coding directly in Python, there's also NumPy, SciPy and scikit-image. SimpleCV is great for prototyping before you adopt OpenCV for more serious work.
Computer Vision Toolbox from MathWorks is a paid tool but it can simplify the design and testing of CV algorithms including 3D vision and video processing. C# and .NET developers can use AForge.NET/Accord.NET for image processing. For using CNNs, TensorFlow is a popular tool. CUDA Toolkit can help you get the best performance out of GPUs. Try Tesseract for OCR.
For more tools, see eSpace on Medium, ResearchGate and Computer Vision Online.

Milestones

1957

The world's first digitized photograph. Source: NIST 2007.

Russell Kirsch at the National Bureau of Standards (now called NIST) asks, "What would happen if computers could look at pictures?" Kirsch and his colleagues develop equipment to scan a photograph and represent it in the world of computers. They scan a 5cm x 5cm photograph of Kirsch's infant son into an array of 176 x 176 pixels.

1959

Hubel and Wiesel experiment on how cats see. Source: Demush 2019.

Neurophysiologists David Hubel and Torsten Wiesel discover that a cat's visual cortex is activated not by objects but by simple structures such as oriented edges. It's only decades later that we use this in the design of CNNs.

1963

Larry Roberts publishes his PhD thesis at MIT. His idea is to create a 3D representation based on perspectives contained in 2D pictures. This is done by transforming images into line drawings. Soon after this, Roberts joins DARPA and becomes one of the founders of the ARPANET that eventually evolves into the Internet. Roberts is considered at the father of computer vision.

1966

The summer of 1966 is considered the official birth of computer vision. Seymour Papert of MIT's AI Lab defines the "Summer Vision Project". The idea is to do segmentation and pattern recognition on real-world images. The project proves too challenging for its time and not much is achieved.

1971

Research in computer vision continues in the direction suggested by Roberts. David Huffman and Max Clowes independently publish line labelling algorithms. Lines are labelled (convex, concave, occluded) and then used to discern the shape of objects.

1974

To overcome blindness, Kurzweil Computer Products comes up with a program to do OCR. This comes at a time when funding and confidence in AI was at its lowest point, now called as the AI Winter of the 1970s.

1979

Marr's representational framework for vision. Source: Leymarie 2006.

David Marr suggests a bottom-up approach to computer vision (his book is published posthumously in 1982). He states that vision is hierarchical. It doesn't start with high-level objects. Rather, it starts with low-level features (edges, curves, corners) from which higher level details are built up. In the 1980s, this leads to greater focus on low-level processing and goes on to influence deep learning systems. Marr's work is now considered a breakthrough in computer vision.

1980

In 1980, Kunihiko Fukushima designs a neural network called Neocognitron for pattern recognition. It's inspired by the earlier work of Hubel and Wiesel. It includes many convolutional layers and may be called the first deep neural network. Later this decade, math and stats begin to play a more significant role in computer vision. Some examples of math-inspired contributions include Lucas-Kanade algorithm (1981) for flow calculation, Canny edge detector (1986), and eigenface for facial recognition (1991).

1990

In the early 1990s, in criticism of Marr's approach, goal-oriented computer vision emerges. The idea is that we often don't need 3D models of the world. For example, a self-driving car only needs to know if the object is moving away or towards the vehicle. One of the proponents of this approach is Yiannis Aloimonos.

2012

AlexNet wins the annual ImageNet object classification competition with the idea that depth of a neural network is important for accurate results. AlexNet uses five convolutional layers followed by three fully connected layers. It's considered a breakthrough in computer vision, inspiring further research in its footsteps. In 2015, Microsoft's ResNet of 152 layers obtains better accuracy.

References

Article Stats

2208

Words

Authors

Edits

Chats

Likes

8563

Hits

Cite As

Devopedia. 2020. "Computer Vision." Version 4, July 23. Accessed 2023-11-12. https://devopedia.org/computer-vision

Contributed by
2 authors

Last updated on
2020-07-23 14:27:30

algorithms machine learning deep learning image processing digital filters

Convolutional Neural Network
Artificial Intelligence
OpenCV
Facial Recognition
Automatic Number Place Recognition
Image Retrieval

Computer Vision

Discussion

Milestones

References

Further Reading

Article Stats

Cite As

See Also

Computer Vision

Discussion

Milestones

References

Further Reading

Article Stats

Author-wise Stats for Article Edits

Cite As

See Also

Login