Computer Vision

Computer Vision is an interdisciplinary field. Source: Frolov 2018, slide 7.
Computer Vision is an interdisciplinary field. Source: Frolov 2018, slide 7.

Computer Vision is about enabling computers to see, perceive and understand the world around them. This is achieved through a combination of hardware and software. Computers are trained using lots of images/videos and algorithms/models are built. An understanding of human vision also informs the design of these algorithms/models. In fact, computer vision is a complex interdisciplinary field at the intersection of engineering, computer science, mathematics, biology, psychology, physics, and more.

Since the early 2010s, neural network approaches have greatly advanced computer vision. But given the sophistication of human vision, much more needs to done.

Computer vision is an important part of Artificial Intelligence. This is because we "see" less with our eyes and more with our mind. It's also because it spans multiple disciplines.


  • How does Computer Vision compare with human vision?

    Computers first see every image as a matrix of numbers. It's the job of algorithms to transform these low-level numbers into lines, shapes and objects. This isn't so different from human vision where the retina triggers signals that are then processed by the visual cortex in the brain, leading to perception.

    CV uses Convolutional Neural Network (CNN). This is a model inspired by how the human visual cortex works, processing visual sensory inputs via a hierarchy of layers of neurons. While more work is needed to achieve the accuracy of human vision, CNNs have brought us the best results so far. CNNs lead to Deep CNNs where the idea is to match 2D templates rather than construct 3D models. This again is inspired by our own vision system.

    Among the obvious differences are that CV can see 360 degrees; that CV is not limited to just visible light; that CV is not affected by fatigue or physiology; CV sees uniformly across the field of view but our peripheral vision is better in low-light conditions; CV has its own biases but they're free from biases and optical illusions that affect humans.

  • Isn't Computer Vision similar to image processing?
    Computer vision vs image processing explained. Source: Isikdogan 2018.

    Image processing comes from the disciplines of Electrical Engineering and Signal Processing whereas computer vision is from Computer Science and Artificial Intelligence. Image processing takes in an image, enhances the image in some way, and outputs an image. Computer vision is more about image analysis with the goal of extracting features, segments and objects from the image.

    Adjusting the contrast in an image or sharpening the edges via a digital filter are image processing tasks. Adding colour to a monochrome image, detecting faces or describing the image are computer vision tasks. It's common to combine the two. For example, an image is first enhanced and then given to computer vision. Computer vision can detect faces or eyes, then image processing improves facial skin tone or removes red eye.

    Let's note that Machine Learning can be used for both CV and image processing, although it's more commonly used for CV.

  • How is Computer Vision (CV) related to Machine Vision (MV)?

    Machine vision is more an engineering approach to enable machines to see. It's about image sensors (cameras), image acquisition, and image processing. For example, it's used on production lines to detect manufacturing defects or ensure that products are labelled correctly. Machine vision is commonly used in controlled settings, has strong assumptions (colour, shape, lighting, orientation, etc.) and therefore works reliably.

    Computer vision incorporates everything that machine vision does but adds value by way of image analysis. Thus, machine vision can be seen as a subset of computer vision. CV makes greater use of automation and algorithms, including machine learning but the line between CV and MV is blurry. Typically, vision systems in industrial settings can be considered as MV.

  • What are some applications of Computer Vision?
    Google Maps uses CV on satellite or aerial images to create 3D models. Source: Miller 2014.
    Google Maps uses CV on satellite or aerial images to create 3D models. Source: Miller 2014.

    CV has far-reaching applications. Wikipedia's category on this topic has many sub-categories and dozens of pages:

    • Recognition Tasks: Recognition of different entities including face, iris, gesture, handwriting, optical character, number plate, and traffic sign.
    • Image Tasks: Automation of image search, synthesis, annotation, inspection, and retrieval.
    • Applications: Enabling entire applications such as augmented reality, sign language translation, automated lip reading, remote sensing, mobile mapping, traffic enforcement camera, red light camera, pedestrian detection and video content analysis.

    Facebook uses CV for detecting faces and tagging images automatically. Google's able to give relevant results for an image search because it analyzes image content. Microsoft Kinect uses stereo vision. Iris or face recognition are being used for surveillance or for biometric identification. Self-driving cars employ a variety of visual processing tasks to drive safely.

    Gauss Surgical uses real-time ML-based image analysis to determine blood loss in patients. Amazon Go uses CV for tracking shoppers in the store and enabling automated checkout. CV has been used to study society, demographics, predict income, crime rates, and more.

  • Could you describe some common tasks in Computer Vision?
    Labels and bounding boxes in object detection. Source: Murali 2017.
    Labels and bounding boxes in object detection. Source: Murali 2017.

    Image Segmentation groups pixels that have similar attributes such as colour, intensity or texture. It's a better representation of the image to simplify further processing. This can be subdivided into semantic or instance segmentation. For instance, the former means that persons and cats are segmented; the latter means that each person and each cat is segmented.

    Image Classification is about giving labels to an image based on its content. Thus, the image of a cat would be labelled as "cat" with high probability.

    Object Detection is about detecting objects and placing bounding boxes. Objects are also categorized and labelled. In a two-stage detector, boxing and classification are done separately. A one-stage detector will combine the two. Object detection leads to Object Tracking in video applications.

    Image Restoration attempts to enhance the image. Image Reconstruction is about filling in missing parts of the image. With Image Colourization, we add colour to a monochrome image. With Style Transfer we transform an image based on the style (colour, texture) of another image.

  • What's the typical data pipeline in Computer Vision?
    A typical computer vision data pipeline. Source: Thompson 2015.
    A typical computer vision data pipeline. Source: Thompson 2015.

    A typical CV pipeline includes image acquisition using image sensors; pre-processing to enhance the image such as reducing noise; feature extraction that would reveal lines, edges, shapes, textures or motion; image segmentation to identify areas or objects of interest; high-level processing (also called post-processing) as relevant to the application; and finally, decision making such as classifying a medical scan as true or false for tumour.

  • Could you mention some algorithms that power Computer Vision?

    Here's a small and incomplete selection of algorithms. For pre-processing, thresholding is a simple and effective method: conventional, Otsu global optimal, adaptive local. Filters are commonly used: median filter, top-hat filter, low-pass filter; plus filters for edge detection: Roberts, Laplacian, Prewitt, Sobel, and more.

    For feature-point extraction, we can use HOG, SIFT and SURF. Hough Transform is another feature extraction technique. Viola-Jones algorithm is for object or face detection in real time. There's also the PCA approach called eigenfaces for face recognition.

    Lucas-Kanade algorithm and Horn-Schunk algorithm are useful for optical flow calculation. Mean-shift algorithm and Kalman filter are for object tracking. Graph Cuts are useful for image segmentation. For 3D work, NDT, ICP, CPD, SGM, and SGBM algorithms are useful.

    Bresenham's line algorithm is for drawing lines in raster graphics. To relate corresponding points in stereo images, use Fundamental Matrix.

    From the world of machine learning algorithms we have CNNs and Deep CNNs. We also have SVM, KNN, and more.

  • What are the current challenges in Computer Vision?

    It's been shown that "adversarial" images in which pixels are selectively changed can trick image classification systems. For example, Google Cloud Vision API thinks it's looking at a dog when really the scene has skiers.

    Algorithms are capable of deductive reasoning but are poor with understanding context, analogies and inductive reasoning. For example, CV can recognize a book but the same book when used as a doorstop will be seen only as a book. In other words, CV is incapable of understanding a scene.

    While CV has progressed with object recognition, accuracy can suffer if the background is cluttered with details or the object is shown under different lighting in a different angle. In other words, invariant object recognition is still a challenge.

    There are also challenges in creating low-powered CV solutions that can be used in smartphones and drones. Embedded vision is becoming mainstream in automotive, wearables, gaming, surveillance, and augmented reality with a focus towards object detection, gesture recognition, and mapping functions.

  • What software tools would you recommend for doing Computer Vision?
    A 2017 developer survey shows TensorFlow's popularity. Source: DataBricks 2019.
    A 2017 developer survey shows TensorFlow's popularity. Source: DataBricks 2019.

    The easy approach to use CV into your applications is to invoke CV APIs: Microsoft Azure Computer Vision, AWS Rekognition, Google Cloud Vision, IBM Watson Visual Recognition, Cloud Sight, Clarifai, and more. These cover image classification, face detection/recognition, emotion detection, optical character recognition (OCR), text detection, landmark detection, content moderation, and more.

    OpenCV is a popular multiplatform tool with C/C++, Java and Python bindings; but it doesn't have native GPU support. For coding directly in Python, there's also NumPy, SciPy and scikit-image. SimpleCV is great for prototyping before you adopt OpenCV for more serious work.

    Computer Vision Toolbox from MathWorks is a paid tool but it can simplify the design and testing of CV algorithms including 3D vision and video processing. C# and .NET developers can use AForge.NET/Accord.NET for image processing. For using CNNs, TensorFlow is a popular tool. CUDA Toolkit can help you get the best performance out of GPUs. Try Tesseract for OCR.

    For more tools, see eSpace on Medium, ResearchGate and Computer Vision Online.


The world's first digitized photograph. Source: NIST 2007.

Russell Kirsch at the National Bureau of Standards (now called NIST) asks, "What would happen if computers could look at pictures?" Kirsch and his colleagues develop equipment to scan a photograph and represent it in the world of computers. They scan a 5cm x 5cm photograph of Kirsch's infant son into an array of 176 x 176 pixels.

Hubel and Wiesel experiment on how cats see. Source: Demush 2019.

Neurophysiologists David Hubel and Torsten Wiesel discover that a cat's visual cortex is activated not by objects but by simple structures such as oriented edges. It's only decades later that we use this in the design of CNNs.


Larry Roberts publishes his PhD thesis at MIT. His idea is to create a 3D representation based on perspectives contained in 2D pictures. This is done by transforming images into line drawings. Soon after this, Roberts joins DARPA and becomes one of the founders of the ARPANET that eventually evolves into the Internet. Roberts is considered at the father of computer vision.


The summer of 1966 is considered the official birth of computer vision. Seymour Papert of MIT's AI Lab defines the "Summer Vision Project". The idea is to do segmentation and pattern recognition on real-world images. The project proves too challenging for its time and not much is achieved.


Research in computer vision continues in the direction suggested by Roberts. David Huffman and Max Clowes independently publish line labelling algorithms. Lines are labelled (convex, concave, occluded) and then used to discern the shape of objects.


To overcome blindness, Kurzweil Computer Products comes up with a program to do OCR. This comes at a time when funding and confidence in AI was at its lowest point, now called as the AI Winter of the 1970s.

Marr's representational framework for vision. Source: Leymarie 2006.

David Marr suggests a bottom-up approach to computer vision (his book is published posthumously in 1982). He states that vision is hierarchical. It doesn't start with high-level objects. Rather, it starts with low-level features (edges, curves, corners) from which higher level details are built up. In the 1980s, this leads to greater focus on low-level processing and goes on to influence deep learning systems. Marr's work is now considered a breakthrough in computer vision.

Interconnected layers of Fukushima's Neocognitron. Source: Fukushima 1980, fig. 2.

In 1980, Kunihiko Fukushima designs a neural network called Neocognitron for pattern recognition. It's inspired by the earlier work of Hubel and Wiesel. It includes many convolutional layers and may be called the first deep neural network. Later this decade, math and stats begin to play a more significant role in computer vision. Some examples of math-inspired contributions include Lucas-Kanade algorithm (1981) for flow calculation, Canny edge detector (1986), and eigenface for facial recognition (1991).


In the early 1990s, in criticism of Marr's approach, goal-oriented computer vision emerges. The idea is that we often don't need 3D models of the world. For example, a self-driving car only needs to know if the object is moving away or towards the vehicle. One of the proponents of this approach is Yiannis Aloimonos.


AlexNet wins the annual ImageNet object classification competition with the idea that depth of a neural network is important for accurate results. AlexNet uses five convolutional layers followed by three fully connected layers. It's considered a breakthrough in computer vision, inspiring further research in its footsteps. In 2015, Microsoft's ResNet of 152 layers obtains better accuracy.


  1. Alyamkin, Sergei, Matthew Ardi, Alexander C. Berg, Achille Brighton, Bo Chen, Yiran Chen, Hsin-PaiCheng, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Abhinav Goel, Alexander Goncharenko, XuyangGuo, Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Jong GookKo, Alexander Kondratyev, Junhyeok Lee, Seungjae Lee, Suwoong Lee, Zichao Li, Zhiyu Liang, JuzhengLiu, Xin Liu, Yang Lu, Yung-Hsiang Lu, Deeptanshu Malik, Hong Hanh Nguyen, Eunbyung Park, DenisRepin, Liang Shen, Tao Sheng, Fei Sun, David Svitov, George K. Thiruvathukal, Baiwu Zhang, JingchiZhang, Xiaopeng Zhang, and Shaojie Zhuo. 2019. "Low-Power Computer Vision: Status,Challenges, Opportunities." Preprint accepted by Journal on Emerging and Selected Topics in Circuits and Systems, via arXiv, April 15. Accessed 2019-05-28.
  2. Amerland, David. 2017a. "Computer Vision and Why It is so Difficult." Towards Data Science, via Medium, October 10. Accessed 2019-05-28.
  3. Amerland, David. 2017b. "Your Mind Does All The Seeing." Blog, The Sniper Mind, September 11. Accessed 2019-05-28.
  4. Ashken, Sam. 2016. "What is Computer Vision? Human Vision V.S. Computer Vision." Blog, Blippar, August 02. Accessed 2019-05-28.
  5. Bhatia, Richa. 2017. "Forget Singularity: Even Computer Vision Is A Difficult Problem To Solve." Analytics India Magazine, November 06. Accessed 2019-05-28.
  6. Bisht, Pooja. 2019. "What is the Difference between Image Processing and Computer Vision." House of Bots, May 02. Accessed 2019-05-28.
  7. Brownlee, Jason. 2019. "9 Applications of Deep Learning for Computer Vision." Machine Learning Mastery, March 12. Accessed 2019-05-28.
  8. Chung, Won Taek. 2014. "What are the major algorithms in computer vision?" Quora, September 03. Accessed 2019-05-28.
  9. ClearView Imaging. 2019. "The difference between computer vision and machine vision." Blog, ClearView Imaging. Accessed 2019-05-28.
  10. Conlon, Austin. 2013. "What are the major open problems in computer vision?" Quora, April 06. Accessed 2019-05-28.
  11. Crouch, Elizabeth. 2019. "Computer Vision vs. Machine Vision — What’s the Difference?" Blog, Appen, April 04. Accessed 2019-05-28.
  12. DataBricks. 2019. "TensorFlow." Glossary. Accessed 2019-05-28.
  13. Demush, Rostyslav. 2019. "A Brief History of Computer Vision (and Convolutional Neural Networks)." Hackernoon, via Medium, February 27. Accessed 2019-05-28.
  14. eSpace. 2017. "Tools to help you dive into Computer Vision." eSpace Technologies, March 01. Accessed 2019-05-28.
  15. Frolov, Stanislav. 2018. "Computer Vision – From traditional approaches to deep neural networks." inovex GmbH, via SlideShare, February 28. Accessed 2019-05-28.
  16. Fukushima, Kunihiko. 1980. "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position." Biological Cybernetics 36, pp. 193-202, Springer-Verlag. Accessed 2019-05-30.
  17. Gebru, Timnit. 2019. "Using Computer Vision to Study Society: Methods and Challenges." IDSS Special Seminars, MIT, March 11. Accessed 2019-05-28.
  18. Huang, T. S. 1996. "Computer Vision: Evolution and Promise." 5th International Conference on High technology: Imaging science and technology, pp. 13-20. Accessed 2019-05-28.
  19. Isikdogan, Leo. 2018. "Computer Vision vs Image Processing." YouTube, September 15. Accessed 2019-05-28.
  20. Jedng. 2018. "Top 10 Computer Vision APIs: AWS, Microsoft, Google and more." Blog, Rakuten, October 23. Accessed 2019-05-28.
  21. Lazar, Aaron. 2018. "Top 10 Tools for Computer Vision." Packt Pub, April 05. Accessed 2019-05-28.
  22. Le, James. 2018. "The 5 Computer Vision Techniques That Will Change How You See The World." Heartbeat, via Medium, April 12. Accessed 2019-05-28.
  23. Leymarie, Frédéric Fol. 2006. "Computing & the Arts." Dept. of Computing, Goldsmiths College, University of London, October 10. Accessed 2019-05-30.
  24. Linn, Allison. 2015. "Microsoft researchers win ImageNet computer vision challenge." The AI Blog, Microsoft, December 10. Accessed 2019-05-30.
  25. MathWorks. 2019. "Computer Vision Toolbox." Accessed 2019-05-28.
  26. Miller, Greg. 2014. "The secret of Google Maps' accuracy revealed." Wired, December 09. Accessed 2019-05-28.
  27. Murali, Shravan. 2017. "An analysis on computer vision problems." Medium, September 13. Accessed 2019-05-28.
  28. NIST. 2007. "Fiftieth Anniversary of First Digital Image Marked." News, NIST, May 24. Accessed 2019-05-30.
  29. Nowicki, P. D., A. Ndika, J. Kemppainen, J. Cassidy, M. Forness, S. Satish, and N. Hassan. 2018. "Measurement of Intraoperative Blood Loss in Pediatric Orthopaedic Patients: Evaluation of a New Method." Journal of the American Academy of Orthopaedic Surgeons. Global research & reviews, 2(5), e014. doi:10.5435/JAAOSGlobal-D-18-00014. Accessed 2019-05-28.
  30. Padhan, Asmita. 2019. "What is computer vision? What are its applications?" Skyfi Labs, February 08. Accessed 2019-05-28.
  31. Seif, George. 2018. "How to do everything in Computer Vision." Towards Data Science, via Medium, December 13. Accessed 2019-05-28.
  32. Snow, Jackie. 2017. "Computer Vision Algorithms Are Still Way Too Easy to Trick." MIT Technology Review, December 20. Accessed 2019-05-28.
  33. Thompson, Mike. 2015. "Conquer the Challenge of Integrating Efficient Embedded Vision." Electronic Design, May 04. Accessed 2019-05-28.
  34. Tillman, Maggie. 2019. "What is Amazon Go, where is it, and how does it work?" Pocket-lint, February 18. Accessed 2019-05-28.
  35. Timothy, Tawose Olamide. 2014. "Image Segmentation." SlideShare, May 08. Accessed 2019-05-28.
  36. Turk, Matthew and Alex Pentland. 1991. "Eigenfaces for Recognition." Journal of Cognitive Neuroscience, vol. 3, no. 1, MIT. Accessed 2019-05-30.
  37. Vidolab. 2018. "Computer Vision History: Milestones and Breakthroughs." Vidolab Blog, December 24. Accessed 2019-05-28.
  38. Wikipedia. 2018. "Category:Applications of computer vision." Wikipedia, May 03. Accessed 2019-05-28.
  39. Wikipedia. 2019a. "Computer vision." Wikipedia, May 09. Accessed 2019-05-28.
  40. Wikipedia. 2019b. "AlexNet." Wikipedia, May 08. Accessed 2019-05-28.
  41. Zdziarski, Zbigniew. 2018. "The Early History of Computer Vision." July 13. Accessed 2019-05-28.

Further Reading

  1. Jonasson, Claes. 2018. "Seeing vs. perceiving — how we interact with the world." Medium, July 07. Accessed 2019-05-28.
  2. Le, James. 2018. "The 5 Computer Vision Techniques That Will Change How You See The World." Heartbeat, via Medium, April 12. Accessed 2019-05-28.
  3. Murali, Shravan. 2017. "An analysis on computer vision problems." Medium, September 13. Accessed 2019-05-28.
  4. Frolov, Stanislav. 2018. "Computer Vision – From traditional approaches to deep neural networks." inovex GmbH, via SlideShare, February 28. Accessed 2019-05-28.
  5. Demush, Rostyslav. 2019. "A Brief History of Computer Vision (and Convolutional Neural Networks)." Hackernoon, via Medium, February 27. Accessed 2019-05-28.

Article Stats

Author-wise Stats for Article Edits

No. of Edits
No. of Chats

Cite As

Devopedia. 2020. "Computer Vision." Version 4, July 23. Accessed 2020-11-24.
Contributed by
2 authors

Last updated on
2020-07-23 14:27:30
  • Convolutional Neural Network
  • Artificial Intelligence
  • OpenCV
  • Facial Recognition
  • Automatic Number Place Recognition
  • Image Retrieval