A snapshot of two root-to-leaf branches of ImageNet: mammal sub-tree and vehicle sub-tree. Source: Ye 2018, fig. A1-A.
A snapshot of two root-to-leaf branches of ImageNet: mammal sub-tree and vehicle sub-tree. Source: Ye 2018, fig. A1-A.

ImageNet is a large database or dataset of over 14 million images. It was designed by academics intended for computer vision research. It was the first of its kind in terms of scale. Images are organized and labelled in a hierarchy.

In Machine Learning and Deep Neural Networks, machines are trained on a vast dataset of various images. Machines are required to learn useful features from these training images. Once learned, they can use these features to classify images and perform many other tasks associated with computer vision. ImageNet gives researchers a common set of images to benchmark their models and algorithms.

It's fair to say that ImageNet has played an important role in the advancement of computer vision.


  • Where is ImageNet useful and how has it advanced computer vision?

    ImageNet is useful for many computer vision applications such as object recognition, image classification and object localization.

    Prior to ImageNet, a researcher wrote one algorithm to identify dogs, another to identify cats, and so on. After training with ImageNet, the same algorithm could be used to identify different objects.

    The diversity and size of ImageNet meant that a computer looked at and learned from many variations of the same object. These variations could include camera angles, lighting conditions, and so on. Models built from such extensive training were better at many computer vision tasks. ImageNet convinced researchers that large datasets were important for algorithms and models to work well. In fact, their algorithms performed better after they were trained with ImageNet dataset.

    Samy Bengio, a Google research scientist, has said of ImageNet, "Its size is by far much greater than anything else available in the computer vision community, and thus helped some researchers develop algorithms they could never have produced otherwise."

  • What are some technical details of ImageNet?

    ImageNet consists of 14,197,122 images organized into 21,841 subcategories. These subcategories can be considered as sub-trees of 27 high-level categories. Thus, ImageNet is a well-organized hierarchy that makes it useful for supervised machine learning tasks.

    On average, there are over 500 images per subcategory. The category "animal" is most widely covered with 3822 subcategories and 2799K images. The "appliance" category has on average 1164 images per subcategory, which is the most for any category. Among the categories with least number of images are "amphibian", "appliance", and "utensil".

    As many as 1,034,908 images have been annotated with bounding boxes. For example, if an image contains a cat as its main subject, the coordinates of a rectangle that bounds the cat are also published on ImageNet. This makes it useful for computer vision tasks such as object localization and detection.

    Then there's Scale-Invariant Feature Transform (SIFT) used in computer vision. SIFT helps in detecting local features in an image. ImageNet gives researchers 1000 subcategories with SIFT features covering about 1.2 million images.

    Images vary in resolution but it's common practice to train deep learning models on sub-sampled images of 256x256 pixels.

  • Could you explain how ImageNet defined the subcategories?
    Treemap visualization of first-level subcategories of geological formations. Source: Gershgorn 2017.
    Treemap visualization of first-level subcategories of geological formations. Source: Gershgorn 2017.

    In fact, ImageNet did not define these subcategories on its own but derived these from WordNet. WordNet is a database of English words linked together by semantic relationships. Words of similar meaning are grouped together into a synonym set, simply called synset. Hypernyms are synsets that are more general. Thus, "organism" is a hypernym of "plant". Hyponyms are synsets that are more specific. Thus, "aquatic" is a hyponym of "plant".

    This hierarchy makes it useful for computer vision tasks. If the model is not sure about a subcategory, it can simply classify the image higher up the hierarchy where the error probability is less. For example, if model is unsure that it's looking at a rabbit, it can simply classify it as a mammal.

    While WordNet has 100K+ synsets, only the nouns have been considered by ImageNet.

  • How were the images labelled in ImageNet?

    In the early stages of the ImageNet project, a quick calculation showed that by employing a few people, they would need 19 years to label the images collected for ImageNet. But in the summer of 2008, researchers came to know about an Amazon service called Mechanical Turk. This meant that image labelling can be crowdsourced via this service. Humans all over the world would label the images for a small fee.

    Humans make mistakes and therefore we must have checks in place to overcome them. Each human is given a task of 100 images. In each task, 6 "gold standard" images are placed with known labels. At most 2 errors are allowed on these standard images, otherwise the task has to be restarted.

    In addition, the same image is labelled by three different humans. When there's disagreement, such ambiguous images are resubmitted to another human with tighter quality threshold (only one allowed error on the standard images).

  • How are the images of ImageNet licensed?

    Images for ImageNet were collected from various online sources. ImageNet doesn't own the copyright for any of the images. This has implication on how ImageNet shares the images to researchers.

    For public access, ImageNet provides image thumbnails and URLs from where the original images were downloaded. Researchers can use these URLs to download the original images. However, those who wish to use the images for non-commercial or educational purpose, can create an account on ImageNet and request access. This will allow direct download of images from ImageNet. This is useful when the original sources of images are no longer available.

    The dataset can be explored via a browser-based user interface. Alternatively, there's also an API. Researchers may want to read the API Documentation. This documentation also shares how to download image features and bounding boxes.

  • What is the ImageNet Challenge and what's its connection with the dataset?
    Performance of winning entries of ILSVRC 2010-2014. Source: Russakovsky et al. 2014, fig. 9.
    Performance of winning entries of ILSVRC 2010-2014. Source: Russakovsky et al. 2014, fig. 9.

    ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was an annual computer vision contest held between 2010 and 2017. It's also called ImageNet Challenge.

    For this challenge, the training data is a subset of ImageNet: 1000 synsets, 1.2 million images. Images for validation and test are not part of ImageNet and are taken from Flickr and via image search engines. There are 50K images for validation and 150K images for testing. These are hand-labeled with the presence or absence of 1000 synsets.

    The Challenge included three tasks: image classification, single-object localization (since ILSVRC 2011), and object detection (since ILSVRC 2013). More difficult tasks are based upon these tasks. In particular, image classification is the common denominator for many other computer vision tasks. Tasks related to video processing, but not part of the main competition, were added in ILSVRC 2015. These were object detection in video and scene classification.

    For more information, read the current state-of-the-art on image classification for ImageNet.

  • What is meant by a pretrained ImageNet model?

    A model trained on ImageNet has essentially learned to identify both low-level and high-level features in images. However, in a real-world application such as medical image analysis or handwriting recognition, models have to be trained from data drawn from those application domains. This is time consuming and sometimes impossible due to lack of sufficient annotated training data.

    One solution is that a model trained on ImageNet can use it's weights as a starting point for other computer vision task. This reduces the burden of training from scratch. A much smaller annotated domain-specific training may be sufficient. By 2018, this approach was proven in a number of tasks including object detection, semantic segmentation, human pose estimation, and video recognition.

  • How is Tiny ImageNet related to ImageNet?

    Tiny ImageNet and its associated competition is part of Stanford University's CS231N course. It was created for students to practise their skills in creating models for image classification.

    The Tiny ImageNet dataset has 100,000 images across 200 classes. Each class has 500 training images, 50 validation images, and 50 test images. Thus, the dataset has 10,000 test images. The entire dataset can be downloaded from a Stanford server.

    Tiny ImageNet is a strict subset of ILSVRC2014. Labels and bounding boxes are provided for training and validation images but not for test images. All images have a resolution of 64x64. Since the average resolution of ImageNet images is 482x418 pixels, images in Tiny ImageNet might have some problems: object cropped out, too tiny, or distorted. It's been observed that with a small training dataset overfitting can occur. Data augmentation is usually done on the images to help models generalize better.

    Similarly, Imagenette and Imagewoof are other subsets of ImageNet, created by

  • What are the criticisms or shortcomings of ImageNet?

    Though ImageNet has a large number of classes, most of them don't represent everyday entities. One researcher, Samy Bengio, commented that the WordNet categories don't reflect the interests of common people. He added, "Most people are more interested in Lady Gaga or the iPod Mini than in this rare kind of diplodocus".

    Images are not uniformly distributed across subcategories. One research team found that by considering 200 subcategories, they found that the top 11 had 50% of the images, followed by a long tail.

    When classifying people, ImageNet uses labels that are racist, misogynist and offensive. People are treated as objects. Their photos have been used without their knowledge.

    One study noted that ImageNet lacks geodiversity. Most of the data represents North America and Europe. China and India are represented in only 1% and 2.1% of the images respectively. This implies that models trained on ImageNet will not work well when applied for the developing world.

    Another study from 2016 found that 30% of ImageNet's image URLs are broken. This is about 4.4 million annotations lost. Copyright laws prevent caching and redistribution of these images by ImageNet itself.



George A. Miller and his team at Princeton University start working on WordNet, a lexical database for the English language. It's really a combination of a dictionary and a thesaurus. This would enable applications in the area of Natural Language Processing (NLP).


Fei-Fei Li at the University of Illinois Urbana-Champaign gets the idea for ImageNet. The prevailing conviction among AI researchers at this time is that algorithms are more important and data is secondary. Li instead proposes that lots of data reflecting the real world would improve accuracy. By now, WordNet itself is mature, with version 3.0 getting released in December.


Fei-Fei Li meets Christiane Fellbaum of Princeton University, a WordNet researcher. Li adopts WordNet for ImageNet.


In July, ImageNet has 0 images. By December, ImageNet reaches 3 million images categorized across 6000+ synsets. By April 2010, the count is 11 million images across 15,000+ synsets. This is impossible for a couple of researchers but is made possible via crowdsourcing on the Amazon's Mechanical Turk platform.


ImageNet is presented for the first time at the Conference on Computer Vision and Pattern Recognition (CVPR) in Florida by researchers from the Computer Science Department, Princeton University.


The first ever ImageNet Challenge is organized, along with the well-known image recognition competition in Europe called the PASCAL Visual Object Classes Challenge 2010 (VOC2010).

AlexNet triggers a wave of better solutions to the ImageNet classification problem. Source: von Zitzewitz 2017, fig. 11.

ImageNet becomes the world's largest academic user of Mechanical Turk. The average worker identifies 50 images per minute. The year 2012 also sees a big breakthrough for both Artificial Intelligence and ImageNet. AlexNet, a deep convolutional neural network, achieves top-5 classification error rate of 16% from the previous best of 26%. Their approach is adapted by many others leading to lower error rates in following years.


The best human-level accuracy for classifying ImageNet data is 5.1% and GoogLeNet becomes the nearest neural network counterpart with 6.66%. PReLU-Net becomes the first neural network to surpass human-level of accuracy by achieving 4.94% top-5 error rate.


This year witnesses the final ImageNet Competition. Top-5 classification error drops to 2.3% and the competition is now considered a solved problem. Subsequently, the competition is hosted at Kaggle.


EfficientNet claims to have achieved top-5 classification accuracy of 97.1% and top-1 accuracy of 84.4% for ImageNet, dethroning it's predecessor GPipe (December 2018) by a meagre 0.1% in both top-1 and top-5 accuracies.

ImageNet wins the prestigious Longuet-Higgins Prize. Source: Menezes 2019.

ImageNet wins Longuet-Higgins Prize at CVPR 2019, a retrospective award that recognizes a CVPR paper for having significant impact and enduring relevancy on computer vision research over a 10-year period.

Four misclassified images of ImageNet-A. Source: Greene 2019.

ImageNet-A fools the best AI models 98% of the time, due to their over-reliance on colour, texture and background cues. Unlike adversarial attack in which images are modified, ImageNet-A has 7500 original images that have been handpicked from ImageNet. This shows that current AI models are not robust to new data.


  1. Brownlee, Jason. 2019. "A Gentle Introduction to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)." Machine Learning Mastery, May 01. Accessed 2019-06-15.
  2. Crawford, Kate and Trevor Paglen. 2019. "Excavating AI: The Politics of Training Sets for Machine Learning." The AI Now Institute, NYU, September 19. Accessed 2019-09-28.
  3. Deng, Jia , Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Fei-Fei Li. 2009. "ImageNet: A large-scale hierarchical image database." 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, pp. 248-255. Accessed 2019-04-27.
  4. fastai. 2019. "fastai/imagenette." GitHub, October 02. Accessed 2019-10-25.
  5. Gershgorn, Dave. 2017. "The data that transformed AI research—and possibly the world." Quartz, July 26. Accessed 2019-05-31.
  6. Greene, Tristan. 2019. "AI fails to recognize these nature images 98% of the time." The Next Web, July 19. Accessed 2019-08-09.
  7. Gupta, Sumit. 2018. "High Accuracy & Faster Deep Learning with High Resolution Images & Large Models." Medium, May 04. Accessed 2019-07-02.
  8. Hansen, Lucas. 2015. "Tiny ImageNet Challenge Submission." Stanford University. Accessed 2019-07-02.
  9. Hempel, Jessi. 2018. "Fei-Fei Li's Quest To Make Ai Better For Humanity." Wired, November 13. Accessed 2019-06-01.
  10. ImageNet. 2010a. "Summary and Statistics." Accessed 2019-06-18.
  11. ImageNet. 2010b. "ImageNet attribute labels." v1.0, August 02. Accessed 2019-06-17.
  12. ImageNet. 2015. "Large Scale Visual Recognition Challenge 2015 (ILSVRC2015)." Accessed 2019-07-02.
  13. ImageNet. 2017a. "Large Scale Visual Recognition Challenge 2017 (ILSVRC2017)." Accessed 2019-07-02.
  14. ImageNet. 2017b. "Large Scale Visual Recognition Challenge 2017 (ILSVRC2017)." Accessed 2019-07-02.
  15. ImageNet. 2019a. "About ImageNet." Accessed 2019-07-03.
  16. ImageNet. 2019b. "Download FAQ." Accessed 2019-06-15.
  17. ImageNet. 2019c. "Homepage." Accessed 2019-06-15.
  18. ImageNet. 2019d. "Download API." Accessed 2019-07-03.
  19. Karpathy, Andrej. 2014. "What I learned from competing against a ConvNet on ImageNet." Blog, on GitHub IO, September 02. Accessed 2019-06-19.
  20. Li, Fei-Fei. 2010. "ImageNet: crowdsourcing, benchmarking & other cool things." Stanford University. Accessed 2019-05-31.
  21. Loria, Steven. 2013. "Tutorial: What is WordNet? A Conceptual Introduction Using Python." September 20. Accessed 2019-07-02.
  22. Markoff, John. 2012. "Seeking a Better Way to Find Web Images." The New York Times, November 19. Accessed 2019-05-31.
  23. Menezes, Erica. 2019. "Most impactful paper: ImageNet -- No surprise here!" Twitter, June 18. Accessed 2019-06-19.
  24. Norman, Jeremy. 2019. "George A. Miller Begins WordNet, a Lexical Database." History of Information. Accessed 2019-07-02.
  25. Papers With Code. 2019. "Image Classification on ImageNet." Papers With Code. Accessed 2019-05-31.
  26. Peng, Tony. 2019. "CVPR 2019 Attracts 9K Attendees; Best Papers Announced; ImageNet Honoured 10 Years Later." SyncedReview, via Medium, edited by Michael Sarazen, June 19. Accessed 2019-06-19.
  27. Rahman, Shafin, Salman Khan, and Fatih Porikli. 2018. "Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts." arXiv, March 16. Accessed 2019-07-03.
  28. Ruder, Sebastian. 2018. "NLP's ImageNet moment has arrived." July 12. Accessed 2019-10-14.
  29. Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2014. "ImageNet Large Scale Visual Recognition Challenge." International Journal of Computer Vision, vol. 115, no. 3, September. Accessed 2019-07-03.
  30. Shankar, Shreya, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D. Sculley. 2017. "No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World." 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, US. Accessed 2019-07-03.
  31. Tiny ImageNet. 2019. "Tiny ImageNet Visual Recognition Challenge." Accessed 2019-06-17.
  32. Tsang, Sik-Ho. 2018. "Review: PReLU-Net — The First to Surpass Human-Level Performance in ILSVRC 2015 (Image Classification)." Coinmonks, via Medium, September 03. Accessed 2019-06-20.
  33. von Zitzewitz, Gustav. 2017. "Survey of neural networks in autonomous driving." Advanced Seminar Summer Semester, Technische Universitat Munchen. Accessed 2019-06-01.
  34. Wijnveen, Arjan. 2016. "How copyright is causing a decay in public datasets." LinkedIn Pulse, November 28. Accessed 2019-06-15.
  35. Wikipedia. 2019. "Scale-invariant feature transform." Wikipedia, June 22. Accessed 2019-07-02.
  36. WordNet. 2019. "Current Version." WordNet, Princeton University. Accessed 2019-07-02.
  37. Wu, Jiayu, Qixiang Zhang, and Guoxi Xu. 2017. "Tiny ImageNet Challenge." CS231n, Stanford University. Accessed 2019-06-17.
  38. Ye, Tengqi. 2018. "Visual Object Detection from Lifelogs using Visual Non-lifelog Data." Researchgate, January. Accessed 2019-06-20.

Further Reading

  1. Li, Fei-Fei. 2010. "ImageNet: crowdsourcing, benchmarking & other cool things." Stanford University. Accessed 2019-05-31.
  2. Gershgorn, Dave. 2017. "The data that transformed AI research—and possibly the world." Quartz, July 26. Accessed 2019-05-31.
  3. Karpathy, Andrej. 2014. "What I learned from competing against a ConvNet on ImageNet." Blog, on GitHub IO, September 02. Accessed 2019-06-19.
  4. Simon, Julien. 2017a. "ImageNet — part 1: going on an adventure." Medium, September 19. Accessed 2019-06-20.
  5. Simon, Julien. 2017b. "ImageNet — part 2: going on an adventure." Medium, September 24. Accessed 2019-06-20.
  6. Norena, Sebastian. 2018. "How to get Images from ImageNet with Python in Google Colaboratory." Coinmonks, via Medium. August 18. Accessed 2019-06-20.

Article Stats

Author-wise Stats for Article Edits

No. of Edits
No. of Chats

Cite As

Devopedia. 2019. "ImageNet." Version 15, October 25. Accessed 2020-11-24.
Contributed by
3 authors

Last updated on
2019-10-25 16:23:57
  • Computer Vision
  • MNIST Dataset
  • CIFAR Datasets
  • COCO Dataset
  • Image Augmentation
  • Image Classification