Units of Information
We wish to quantify information or data for the purpose of storage or transfer. This helps us plan the capacity of storage systems or the bandwidth of communication systems. Service providers also use these numbers to determine pricing.
The basic unit of information is called bit. It's a short form for binary digit.^{} It takes only two values, 0 or 1. All other units of information are derived from the bit. For example, 8 bits is called a byte, which is commonly used. Since the early 2010s, we're seeing a massive growth in the amount of data being generated worldwide.^{} This has led to the use of higher units such as petabytes and exabytes.
In the world of quantum computing, qubit is the basic unit of information.^{}
Discussion

What's the rationale for trying to quantify information? It's been said that a picture is worth a thousand words. In digital systems, this is not a satisfactory statement. What if there's a quantitative way to compare information regardless of the form, be it a picture, a movie, printed text or a song?
Outside the world of digital systems, different units exist. For a library, the number of books it has on its shelves is an essential metric. For a printer, the number of pages is more suitable but this too is not a uniform measure since the page size of a book is different from that of a newspaper. For a publisher or an author, the number of words is a better way to convey the amount of data.
In digital systems, a common unit makes it easier to design systems regardless of the type of data being handled. How much storage should you install for a desktop machine? How many images can you store on your SD card? How long will it take to download a 90minute full HD movie? These questions can be answered more readily.^{}

What are the different units of information? Eight bits make a byte or octet.^{} A thousand bytes make a kilobyte. A thousand kilobytes make a megabyte. The largest unit we have (in 2019) is a yottabyte, which is \(10^{24}\) bytes.^{} While higher units (brontobytes, geopbytes, etc.) have been suggested, they're not official.^{}
We also have multiples of bits: kilobits, megabits, and so on.^{} While storage is often quoted in bytes, communication bandwidth is often quoted in bits. For example, we may say that a file is 300KB (kilobytes) in size; or a broadband connection gives a maximum of 4Mbps (megabits per second) download speed. Web browsers, however, give download speeds in bytes per second.^{}
Bytes are commonly used. Each letter of the English alphabet is represented by a byte. Short emails are in the order of few hundred bytes. Typed text of a few pages is a few kilobytes. An image downloaded on the web could be a few hundred kilobytes. A song encoded in MP3 is a few megabytes. A movie is hundreds of megabytes or even a few gigabytes.^{} In production, Avatar movie took up 1 petabyte of storage.^{}

What units of information are of concern to a programmer? Byte is the basic unit of memory or storage for programmers, although bitlevel manipulation is possible. Since a bit can store two values, a byte can store \(2^8=256\) different values.^{}
From the perspective of programming, four bits make a nibble. It's common to write a byte value in hexadecimal form of two nibbles. For example, 126 is 0x7E. These nibbles are also conveniently shown when debugging memory contents. Nibbles were useful in early computer systems that used packed BCD.^{}
Two words make a doubleword but what exactly is a word? The length of a word depends on the processor architecture. In a 16bit system, a word is defined as two bytes. In a 64bit system, a word is eight bytes.^{} A processor executes instructions, accesses data in memory via addresses, and processes or represents numbers in words or its multiples.^{} Thus, programmers who write lowlevel code need to be aware of the word size.
In C programming language, short must be at least 16 bits; long at least 32 bits; and long long at least 64 bits.^{}

How is a kibibyte different from a kilobyte? The prefixes kilo, mega, giga and others are based on the decimal system in which these are powers of ten. But computer systems operate on the binary system. Hence, memory and storage capacities are often quoted in powers of two. Thus, a kilobyte was 1,024 bytes, not 1,000 bytes. This led to confusion since these prefixes in SI metric system commonly referred to powers of ten.^{}
To set right this confusion, the International Electrotechnical Commission (IEC) approved in 1998, and published in 1999, a new set of units that represented powers of two. Thus, a kibibyte had 1,024 bytes, a mebibyte had 1,024 kibibytes, and so on.^{}
However, industry has not widely adopted the IEC units.^{} Thus, Windows 10 continues to report 203GB disk capacity as 189GB when it should say 189GiB.

How is information related to entropy? The word information is often loosely taken to mean data. We assume that a file of size 1MB carries 1MB of information. However, from the perspective of information theory, data is not equal to information. In information theory, information is defined mathematically as the amount of uncertainty or entropy. A dice throw has more uncertainty than a coin toss, and therefore has more information to convey.^{}
An uncompressed bitmap image has a lot of spatial redundancy in its pixel values. In other words, a pixel value can be used to predict values of neighbouring pixels. Image compression techniques exploit this redundancy. Thus, a compressed image is closer to the mathematical definition of information. But an MP3 song could contain repetitions of the chorus. Also, once we've heard the song and remember it well, it conveys less information the next time we hear it.
Therefore, the phrase "units of information" should be interpreted as "units of data/storage/memory".
Milestones
Hartley studies the problem of measure information. Given \(s\) symbols and \(n\) selections, he notes that information \(H=n\,log\,s\). He describes this as,^{}
our practical measure of information the logarithm of the number of possible symbol sequences
Claude Shannon publishes his paper titled A Mathematical Theory of Communication. He introduces the word bit as a contraction of binary digit. He credits J. W. Tukey for coining this word.^{}
1956
During the design of IBM 7030 Stretch, IBM's first transistorized supercomputer, Werner Buchholz coins the term byte for an ordered collection of bits. It represents the smallest amount of data that a computer can process. At this time, the byte is not defined as eight bits.^{}
During this decade, 7bit ASCII is standardized. More importantly, IBM introduces its System/360 product line that uses 8bit Extended Binary Coded Decimal Interchange Code (EBCDIC). This leads to the popular use of 8bit storage systems. Thus, a byte comes to represent 8 bits.^{}
The prefixes exa and peta are standardized as part of the International System of Units (SI).^{}
The prefixes zetta and yotta are standardized as part of the International System of Units (SI).^{}
1999
The International Electrotechnical Commission (IEC) publishes new units of information based on powers of two: kibibyte, mebibyte, gibibyte, and so on. The letters "bi" replace the last two letters of the SI prefixes. While some systems have adopted these units, many others have not.^{}
References
 Economist. 2010. "All too much." The Economist, February 27. Accessed 20190520.
 Forret, Peter. 2005. "Binary confusion: kilobytes and kibibytes." February 04. Accessed 20190520.
 Harish. 2014. "C Programming #01: Bit, Byte etc." Harish Note, June 01. Accessed 20190520.
 Hartley, R.V.L. 1928. "Transmission of Information." Bell System Technical Journal, vol. 7, no. 3, pp. 535563, July. Accessed 20200722.
 Khan Academy Labs. 2014. "Measuring information." Journey into Information Theory, on YouTube, April 28. Accessed 20200722.
 Kwiatkowski, Sebastian. 2018. "Entropy is a measure of uncertainty." Towards Data Science, via Medium, October 06. Accessed 20190520.
 LINFO. 2005. "Bit Definition." The Linux Information Project, March 05. Updated 20060405. Accessed 20190520.
 Norman, Jeremy. 2019. "Werner Buchholz Coins the Term Byte." Jeremy Norman & Co., Inc., May 19. Accessed 20190520.
 Petrov, Christo. 2019. "Big Data Statistics 2019." TechJury, March 22. Accessed 20190520.
 Ramić, Omer. 2017. "Conversion and difference kilobyte to kibibyte, megabyte to mebibyte." Blog, January 01. Accessed 20190520.
 Science Epic. 2015. "What is the Largest Unit of Storage?" Science Epic, April 14. Accessed 20190520.
 Shannon, C. E. 1948. "A Mathematical Theory of Communication." The Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, July, October. Accessed 20190520.
 Technopedia. 2019. "Word Size." Accessed 20190520.
 Wikipedia. 2019a. "Units of information." Wikipedia, May 15. Accessed 20190520.
 Wikipedia. 2019b. "C data types." Wikipedia, May 10. Accessed 20190520.
 Wikipedia. 2019c. "Binarycoded decimal." Wikipedia, May 05. Accessed 20190520.
 Wikipedia. 2019d. "International System of Units." Wikipedia, May 20. Accessed 20190520.
 Wikiversity. 2018. "Data sizes and speeds." Wikiversity, November 21. Accessed 20190520.
 Zee. 2010. "Believe it or not: Avatar takes 1 petabyte of storage space, equivalent to a 32 YEAR long MP3." The Next Web, January 01. Accessed 20190520.
Further Reading
 LINFO. 2005. "Bit Definition." The Linux Information Project, March 05. Updated 20060405. Accessed 20190520.
 Wikiversity. 2018. "Data sizes and speeds." Wikiversity, November 21. Accessed 20190520.
 Forret, Peter. 2005. "Binary confusion: kilobytes and kibibytes." February 04. Accessed 20190520.