Units of Information

We wish to quantify information or data for the purpose of storage or transfer. This helps us plan the capacity of storage systems or the bandwidth of communication systems. Service providers also use these numbers to determine pricing.

The basic unit of information is called bit. It's a short form for binary digit. It takes only two values, 0 or 1. All other units of information are derived from the bit. For example, 8 bits is called a byte, which is commonly used. Since the early 2010s, we're seeing a massive growth in the amount of data being generated worldwide. This has led to the use of higher units such as petabytes and exabytes.

In the world of quantum computing, qubit is the basic unit of information.

Discussion

• What's the rationale for trying to quantify information?

It's been said that a picture is worth a thousand words. In digital systems, this is not a satisfactory statement. What if there's a quantitative way to compare information regardless of the form, be it a picture, a movie, printed text or a song?

Outside the world of digital systems, different units exist. For a library, the number of books it has on its shelves is an essential metric. For a printer, the number of pages is more suitable but this too is not a uniform measure since the page size of a book is different from that of a newspaper. For a publisher or an author, the number of words is a better way to convey the amount of data.

In digital systems, a common unit makes it easier to design systems regardless of the type of data being handled. How much storage should you install for a desktop machine? How many images can you store on your SD card? How long will it take to download a 90-minute full HD movie? These questions can be answered more readily.

• What are the different units of information?

Eight bits make a byte or octet. A thousand bytes make a kilobyte. A thousand kilobytes make a megabyte. The largest unit we have (in 2019) is a yottabyte, which is $$10^{24}$$ bytes. While higher units (brontobytes, geopbytes, etc.) have been suggested, they're not official.

We also have multiples of bits: kilobits, megabits, and so on. While storage is often quoted in bytes, communication bandwidth is often quoted in bits. For example, we may say that a file is 300KB (kilobytes) in size; or a broadband connection gives a maximum of 4Mbps (megabits per second) download speed. Web browsers, however, give download speeds in bytes per second.

Bytes are commonly used. Each letter of the English alphabet is represented by a byte. Short emails are in the order of few hundred bytes. Typed text of a few pages is a few kilobytes. An image downloaded on the web could be a few hundred kilobytes. A song encoded in MP3 is a few megabytes. A movie is hundreds of megabytes or even a few gigabytes. In production, Avatar movie took up 1 petabyte of storage.

• What units of information are of concern to a programmer?

Byte is the basic unit of memory or storage for programmers, although bit-level manipulation is possible. Since a bit can store two values, a byte can store $$2^8=256$$ different values.

From the perspective of programming, four bits make a nibble. It's common to write a byte value in hexadecimal form of two nibbles. For example, 126 is 0x7E. These nibbles are also conveniently shown when debugging memory contents. Nibbles were useful in early computer systems that used packed BCD.

Two words make a doubleword but what exactly is a word? The length of a word depends on the processor architecture. In a 16-bit system, a word is defined as two bytes. In a 64-bit system, a word is eight bytes. A processor executes instructions, accesses data in memory via addresses, and processes or represents numbers in words or its multiples. Thus, programmers who write low-level code need to be aware of the word size.

In C programming language, short must be at least 16 bits; long at least 32 bits; and long long at least 64 bits.

• How is a kibibyte different from a kilobyte?

The prefixes kilo, mega, giga and others are based on the decimal system in which these are powers of ten. But computer systems operate on the binary system. Hence, memory and storage capacities are often quoted in powers of two. Thus, a kilobyte was 1,024 bytes, not 1,000 bytes. This led to confusion since these prefixes in SI metric system commonly referred to powers of ten.

To set right this confusion, the International Electrotechnical Commission (IEC) approved in 1998, and published in 1999, a new set of units that represented powers of two. Thus, a kibibyte had 1,024 bytes, a mebibyte had 1,024 kibibytes, and so on.

However, industry has not widely adopted the IEC units. Thus, Windows 10 continues to report 203GB disk capacity as 189GB when it should say 189GiB.

• How is information related to entropy?

The word information is often loosely taken to mean data. We assume that a file of size 1MB carries 1MB of information. However, from the perspective of information theory, data is not equal to information. In information theory, information is defined mathematically as the amount of uncertainty or entropy. A dice throw has more uncertainty than a coin toss, and therefore has more information to convey.

An uncompressed bitmap image has a lot of spatial redundancy in its pixel values. In other words, a pixel value can be used to predict values of neighbouring pixels. Image compression techniques exploit this redundancy. Thus, a compressed image is closer to the mathematical definition of information. But an MP3 song could contain repetitions of the chorus. Also, once we've heard the song and remember it well, it conveys less information the next time we hear it.

Therefore, the phrase "units of information" should be interpreted as "units of data/storage/memory".

Milestones

1928

Hartley studies the problem of measure information. Given $$s$$ symbols and $$n$$ selections, he notes that information $$H=n\,log\,s$$. He describes this as,

our practical measure of information the logarithm of the number of possible symbol sequences
1948

Claude Shannon publishes his paper titled A Mathematical Theory of Communication. He introduces the word bit as a contraction of binary digit. He credits J. W. Tukey for coining this word.

Jul
1956

During the design of IBM 7030 Stretch, IBM's first transistorized supercomputer, Werner Buchholz coins the term byte for an ordered collection of bits. It represents the smallest amount of data that a computer can process. At this time, the byte is not defined as eight bits.

1960

During this decade, 7-bit ASCII is standardized. More importantly, IBM introduces its System/360 product line that uses 8-bit Extended Binary Coded Decimal Interchange Code (EBCDIC). This leads to the popular use of 8-bit storage systems. Thus, a byte comes to represent 8 bits.

1975

The prefixes exa and peta are standardized as part of the International System of Units (SI).

1991

The prefixes zetta and yotta are standardized as part of the International System of Units (SI).

Jan
1999

The International Electrotechnical Commission (IEC) publishes new units of information based on powers of two: kibibyte, mebibyte, gibibyte, and so on. The letters "bi" replace the last two letters of the SI prefixes. While some systems have adopted these units, many others have not.

Author
No. of Edits
No. of Chats
DevCoins
3
0
1042
2
0
51
1290
Words
4
Likes
5692
Hits

Cite As

Devopedia. 2020. "Units of Information." Version 5, July 22. Accessed 2021-09-09. https://devopedia.org/units-of-information
Contributed by
2 authors

Last updated on
2020-07-22 16:31:12
• Information Theory
• Big Data
• Computer Memory
• Computer Data Storage
• Site Map