A byte (of 8 bits) has a limited range of 256 values. When a value is beyond this range, it has to be stored in multiple bytes. A number such as 753 in hexadecimal format is 0x02F1. It requires at least two bytes of storage. The order in which these two bytes are stored in memory can be different. Byte 0x02 can be stored in lower memory address followed by 0xF1; or vice versa.
Programs must conform to the byte ordering as supported by the processor. If not, 0x02F1 might be wrongly interpreted as 0xF102, which is the number 61698 in decimal system. Byte ordering is also important when data is transferred across a network or between systems using different ordering.
Byte ordering is an attribute of the processor, not the operating system running on it.
Which are the different byte orderings present in computing systems?
Two common ordering systems include,
- Little-Endian: Low-order byte is stored at a lower address. This is also called Intel order since Intel's x86 family of CPUs popularized this ordering. Intel, AMD, PDP-11 and VAX are little-endian systems.
- Big-Endian: High-order byte is stored at a lower address. This is also called Motorola order since Motorola's PowerPC architecture used this ordering. Motorola 68K and IBM mainframes are big-endian systems.
Some processors support both ordering and are therefore called Bi-Endian. PowerPC and Itanium are bi-endian. Bi-Endian processers can switch between the two orderings. Most RISC architectures (SPARC, PowerPC, MIPS) were originally big-endian but are now configurable. While ARM processors are bi-endian, the default is to run them as little-endian systems, as seen in the Raspberry Pi.
Due to the popular adoption of x86-based systems (Intel, AMD, etc.) and ARM, little-endian systems have come to dominate the market.
Could you compare host-byte vs network-byte ordering?
Since computers using different byte ordering, and exchanging data, have to operate correctly on data, the convention is to always send data on the network in big-endian format. We call this network-byte ordering. The ordering used on a computer is called host-byte ordering.
A host system may be little-endian but when sending data into the network, it must convert data into big-endian format. Likewise, a little-endian machine must first convert network data into little-endian before processing it.
Four common functions to do these conversions are (for 32-bit long and 16-bit short)
Even if your code doesn't do networking, data may be stored in a different ordering. For example, file data is stored in big-endian while your machine is big-endian. In some cases, CPU instructions may be available for conversion. Intel's 64-bit instruction set has
BSWAPto swap byte ordering. Since ARMv6,
REVswaps byte order and
SETENDto set the endianness.
Aren't conversions redundant for big-endian machines since it already conforms to network-byte ordering?
It's good programming practice to invoke these conversions since this makes your code portable. The same codebase can be compiled for a little-endian machine and it will work. Calling the conversion functions on a big-endian machine has no effect.
While a little-endian system sending data to another little-endian system need not do any conversion, it's good to convert. This makes the code more portable across systems.
Are there situations where byte ordering doesn't matter?
When data is stored and processed as a sequence of single bytes (not shorts or longs), then byte ordering doesn't matter. No conversions are required when receiving or sending such data into the network.
ASCII strings are stored as a sequence of single bytes. Byte ordering doesn't matter. Byte ordering for Unicode strings depends on the type of encoding used. If encoding is UTF-8, ordering doesn't matter since encoding is a sequence of single bytes. If encoding is UTF-16, then byte ordering matters.
When storing TIFF images, byte ordering matters since pixels are stored as words. GIF and JPEG images don't care about byte ordering since storage is not word oriented.
What's the purpose of Byte Order Mark (BOM)?
An alternative to using host-to-network and network-to-host conversions is to send the data along with an indication of the ordering that's being used. This ordering is indicated with additional two bytes, known as Byte Order Mark (BOM).
The BOM could have any agreed upon value but 0xFEFF is common. If a machine reads this as 0xFFFE, it implies that ordering is different from the machine's ordering and conversion is required before processing the data further.
BOM adds overhead. For example, sending two bytes of data incurs an overhead of additional two bytes. Problems can also arise if a program forgets to add the BOM or data payload starts with the BOM by coincidence.
BOM is not required for single-byte UTF-8 encoding. However, some editors such as Notepad on Windows may include BOM (three bytes, EFBBBF) to indicate UTF-8 encoding.
What are the pros and cons of little-endian and big-endian systems?
In little-endian systems, just reading the lowest byte is enough to know if the number is odd or even. This may be an advantage for low-level processing. Big-endian systems have a similar advantage: lowest byte can tell us if a signed integer is positive or negative.
Typecasting (say from
int8_t) is easier in little-endian systems. Because of the simple relationship between address offset and byte number, little-endian can be easier for writing math routines.
During low-level debugging, programmers can see bytes stored from low address to high address, in left-to-right ordering. Big-endian systems store in the same left-to-right order and this makes debugging easier. For the same reason, binary to decimal routines are easier.
For most part, programmers have to deal with both systems. Each system evolved separately and therefore it's hard to complain about not having a single system.
Is the concept of endianness applicable for instructions?
Endianness is applicable for multi-byte numeric values. Instructions are not numeric values and therefore endianness is not relevant. However, an instruction may contain 16-bit integers, addresses or other values. The byte ordering of these parts is important.
For example, 8051 has a
LCALLinstruction that stores the address of the next instruction on the stack. Address is pushed to stack in little-endian format. However,
LCALLinstructions contain 16-bit addresses that are in big-endian format.
How can I check the endianness of my system?
On Linux and Mac, the command
lscpucan be used to find endianness.
Developers can also write a simple program to determine the endianness of the host machine. In a C program, for example, store two bytes in memory. Then use a
short *ptrto point to the lower address. Dereference this pointer to obtain a short value, which will tell us if the machine is little-endian or big-endian.
Do systems differ in the ordering of bits within a byte?
Bits within a byte are commonly numbered as Bit0 for the least significant bit and Bit7 for the most significant bit. Thus, bit numbering in a 32-bit integer will be left-to-right order in big-endian, and right-to-left in little-endian. However, some systems such as the OpenXC vehicle interface use the opposite numbering in which the least significant bit is Bit7. Note that in either case, the content remains the same, only the numbering is different.
Danny Cohen in his classic paper on the subject of endianness notes some examples where bit numbering was inconsistent in early computer systems. For example, M68000 was big-endian but the bit numbering resembled little-endian.
In digital interfaces, bit ordering matters. In Serial Peripheral Interface (SPI), this can be configured based on what both devices support. In I2C, most significant bit is sent first. In UART, either ordering is fine and must be configured correctly at both ends. If not, sending least significant bit first is usually assumed.
First samples of an ARM processor came out in 1985. It's only in 1992 with the release of ARM6 that the processor becomes bi-endian. Legacy big-endian is supported for both instructions and data. Otherwise, instructions are little-endian. Data is little-endian or big-endian as configured. While byte ordering can be configured in software for some ARM processors, for others such as ARM-Cortex M3, the order is determined by a configuration pin that's sampled at reset.
- ARM. 2007. "Cortex™-M3: Technical Reference Manual." Revision r1p1, June 13. Accessed 2019-04-27.
- ARM. 2009. "ARM1176JZ-S™: Technical Reference Manual." Revision r0p7, November 27. Accessed 2019-04-27.
- Arora, Himanshu. 2018. "Linux lscpu Command Tutorial for Beginners (5 Examples)." HowtoForge, March 01. Accessed 2019-04-27.
- Azad, Kalid. 2006. "Understanding Big and Little Endian Byte Order." Better Explained, September 19. Accessed 2019-04-25.
- Cohen, Danny. 1980. "On Holy Wars and a Plea for Peace." IEN 137, IETF, April 01. Accessed 2019-04-25.
- Doulos. 2019. "Byte Swapping using ARMv6 and ARMv7-A/R instructions." Accessed 2019-04-25.
- Edwards, Philip Eddie. 2004. "Memory." November 04. Updated by Ian Harries. Accessed 2019-04-25.
- Gillespy, Thurman and Alan Rowberg. 1993. "Radiological images on personal computers: Introduction and fundamental principles of digital images." Journal of Digital Imaging, vol. 6, no. 2, pp. 81-7, June. Accessed 2019-04-25.
- Grusin, Mike. 2013. "Serial Peripheral Interface (SPI)." SparkFun, January 14. Accessed 2019-04-27.
- Hackjutsu. 2016. "Network Byte Orders." Hackjutsu Dojo, August 09. Accessed 2019-04-25.
- Ippolito, Greg. 2017. "Endianness: Big and Little Endian Byte Order." Tutorial, YoLinux. Accessed 2020-07-20.
- Ishida, Richard. 2010. "The byte-order mark (BOM) in HTML." W3C, August 10. Updated 2013-01-31. Accessed 2020-08-28.
- Keil. 2004. "Appendix E: Byte Ordering." Cx51 User's Guide, Keil Embedded Development Tools, Keil, ARM. Accessed 2019-04-25.
- Kholodov, Igor. 2007. "Little Endian Example." CIS-77 Introduction to Computer Systems, Computer Information Systems Department, Bristol Community College. Accessed 2019-04-25.
- Kleinespel, Conrad. 2017. "Endianness and the coin toss." October 30. Accessed 2019-04-25.
- Lindblom, Jim. 2012. "Serial Communication." SparkFun, December 18. Accessed 2019-04-27.
- Nguyen, Uy. 2018. "Big Endian vs Little Endian." April 30. Accessed 2019-04-25.
- OpenBSD. 2019. "HTONL(3)." Library Functions Manual, OpenBSD, February 13. Accessed 2019-04-27.
- OpenXC Docs. 2017. "Bit Numbering and Byte Order." OpenXC Docs, Ford Motor Company, September 07. Accessed 2019-04-25.
- PCMag. 2019. "Definition of: byte order." Encyclopedia, PCMag. Accessed 2019-04-25.
- Poser, Bill. 2003. "The Origin of the Terms Big-Endian and Little-Endian." Linguistics 538: Computational Methods in Linguistic Research, The Department of Linguistics, University of Pennsylvania. Accessed 2019-04-27.
- Rubenstein, Dan. 2003. "Socket Programming." Computer Science, Columbia University. Accessed 2019-04-25.
- Saha, Dola. 2018. "Serial Communication: I2C, UART & USB." Cyber-Physical Systems, ICEN 553/453, University at Albany. Accessed 2019-04-27.
- Schwartz, David. 2019. "Endianness of instructions." Stackoverflow, April 26. Accessed 2019-04-27.
- The Centre for Computing History. 2019. "Digital Micro PDP-11/23." Accessed 2019-04-27.
- Unicode. 2017. "UTF-8, UTF-16, UTF-32 & BOM." Frequently Asked Questions, Unicode, June 27. Accessed 2019-04-25.
- Wikipedia. 2018. "History of the Berkeley Software Distribution." Wikipedia, December 31. Accessed 2019-04-27.
- Wikipedia. 2019b. "Comparison of instruction set architectures." Wikipedia, March 10. Accessed 2019-04-25.
- Wikipedia. 2019c. "ARM architecture." Wikipedia, April 26. Accessed 2019-04-27.
- bryanpkc. 2017. "Booting the Raspberry Pi into big-endian mode?" Raspberry Pi Forum, May 15. Accessed 2019-04-27.
- Instruction Set Architecture
- Computer Architecture
- Computer Memory
- Units of Information
- CPU Design