Byte Ordering

Examples of systems in different byte ordering. Source: Gillespy and Rowberg 1993, table 2.
Examples of systems in different byte ordering. Source: Gillespy and Rowberg 1993, table 2.

A byte (of 8 bits) has a limited range of 256 values. When a value is beyond this range, it has to be stored in multiple bytes. A number such as 753 in hexadecimal format is 0x02F1. It requires at least two bytes of storage. The order in which these two bytes are stored in memory can be different. Byte 0x02 can be stored in lower memory address followed by 0xF1; or vice versa.

Programs must conform to the byte ordering as supported by the processor. If not, 0x02F1 might be wrongly interpreted as 0xF102, which is the number 61698 in decimal system. Byte ordering is also important when data is transferred across a network or between systems using different ordering.

Byte ordering is an attribute of the processor, not the operating system running on it.

Discussion

  • Which are the different byte orderings present in computing systems?
    Comparing Little-Endian and Big-Endian ordering. Source: Hackjutsu 2016.
    Comparing Little-Endian and Big-Endian ordering. Source: Hackjutsu 2016.

    Two common ordering systems include,

    • Little-Endian: Low-order byte is stored at a lower address. This is also called Intel order since Intel's x86 family of CPUs popularized this ordering. Intel, AMD, PDP-11 and VAX are little-endian systems.
    • Big-Endian: High-order byte is stored at a lower address. This is also called Motorola order since Motorola's PowerPC architecture used this ordering. Motorola 68K and IBM mainframes are big-endian systems.

    Some processors support both ordering and are therefore called Bi-Endian. PowerPC and Itanium are bi-endian. Bi-Endian processers can switch between the two orderings. Most RISC architectures (SPARC, PowerPC, MIPS) were originally big-endian but are now configurable. While ARM processors are bi-endian, the default is to run them as little-endian systems, as seen in the Raspberry Pi.

    Due to the popular adoption of x86-based systems (Intel, AMD, etc.) and ARM, little-endian systems have come to dominate the market.

  • Could you compare host-byte vs network-byte ordering?
    Illustrating the use of htonl and ntohl functions. Source: Adapted from Rubenstein 2003, slide 20.
    Illustrating the use of htonl and ntohl functions. Source: Adapted from Rubenstein 2003, slide 20.

    Since computers using different byte ordering, and exchanging data, have to operate correctly on data, the convention is to always send data on the network in big-endian format. We call this network-byte ordering. The ordering used on a computer is called host-byte ordering.

    A host system may be little-endian but when sending data into the network, it must convert data into big-endian format. Likewise, a little-endian machine must first convert network data into little-endian before processing it.

    Four common functions to do these conversions are (for 32-bit long and 16-bit short) htonl, ntohl, htons and ntohs.

    Even if your code doesn't do networking, data may be stored in a different ordering. For example, file data is stored in big-endian while your machine is big-endian. In some cases, CPU instructions may be available for conversion. Intel's 64-bit instruction set has BSWAP to swap byte ordering. Since ARMv6, REV swaps byte order and SETEND to set the endianness.

  • Aren't conversions redundant for big-endian machines since it already conforms to network-byte ordering?

    It's good programming practice to invoke these conversions since this makes your code portable. The same codebase can be compiled for a little-endian machine and it will work. Calling the conversion functions on a big-endian machine has no effect.

    While a little-endian system sending data to another little-endian system need not do any conversion, it's good to convert. This makes the code more portable across systems.

  • Are there situations where byte ordering doesn't matter?
    Endianness in different encodings of Unicode. Source: Unicode 2017.
    Endianness in different encodings of Unicode. Source: Unicode 2017.

    When data is stored and processed as a sequence of single bytes (not shorts or longs), then byte ordering doesn't matter. No conversions are required when receiving or sending such data into the network.

    ASCII strings are stored as a sequence of single bytes. Byte ordering doesn't matter. Byte ordering for Unicode strings depends on the type of encoding used. If encoding is UTF-8, ordering doesn't matter since encoding is a sequence of single bytes. If encoding is UTF-16, then byte ordering matters.

    When storing TIFF images, byte ordering matters since pixels are stored as words. GIF and JPEG images don't care about byte ordering since storage is not word oriented.

  • What's the purpose of Byte Order Mark (BOM)?
    BOM used with UTF-16 encoding. Source: Ishida 2010.
    BOM used with UTF-16 encoding. Source: Ishida 2010.

    An alternative to using host-to-network and network-to-host conversions is to send the data along with an indication of the ordering that's being used. This ordering is indicated with additional two bytes, known as Byte Order Mark (BOM).

    The BOM could have any agreed upon value but 0xFEFF is common. If a machine reads this as 0xFFFE, it implies that ordering is different from the machine's ordering and conversion is required before processing the data further.

    BOM adds overhead. For example, sending two bytes of data incurs an overhead of additional two bytes. Problems can also arise if a program forgets to add the BOM or data payload starts with the BOM by coincidence.

    BOM is not required for single-byte UTF-8 encoding. However, some editors such as Notepad on Windows may include BOM (three bytes, EFBBBF) to indicate UTF-8 encoding.

  • What are the pros and cons of little-endian and big-endian systems?
    Little-endian byte ordering seen when debugging. Source: Kholodov 2007.
    Little-endian byte ordering seen when debugging. Source: Kholodov 2007.

    In little-endian systems, just reading the lowest byte is enough to know if the number is odd or even. This may be an advantage for low-level processing. Big-endian systems have a similar advantage: lowest byte can tell us if a signed integer is positive or negative.

    Typecasting (say from int16_t to int8_t) is easier in little-endian systems. Because of the simple relationship between address offset and byte number, little-endian can be easier for writing math routines.

    During low-level debugging, programmers can see bytes stored from low address to high address, in left-to-right ordering. Big-endian systems store in the same left-to-right order and this makes debugging easier. For the same reason, binary to decimal routines are easier.

    For most part, programmers have to deal with both systems. Each system evolved separately and therefore it's hard to complain about not having a single system.

  • Is the concept of endianness applicable for instructions?

    Endianness is applicable for multi-byte numeric values. Instructions are not numeric values and therefore endianness is not relevant. However, an instruction may contain 16-bit integers, addresses or other values. The byte ordering of these parts is important.

    For example, 8051 has a LCALL instruction that stores the address of the next instruction on the stack. Address is pushed to stack in little-endian format. However, LJMP and LCALL instructions contain 16-bit addresses that are in big-endian format.

  • How can I check the endianness of my system?

    On Linux and Mac, the command lscpu can be used to find endianness.

    Developers can also write a simple program to determine the endianness of the host machine. In a C program, for example, store two bytes in memory. Then use a short *ptr to point to the lower address. Dereference this pointer to obtain a short value, which will tell us if the machine is little-endian or big-endian.

  • Do systems differ in the ordering of bits within a byte?

    Bits within a byte are commonly numbered as Bit0 for the least significant bit and Bit7 for the most significant bit. Thus, bit numbering in a 32-bit integer will be left-to-right order in big-endian, and right-to-left in little-endian. However, some systems such as the OpenXC vehicle interface use the opposite numbering in which the least significant bit is Bit7. Note that in either case, the content remains the same, only the numbering is different.

    Danny Cohen in his classic paper on the subject of endianness notes some examples where bit numbering was inconsistent in early computer systems. For example, M68000 was big-endian but the bit numbering resembled little-endian.

    In digital interfaces, bit ordering matters. In Serial Peripheral Interface (SPI), this can be configured based on what both devices support. In I2C, most significant bit is sent first. In UART, either ordering is fine and must be configured correctly at both ends. If not, sending least significant bit first is usually assumed.

Milestones

1970

PDP-11 released by DEC is probably the first computer to adopt little-endian ordering.

1980

The terms big-endian and little-endian are used for the first time in the context of byte ordering. The terms are inspired by Jonathan Swift's novel titled Gulliver's Travels.

1983

Byte order conversion functions htonl, ntohl, htons and ntohs are introduced in BSD 4.2 release.

1992

First samples of an ARM processor came out in 1985. It's only in 1992 with the release of ARM6 that the processor becomes bi-endian. Legacy big-endian is supported for both instructions and data. Otherwise, instructions are little-endian. Data is little-endian or big-endian as configured. While byte ordering can be configured in software for some ARM processors, for others such as ARM-Cortex M3, the order is determined by a configuration pin that's sampled at reset.

Sample Code

  • // Source: http://hackjutsu.com/2016/08/09/Network%20Byte%20Orders/
    // Accessed: 2019-04-25
     
    // Program to determine the endianness of your machine
    #include <stdio.h>
    int main(int argc, char **argv) {
       union {
          short s;
          char c[sizeof(short)];
       }un;
       un.s = 0x0102;
       if (sizeof(short) == 2) {
          if (un.c[0] == 1 && un.c[1] == 2)
             printf("big-endian\n");
          else if (un.c[0] == 2 && un.c[1] == 1)
             printf("little-endian\n");
          else
             printf("unknown\n");
       }
       else {
          printf("sizeof(short) = %d\n", sizeof(short));
       }
       exit(0);
    }
     

References

  1. ARM. 2007. "Cortex™-M3: Technical Reference Manual." Revision r1p1, June 13. Accessed 2019-04-27.
  2. ARM. 2009. "ARM1176JZ-S™: Technical Reference Manual." Revision r0p7, November 27. Accessed 2019-04-27.
  3. Arora, Himanshu. 2018. "Linux lscpu Command Tutorial for Beginners (5 Examples)." HowtoForge, March 01. Accessed 2019-04-27.
  4. Azad, Kalid. 2006. "Understanding Big and Little Endian Byte Order." Better Explained, September 19. Accessed 2019-04-25.
  5. Cohen, Danny. 1980. "On Holy Wars and a Plea for Peace." IEN 137, IETF, April 01. Accessed 2019-04-25.
  6. Doulos. 2019. "Byte Swapping using ARMv6 and ARMv7-A/R instructions." Accessed 2019-04-25.
  7. Edwards, Philip Eddie. 2004. "Memory." November 04. Updated by Ian Harries. Accessed 2019-04-25.
  8. Gillespy, Thurman and Alan Rowberg. 1993. "Radiological images on personal computers: Introduction and fundamental principles of digital images." Journal of Digital Imaging, vol. 6, no. 2, pp. 81-7, June. Accessed 2019-04-25.
  9. Grusin, Mike. 2013. "Serial Peripheral Interface (SPI)." SparkFun, January 14. Accessed 2019-04-27.
  10. Hackjutsu. 2016. "Network Byte Orders." Hackjutsu Dojo, August 09. Accessed 2019-04-25.
  11. Ippolito, Greg. 2017. "Endianness: Big and Little Endian Byte Order." Tutorial, YoLinux. Accessed 2020-07-20.
  12. Ishida, Richard. 2010. "The byte-order mark (BOM) in HTML." W3C, August 10. Updated 2013-01-31. Accessed 2020-08-28.
  13. Keil. 2004. "Appendix E: Byte Ordering." Cx51 User's Guide, Keil Embedded Development Tools, Keil, ARM. Accessed 2019-04-25.
  14. Kholodov, Igor. 2007. "Little Endian Example." CIS-77 Introduction to Computer Systems, Computer Information Systems Department, Bristol Community College. Accessed 2019-04-25.
  15. Kleinespel, Conrad. 2017. "Endianness and the coin toss." October 30. Accessed 2019-04-25.
  16. Lindblom, Jim. 2012. "Serial Communication." SparkFun, December 18. Accessed 2019-04-27.
  17. Nguyen, Uy. 2018. "Big Endian vs Little Endian." April 30. Accessed 2019-04-25.
  18. OpenBSD. 2019. "HTONL(3)." Library Functions Manual, OpenBSD, February 13. Accessed 2019-04-27.
  19. OpenXC Docs. 2017. "Bit Numbering and Byte Order." OpenXC Docs, Ford Motor Company, September 07. Accessed 2019-04-25.
  20. PCMag. 2019. "Definition of: byte order." Encyclopedia, PCMag. Accessed 2019-04-25.
  21. Poser, Bill. 2003. "The Origin of the Terms Big-Endian and Little-Endian." Linguistics 538: Computational Methods in Linguistic Research, The Department of Linguistics, University of Pennsylvania. Accessed 2019-04-27.
  22. Rubenstein, Dan. 2003. "Socket Programming." Computer Science, Columbia University. Accessed 2019-04-25.
  23. Saha, Dola. 2018. "Serial Communication: I2C, UART & USB." Cyber-Physical Systems, ICEN 553/453, University at Albany. Accessed 2019-04-27.
  24. Schwartz, David. 2019. "Endianness of instructions." Stackoverflow, April 26. Accessed 2019-04-27.
  25. The Centre for Computing History. 2019. "Digital Micro PDP-11/23." Accessed 2019-04-27.
  26. Unicode. 2017. "UTF-8, UTF-16, UTF-32 & BOM." Frequently Asked Questions, Unicode, June 27. Accessed 2019-04-25.
  27. Wikipedia. 2018. "History of the Berkeley Software Distribution." Wikipedia, December 31. Accessed 2019-04-27.
  28. Wikipedia. 2019b. "Comparison of instruction set architectures." Wikipedia, March 10. Accessed 2019-04-25.
  29. Wikipedia. 2019c. "ARM architecture." Wikipedia, April 26. Accessed 2019-04-27.
  30. bryanpkc. 2017. "Booting the Raspberry Pi into big-endian mode?" Raspberry Pi Forum, May 15. Accessed 2019-04-27.

Further Reading

  1. Azad, Kalid. 2006. "Understanding Big and Little Endian Byte Order." Better Explained, September 19. Accessed 2019-04-25.
  2. Cohen, Danny. 1980. "On Holy Wars and a Plea for Peace." IEN 137, IETF, April 01. Accessed 2019-04-25.
  3. Wikipedia. 2019a. "Endianness." Wikipedia, April 21. Accessed 2019-04-27.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
0
2830
4
1
66
1482
Words
16
Likes
33K
Hits

Cite As

Devopedia. 2020. "Byte Ordering." Version 7, August 28. Accessed 2023-11-12. https://devopedia.org/byte-ordering
Contributed by
2 authors


Last updated on
2020-08-28 08:47:13