Byte Ordering

A byte (of 8 bits) has a limited range of 256 values. When a value is beyond this range, it has to be stored in multiple bytes. A number such as 753 in hexadecimal format is 0x02F1. It requires at least two bytes of storage. The order in which these two bytes are stored in memory can be different. Byte 0x02 can be stored in lower memory address followed by 0xF1; or vice versa.

Programs must conform to the byte ordering as supported by the processor. If not, 0x02F1 might be wrongly interpreted as 0xF102, which is the number 61698 in decimal system. Byte ordering is also important when data is transferred across a network or between systems using different ordering.

Byte ordering is an attribute of the processor, not the operating system running on it.

Discussion

Which are the different byte orderings present in computing systems?
Comparing Little-Endian and Big-Endian ordering. Source: Hackjutsu 2016.
Two common ordering systems include,
- Little-Endian: Low-order byte is stored at a lower address. This is also called Intel order since Intel's x86 family of CPUs popularized this ordering. Intel, AMD, PDP-11 and VAX are little-endian systems.
- Big-Endian: High-order byte is stored at a lower address. This is also called Motorola order since Motorola's PowerPC architecture used this ordering. Motorola 68K and IBM mainframes are big-endian systems.
Some processors support both ordering and are therefore called Bi-Endian. PowerPC and Itanium are bi-endian. Bi-Endian processers can switch between the two orderings. Most RISC architectures (SPARC, PowerPC, MIPS) were originally big-endian but are now configurable. While ARM processors are bi-endian, the default is to run them as little-endian systems, as seen in the Raspberry Pi.
Due to the popular adoption of x86-based systems (Intel, AMD, etc.) and ARM, little-endian systems have come to dominate the market.
Could you compare host-byte vs network-byte ordering?
Illustrating the use of htonl and ntohl functions. Source: Adapted from Rubenstein 2003, slide 20.
Since computers using different byte ordering, and exchanging data, have to operate correctly on data, the convention is to always send data on the network in big-endian format. We call this network-byte ordering. The ordering used on a computer is called host-byte ordering.
A host system may be little-endian but when sending data into the network, it must convert data into big-endian format. Likewise, a little-endian machine must first convert network data into little-endian before processing it.
Four common functions to do these conversions are (for 32-bit long and 16-bit short) htonl, ntohl, htons and ntohs.
Even if your code doesn't do networking, data may be stored in a different ordering. For example, file data is stored in big-endian while your machine is big-endian. In some cases, CPU instructions may be available for conversion. Intel's 64-bit instruction set has BSWAP to swap byte ordering. Since ARMv6, REV swaps byte order and SETEND to set the endianness.
Aren't conversions redundant for big-endian machines since it already conforms to network-byte ordering?
It's good programming practice to invoke these conversions since this makes your code portable. The same codebase can be compiled for a little-endian machine and it will work. Calling the conversion functions on a big-endian machine has no effect.
While a little-endian system sending data to another little-endian system need not do any conversion, it's good to convert. This makes the code more portable across systems.
Are there situations where byte ordering doesn't matter?
Endianness in different encodings of Unicode. Source: Unicode 2017.
When data is stored and processed as a sequence of single bytes (not shorts or longs), then byte ordering doesn't matter. No conversions are required when receiving or sending such data into the network.
ASCII strings are stored as a sequence of single bytes. Byte ordering doesn't matter. Byte ordering for Unicode strings depends on the type of encoding used. If encoding is UTF-8, ordering doesn't matter since encoding is a sequence of single bytes. If encoding is UTF-16, then byte ordering matters.
When storing TIFF images, byte ordering matters since pixels are stored as words. GIF and JPEG images don't care about byte ordering since storage is not word oriented.
What's the purpose of Byte Order Mark (BOM)?
BOM used with UTF-16 encoding. Source: Ishida 2010.
An alternative to using host-to-network and network-to-host conversions is to send the data along with an indication of the ordering that's being used. This ordering is indicated with additional two bytes, known as Byte Order Mark (BOM).
The BOM could have any agreed upon value but 0xFEFF is common. If a machine reads this as 0xFFFE, it implies that ordering is different from the machine's ordering and conversion is required before processing the data further.
BOM adds overhead. For example, sending two bytes of data incurs an overhead of additional two bytes. Problems can also arise if a program forgets to add the BOM or data payload starts with the BOM by coincidence.
BOM is not required for single-byte UTF-8 encoding. However, some editors such as Notepad on Windows may include BOM (three bytes, EFBBBF) to indicate UTF-8 encoding.
What are the pros and cons of little-endian and big-endian systems?
Little-endian byte ordering seen when debugging. Source: Kholodov 2007.
In little-endian systems, just reading the lowest byte is enough to know if the number is odd or even. This may be an advantage for low-level processing. Big-endian systems have a similar advantage: lowest byte can tell us if a signed integer is positive or negative.
Typecasting (say from int16_t to int8_t) is easier in little-endian systems. Because of the simple relationship between address offset and byte number, little-endian can be easier for writing math routines.
During low-level debugging, programmers can see bytes stored from low address to high address, in left-to-right ordering. Big-endian systems store in the same left-to-right order and this makes debugging easier. For the same reason, binary to decimal routines are easier.
For most part, programmers have to deal with both systems. Each system evolved separately and therefore it's hard to complain about not having a single system.
Is the concept of endianness applicable for instructions?
Endianness is applicable for multi-byte numeric values. Instructions are not numeric values and therefore endianness is not relevant. However, an instruction may contain 16-bit integers, addresses or other values. The byte ordering of these parts is important.
For example, 8051 has a LCALL instruction that stores the address of the next instruction on the stack. Address is pushed to stack in little-endian format. However, LJMP and LCALL instructions contain 16-bit addresses that are in big-endian format.
How can I check the endianness of my system?
On Linux and Mac, the command lscpu can be used to find endianness.
Developers can also write a simple program to determine the endianness of the host machine. In a C program, for example, store two bytes in memory. Then use a short *ptr to point to the lower address. Dereference this pointer to obtain a short value, which will tell us if the machine is little-endian or big-endian.
Do systems differ in the ordering of bits within a byte?
Bits within a byte are commonly numbered as Bit0 for the least significant bit and Bit7 for the most significant bit. Thus, bit numbering in a 32-bit integer will be left-to-right order in big-endian, and right-to-left in little-endian. However, some systems such as the OpenXC vehicle interface use the opposite numbering in which the least significant bit is Bit7. Note that in either case, the content remains the same, only the numbering is different.
Danny Cohen in his classic paper on the subject of endianness notes some examples where bit numbering was inconsistent in early computer systems. For example, M68000 was big-endian but the bit numbering resembled little-endian.
In digital interfaces, bit ordering matters. In Serial Peripheral Interface (SPI), this can be configured based on what both devices support. In I2C, most significant bit is sent first. In UART, either ordering is fine and must be configured correctly at both ends. If not, sending least significant bit first is usually assumed.

Milestones

1970

PDP-11 released by DEC is probably the first computer to adopt little-endian ordering.

1980

The terms big-endian and little-endian are used for the first time in the context of byte ordering. The terms are inspired by Jonathan Swift's novel titled Gulliver's Travels.

1983

Byte order conversion functions htonl, ntohl, htons and ntohs are introduced in BSD 4.2 release.

1992

First samples of an ARM processor came out in 1985. It's only in 1992 with the release of ARM6 that the processor becomes bi-endian. Legacy big-endian is supported for both instructions and data. Otherwise, instructions are little-endian. Data is little-endian or big-endian as configured. While byte ordering can be configured in software for some ARM processors, for others such as ARM-Cortex M3, the order is determined by a configuration pin that's sampled at reset.

Sample Code

// Source: http://hackjutsu.com/2016/08/09/Network%20Byte%20Orders/
// Accessed: 2019-04-25
 
// Program to determine the endianness of your machine
#include <stdio.h>
int main(int argc, char **argv) {
   union {
      short s;
      char c[sizeof(short)];
   }un;
   un.s = 0x0102;
   if (sizeof(short) == 2) {
      if (un.c[0] == 1 && un.c[1] == 2)
         printf("big-endian\n");
      else if (un.c[0] == 2 && un.c[1] == 1)
         printf("little-endian\n");
      else
         printf("unknown\n");
   }
   else {
      printf("sizeof(short) = %d\n", sizeof(short));
   }
   exit(0);
}

; Source: https://www.doulos.com/knowhow/arm/Hints_and_Tips/Byte_Swapping/
; Accessed 2019-04-27
 
; ARM assembly instructions can be used to change processor endianness
; when we need to convert large blocks of data.
 
__asm void *be2le4w(uint32_t *blk, size_t len)
{
    CMP  r1, #0           ; cover trivial case
    BXEQ lr
 
    PUSH { r4, r5 }       ; set up stack
    MOV  r12, r0          ; save pointer
 
loop
    SETEND BE             ; switch to big endian
    LDM  r12, { r2-r5 }   ; and load four words
                          ;   from memory
    SETEND LE             ; switch to little endian
    STM  r12!, { r2-r5 }  ; store four words to memory 
                          ;   and advance pointer
    SUBS r1, r1, #16      ; decrement counter
                          ;   (16 bytes transfered)
    BGT  loop             ; finished?
 
exit
    POP  { r4, r5 }       ; restore registers
    BX   lr               ; return
}

// Source: https://uynguyen.github.io/2018/04/30/Big-Endian-vs-Little-Endian/
// Accessed: 2019-04-27
 
// Illustrating Objective-C methods that support byte order conversions.
NSString *strData = @"001E653A";
NSData *data = [NSData dataWithHexString:strData];
uint8_t *bytes = (uint8_t *)data.bytes;
 
/* Functions for loading little endian to host endianess. */
uint16_t firstInLittle = OSReadLittleInt16(bytes, 0); // 0x1E00 = 7680
uint16_t secondInLittle = OSReadLittleInt16(bytes, 2); // 0x3A65 = 14949
 
uint16_t firstInBig = OSReadBigInt16(bytes, 0); // 0x001E = 30
uint16_t secondInBig = OSReadBigInt16(bytes, 2); // 0x653A = 25914 
 
/* Functions for storing host endianess to little endian. */
uint8_t byte16[2];
OSWriteLittleInt16(byte16, 0, firstInLittle); // byte16 = [0x00, 0x1E]

References

Article Stats

1482

Words

Authors

Edits

Chats

Likes

33K

Hits

Cite As

Devopedia. 2020. "Byte Ordering." Version 7, August 28. Accessed 2023-11-12. https://devopedia.org/byte-ordering

Contributed by
2 authors

Last updated on
2020-08-28 08:47:13

system processor type memory

Instruction Set Architecture
Computer Architecture
Computer Memory
Units of Information
CPU Design

Byte Ordering

Discussion

Milestones

Sample Code

References

Further Reading

Article Stats

Cite As

See Also

Byte Ordering

Discussion

Milestones

Sample Code

References

Further Reading

Article Stats

Author-wise Stats for Article Edits

Cite As

See Also

Login