Error checking routines

V0.01 21-Feb-03

1. Introduction

There are many circumstances when it is important to know that the data you are dealing with has no errors in. For example, if you are sending data to your robot over a radio link, if this data gets corrupted, you may end up steering left when you wanted to go right. I'm sure you can imagine worse consequences than this too! There are other occasions where data may be stored in EEPROM on board, or very important data may be in RAM, and it is essential that this data is confirmed to be correct before using it.

<p>There are two levels of protection that can be performed in these situations - error detection, and error correction. Just detecting that the data is corrupt may be sufficient. For example, if a radio data message is sent continuously from the transmitter to the robot, and one packet is determined to be corrupted, it may be OK just to throw it away and wait for the next packet to come along. However, if a message is only sent once, and it is detected to be corrupted, you may want to correct the errors locally.

 

2. Efficiency

Whether the errors are to be corrected or just detected, it is necessary to transmit extra information along with the data, so it can be compared. The simplest form may be to send every message twice, then if the two messages disagree, then an error has been detected.

This is quite an inefficient method though. To get N data bytes across the link, we had to transmit 2N bytes. That is an efficiency of 50%. In general, if we transmit N data bytes, and we need to send X more bytes to detect errors, then the efficiency is given by:

e = N / (N + X)

 

3. Error detection schemes

Sending everything twice appears to be a very inefficient method if all we want to do is determine whether an error occurred or not.

3.1. Parity

A more efficient scheme is built into many UARTs (Universal Asynchronous Receiver/Transmitters) which most microcontroller will have, and every PC will have. This is Parity Checking. This involves sending one extra bit at the end of every byte. The bit is set to 1 if there are an odd number of bits in the byte, and set to 0 if there are an even number. Therefore, including the parity bit itself, there will always be an even number of bits in every byte. For example:

Byte

+ Parity bit

Result

1 1 1 0 0 0 1 1

1

1 1 1 0 0 0 1 1 1

0 0 0 0 0 0 0 0

0

0 0 0 0 0 0 0 0 0

1 0 1 0 1 0 1 0

0

1 0 1 0 1 0 1 0 0

This scheme is called Even parity because it always ends up with an even number of 1s. There is another method called Odd parity, which is the same except the parity bit is 1 if there is an even number of bits in the original byte, and 0 otherwise. This would result in an odd number of 1s.

Parity checking is built into most UARTs, and is well worth using since it requires no extra work on your part. The UART will have a status register with a flag to indicate whether a parity error has occurred. Often, on micro- controllers, these can be made to cause an interrupt if a parity error occurs.

Assuming that 8-bit bytes are sent, the efficiency of parity checking is

e = 8 / (8 + 1) = 89%

 

3.2. Checksums

In addition to using parity checking, adding a checksum is fairly easy to implement. This involves adding every byte that is sent together, and sending the result after the last byte of data.

For example, assuming the data bytes to send are (in hexadecimal) 1A 3F 55 20, then adding all these together gives CE. Therefore the complete message will be 1A 3F 55 20 CE. At the receiving end, all but the final bytes are added together and checked against the final byte. If they are different, an error has occurred.

Note that the addition must be done by throwing away any carries into the next byte. For example (in hexadecimal), 90 + AC = 13C. However, we can only send bytes, so the 1 carried into the third column is thrown away, and the checksum is then 3C. However many data bytes are added together, the result is still just one byte wide. If you really wanted to, you could make the checksum 2 bytes wide - it is entirely up to you, but it will affect the efficiency. Adding and throwing away the carry like this is called modulo addition. The example above shows adding "modulo 256". This means that if the result ever gets larger than 256, then the carry is thrown away (256 is taken away from it), and the result will always be less than 256.

The efficiency of using checksums is:

e = N / (N + C)

where N is the number of data bytes sent, and C is the number of checksum bytes. For a 10 byte message with a single byte checksum, the efficiency is 91%.

Sometimes the checksum byte is negated, i.e. multiplied by –1 before being sent. This means that at the receiver, all the received bytes can be added together, then the result should equal zero. This makes coding at the receiver a little bit easier. Using the example data above:

Data = 1A 3F 55 20
Sum of all bytes = CE, so checksum = 32 (this is the two’s complement of CE)
Transmitted message = 1A 3F 55 20 32
At the receiver, adding all these bytes up (modulo 256) = 00

 

This function adds on a 1 byte checksum to a message.

/********************************************************/
/* AddChecksum                     */
/* Adds a checksum to a message             */
/* Parameters:                     */
/* Message – A null terminated string with room enough */
/*      for one more character – the checksum.  */
/********************************************************/
void AddChecksum(unsigned char *Message)
{
  unsigned char Checksum = 0x00;
  char *MessagePtr = Message;
 
  /* Go through the message adding the bytes up */
  while (*MessagePtr != 0)
    Checksum += *MessagePtr++;
 
  /* Now place the checksum and a new string */
  /* terminator at the end of the message. */
  *MessagePtr++ = Checksum;
  *MessagePtr = ‘\0’;
}

At the receiving end, the message is decoded…

/*********************************************************/
/* CheckMessage                     */
/* Check that the received message is error free     */
/* Parameters:                     */
/* Message – A null terminated string with the checksum */
/*      as the last byte before the terminator.  */
/* Returns:                       */
/* 0   Message checksum was correct         */
/* -1   Message checksum was wrong.         */
/*********************************************************/
int Checksum(unsigned char *Message)
{
  unsigned char Checksum = 0x00;
  char *MessagePtr = Message;
 
  /* Go through the message adding the bytes up */
  while (*MessagePtr != 0)
    Checksum += *MessagePtr++;
 
  /* Un-add the checksum byte, and compare with the checksum */
  Checksum –= *(MessagePtr-2);
  if (Checksum != *(MessagePtr-2))
    return –1;
  else
    return 0;
}

Here are functions that do the same thing, but use a negated checksum. Note that the receiving end is slightly simpler…

/********************************************************/
/* AddChecksum                     */
/* Adds a checksum to a mesaage             */
/* Parameters:                     */
/* Message – A null terminated string with room enough */
/*      for one more character – the checksum.  */
/********************************************************/
void AddChecksum(unsigned char *Message)
{
  unsigned char Checksum = 0x00;
  char *MessagePtr = Message;
 
  /* Go through the message adding the bytes up */
  while (*MessagePtr != 0)
    Checksum += *MessagePtr++;
 
  /* Now place the negated checksum and a new   */
  /* string terminator at the end of the message. */
  *MessagePtr++ = -Checksum;
  *MessagePtr = ‘\0’;
}

At the receiving end, the message is decoded…

/*********************************************************/
/* CheckMessage                     */
/* Check that the received message is error free     */
/* Parameters:                     */
/* Message – A null terminated string with the checksum */
/*      as the last byte before the terminator.  */
/* Returns:                       */
/* 0   Message checksum was correct         */
/* -1   Message checksum was wrong.         */
/*********************************************************/
int Checksum(unsigned char *Message)
{
  unsigned char Checksum = 0x00;
  char *MessagePtr = Message;
 
  /* Go through the message adding the bytes up */
  while (*MessagePtr != 0)
    Checksum += *MessagePtr++;
 
  /* Un-add the checksum byte, and compare with the checksum */
  if (Checksum != 0x00)
    return –1;
  else
    return 0;
}

3.3. Cyclic Redundancy Checking

The problem with checksums is that data is very often made up of long strings of exactly the same byte, often zeroes. However many zeroes are added together, the checksum answer will always be the same. It would be advantageous to have a function that comes up with a non-zero answer that also depends on the length of the data. Such a function is the CRC. There are sites on the web which explain exactly what it is and how it works, and here are some links to good ones.

http://www2.rad.com/networks/1994/err_con/crc.htm

http://www.4d.com/ACIDOC/CMU/CMU79909.HTM

http://www.semtech.com/support/crc-8_tutorial_pre.pdf

Ethernet packets have a 32-bit CRC. Many disk formats include a CRC at some level.

There are two basic forms of code that perform the calculation. The first does all the bit shifting and logical operations that the CRC defines. The second uses more program memory, but is a lot faster because it uses a lookup table to bypass a lot of the calculation. Many examples can be found on the web in addition the functions presented here.

/*********************************************************/
/* Crc8                         */
/* Calculates an 8-bit CRC from a data message using the */
/* CCITT-8 Polynomial, expressed as           */
/* X^8 + X^5 + X^4 + 1                 */
/* Parameters:                     */
/* DataPtr – Pointer to message.            */
/* Count - Length of the message.           */
/* Returns:                       */
/* Calculated CRC                   */
/*********************************************************/
unsigned char Crc8(const void* DataPtr, unsigned char Count)
{
  unsigned char Crc ;
  unsigned char Index ;
  unsigned char InputData ;
 
  Crc = CRC_SEED ;
 
  for ( ; Count > 0 ; Count--)
  {
    InputData = * (unsigned char*) DataPtr;
    for (Index = 0 ; Index < BITS_PER_BYTE ; Index++)
    {
      if ((InputData ^ Crc ) & 0x01)
      {
        /* shift and subtract poly */
        Crc = (( Crc ^ CRC_POLY ) >> 1 ) | 0x80 ;
      }
      else
      {
        /* transparent shift */
        Crc >>= 1 ;
      }
      InputData >>= 1 ;
    }
    /* move on to next byte */
    ((unsigned char*)DataPtr)++ ;
  }
  return Crc ;
}

Here’s a function that creates a 16-bit CRC. It is called for each byte that you want to add to an existing CRC. To use it, start with a CRC of zero, then call this function for each byte you want to add, and update the CRC with the result of the function.

/*********************************************************/
/* Crc                         */
/* Calculates a new 16-bit CRC from a new data byte and */
/* the previous CRC value.               */
/* X^8 + X^5 + X^4 + 1                 */
/* Parameters:                     */
/* data – next item of data to add.           */
/* accum – previous CRC value.             */
/* Returns:                       */
/* Updated CRC value                  */
/*********************************************************/
 
#define POLYNOMIAL 0x1021
unsigned int crc(unsigned int data, unsigned int accum)
{
  int i;
 
  data <<= 8;
  for (i = 0; i < 8; i++)
  {
    if ((data ^ accum) & 0x8000)
      accum = (accum << 1) ^ POLYNOMIAL;
    else
      accum <<= 1;
    data <<= 1;
  }
  return accum;
}

A site with plenty of implementation details can be found here: http://www.repairfaq.org/filipg/LINK/F_crc_v35.html

The efficiency of a CRC protected message is the same as that for the checksum…

e = N / (N + C)

where N is the number of bytes sent, and C is the number of bytes of the CRC.

4. Error correction schemes

To correct an error requires a bit more complexity. The simplest form is similar to the simplest form of detection scheme I mentioned above – send the message three times over. Then if there are any discrepancies, just take a 2 – 1 vote over what the bit value in question should be. The efficiency of this scheme is obviously only 33% which is a bit low. We can get better than this though

4.1. Hamming codes

This involves determining the parity of bytes, but in a special way. Say we wanted to send the same message as in section 3.2: 1A 3F 55 20. First, since parity deals with bits, we must write this message down at bit level:

 
1A =
3F =
55 =
20 =
Par =
Bytes
0 0 0 1 1 0 1 0
0 0 1 1 1 1 1 1
0 1 0 1 0 1 0 1
0 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0
Even Parity
1
0
0
1
<- Even parity
 

Here, we have calculated the even parity for each byte, and also calculated the even parity for each column of bits. The result is 32 bits (4 bytes) of data, and 12 bits of parity information. This can be sent as a 6 byte message:

1A 3F 55 20 50 09

where the fifth byte is the column parity, and the sixth is the row parity. At the receiving end, the same calculations can be performed, and the parities checked. If they all match, then there is no problem.

But what happens if one bit is in error? Let’s invert the second highest bit of the second byte, changing 3F to 7F:

 
1A =
3F =
55 =
20 =
Par =
Bytes
0 0 0 1 1 0 1 0
0 0 1 1 1 1 1 1
0 1 0 1 0 1 0 1
0 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0
Even Parity
1
0
0
1
<- Even parity
 

The bit in error, and the parity bits that now wrong are shown in red. Because we know which row and which column the faulty bit is in, we know the exact bit that is in error. Because we’re dealing with binary numbers, we also know what the correct bit should be, so we can fix it at the receiver.

Functions for performing the encoding and decoding are shown below. These are fixed for 8 byte messages, and produce two error correcting bytes.

Two ancillary functions are required, for calculating the number of bits set in a byte, and generating the even parity bit:

/********************************************************/
/* BitCount                       */
/* Counts number of bits set in byte.         */
/********************************************************/
int BitCount(unsigned char Byte)
{
  int Count = 0;
  unsigned char Mask = 0x80;
 
  /* Check each bit from the MSB to LSB */
  for (Mask = 0x80 ; Mask != 0x00 ; Mask >>= 1)
    if ((Byte & Mask) != 0)
      Count++;
 
  return Count;
}
 
/********************************************************/
/* EvenParity                     */
/* Return 1 if even parity, else 0.           */
/********************************************************/
unsigned char EvenParity(unsigned char Byte)
{
  /* If there is an odd number of 1s... */
  if (BitCount(Byte) & 0x01)
    return 1;
  else
    return 0;
}

This is the encoding (transmitter side) function:

/********************************************************/
/* AddHammingBytes                   */
/* Adds a checksum to a message             */
/* Parameters:                     */
/* Message - A null terminated string with room enough */
/*      for two more characters.         */
/********************************************************/
 
void AddHammingBytes(unsigned char *Message)
{
  unsigned char RowParity = 0x00;
  unsigned char ColParity = 0x00;
  unsigned char Temp = 0x00;
  unsigned char Mask;
  char i,j;
 
  /* Calculate the row parity */
  for (i=0 ; i<8 ; i++)
  {
    /* Move parity bit into LSB */
    RowParity = (RowParity << 1) | EvenParity(Message[i]);
  }
 
  /* Calculate the column parity */
  Mask = 0x80;
  for (j=0 ; j<8 ; j++)
  {
    for (i=0 ; i<8 ; i++)
    {
      /* Add modulo 1 (XOR) bits into appropriate ColParity bit */
      if ((Message[i] & Mask) > 0)
        ColParity = ColParity ^ Mask;
    }
    Mask = Mask >> 1;
  }
 
  Message[8] = ColParity;
  Message[9] = RowParity;
  Message[10] = '\0';
}

And here is the receiving side function:

/**********************************************************/
/* CorrectBytes                     */
/* Corrects single bit errors from message with 2 Hamming */
/* bytes appended.                    */
/* Parameters:                      */
/* Message - A null terminated string with room enough */
/*      for two more characters.          */
/* Returns:                       */
/* 1 - Message was OK                  */
/* 0 - Message was in error but has been fixed.     */
/* -1 - Message was in error but can't fix it.      */
/**********************************************************/
 
#define MESSAGE_OK      1
#define MESSAGE_FIXED     0
#define MESSAGE_UNFIXABLE   -1
 
int CorrectBytes(unsigned char *Message)
{
  unsigned char RowParity = 0x00;
  unsigned char ColParity = 0x00;
  unsigned char Temp = 0x00;
  unsigned char Mask;
  unsigned char RowError = 0;
  char i,j;
 
  /* First we calculate the row and column parities in the same */
  /* was as the encoding function, and see if they are correct. */
 
  /* Calculate the row parity */
  for (i=0 ; i<8 ; i++)
  {
    /* Move parity bit into LSB */
    RowParity = (RowParity << 1) | EvenParity(Message[i]);
  }
 
  /* Calculate the column parity */
  Mask = 0x80;
  for (j=0 ; j<8 ; j++)
  {
    for (i=0 ; i<8 ; i++)
    {
      /* Add modulo 1 (XOR) bits into appropriate ColParity bit */
      if ((Message[i] & Mask) > 0)
        ColParity = ColParity ^ Mask;
    }
    Mask = Mask >> 1;
  }
 
  if (Message[8] != ColParity ||
    Message[9] != RowParity)
  {
    /* There has been an error */
 
    /* Set the bits where the discrepancies occur. */
    ColParity = ColParity ^ Message[8];
    RowParity = RowParity ^ Message[9];
 
    /* If there are more than one bit set in either of these, */
    /* then we can't fix the error.             */
    if (BitCount(ColParity) > 1 || BitCount(RowParity) > 1)
      return MESSAGE_UNFIXABLE;
 
    /* If one of the row or column parity bytes is zero, the error */
    /* must have been in the parity byte, so just ignore it.   */
    if (BitCount(ColParity) == 0 || BitCount(RowParity) == 0)
      return MESSAGE_FIXED;
 
    /* Get the integer number of the row in error */
    while (RowParity != 0x80)
    {
      RowError++;
      RowParity <<= 1;
    }
 
    /* And invert the bit that is in error */
    Message[RowError] ^= ColParity;
 
    /* Error fixed! */
    return MESSAGE_FIXED;
  }
  else
    /* No errors in original message */
    return MESSAGE_OK;
}

A more in-depth description of Hamming codes can be found on the following sites:

http://www2.rad.com/networks/1994/err_con/hamming.htm

5. Links

ePanorama has many links to error detection and correction articles:

http://www.epanorama.net/links/tele_datacom.html#ecc