This is going to be another one of my “selfish” posts – written primarily for me to refer back to in the future and not because I believe it will benefit anyone other than me. The idea is one that I always took for granted but had a hard time proving to myself once I decided to try.
Theorem: Suppose we have an M bit unsigned binary integer with value A. Consider the first (least significant) N bits with value B. Then:
Put another way, arithmetic with unsigned binary integers of a fixed length N is always performed modulo .
This is actually true regardless of the radix, but we’ll focus solely on binary numbers since the underlying reason for this post is to study binary representations of integers in digital systems. If our system represents integers using N bits, and a mathematical operation causes an overflow with a result needing M bits, then that answer is congruent to the N bit answer (with overflow discarded) modulo .
A simple example may help to illustrate this. Let’s add two 4 bit integers: 7 and 13. A 4 bit number can encode , or 16 integers, 0 through 15. So we won’t have any problem representing the two addends, but we know that the sum, 20, exceeds what we can represent with 4 bits.
We see that if we can extend our answer to 5 bits, we’d get the correct sum of 20. Because we are limiting ourselves to 4 bits, we discard the overflow bit that resulted from a carry of the most significant, left-most, bits, and are left with a value of 4.
But to my point, 20 is congruent to 4 modulo 16 (), and, again, in general, arithmetic operations on unsigned binary integers of fixed length N is always performed module . Now we’ll take a look at why.
Suppose we have a binary integer with length M and value A. The first (least significant) N bits have a value B. We want to prove:
Let’s start by defining the difference between A and B as C. So, A – C = B, or equivalently, B + C = A. So,
The modular addition rule tell us that:
Substituting for gives us:
Because B is the value of the first N bits, B must be less than , and therefore .
Since C is the value of bits N+1 through M, we can define it as the following summation:
where j is the value of the ith bit, either 0 or 1.
To determine the value of we need to divide that summation by and calculate the remainder.
Since is a whole number, the remainder is 0, and therefore, . Substituting B and C back in to our original equation gives us:
Which is the definition of congruence. So, restating the equation as a congruence relation:
And with that, our proof is complete.