You also said "If you take 24 consecutive samples and divide by 6, you'll practically get an average of six 11 bit samples multiplied by two (or 4 averaged 10 bit results added together, which increases the resolution by 1 bit)." First, the 24x O/S satisfies the minimum O/S requirement of 4^2 for 2 extra bits. The /6 removes the superfluous bits. Secondly, "the average of six 11 bit samples multiplied by two" done in the manner of 24x O/S then /6 also satifies the minimum O/S requirement of 4x per extra bit. i.e. the 6x oversampling of the 11-bit value allows for the 12-bits of information. The "multiplied by two" allows the extra bit to be moved out of the fractional part of the number before it is truncated to an integer.I said: "average of six 12 bit samples (using 10 bit ADC) you'll need to take 96 samples, add them, and then divide by 24".
Perhaps the paper is misleading.. shifting right by 2 bits is exactly equivalent to integer division by 4; the author doesn't seem to know the difference between dividing and dividing.