Er's TH algorithm can be made faster
What slows down the Er's TH algorithm on some platforms is the use of the remainder operator %. Er uses % for cyclic rotations of three integers 1,2,3 right or left. The left cyclic rotation replaces j=1,2,3 by l=3,1,2, and the right cyclic rotation j=1,2,3 by r=2,3,1. The original Er's code uses the following expressions:
l = (j-1)%3 r = (j+1)%3These can be replaced by bit operations which are faster (much faster for some compilers):
l = ~j>>(j&1)&3 r = ++j>>(j>>2<<1)One can come up with other expressions giving identical results, such as:
r = ++j&3|j>>2 r = j+1&3|j-1>>1 l = (~j>>1&3)-(~j&1) l = j-1|(j+6&6)>>1But these are mostly more complex (and thus slower). Can you come up with something simpler?
Using these replacements, one gets the following modified Er's code:
Here are CPU times (in seconds) for n=29 for the "bare bone" versions of
these TH codes (with commented out lines containing printf function
exactly as in the files contained in the above zip file) for four different platforms:
Algorithm DEC Alpha
UNIX 300 MHz
Cygwin on NT, Pentium II MMX 266 MHz; gcc DLL 1.1.6 B20.1 B20.1/M.Khan Er.c 73.7 45.0 38.8 39.8 ErB.c 42.5 30.7 30.6 31.4 algo2.c 39.6 35.6 40.4 40.1 algo3.c 37.2 40.5 41.0 40.8 algo4.c 30.0 33.9 37.4 37.5 hBits00.c 5.4 4.16 4.14 6.18 hBits0.c 1.3 4.13 4.13 6.19 hBits.c 26.8 18.9 18.6 22.3 hBitsVis.c 85.9 62.6 73.9 73.0
DEC: UNIX OSF1 V 4.0; DEC C compiler (cc -O).
Cygwin (all on the same MS Windows NT 4.0 computer; gcc -s -O):
- DLL 1.1.6 is the most latest release;
- B20.1 is just over a year old;
- in the last column, Cygwin B20.1's gcc was replaced by
Mumit Khan's egcs-1.1.1 gcc compiler.
This seems to be more about testing the performance of C compilers than how fast the above TH algorithms are!