
The shortest and "mysterious" TH algorithm



M. Kolar
December 5, 2000; updated March 3, 2001
All the information
about a move in the minimumnumberofmoves TH solution is hidden in
the binary representation of the number of that move!
From this representation one can determine the starting
and the destination pegs (poles), which disk is being moved, and
the configuration of all three pegs after the move.
Starting and destination pegs
Subject: Tower of Hanoi
Date: Sat, 18 Nov 2000 14:07:10 0500
From: Glenn <rhoads@paul.rutgers.edu>

I noticed your Tower of Hanoi page ...
I think the following solution is about
the shortest and fastest solution around.
It is based on treating the move # as
a bit string and uses bit operations to extract
the information in a surprisingly
elegant and mysterious manner:
/* Start post is post 0. */
/* If # posts even, then final post is post 1, else is post 2 */
#include <stdio.h>
#include <stdlib.h>
void main(void)
{
int n, x;
printf( "How many disks? " );
scanf( "%d", &n );
puts( "\n\n" );
for (x=1; x < (1 << n); x++)
printf( "move from pole %i to pole %i.\n",
(x&x1)%3, ((xx1)+1)%3 );
}


Date: Wed, 22 Nov 2000 00:14:17 0500
From: Glenn <rhoads@paul.rutgers.edu>

Unfortunately, I don't know the author.
I found it on the web quite some time ago.
The original version counted the moves down and the
starting post varied depending
on whether the number of disks was even or odd.
I figured out how it worked and
noticed that by counting the moves up,
you could keep the starting post fixed and
then the destination post would vary 
to me it seemed more aesthetic that way..

Glenn C. Rhoads
Speed
This algorithm is definitely fast (about 5.5 to 9 times faster than my
Algorithm 4 for the platforms mentioned below).
For the purpose of comparing CPU times of the "bare" algorithms,
I have removed the printout of moves as before (otherwise the CPU
time is about the same for all algorithms with the same printf statements):
void main(void)
{
int n, x, to, fr;
...
for (x=1; x < (1 << n); x++) {
fr=(x&x1)%3;
to=((xx1)+1)%3;
/* printf( "move from pole %i to pole %i.\n", to, fr ); */
}
}
hBits00.c


The calculated peg numbers had to be assigned to something
(fr and to), otherwise the two expressions
would not be performed at all for some compilers (with O flag).
The remainder operator (%) is very slow for some compilers.
i%3 can be replaced by (i+i/3)&3 which gives the same result:
void main(void)
{
int n, x, i, to, fr;
...
for (x=1; x < (1 << n); x++) {
i=x&x1; fr=(i+i/3)&3;
i=(xx1)+1; to=(i+i/3)&3;
/* printf( "move from pole %i to pole %i.\n", to, fr ); */
}
}
hBits0.c


This resulted in four times faster code for DEC C compiler (UNIX OSF1 V 4),
but no noticeable speedup for the gcc (in Cygwin on Windows NT).
See some more discussion in the comments of the file hBits.c.
Number of the disk being moved
To some extent the above code if fast because it provides less information about each
move than other algorithms. What can be added easily is the number of the
disk being moved. It is by 1 larger than the number of the rightmost
consecutive "0" bits of x. This can be proved using recursion.
In the final code below, the number of the moved disk is calculated by
an additional for loop, and is placed in variable j:
main(int argc, char *argv[])
{
int n, x, fr, to;
unsigned int i, j;
n = atoi(argv[1]);
for(x=1; x < (1 << n); x++) {
i=x&x1; fr=(i+i/3)&3;
i=(xx1)+1; to=(i+i/3)&3;
for(i=x, j=1; ; i>>=1, j++) { if(i&1) break; }
/* printf( "move disc %i from %i to %i\n", j, fr, to ); */
}
}
hBits.c


The inclusion of this additional loop slows down
the code considerably. (Dec. 12: This additional inner loop can
also be written in a more compact way.)
And at the same time, this code still does not provide the full configuration
of disks on all three pegs after each move.
See some more discussion in the comments of the file hBits.c.
We can get more: the distribution of disks among the pegs
The above code gives information about the position of one disk (the
one that was moved) after each move. From the bits of the same move
number x, one can also get the information
about the complete configuration of disks on all three pegs after each
move. That is, without having to do all the moves from the beginning?
This has been known for at least 19 years.
The code to extract this configuration from x is rather complicated. If one is interested in the whole
minimum solution, the following "hybrid" approach gives faster code:
use three more integers (p[0], p[1] and p[2] below) to store in their
bits the current disk configuration on each peg. It is easier to
store the disks "upsidedown"  the rightmost bit used to indicate
the presence of the uppermost disk, etc. The presence of a disk on
a peg corresponds to 1, absence to 0.
Using the information obtained from x, one has to move
a bit between two pegs in each move, and we have to do all
the moves from the beginning to get the current configuration:
#include <stdio.h>
#include <stdlib.h>
#define PR (void)printf(
main(int argc, char *argv[])
{
int n, x, fr, to;
unsigned int i, j, p[3];
n = atoi(argv[1]);
p[0]=1; p[0]<<=n; p[0];
p[1]=p[2]=0;
/* Print out the initial configuration: */
PR"\n");
for(i=n;i>0;i) PR" %d",i);
PR"\n\n\n");
for(x=1; x < (1 << n); x++) {
i=x&x1; fr=(i+i/3)&3;
i=(xx1)+1; to=(i+i/3)&3;
for(i=x, j=1; ; i>>=1, j<<=1) { if(i&1) break; }
/* Move the bit: */
p[fr] &= ~j;
p[to] = j;
/* Print out the configuration after move x: */
PR"\n"); j=1; j<<=n;
for(i=n;i>0;i) { j>>=1; if(j&p[0]) PR" %d",i); }
PR"\n"); j=1; j<<=n;
for(i=n;i>0;i) { j>>=1; if(j&p[1]) PR" %d",i); }
PR"\n"); j=1; j<<=n;
for(i=n;i>0;i) { j>>=1; if(j&p[2]) PR" %d",i); }
PR"\n");
}
}
hBitsVis.c


Now j is not the number of the disk being moved, it is a
power of 2 containing a single bit representing the disk being moved.
Just adding one more bit shift in the inner for loop (with the whole
output block removed) slows down the code
3.2 to 4 times! If one also performs all the operations needed to
generate the printout, and removes only the calls to printf (PR),
there is a further 3.5fold increase in CPU time (from 85.9 to 298.8
sec for the DEC machine  not shown in the above table).
Download the code
C code for the above bitbased algorithms is in files hBits*.c
(with the printout commented out) in this zip archive
(hBits00.c and hBits0.c can easily be made by editing
hBits.c).