Take any natural number n. If n is even, divide it by 2 to get n / 2, if n is odd multiply it by 3 and add 1 to obtain 3n + 1. Repeat the process indefinitely. The conjecture is that no matter what number you start with, you will always eventually reach 1.
The Euler problem to be resolved is phrased as follows:
The following iterative sequence is defined for the set of positive integers:
n -> n/2 (n is even)
n -> 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
As with many other Project Euler problems, we are requested to generate a sequence (in this case a sequence of "Collatz") and to single out one item matching a certain criteria. In this particular case, we need to create Collatz sequences for all numbers under 1 million and find out the longest one.
The basic algorithm has the following shape:
Using this recursive function, we could easily generate Collatz sequences for arbitrarily large numbers. But the performance would soon degrade when we start building sequence for numbers bigger than about 10 thousands.
In the process of generating all the sequences from 1 to 1 million, however, it’s easy to see that for each sequence that we build, there are a lot of subsequences that are identical, which are being calculated over and over.
Consider for example the two sequences having 5 and 24 as starting numbers respectively:
5 -> 16 -> 8 -> 4 -> 2 -> 1
24 -> 12 -> 6 -> 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
As you can see, after a few iterations the sequence starting from 24 converges to the sequence starting at 5, which we have already calculated! (see this image from Wikipedia for a pictorial representation of this behavior)
As you would have guessed by now, the key to solve this problem is precisely caching the subsequences that are being calculated, so that the partial results can be reused for subsequent iterations.
So, with these considerations in mind, we can have a look at the final solution:
As you can see, the basic pattern of the algorithm is still the same. However, we have added a cache to store partial results which dramatically improve the performances (this version resolve the problem on my laptop in about 7 seconds).
The last function
max_sequence_lengths is the main loop generating all the sequences,
calculating the lengths and picking the sequence with the highest length.
Caching partial results is a common theme in functional programming, and is called memoization. The example provided above is not one of the cleanest examples of memoization. It turns out that combining memoization and tail recursion is a tricky problem.