Learning Haskell Through Google Code Jam

Introduction

Over the last year or so, I’ve been playing around with functional programming. As the first few lines of the Wikipedia page suggest, functional programming is all about expressing a computation or algorithm as the composition of functions rather than using a state that changes over time. From what I’ve understood so far, functional programming is based on lambda calculus which is an alternative but equivalent formulation of the famous Turing Machine that most modern computers are based on.

Once you start reading up on functional programming, the language that is most often recommended is Haskell. I got interested in functional programming and Haskell in the first place because I was attracted by its more mathematical appearance. A lot of the code that I looked at seemed to express ideas much more clearly and (in my opinion) more beautifully. However, Learning Haskell was not easy at all. As someone who’s been programming using imperative languages all my life, I found it unusually difficult to think about solving problems using the style that Haskell (and functional programming) imposed. Near the beginning of my journey to learn the language, there were quite a lot of stops and restarts and many instances where I questioned whether it was even useful to learn the language. I found it so difficult to express ideas that were, in my head simple to express in a language like Python. I also found writing programs that required I/O quite difficult. However, the aforementioned clarity and beauty that I found in the way functional programming expressed ideas stuck with me and I kept coming back to it. after nearly a year, things fell into place and I started writing code that was actually useful (i.e. interacted with the world through I/O). I feel like I’m finally at a point where I can program well enough in the language that I can solve problems without struggling too much with I/O. So, when I got the email notifying me about this year’s Google Code Jam I thought I’d participate and try to use Haskell as much as possible!

Google Code Jam

I’ve known about Google Code Jam for quite a while. I’ve even tried participating once before near the beginning of my undergraduate degree. Being a novice at the time, I did not even make it past the qualifying round. Since then I’ve been programming on a regular basis for nearly six years and it’s paid off. This time, I made it past the qualifying round quite comfortably. The main challenge before me was my use of a programming language that I hadn’t fully mastered yet. The added challlenge did make solving the problems a lot more fun however. Along the way, I was able to take advantage of some functional programming patterns and I thought that it might be valuable to write an article about how these patterns can be useful.

Problem 1 - Foregone Solution

To read the full problem description go here.

Summary of Problem Statement

The input for each testcase is a number N which when expressed in base 10 may or may not contain the digit 4. The goal is to split up the number N into 2 numbers A and B so that A + B = N and neither A nor B contain the digit 4.

Analysis

A brute force approach to the solution would search through all possible sums of two integers between 0 and N/2. Of course, this solution would scale as $O(N^2)$ and would not work with the larger test cases. Another thing to think about is that the hidden test cases can contain numbers between 1 and $10^{100}$. That number is beyond the range of usual integer data types. The problem can be solved quite easily by making the simple observation that you can split up all the digits that are 4 into 2 and 2 (or 3 and 1) and easily generate two numbers without even adding the two numbers to check if they sum up to N. For instance, if N is 9454 you can split it up into 9252 and 202. To do this you do not even need to convert the number into an integer data type. You can work with the string representation directly.

Solution

I will read each test case number as a string and create two new numbers A and B. A will have all the digits 4 replaced with 2 and the remaining digits the same. B Will have all the digits 4 replaced with 2 and all the remaining digits replaced with 0. This pattern of replacing each member of an array with another one using information from only a single array element can be implemented using the higher order function map. So, I defined two functions to calculate A and B.

getA :: [Char] -> [Char]
getA n = map (\x -> if x == '4' then '2' else '0') n

getB :: [Char] -> [Char]
getB n = map (\x -> if x == '4' then '2' else x) n

The rest of the solution is just I/O. You can see my full solution here.

Problem 2: You Can Go Your Own Way

To read the full problem description go here.

Summary of the Problem Statement

The goal is to get from the top right corner of an NxN grid to the bottom right corner using only South (Down) and East (Right) moves with the additional constraint that any segment of the path taken must not coincide (but can intersect at a single point) with another path that someone called Lydia took. The input is this path which is represented as a string containing uppercase “S” and “E” characters only.

Analysis

This problem looks more difficult to solve than it actually is. Once you know the trick, the code is quite simple. Here are some observations about the problem which helped me arrive at the solution.

  • Observation 1: Number of S moves and E moves must be exactly equal to N-1 if you want to reach the bottom right corner. You can see this by thinking of each move as a vector. This implies that it might be possible to find the solution by just transforming the input string as it is read.
  • Observation 2: If I have a function that only stays the same or decreases along the X-axis and intersects the point (0, 0), I can mirror the function along the line y=-x and the only places where the path intersects will be crossings. That means I can solve this by just doing the opposite of whatever Lydia does at each step.

This means that I can take Lydia’s path and reverse the action she took at each step like some dark mirror and I have my solution. Interestingly, this problem can also be solved using the “map” higher order function pattern in Haskell. The meat of my solution was implemented in two functions. One is the character (move) mapping function:

flipMove :: Char -> Char
flipMove c
    | c == 'S' = 'E'
    | c == 'E' = 'S'
    | otherwise = c

And another is the function that transforms the input string into the output string.

solveCase :: String -> String
solveCase lydiasMoves = map flipMove lydiasMoves

As before, the rest of the code is just IO. My full solution is here.

Problem 3: Cryptopangrams

This problem was a bit harder than the others. I spent a lot of time thinking about this one and solved it quite close to the end of the round. Some of my initial attempts failed because I didn’t consider all the edge cases. Read the full problem statement here.

Summary of Problem Statement

A string is “encoded” using the following procedure:

  1. 26 prime numbers less than a number N are chosen.
  2. The numbers are sorted in increasing order and a character map is made from the numbers to the corresponding letter between A and Z.
  3. Each character in the sentence is replaced with the corresponding prime number (spaces are removed and ignored).
  4. The prime number at each position is replaced with the product of the number and the number corresponding to the next character in the sentence.

As an additional condition, the string is guaranteed to have all 26 characters occur at least once.

For each case, as input, the maximum number N and the encoded sentence is given. The output should be the decoded sentence (without spaces).

Analysis

Let each sentence that is encoded be made of L characters. The set of primes that represent each of these characters can be represented by the sequence of numbers: $\{P_1, P_2, … , P_L\}$. After the “encoding” procedure is carried out, the sequence will look like: $\{P_1\times P_2, P_2\times P_3, … , P_{L-1}\times P_{L}\}$. The brute-force approach to solve this problem would involve trying to find the prime factors of each number in the sequence. However, if you look at the sequence above you can see that each number in the sequence except for the first one shares a common factor with the number before it! This means that if I find the common factors of the first two numbers in the sequence then I can factorize the entire sequence quite easily using the following procedure:

  1. Find the common factor between the first two numbers in the sequence.
  2. Divide the first number by this common factor to get the first factor.
  3. Second factor is the common factor.
  4. For all subsequent numbers, the first factor is the common factor with the previous number and the second factor is the number divided by the common factor.

Solution

So first, I wrote a function to find the greatest common divisor between two numbers. This function implements Euclid’s Algorithm for find the greatest common divisor of two numbers:

commonFactor :: Integer -> Integer -> Integer
commonFactor a b
    | remainder == 1 = 1
    | remainder == 0 = if a > b then b else a
    | otherwise = if a > b then commonFactor b remainder else commonFactor a remainder 
    where remainder = if a > b then snd (quotRem a b) else snd (quotRem b a)

In an imperative programming language we would loop through each character in the sequence to find the factors after setting up the loop by finding the common factor between the first two numbers. In Haskell the loop pattern for solving this particular problem is captured quite beautifully using the higher order function scanl. The type signature for scanl is (b -> a -> b) -> b -> [a] -> [b]. This means that scanl takes a function that takes a type a and b and returns a b, a member of type b and a list of type a and returns a list of type b.

A scanl is a higher order function that is a variant of a general pattern in computing called a fold. To put it very simply, a fold takes a starting value (sometimes called a seed), a function and a list and applies the function repeatedly in a specific way. If I write down the fold imperatively in a language like Python, it would look like this (f is the function that is applied recursively):

seed = 0
for idx, elem in enumerate(lst):
    if idx == 0:
        result = f(seed, elem)
    else:
        result = f(result, elem)

A fold will run this loop and return the final value of the result in the computation. A scan is just a fold that returns a list of the intermediate values at each iteration in the loop. Haskell has two scan functions that work on lists: scanl and scanr. scanl starts the iteration from the left side of the list and scanr starts the iteration from the right side of the list.

With this in mind, I wrote a scanl based function to factorize the list of input numbers.

factorizeCiphertext :: [Integer] -> [Integer]
factorizeCiphertext [email protected](x1:x2:nums) = scanl (\n1 n2 -> n2 `div` n1) firstFactor lst
    where firstFactor = x1 `div` (commonFactor x1 x2)

In my first attempt at solving the problem, I thought that this was the full solution. But I kept getting runtime errors once I uploaded the answer to the website. After thinking about the problem for a little bit I realized that I’d missed out on some edge cases. There are two ways in which this solution can fail.

  1. If there are consecutive numbers that are identical at the start of the sequence. The common factor algorithm will return the number itself. Consecutive identical numbers can be caused by either repeated characters (“AAAAAA”) or by repeated alternating characters (“ABABABA”) in the plain text.
  2. If there are multiple but different consecutive numbers at the start of the sequence. For instance, if the cipher text starts like [9, 9, 9, 15, 15, 15, 35, 217].

Note that if repeated characters appear in the middle of the sequence, it does not matter since the procedure we use will divide this number by the previous factor anyway. To handle this edge case, we’ll need to extract any successive repeated numbers from the start of the sequence. So I wrote two functions to split the input list into two lists. One contains any repeated characters at the start of the list and the other is the rest of the list.

groupHead :: [Integer] -> [Integer]
groupHead (x:xs)
    | x == (head xs) = [x] ++ (takeWhile (==x) xs) ++ groupHead (dropWhile (==x) xs)
    | otherwise = []

groupTail :: [Integer] -> [Integer]
groupTail (x:xs)
    | x == (head xs) = groupTail (dropWhile (==x) xs)
    | otherwise = [x] ++ xs

For the rest of the list that does not have any initial repeating characters, the original factorizeCiphertext function can be used to factorize the list. But how do we factorize the first part of the list? After a bit of thought I realized that if we factorize the first number in the second part of the list, it must have a common factor with the last number in the first part of the list. So now I can use this as the first factor to scan the list from the right!

So the final function that accounts for the edge cases is:

factorizeCiphertext4 :: [Integer] -> [Integer]
factorizeCiphertext4 (n:nums)
    | isEdge = (scanr (\n1 n2 -> n1 `div` n2) (head factorTail) groupHeads) ++ (tail factorTail)
    | otherwise = factorTail
    where groupHeads = groupHead ([n] ++ nums)
          isEdge = if (length groupHeads) > 0 then True else False
          listToScan = groupTail ([n] ++ nums)
          firstFactor = (head listToScan) `div` (commonFactor (head listToScan) (head $ tail listToScan))
          factorTail = scanl (\f n -> n `div` f) firstFactor listToScan

You can view my full solution to the problem here.

Conclusion

Haskell is a difficult language to learn, especially if you’ve only been exposed to imperative languages all your life. Rather than thinking in terms of discrete steps, you have to force yourself to think in terms of recursion and higher order patterns in the problem. However, if you stick with it for that initial amount of time that it takes to develop some basic proficiency, the results can be very rewarding. You can write some beautiful code in this language. After learning Haskell, I’ve also found myself using a functional approach to solve programming problems in other languages too! For instance, I’ve started using Python’s iterools package much more often.

However, usefulness wasn’t the main factor that motivated me to learn Haskell. For me it was more about the fun of solving puzzles in clever and aesthetically pleasing ways. Haskell is one of the few languages that puts some fun back into solving programming problems. I think it brings in some of the beauty and rigor traditionally associated with pure mathematics into a programming language. If you’re someone who enjoys programming puzzles for their own sake and finds beauty in clever solutions to puzzles, I highly recommend giving Haskell a go.

Related

comments powered by Disqus