Functional Programming Competition 2018

part of the Formal Methods and Functional Programming (FMFP) course at ETH Zürich

Every week a special Haskell programming exercise is published here and in Code Expert. Each exercise is carefully selected by the Master of Competition (MC)—Dmitriy Traytel—to separate the wheat from the chaff. After 7 weeks, the intersection of the wheat with the FMFP students will receive delicious prizes in a in-lecture ceremony.

Participants submit their solutions via Code Expert. Submissions by mail are accepted only in two exceptional cases: (1) the participant is not enrolled at ETH and therefore cannot use Code Expert, and (2) the solution uses a Haskell library/pragma that is not available on Code Expert. In case (2) precise instructions of how to run the submission (including version numbers) are a necessary requirement for the solution to be evaluated.

Every week, MC's favorite submissions are discussed in a blog post. The submissions are evaluated with respect to different criteria (but the function's correctness is always a must have), which are announced along with the exercise. The best 10 solutions receive from 1 to 10 points (more is better). The points and the names of the participants are made public in a Top 10 of the week and aggregated in a semester ranking.

Important: If you don't want your name to appear on a publicly visible webpage (but still want to participate), write to the MC and suggest a pseudonym.

Finally, the MC employs a zero tolerance policy against plagiarism. Please compete fairly and individually. Submissions that have been developed jointly by two contestants have to be clearly marked as such. Violations of those rules may result in arbitrary point deductions.


Semester ranking (after 3 weeks)
  # name   points
1. Moritz Knüsel 🎩27.0
Simon Yuan 📐 27.0
3. Matthias Brun 25.0
4. Mike Boss 22.0
5. Roger Baumgartner 17.0
Loris Reiff 🏆 17.0
4. Gabriel Giamarchi 14.0
Chris Buob 13.0
9. Tynan Richards 9.0
10. Börge Scheel 7.0
11. Antonio Russo 6.0
12. Noah Berner 5.0
Zur Shmaria 5.0
14. Giuseppe Arcuti 3.0
Lukas Baege 3.0
Andrin Bertschi 3.0
Miro Haller 🧟 3.0
Lukas Heimes 3.0
Timon Jenne 3.0
Nikola Kovacevic 3.0
Andreas Kuster 3.0
Sandar Felicity Lim 3.0
Manuel Nowack 3.0
Felix Sarnthein-Lotichius 3.0
Roberto Starc 3.0
Elwin Stephan 3.0
Valentin Stoppiello 3.0
Jan Wiegner 3.0
29. Jorel Elmiger 1.0

Task of week 1

Write a function oddSum that takes five Int arguments as input and computes the sum of the maximum, the minimum, and the median of the arguments.

For example:

oddSum 1 2 3 4 5 = 1 + 3 + 5 = 9
oddSum 5 4 1 3 2 = 1 + 3 + 5 = 9
oddSum 1 1 5 3 3 = 1 + 3 + 5 = 9

The shortest (in terms of number of tokens1) solution wins! Library imports and type signatures are excluded from the token count. As a measure of reference: the MC has an optimized (but not too crazily optimized) 27 token solution.

1. This competition continues a long-standing tradition of in-lecture functional programming competitions established at TU München. The definition of tokens is borrowed form there.

Results of week 1

Here are the best 10 26 student performers in the first competition task.

Top 10 of week 1
  # name   #tokens   points
1. Matthias Brun 17 10
2. Loris Reiff 19 9
3. Simon Yuan 26 8
4. Roger Baumgartner 27 7
Moritz Knüsel 27 7
Börge Scheel 27 7
7. Gabriel Giamarchi 29 4
8. Giuseppe Arcuti 31 3
Lukas Baege 31 3
Noah Berner 31 3
Andrin Bertschi 31 3
Mike Boss 31 3
Chris Buob 31 3
Miro Haller 31 3
Lukas Heimes 31 3
Timon Jenne 31 3
Nikola Kovacevic 31 3
Andreas Kuster 31 3
Sandar Felicity Lim 31 3
Manuel Nowack 31 3
Tynan Richards 31 3
Felix Sarnthein-Lotichius 31 3
Roberto Starc 31 3
Elwin Stephan 31 3
Valentin Stoppiello 31 3
Jan Wiegner 31 3

The MC is happy about 104 student participants in this first week of the competition (+ 1 tutor + 1 professor + the MC). Several participants have submitted more than one solution without any comments (such as -- grade me), leaving the MC the choice to pick the worst one in terms of tokens. However, the MC was so happy about the fact that every participant has submitted at least one correct solution (determined by the MC by a mixture of testing and manual inspection), that he decided to take the best submission in terms of tokens into account.

Before we start looking at the best solutions, let's see what one can do with what we've seen in the lecture so far. The following solution is MC's attempt to only use the material from the lecture and also one of the submissions by our number 3, Simon Yuan (and a few others, who wasted tokens by adding some needless parentheses or the like). With 58 tokens it is far away from being Simon's or the MC's best solution and even far away from the top 10.

oddSum a b c d e
 | a > b        = oddSum b a c d e
 | b > c        = oddSum a c b d e
 | c > d        = oddSum a b d c e
 | d > e        = oddSum a b c e d
 | otherwise    = a + c + e

To understand how this solution works, one has to realize that the guards are matched sequentially: i.e., the program will only check c>d after having established a≤b and b≤c. With this insight it is pretty obvious what this code does: the last trivial guard otherwise is only reached once a b c d e is sorted.

Now we have a baseline to compete against. It is fairly obvious, that sorting should be part of any successful solution. Can we sort the five elements in Haskell, without implementing our own sorting algorithm? Of course: we can just use the right libary function. To use this function we must import the library Data.List, which contains many useful functions about the Haskell list type (more on this next week in the lecture; what you see below should be mostly intuitive usage of finite sequences of Int elements). But, hey, imports cost 0 tokens!

Once the list is sorted, we have to think how to extract the minimal, maximal, and median element. Many solutions, including the following one by an anonymous professor, decided for an index-based addressing, written list!!index in Haskell.

oddSum a b c d e = let xs = sort [a,b,c,d,e] in xs !! 0 + xs !! 2 + xs !! 4

This gives 34 tokens (or 33 when using where) and brings us closer to the top 10.

To get into the top 10, this time it was enough to access the first and last elements of the sorted list using library functions (head and last). Note that head xs are two tokens, while xs!!0 are three. Alternatively, but less efficiently, one can extract the minimum or the maximum of the (already sorted!) list. Alas, all 31-token solutions, including one submitted by a tutor who likes applicative functors (more on this much later in the lecture), are variations of the following two.

import Data.List

oddSum a b c d e = maximum xs + minimum xs + xs !! 2
  where lst = sort [a,b,c,d,e]

oddSum' a b c d e = head xs + last xs + xs !! 2
  where lst = sort [a,b,c,d,e]

Side remark 1: Some participants mistook tokens for characters, and tried to minimize the number of characters used in their program. Readability suffered from this quite a bit, e.g., oddSum a b c d e=xs!!0+xs!!2+xs!!4 where xs=sort[a,b,c,d,e]. Useful advice: leave the reader of your program some air to breathe. It is better to kill needless parentheses than spaces (also in terms of tokens).

All solutions further up in the ranking use techniques other than indexing to extract the three elements from the sorted list. Gabriel Giamarchi uses the following trick (omitting the Data.List import):

oddSum a b c d e = sum ((subsequences (sort [a,b,c,d,e]))!!21)

Before we talk about the semantics lets see if syntactically this solutions does its best in terms of tokens. No! Prefix function application always binds stronger than any infix operator. That is, f a b c d !! g x y z is always parsed as (f a b c d) !! (g x y z). Have I already mentioned what to do with needless parentheses? Don't get me wrong: Gabriel did a great job by using subsequences in the first place. But those two parentheses around subsequences ... cost him three competition points, which in the end might decide between Lindt and Migros Budget.

So how does this solution work? The library function subsequences computes the list of subsequences (i.e., a list whose elements are lists themselves) of the input list, where a list xs is a subsequence of another list ys if one can obtain xs by removing zero or more elements from ys (but without reordering the elements). For example: subsequences [1,2,3] = [[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3]]. Now, when applied to a list with five elements, the result of subsequences will contain the subsequence consisting from the first, middle, and last element in the 22nd position (i.e., accessible using the 0-based index 21). This holds regardless of the list elements' types or values and is exploited here. Finally, the library function sum sums up the elements of this subsequence, which is exactly what is asked for. BTW: you might ask: is computing subsequences the most efficient way to do this? The MC answers: certainly not, but who cares? It is all about tokens.

The best solutions of Moritz Knüsel and Simon Yuan are variants of this idea, which are more economic about tokens. Here is for example Simon's solution which came in third with 26 tokens:

oddSum a b c d e = sum $ subsequences (sort [a,b,c,d,e]) !! 21

Using the infix operator $ is a useful trick to save tokens (and sometimes also to improve readability). It is nothing else than function application (f $ x = f x), however, the low precedence of this infix operator helps to avoid parentheses: instead of f (g x) we can write f $ g x (or f $ g !! i instead of f (g !! i)).

Let's look at two other 27 tokens solutions. The first one was found by the two students Börge Scheel and Roger Baumgartner. The second is the MC's best solution.

oddSum a b c d e = sum $ map head $ chunksOf 2 $ sort [a, b, c, d, e]
oddSum a b c d e = sum $ scanr (-) 0 $ sort [a, b, c, d, e]

The first solution works by sorting and then using the library function chunksOf, which splits a list into shorter ones (of length at most 2 here as indicated by the passed argument). We have chunksOf 2 [1,2,3,4,5] = [[1,2],[3,4],[5]]. The next applied function is the important map function, which we will see again in the course. It applies its first function argument (yes, in Haskell functions can be passed around as arguments—map is called a higher-order function because it takes a function as an argument; more on higher-order functions in two weeks in the lecture) to every element of the list passed as its second argument. For example: map head [[1,2],[3,4],[5]] = [head [1,2], head [3,4], head [5]] = [1,3,5].

The second solution uses an even more complicated higher-order function than map: scanr. In our example, it works as follows: scanr (-) 0 [1,2,3,4,5] becomes

[1 - (2 - (3 - (4 - (5 - 0)))
,     2 - (3 - (4 - (5 - 0)))
,          3 - (4 - (5 - 0))
,               4 - (5 - 0)
,                    5 - 0
,                        0] =

or written differently

[1 - 2 + 3 - 4 + 5 - 0
,    2 - 3 + 4 - 5 + 0
,        3 - 4 + 5 - 0
,            4 - 5 + 0
,                5 - 0
,                    0] =

Reading this column-wise, we observe that after summing this list the elements in the even positions (2 and 4) cancel out, whereas the elements in the odd positions (1, 3, and 5) survive this telescoping.

Side remark 2: When writing this explanation, the MC noticed that it is possible to use a very similar function scanr1 (-) instead of scanr (-) 0 to save one token. But unfortunately, we are past the deadline now.

All these tricks are nice, but they are still about 10 tokens away from the two best solutions. The latter came up with a way to avoid writing the list [a,b,c,d,e] or even the five arguments to oddSum. Here is Matthias Brun's winning entry with all the imports and a nice comment describing how the solution works:

import Data.List (sort, subsequences)
import Text.Printf (printf)
import Data.Composition ((.::.))

 - This solution works by computing the power set (subsequences) of the sorted
 - list containing the arguments. The 22nd element (i.e. index 21) of the power
 - set (as generated by 'subsequences') is always the one containing the
 - minimum, median and maximum (for a sorted list of 5 values), so we take its
 - sum. The 'read .::. printf "[%d,%d,%d,%d,%d]"' part simply reads in the
 - arguments and transforms them into a list in a way that uses few tokens.

oddSum = sum . (!! 21) . subsequences . sort . read .::. printf "[%d,%d,%d,%d,%d]"

The MC is amazed. This is pointfree functional programming at its best. (The pointfree style is often critisized for obfuscating the program. One particular problem is that it might be hard to see how many arguments a function takes. But it is the best way to save tokens. And in this, case the solution is fairly easy to read.) To the above explanation, the MC only want to add some comments on the operators . and the funny .::., which for some reason makes the MC think of Space Invaders. Both denote function composition. We have:

(f . g) x1 = f (g x1)
(f .::. g) x1 x2 x3 x4 x5 = f (g x1 x2 x3 x4 x5)

Hence, the above solution is equivalent to:

oddSum a b c d e = sum ((!! 21) (subsequences (sort (read (printf "[%d,%d,%d,%d,%d]" a b c d e)))))

The sum ((!! 21) (subsequences (sort ...))) part looks familiar. We've seen the function read, which converts Strings to values of different types (or complains with a parse error) in the first exercise class. The type of printf is to complex to explain at this point; however when being used the function behaves just as its homonymous C ancestor (i.e., the number of arguments it takes depends on the formatting string).

The second best solution by Loris Reiff is quite similar (also using printf and .::.), except that it uses a slightly less token-efficient way to extract the first, middle, and last argument from a five-element list.

The MC could not resist the temptation to look at some incorrect submissions: a glimpse into the abyss in a sense. There, one meets participants who tried to sacrifice everything for tokens: even correctness. Some omitted the sorting or did it half-heartedly:

oddSum a b c d e = head xs + last xs + xs!!2 where xs = [a, b, c, d, e]
oddSum a b c d e = c + min a (min b (min c (min d e))) + max a (max b (max c (max d e)))

Others tried out mysterious ways:

oddSum a b c d e = l!!0 + l!!2 + l!!4 where l = [ x | x <- [1..], elem x [a,b,c,d,e]]
oddSum a b c d e = l!!0 + l!!2 + l!!4 where l = [ x | x <- [-1000..], elem x [a,b,c,d,e]]

Fortunately, all of them found the path of the righteous (wo)man in the end.

And in case you wonder: the longest correct solution has 302 tokens and is, well, a bit repetitive.

min2 a b c d e
 | minimum [a,b,c,d,e] == a  = minimum [b,c,d,e]
 | minimum [a,b,c,d,e] == b  = minimum [a,c,d,e]
 | minimum [a,b,c,d,e] == c  = minimum [a,b,d,e]
 | minimum [a,b,c,d,e] == d  = minimum [a,b,c,e]
 | minimum [a,b,c,d,e] == e  = minimum [a,b,c,d]

max2 a b c d e
 | maximum [a,b,c,d,e] == a  = maximum [b,c,d,e]
 | maximum [a,b,c,d,e] == b  = maximum [a,c,d,e]
 | maximum [a,b,c,d,e] == c  = maximum [a,b,d,e]
 | maximum [a,b,c,d,e] == d  = maximum [a,b,c,e]
 | maximum [a,b,c,d,e] == e  = maximum [a,b,c,d]

oddSum a b c d e = a + b + c + d + e - min2 a b c d e - max2 a b c d e

Wrapping up: what have we learned this week?

Task of week 2

Lists have still not been spotted in the lecture so far. So let's do some number theory instead.

The digital product of a number n is the product of its (base 10) digits. The computation of the digital product can be iterated until a single digit is obtained. The single digit is called the (multiplicative) digital root, whereas the number of iterations needed is called the (multiplicative) persistence.

For example:

999 has digital product 9 * 9 * 9 = 729
729 has digital product 7 * 2 * 9 = 126
126 has digital product 1 * 2 * 6 = 12
 12 has digital product 1 * 2     = 2

Thus the digital root of 999 is 2 and the persistence is 4 (as we had to compute four digital products).

Your task is to implement the function persistenceRoot :: Integer -> (Integer, Integer) that computes both the persistence (first component) and the digital root (second component). E.g., persistenceRoot 999 = (4,2). (For a change we want to use Haskell's Integer type of unbounded integer numbers instead of the machine integers Int.)

As last week, the MC is looking for the shortest solution token-wise. (Next week we will move to some other criterion.) His best attempt needs 30 tokens. Just beat it!

Results of week 2

Here is the overdue top 10 for week 2. This week 66 students have submitted a solution.

Top 10 of week 2
  # name   #tokens   points
1. Mike Boss 19 10
Matthias Brun 19 10
Chris Buob 19 10
Gabriel Giamarchi 19 10
Moritz Knüsel 🎩 19 10
Simon Yuan 📐 19 10
7. Tynan Richards 21 4
8. Roger Baumgartner 24 3
9. Loris Reiff 🏆 27 2
Antonio Russo 27 2

You may wonder about the emojis used above. The MC will explain. But first let us look at a solution that reflects the state of the lecture at the submission date. It was submitted by the still anonymous professor with the comment "I didn’t try to be clever." and counts 57 tokens.

persistenceRoot x = aux 0 x

aux n x
    | x < 10    = (n,x)
    | otherwise = aux (n+1) (dp x)

dp x
    | x < 10     = x
    | otherwise  = (dp (x `div` 10)) * (x `mod` 10)

Without trying to be clever, there are some obvious further token-optimizations possible in the above solution. (E.g., what have we learned last week about needless parentheses?) Altogether this might save around 10–15 tokens, which would be still miles away from the top 10.

To get into the top 10, we should rather try to be clever. Being clever for the whole problem at once may cause some headaches, so lets look at a subproblem first: computing the digital product (i.e., what the above dp function computes). Using div and mod is a natural way to approach this subproblem. Another natural approach to get the digits of a number x is to print the number: show x. Well, almost. This only gives one a list of Chars, which we would want to convert to a list of Integers. Unfortunately, it seems that there is no function to do this conversion directly, i.e., we need to take the detour over Ints: map toInteger (map digitToInt (show x)). Finally, to compute the product of the digits we can conveniently use the product function. Making all of this pointfree, we obtain the digital product function with 9 tokens:

product . map toInteger . map digitToInt . show

This expression or a slight variation of it occurs in solutions submitted by Roger Baumgartner and Antonio Russo as well as in MC's own 30 tokens solution. It is good, but not exactly the stuff dreams are made from. The stuff dreams are made from for this exercise could be found on Hackage in the Data.Digits module: the digits function gives one the digits of a number with respect to a given base. Hence, one digital product costs only 4 tokens as noticed by everyone else from the top 10:

product . digits 10

But one digital product is not enough. We need to iterate it and count the number of iterations until we reach a single digits.

Until, hmm until... Here is the approach of the MC:

persistenceRoot =
  flip curry 0 $ until ((<10) . snd) $
    (+1) *** product . map read . map return . show

Or equivalently written in a bit saner and a bit less pointfree fashion:

persistenceRoot x =
  until ((<10) . snd) ((+1) *** product . map read . map return . show) (0, x)

The function until is another perfect match for this exercise: until test f x computes the first y = f (... f (f x) ...) for which test y is satisfied. The MC is a bit embarrassed to have wasted 5 tokens on the digital product and 3 tokens on the (+1) functions, which exists in the Prelude under the name succ.

Tynan Richards was more careful and used a bunch of funky functions from somewhat exotic modules. And he wrote a nice explanation that speaks for itself. Interestingly, Tynan's solution is actually recursive, i.e., he did not use a combinator (a la until) to iterate digital products.

-- The online IDE fails to import these packages
import Data.Digits (digits)
import Data.Tuple.Update (upd1)
import Data.Tuple.Extra (dupe, first)
import Data.Bool.Extras (bool)
import GHC.Integer (leInteger)

- The pointfree style makes it hard to see exactly what this function does, so
- I'll briefly try to explain...
- 'bool' is a function that works like an if-then-else statement. It checks the
- value of 'leInteger 10', which is equivalent to '(<=) 10'. If that condition
- is false, then the input is smaller than 10 and we have the digital root, as
- well as persistence 0, so we return the value calculated by 'upd1 0 . dupe'.
- 'dupe x' gives the tuple (x, x), 'upd1 0 .' sets the first value of that
- tuple to 0, giving (0, x).
- If the input is greater than 10, we instead return the value calculated by
- 'first succ . persistenceRoot . product . digits 10', which recursively
- calculates the digital product ('persistenceRoot . product . digits 10'),
- then adds one to the first element of the tuple ('first succ') when it
- returns.
- The <*> operator applies the function's argument to each of the expressions
- of bool, which enables the complete lack of parentheses.

persistenceRoot :: Integer -> (Integer, Integer)
persistenceRoot =
  bool . upd1 0 . dupe <*>
    first succ . persistenceRoot . product . digits 10 <*> leInteger 10

Lets finally look at the winning solutions. Five of the six were slight variations of the following one by Simon Yuan. Again, it is beautifully documented.

import Data.List
import Data.Bifunctor
import Data.Digits

--19 Token
persistenceRoot :: Integer -> (Integer, Integer)

  The code explained:
  "product . digits 10" calculates the digital product
  (lets call this function dp).
        digits 10 n is evaluated to a Integral list, with the digits
        (in base 10) of n as its elements.
        product then takes to product of all the elements in that list.

  "iterate dp n" is evaluated to [n, dp n, dp (dp n), ...]
        This list contains the input n as well as all the intermediate steps.

  "break (<10)" splits the list into 2, it is a tuple of 2 lists.
      The breakpoint is at the first element for which (<10) is true.
      As soon as an element x is less than 10 (x is the first 1 digit number),
      x will be equal to dp x. Hence x is the digital root.

  "bimap genericLength head (a,b)" is evaluated to (length a, head b)
        The length of a is how many intermediate steps it took to reach a 1
        digit number (= persistence). The head of b (or any element of b) is
        the digital root.

  To save 2 tokens, make the function pointfree by omitting the input twice.

persistenceRoot =
   bimap genericLength head . break (<10) . iterate (product . digits 10)

Above genericLength is used instead of length to obtain an Integer rather than an Int. The iterate function deserves special attention: yes iterate f x just iterates the function f as in [x, f x, f (f x), ...]. Remarkably, the iteration never stops, i.e., the output of iterate f x is unconditionally an infinite list (more on this in a few weeks in the course). The lazy evaluation strategy of Haskell saves us from waiting infinitely long for it to finish. In the following computation we only need to observe a finite prefix of the infinite list.

The remaining 19 token solution by Gabriel Giamarchi is somewhat different:

import Data.Digits
import Data.Tuple

- The idea is to avoid costly (in tokens) termination criteria in
- We use an easily seen property, namely that the digital persistence
  of x is smaller or equal to x (Except for zero !!).
- This allows to write the following function, where we zip from the right
  so that the minimum (lexicographical probably) selects the correct tuple.
- Luckily this also works for zero :-)
persistenceRoot :: Integer -> (Integer, Integer)
persistenceRoot x =
  swap $ minimum $ iterate (product . digits 10) x `zip` enumFromTo 0 x

This one beautifully relies on a mathematical property of the persistence (to prove the above "easily seen" property, prove first that the digital product of a number is smaller that the number for numbers > 10) and the default lexicographic ordering (i.e., (a,b) ≤ (c,d) = a < c || a = c && b ≤ d) on pairs in Haskell. Moreover, it is not even pointfree, which usually hints that it is possible to beat it in terms of tokens. And indeed, here is MC's rewrite of Gabriel's solution that saves 1 token:

import Data.Digits
import Data.Tuple
import Control.Arrow
import Control.Monad

persistenceRoot =
  swap <<< minimum <<<
    zip `liftM2` iterate (product . digits 10) `arr` enumFromTo 0

Here, <<< is another operator for composition . (with precedence 1, whereas . has 9) and arr is another operator for application $ (with precedence 9 as any function written in backquotes, whereas $ has 0). Finally, the general function liftM2, whose type cannot be explained at this point, behaves in this particular instance as liftM2 z f g x = z (f x) (g x).

Great, now to the emojis. They can be considered as special awards for a distinguished achievement in a particular exercise. The precise meaning and value (e.g., in terms of Schoggi) of these awards is unclear yet (even to the MC), but hey, now you have a nice emoji stuck to your name.

First of all, Simon Yuan receives the mathematician award 📐. One of his early submissions read ... . take 100000 . iterate ... and he wrote lengthy justifications why this is safe:

If you are interested to read, I have also done some research
( you really dont have to read this ;) ):
It is conjectured that 11 is the max persistence for base 10.
277777788888899 is the first number with persistence 11,
and they have tested it up to 10^233.

I actually tried to proof that 100000 is way more than enough by proving an
upper limit directly (i.e. I tried to prove the conjecture (before I knew it
was an conjecture) but couldnt really get anywhere. Then I tried to come up
with a lower bound of the number of digits needed for it to have persistence
at least p.


[something about the number of atoms in the universe]


The MC had advised Simon not to use this program and not to rely on the size of the universe. Clearly, we want to write programs (especially in this competition) that will work forever and under all circumstances. And as we know the universe is expanding. Incidentally, taking x first elements of that list when computing the persistence of x (instead of the fixed 100000 first elements) will always suffice—this was also exploited in Gabriel's solution.

The second award goes to Moritz Knüsel. He is a true black hat 🎩 as his 13 token solution demonstrates (comments: his, censorship: MC's).

import System.Process
import System.IO.Unsafe
import Text.Printf
{- If this isn't allowed, please disregard this submission -}
persistenceRoot :: Integer -> (Integer, Integer)

{- very slow, gets killed after completing the last test,
   also using ghci incurs an extra token,
   because it prints extra stuff unless told not to
persistenceRoot =
  read . unsafePerformIO . readCreateProcess (shell "ghci -v0") . printf "\
\import Data.List\nimport Control.Arrow\n\
\genericLength *** head $ span (>=10) $ iterate (product <<< fmap read <<< fmap return <<< show) %d\n"
-- this goes faster, but kinda defeats the purpose of a
-- haskell programming exercise
persistenceRoot = read . unsafePerformIO . readProcess  "bc" [] . printf "\
\define digprod (x) {\n\
  \if (x==0) return 1\n\
  \return (x %% 10)*digprod(x/10)\n\
\define droot (x) {\n\
  \if (x >= 10) return droot(digprod(x))\n\
  \return x\n\
\define drootcnt (x) {\n\
  \if (x >= 10) return 1+drootcnt(digprod(x))\n\
  \return 0\n\
\define void printit(x) {\n\
  \print \"(\", drootcnt(x), \",\" ,droot(x),\")\n\"\

In principle, Moritz would have won this exercise, if it was not for the censored "back door" he used. To Moritz' defense: this was not explicitly prohibited in the rules (but of course in the unspoken rules of the competition) and Moritz was already regretting his usage of such an evil and unportable function as his first comment indicates. Challenge for everybody: can you write a solution in such a style (i.e., using an external interpreter) without using the back door? For this to work, the interpreter would need to be pure (i.e., a side effect free calculator) and expressive enough to computer the digital root and persistence. If something like that is possible, the MC has to redefine the notion of tokens to take the length of a literal string into account.

The last award goes to Loris Reiff. He receives the fairness award 🏆 not for a solution he submitted (his was rather unspectacular), but rather for a solution that he has not submitted:

Btw, today I was looking for solutions online for some inspiration for
next time. I found one on Rosetta Code which only uses 21 tokens. This
is obviously not a submission, since copying is no fun ;) [and I've
already submitted anyway] but it might be something for the blog (if no
one has submitted this solution or a better one).

21 m=(g*h).s(>9).i(p.d1)

-- slightly adapted from
import Data.List
import Data.Digits
import Control.Arrow
mpmdr :: Integer -> (Integer, Integer)
mpmdr = (genericLength *** head) . span (> 9) . iterate (product . digits 10)

The MC feels guilty. His task is to come up with exercises that are original enough not to be on Rosetta code. It is surprising that even the type signature matches. Fortunately, the solution there is not token-optimal, yet it is uncomfortably close to the the optimum. The MC promises to research such "related work" better next time.

Lessons of the week, apart from that the MC should do his related work research properly, are:

Task of week 3

Yay, finally some lists in the lecture!

Here is the task of week 3: Given a finite directed graph G and a subset of its vertices S, define the function that computes the minimal elements in S with respect to G. An element n is minimal in S with respect to G if there is no other element m in S, such that there exists a path from m to n in G (note that only the start point m and end point n of such a path are required to be in S).

We represent directed graphs by their adjacency matrix using the type [(Int,[Int])]. For example, the entry (1,[2,3,4]) in such an adjacency matrix means that vertex 1 has outgoing edges to the vertices 2, 3, and 4. And the entry (2,[]) means that 2 has no outgoing edge (to the same effect we could have just omitted that entry). You may assume that for each vertex there is at most one entry in the adjacency matrix. Moreover, we are using lists [Int] to represent sets of verdices.

So we are looking for a function of type graphMin :: [(Int,[Int])] -> [Int] -> [Int] that performs the computation described above. The function may return the minimal elements in any order (and even with duplicates). For example:

import Data.List

sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,1,2,3,4]     = [0]
sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [1,2,3,4]       = [1,2]
sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [1,3,4]         = [1,4]
sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,3,4]         = [0]
sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,1,2,3,4,5,6] = [0,5,6]
sort $ nub $ graphMin []                          [0,1,2,3,4]     = [0,1,2,3,4]
sort $ nub $ graphMin [(0,[1,2]),(1,[0])]         [1,2,3]         = [1,3]
sort $ nub $ graphMin [(0,[1,2]),(1,[0])]         [1,0,3]         = [3]

The evaluation criterion is time efficiency. Your function will be tested on large graphs, small graphs, dense graphs, sparse graphs, short lists, and longer lists (but probably not too long lists). The solution of the MC is fairly unoptimized. It can handle lists of length up to few hundreds elements for larger graphs (100000 edges) in a few seconds. There is some room for improvement.

Hint: :set +s in GHCi is useful to obtain some timing and memory measurements.

Results of week 3

This week there were only 26 student submissions (plus one by the MC). The MC is slightly disappointed. A decrease of 30% compared to the previous week is ok; a decrease of 60% compared to the previous week indicates that the exercise was simply too hard or not perceived as interesting. And this even though the idea behind this competition problem originated from a small "real-world" subproblem that the MC's alter ego has encountered in his recent (not yet published) research: compute the minimal elements of a set with respect to a preorder.

Additionally, the upcoming quiz and the associated preparations can be presumably blamed as well. (Although not for a 100% decrease in professorial participation. The professors forfeited to the so far absolutely dominant student performance.)

The MC hopes for at least a non-decrease in participation next week. Even if you have close to 0 points at this point, it is still possible to get into the top 10 of the semester, although it will probably be hard to get into the top 5.

Anyway, enough rambling. Let's get down to business.

Top ≈10 of week 3
  # name      points
1. Moritz Knüsel 10
1+ε. Miro Haller ➖🧟
2. Mike Boss 9
Simon Yuan 9
4. Roger Baumgartner 7
5. Loris Reiff 6
6. Matthias Brun 5
Antonio Russo ➖5
Zur Schmaria 5
9. Noah Berner 2
Jorel Elmiger ➖2
Tynan Richards 2

How are these results coming together? Through a sequence of increasingly harder tests. We start with a soundness test. For that purpose the MC uses his favorite random testing library: QuickCheck. Here is a ghci script that the MC ran on each of the submissions, after briefly ensuring that the submissions won't attempt to erase his file system (please don't do this: it would cost you a zillion competition points, but also earn you a 🎩).

:m + Test.QuickCheck Data.List
let getAll xs x = nub $ concat [zs | (y,zs) <- xs, x == y]
let dom = map (abs . fst)
let union xs = map (\x -> (x, filter (/= x) $ map abs $ getAll xs x)) $ dom xs
let test sol = quickCheckWith stdArgs { maxSize = 10, maxSuccess = 3000 }
  (\xs ys0 ->
    let ys = nub $ map abs ys0 in
    let g = union xs in
    (sort $ nub $ graphMinSol g ys) == (sort $ nub $ sol g ys))
:set +s
test GraphMin.graphMin

The MC should explain the quickCheckWith invokation. QuickCheck will execute its function argument in this invokation on 3000 randomly generated integer lists ys0 and graphs xs of length at most 10 (also all randomly chosen integers in those lists are between -10 and 10). To ensure that the randomly generated graphs satisfy the promised property of having at most one adjacency matrix entry for each vertex, the MC used the union function defined above. For each of the 3000 tests, the output of MC's unoptimized solution graphMinSol (shown below) is compared (modulo sorting and duplicates) to the output of the contestant's solution GraphMin.graphMin.

import Data.List

path g = go []
    go vis x y
      | x == y = True
      | otherwise = case lookup x g of
        Nothing -> False
        Just zs -> let vis' = x : vis in or [go vis' z y | z <- zs \\ vis']

graphMinSol :: [(Int,[Int])] -> [Int] -> [Int]
graphMinSol g qs = let ps = nub qs in
  filter (\p -> all (\q -> p == q || not (path g q p)) ps) ps

Only 17 participants (including the MC) passed this test. If your name does not occur somewhere below, try running QuickCheck on your solution to see the actual bugs. Three of the remaining participants would have failed the test, if the MC had not removed the negative numbers from the random inputs (via abs). Since nothing in the exercise stated that all numbers will be non-negative, these solution could be directly disqualified, in principle. In an unexpected bout of mercy, the MC decided to see where these contestants will end up and potentially penalize them with a point deduction later. As a reminder, let's put a "heavy minus" emoji next to their names: Antonio Russo ➖, Jorel Elmiger ➖, Miro Haller ➖.

The whole truth about Miro Haller's submission is even more shocking. He spent great effort on making his solution very efficient. Yet a minor bug (even on positive numbers) compromised his whole work. When the MC reported the bug to Miro after the submission deadline, he quickly came up with a fix that still failed on negative numbers, but seemed to be correct otherwise. It is clear that the MC cannot give points to such a late submission, but he decided to at least evaluate this submission's efficiency. In other words: here is Miro's well deserved zombie emoji 🧟.

Something similar happened also to Matthias Brun's efficient solution. However, he was suspecting a bug and took precautions. He submitted an alternative "naive" solution and asked the MC to grade that one. A wise decision to keep some points.

OK, next test. First, here are three particular (families of) graphs:

k i = map (\k -> (k, [1..k-1] ++ [k+1..i])) [1..i] -- komplete [*]
c i = map (\k -> (k, [(k + 1) `mod` i])) [0..i-1]  -- cycle
l i = map (\k -> (k, [k + 1])) [1..i]              -- line

[*] Wikipedia is not sure whether complete graphs are typically abbreviated with a "k", because of the German word komplett (which seems to be a false friend) or the mathematician Kuratowski.

The MC then tested the following three graphs.

let i = 20 in (sort $ nub $ GraphMin.graphMin (k i) [1..i]) == []
let i = 300 in (sort $ nub $ GraphMin.graphMin (c i) [0..i-1]) == []
let i = 300 in (sort $ nub $ GraphMin.graphMin (l i) [1..i]) == [1]

And here are the results (in no particular order).

TIMEOUT        Dan Kluser
0.01 0.01 0.21 Loris Reiff
0.05 0.10 0.05 Roger Baumgartner
0.01 1.48 4.05 Sandro Panighetti
0.01 0.01 0.01 Moritz Knüsel
TIMEOUT        Patrick Ziegler
TIMEOUT        Silvan Niederer
0.01 0.01 0.01 Mike Boss
0.01 0.01 0.02 Zur Shmaria
0.02 4.35 4.32 Noah Berner
0.02 5.32 2.61 Tynan Richards
0.01 0.04 0.07 Antonio Russo ➖
0.02 5.32 1.92 Jorel Elmiger ➖
0.01 0.28 0.56 MC
0.02 0.34 0.14 Matthias Brun
0.01 0.01 0.01 Miro Haller 🧟
0.06 0.01 0.01 Simon Yuan

Three timeouts and a bit fluctuation otherwise. Not really conclusive. But you already see three solutions that are equally fast on all graphs: Moritz Knüsel, Mike Boss, and our dead man walking. The timeouts are eliminated before the next round of course.

The MC also played around with some permutations and subsequences of the above arguments. The test let i = 20 in (sort $ nub $ GraphMin.graphMin (k i) [i,i-2..0]) == [] has revealed a bug in Sandro Panighetti's solution, which managed to survive the QuickCheck tests. A good reminder that testing is good, but proving is better. The MC decided at this point that the remaining 11 contestants will all end up in the top 10. So the next tests are about the final placing.

Here is the next challenge: lines with many complete subgraphs along them.

:m + Data.List
let getAll xs x = nub $ concat [zs | (y,zs) <- xs, x == y]
let dom = map (abs . fst)
let union xs = map (\x -> (x, filter (/= x) $ map abs $ getAll xs x)) $ dom xs
let k i j = map (\k -> (k, [i..k-1] ++ [k+1..j])) [i..j]
let l i = map (\k -> (k, [k + 1])) [1..i]
:set +s
let i = 100 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0..i]) == []
let i = 100 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,10..i]) == [0]
let i = 100 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,20..i]) == [0]
let i = 300 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0..i]) == []
let i = 300 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,10..i]) == [0]
let i = 300 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,20..i]) == [0]
let i = 600 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0..i]) == []
let i = 600 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,10..i]) == [0]
let i = 600 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,20..i]) == [0]
let i = 900 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0..i]) == []
let i = 900 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,10..i]) == [0]
let i = 900 in (sort $ nub $ GraphMin.graphMin
  (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,20..i]) == [0]
And the corresponding results:
0.02 0.02 0.01 0.07 0.14 0.09 0.19 0.73 0.50 0.57 2.35 1.46  Loris Reiff
0.04 0.01 0.02 0.20 0.12 0.07 0.87 0.27 0.30 1.98 0.71 0.66  Roger Baumgartner
0.02 0.01 0.01 0.04 0.07 0.06 0.22 0.11 0.14 0.32 0.26 0.38  Moritz Knüsel
0.06 0.02 0.02 0.11 0.12 0.15 0.42 0.43 0.21 0.50 0.75 0.52  Mike Boss
0.03 0.02 0.02 0.13 0.09 0.10 0.61 0.25 0.47 1.52 0.68 0.99  Zur Shmaria
0.27 0.04 0.03 6.14 0.67 0.37 49.61 TIMEOUT                  Noah Berner
0.26 0.05 0.03 5.79 0.64 0.37 42.66 TIMEOUT                  Tynan Richards
0.03 0.02 0.02 0.12 0.11 0.12 0.48 0.36 0.43 0.90 0.58 0.50  Antonio Russo ➖
0.37 0.02 0.02 9.44 0.14 0.09 TIMEOUT                        Jorel Elmiger ➖
TIMEOUT                                                      MC
0.07 0.02 0.01 0.79 0.15 0.11 5.47 0.78 0.52 17.09 2.27 1.35 Matthias Brun
0.06 0.01 0.02 0.06 0.06 0.05 0.23 0.23 0.22 0.48 0.48 0.33  Miro Haller 🧟
0.03 0.01 0.01 0.07 0.06 0.05 0.21 0.20 0.21 0.46 0.38 0.34  Simon Yuan

The MC is defeated. And with him three other contestants.

Next, we'll try something more intuitive and realistic. How about the prime numbers?

:m + Data.List Data.Hashable
let multiples k b = [k * i | i <- [1..b`div`k + 1]]
let g i = map (\k -> (k, multiples k i)) [1..i]
let g1000 = g 1000
let g5000 = g 5000
let g10000 = g 10000
:set +s
hash $ GraphMin.graphMin g1000 [2..length g1000]
hash $ GraphMin.graphMin g5000 [2..length g5000]
hash $ GraphMin.graphMin g10000 [2..length g10000]

Note, how the hash function is used to avoid measuring the time needed to print the results. The minimal elements of the above graphs are exactly the prime numbers (up to the bound b). Perhaps this is the best algorithm for computing primes?

0.23 5.25 22.53 Loris Reiff
0.09 0.32 1.09  Roger Baumgartner
0.04 0.11 0.24  Moritz Knüsel
0.04 0.15 0.48  Mike Boss
TIMEOUT         Zur Shmaria
TIMEOUT         Antonio Russo ➖
TIMEOUT         Matthias Brun
0.04 0.18 0.45  Miro Haller 🧟
0.08 0.42 1.87  Simon Yuan

Three students gone. Making the divisibility graph even bigger (e.g., let g100000 = g 100000), while keeping the list size (hash $ GraphMin.graphMin g100000 [2..10000]) also makes Loris Reiff's and Roger Baumgartner's solution time out.

Time for a final showdown. Let's go Big Data. In an intermediate submission Moritz Knüsel claims to have tested his solutions on graphs with 20 millions edges. The divisibility graph with 2 millions vertices, has roughly 31 millions edges.

:m + Data.List Data.Hashable
let multiples k b = [k * i | i <- [1..b`div`k + 1]]
g i = map (\k -> (k, multiples k i)) [1..i]
let gx = g 2000000
:set +s
hash $ GraphMin.graphMin gx [30000,29998..10000]

And here the final results:

13.84    Moritz Knüsel
TIMEOUT  Mike Boss
21.79    Miro Haller 🧟
TIMEOUT  Simon Yuan

Pretty amazing, huh? Note that those huge graphs are processed without any parallelization on a usual laptop.

Finally, lets briefly sample a few of the best solutions. They are a bit long and the MC will (as usual) rely on the comments provided by the contestants.

Mike Boss uses Haskell's Data.Graph library:

module GraphMin where

    {- Grade this (for real now)
        - Longer first part is just the conversion to type Graph.
          Problems with non-existing vertices and renaming are fixed.
            - notInG: all endpoints of edges which are not in the graph
            - allOfG: all vertices from the graph
              (even ones with no outgoing edges)
            - sButNotG: vertices which are in the set but not in the graph =>
              are minimum
            - sAndG: vertices in the set which are also in the graph
            - convertSet: fixes renaming
        - Second part is the actual algorithm:
          Deletes all vertices reachable from the vertices in the set
          which means they aren't minimums.

    import Data.Graph
    import Data.Maybe
    import qualified Data.IntSet as Set

    graphMin :: [(Int,[Int])] -> [Int] -> [Int]
    minimumInGraph :: Graph -> Set.IntSet -> Set.IntSet -> [Int]

    graphMin graph set = [x | (x, _, _) <- result] ++ (Set.toList sButNotG)
            realSet = Set.fromList set
            edges = Set.fromList (concat $ map snd graph)
            vertices = Set.fromList (map fst graph)

            notInG = Set.difference edges vertices
            allOfG = Set.union vertices notInG
            sButNotG = realSet Set.\\ allOfG
            sAndG = realSet Set.\\ sButNotG
            correctedGraph =  [(s, []) | s <- (Set.toList notInG)] ++ graph

            (builtGraph, v, k) = graphFromEdges
              (map (\(a, b) -> (a, a, b)) correctedGraph)
            convertSet = (fromJust . k) sAndG

            result = map v (minimumInGraph builtGraph convertSet convertSet)

    minimumInGraph graph set result
            | Set.null set = Set.toList result
            | Set.null result = []
            | Set.size set == 1 = Set.toList (result Set.\\ reachFromS)
            | otherwise = minimumInGraph graph (Set.delete head set)
               (result Set.\\ reachFromS)
                reachFromS = Set.delete head
                  (Set.fromList (reachable graph (head :: Int)))
                head = Set.findMin set

Miro Haller's corrected solution has a nice complexity analysis, which the MC didn't attempt to verify. First Miro has to fix the negative number problems, anyway.

-- I use containers- (

module GraphMin where

import Prelude hiding (fromList)
import Data.IntMap as IntMap
import Data.Sequence as Seq

-- I use IntMap because they have (almost) constant access time I think.
-- It depends on the number of bits the integers use.
-- Also I rely on the fact that the nodes are numbered in a more or less
-- ascending sequence of natural numbers
graphMin :: [(Int,[Int])] -> [Int] -> [Int]
graphMin graph subset = findUnreachable subset []
        -- O(n)
        graphNodes = [node | (node, xs) <- graph]

        -- O(n)
        maxNodeNr = maximum (graphNodes ++ subset)

        -- Visited array O(n) (I assume n = O(maxNodeNr))
        visited = IntMap.fromList [(node, False) | node <- [0..maxNodeNr]]

        -- Convert out nodes lists to sequences O(n^2)
        graphSeq = [(node, Seq.fromList outNodes) | (node, outNodes) <- graph]
        -- Out edges O(n)
        outEdges = IntMap.fromList graphSeq

        -- True if node is in subset
        -- O(n)
        isInSubset = IntMap.fromList [(node, True) | node <- subset]

        -- Array with for each node nodes from which it is reachable
        -- O(n)
        reachableFromNodesInit = IntMap.fromList [(node, []) | node <- [0..maxNodeNr]]

        -- DFS like algorith, write from node to reachable nodes
        -- O(n^2 * log(n) )
        dfsIter [] reachableFromNodes = reachableFromNodes
        dfsIter (subsetNode:subsetRest) reachableFromNodes = dfsIter subsetRest newReachableFromNodes
                -- O(1)
                connectedNodes = findWithDefault Seq.Empty subsetNode outEdges

                -- O(n)
                newReachableFromNodes = dfsFindReachable connectedNodes reachableFromNodes visited
                        dfsFindReachable Seq.Empty reachableFromNodes visitedLoc = reachableFromNodes
                        dfsFindReachable todoNodesSeq reachableFromNodes visitedLoc
                            -- Cases checking O(1)
                            | findWithDefault False nextNode isInSubset = dfsFindReachable nodesRest extendedReachableFromNodes visitedNew
                            | findWithDefault False nextNode visitedLoc = dfsFindReachable nodesRest reachableFromNodes visitedNew
                            | otherwise = dfsFindReachable extendedListRest reachableFromNodes visitedNew
                                    updateFn key x = True
                                    visitedNew = adjustWithKey updateFn nextNode visitedLoc

                                    -- Both O(1) because it is the first entry
                                    nextNode = Seq.index todoNodesSeq 0
                                    nodesRest = Seq.deleteAt 0 todoNodesSeq

                                    -- O(1)
                                    extendedReachableFromNodes = IntMap.adjust (subsetNode :) nextNode reachableFromNodes

                                    -- O( log( min{n1, n2} ) ) where n1,n2 are the lengths of the sequences
                                    extendedListRest = (findWithDefault Seq.Empty nextNode outEdges) >< nodesRest

        -- Comp. stands above
        reachableFromNodes = dfsIter subset reachableFromNodesInit

        -- Find all nodes that are at most reachable from itself (through a path through nodes
        -- which are not in the subset S)
        -- O(n^2)
        findUnreachable [] unreachableNodes = unreachableNodes
        findUnreachable (subsetNode:subsetRest) unreachableNodes
            | hasOtherNode (findWithDefault [] subsetNode reachableFromNodes) = findUnreachable subsetRest unreachableNodes
            | otherwise = findUnreachable subsetRest (subsetNode : unreachableNodes)
                    -- Check if a node is reachable from another node
                    -- O(n)
                    hasOtherNode [] = False
                    hasOtherNode (otherNode:listRest) =
                        if subsetNode == otherNode
                                hasOtherNode listRest

Finally, Moritz Knüsel has apparently found the right balance between using lazy and eager data structures. And it nicely handles edge cases separately.

{-# LANGUAGE BangPatterns #-}
module GraphMin where
import Data.List
import qualified Data.IntSet
import qualified Data.IntMap.Lazy

-- final submission

type LG = [(Int,[Int])]

graphMin :: LG -> [Int] -> [Int]
graphMin g s = graphMin11 g s

-- Basic Idea:
-- from every node in S, we walk the graph and find all elements in S that are
-- reachable from that particular Node, except the node itself,
-- since a minimal element may still have paths coming back to itself
-- the union of all nodes discovered like this is then subtracted from S,
-- yielding only the nodes that are not reachable from any other
-- This would obviously be way too slow, so, on a walk starting at a node v,
-- we save all nodes we've seen, except the ones that are on a path from v to v
-- these nodes are saved in a map, together with v,
-- so we can avoid walking that path again later
-- This collection of nodes and the map with shortcuts is
-- then passed along for the search starting at the next node in S

-- because of stack overflow problems some strictness annotations
-- have been added, which seemed to solve them

-- shortcuts for all the datastructures and functions for accessing them
type Set2 = Data.IntSet.IntSet
ins2 = Data.IntSet.insert
empty2 = Data.IntSet.empty
toLst2 = Data.IntSet.toList
fromLst2 = Data.IntSet.fromList
del2 = Data.IntSet.delete
isIn2 = Data.IntSet.member
union2 = Data.IntSet.union
size2 = Data.IntSet.size
diff2 = Data.IntSet.difference

type G2 = Data.IntMap.Lazy.IntMap [Int]
lgToG2 = Data.IntMap.Lazy.fromList
getEdges2 g v = Data.IntMap.Lazy.findWithDefault [] v g
type GoesToMap2 = Data.IntMap.Lazy.IntMap Int
emptyGTM2 = Data.IntMap.Lazy.empty
memberGTM2 = Data.IntMap.Lazy.member
getGTM2 = Data.IntMap.Lazy.findWithDefault 0
unionGTM2 = Data.IntMap.Lazy.union
fromSetGTM2 = Data.IntMap.Lazy.fromSet
mergeSetGTM2 gtm set v = unionGTM2 gtm (fromSetGTM2 (const v) set)

-- recursively look at all nodes reachable from a given node, recording the path taken
fun11 :: GoesToMap2 -> -- contains nodes that, in earlier searches,
                       -- were found to loop back to where the search started
  Set2 -> -- contains all nodes that go back to the value this search started at,
          -- needed in case of loop back paths with cycles in them
  G2 ->   -- the graph we're looking at
  Set2 -> -- the set S
  Set2 -> -- the set of points we've already visited. This set may
          -- contain nodes that are in lgtm, since these will only be
          -- removed later, so it's important to check lgtm first
  Set2 -> -- the nodes on the path we've come down
  Int ->  -- the current node
  Int ->  -- the node we started at
  (Set2,Set2) -- returns the set of nodes seen and a set of nodes that lead back to sval.
              -- If the current node is not on such a path, this will be empty
fun11 gtm lgtm g s !seen pset v sval
-- we've come back around, so we need to return the nodes we took to get here
  | sval == v && not (vInSeen) = (seen,vpset)
-- we've come back around, but sval is already known not to be minimal
  | sval == v && vInSeen       = (seen,empty2)
-- the current node is on a path that leads to sval, so we need to save how we got here
  | mlGTMv                     = (seen, vpset)
-- we've already visited this node at some point
  | vInSeen                    = (seen,empty2)
-- the current node is in S, we can't go further
  | isIn2 v s                  = (vseen,empty2)
-- the current node leads to an element in S, but we can skip going down the entire path
-- since all other nodes on that path are not in S, we don't need to add all of them to seen
  | mGTMv                      = (ins2 gGTMv vseen,empty2)
  | otherwise                  =
-- recurse on v's neighbors
      descend11 gtm lgtm g (getEdges2 g v) s vseen vpset sval
  where vInSeen = isIn2 v seen
        vpset = ins2 v pset
        vseen = ins2 v seen
        mGTMv = memberGTM2 v gtm
        mlGTMv = isIn2 v lgtm
        gGTMv = getGTM2 v gtm

-- helper function for the recursion
-- since, in the function above, we check lgtm before seen,
-- we can avoid removing elements from seen here
descend11 :: GoesToMap2 -> Set2 -> G2 -> [Int] -> Set2 -> Set2 -> Set2 -> Int -> (Set2,Set2)
descend11 _ lgtm _ [] _ seen  _ _ = (seen,lgtm)
descend11 gtm lgtm g (e:es) s !seen pset sval =
  let (nseen,npset) = fun11 gtm lgtm g s (seen) pset e sval in
    descend11 gtm (union2 lgtm npset) g es s nseen pset sval

-- searches on all elements in S
-- if a search discovered paths that go back to its starting node, all nodes on such paths
-- are taken out of seen (if we changed some guards around in fun11, this would not be necessary
-- but some quick tests indicate it helps if s is large and g is dense) and added
-- to the map
gmhelp11 :: G2 -> [Int] -> Set2 -> Set2 -> GoesToMap2 -> Set2
gmhelp11 g [] s seen gtm = seen
gmhelp11 g (e:es) s !seen gtm =
  let (nseen,npset) = descend11 gtm empty2 g (getEdges2 g e) s seen empty2 e in
    gmhelp11 g es s (diff2 nseen npset) (mergeSetGTM2 gtm npset e)

-- transforms the graph and S into the datastructures used above, calls gmhelp11 and
-- subtracts the result from S
graphMin11 :: LG -> [Int] -> [Int]
-- get trivial cases out of the way right away
graphMin11 _ [] = []
graphMin11 _ [x] = [x]
graphMin11 [] s = s
graphMin11 g s = toLst2 $ diff2 s' (gmhelp11 (lgToG2 g) s s' empty2 emptyGTM2)
  where s' = fromLst2 s

Oh, and the heavy minuses translate into a one point deduction (truncated at 0).

What have we learned?

Task of week 4

A string s is a subsequence of another string t, if one can obtain s by removing characters from t. For example: The string "foo" is a subsequence of "fofo", "frohlocken", and "funktionale Programmierung is einfach toll". It is not a subsequence of "ofo", "FoO", "oooo", and "tofu".

A lists of strings xs is subsumed by another list of strings ys, if for each string s from xs, there exists a string t in ys, such that s is a subsequence of t. Moreover, you may not use strings from ys more than once (as witnesses for the above "exists").

Write a function subsub :: [String] -> [String] -> Bool that checks whether its first argument is subsumed by its second argument.

All following examples should evaluate to True:

subsub ["a", "ab", "abc"] ["false", "bab", "cdb", "cabAdAbc", "true"]
subsub ["FM", "FP"] ["Ferrovia Mesolcinese", "Microsoft FrontPage"]
subsub ["FP", "FM"] ["Ferrovia Mesolcinese", "Microsoft FrontPage"]
subsub ["1+2=2"] ["12+12=24"]
not $ subsub ["FM", "FP"] ["FMFP"]
subsub ["FM", "FP"] ["FMFP", "FMFP"]
subsub ["FM", "FM"] ["FM", "Ferrovia Mesolcinese"]
not $ subsub ["FM", "FM"] ["FM"]

Evaluation criterion: efficiency. Your solution will be tested with short and long lists and short and long strings.

Results of week 4

Top 10 of week 4
  # name   ranking   points
1. ? 0
2. ? 0
3. ? 0
4. Your name could be here ? 0
5. ? 0
6. ? 0
7. ? 0
8. ? 0
9. ? 0
10. ? 0

Task of week 5

The MC loves regular expressions. But sometimes he manages to insert sputid tysop in his text that cause a later regex search to fail. In this task, your goal is to develop a regex matcher that is immune against scrambled letters.

We consider the following algebraic datatype (wait for the lecture on Tuesday, where those will be introduced) of regular expressions that should be familiar from a Theoretical Computer Science course:

data RE = Zero          -- Zero matches nothing
        | One           -- One matches only the word []
        | Atom Char     -- Atom c matches only the word [c]
        | Plus RE RE    -- Plus r s matches all words matched by r or s
        | Concat RE RE  -- Concat r s matches all words w = u ++ v
                        -- such that u is matched by r and v is matched by s
        | Star RE       -- Star r matches [] and all words w = u ++ v
                        -- such that u is matched by r and v by Star r

Hence, for a standard definition of match :: RE -> String -> Bool we would have:

match (Star (Atom 'a' `Concat` Atom 'b')) ""
match (Star (Atom 'a' `Concat` Atom 'b')) "ab"
match (Star (Atom 'a' `Concat` Atom 'b')) "abab"
match (Star (Atom 'a' `Concat` Atom 'b')) "ababababab"
not $ match (Star (Atom 'a' `Concat` Atom 'b')) "a"
not $ match (Star (Atom 'a' `Concat` Atom 'b')) "abba"
not $ match (Star (Atom 'a' `Concat` Atom 'b')) "baba"
not $ match (Star (Atom 'a' `Concat` Atom 'b')) "ababa"

However, here we are not interested in the standard match, but rather in a dyslexicMatch :: RE -> String -> Bool function that accepts a string s exactly if there is a permutation t of t (of the same length, with the same multiplicities for duplicated characters) that would be accepted by the usual match function. For example, all the following should yield True:

dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) ""
dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ab"
dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "abab"
dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ababababab"
not $ dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "a"
dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "abba"     -- consider "abab"
dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "baba"     -- consider "abab"
not $ dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ababa"

Evaluation criterion: feficiency. Your solution will be tested with short and long strings and simple and deeply nested regular expressions.

Results of week 5

Top 10 of week 5
  # name   ranking   points
1. ? 0
2. ? 0
3. ? 0
4. or here ? 0
5. ? 0
6. ? 0
7. ? 0
8. ? 0
9. ? 0
10. ? 0