## Functional Programming Competition 2018

#### part of the Formal Methods and Functional Programming (FMFP) course at ETH Zürich

Every week a special Haskell programming exercise is published here and in Code Expert. Each exercise is carefully selected by the *Master of Competition* (*MC*)—Dmitriy Traytel—to separate the wheat from the chaff. After 7 weeks, the intersection of the wheat with the FMFP students will receive delicious prizes in a in-lecture ceremony.

Participants submit their solutions via Code Expert. Submissions by mail are accepted only in two exceptional cases: (1) the participant is not enrolled at ETH and therefore cannot use Code Expert, and (2) the solution uses a Haskell library/pragma that is not available on Code Expert. In case (2) precise instructions of how to run the submission (including version numbers) are a necessary requirement for the solution to be evaluated.

Every week, MC's favorite submissions are discussed in a blog post. The submissions are evaluated with respect to different criteria (but the function's correctness is always a must have), which are announced along with the exercise. The best 10 solutions receive from 1 to 10 points (more is better). The points and the names of the participants are made public in a *Top 10 of the week* and aggregated in a *semester ranking*.

**Important**: If you don't want your name to appear on a publicly visible
webpage (but still want to participate),
write to the MC and suggest a pseudonym.

Finally, the MC employs a zero tolerance policy against plagiarism. Please compete fairly and individually. Submissions that have been developed jointly by two contestants have to be clearly marked as such. Violations of those rules may result in arbitrary point deductions.

Outline:

- Semester ranking
- The Task and Results of Week 1
- The Task and Results of Week 2
- The Task and Results of Week 3
- The Task and Results of Week 4
- The Task and Results of Week 5

Semester ranking (after 3 weeks) | ||
---|---|---|

# | name | points |

1. | Moritz Knüsel 🎩 | 27.0 |

Simon Yuan 📐 | 27.0 | |

3. | Matthias Brun | 25.0 |

4. | Mike Boss | 22.0 |

5. | Roger Baumgartner | 17.0 |

Loris Reiff 🏆 | 17.0 | |

4. | Gabriel Giamarchi | 14.0 |

Chris Buob | 13.0 | |

9. | Tynan Richards | 9.0 |

10. | Börge Scheel | 7.0 |

11. | Antonio Russo | 6.0 |

12. | Noah Berner | 5.0 |

Zur Shmaria | 5.0 | |

14. | Giuseppe Arcuti | 3.0 |

Lukas Baege | 3.0 | |

Andrin Bertschi | 3.0 | |

Miro Haller 🧟 | 3.0 | |

Lukas Heimes | 3.0 | |

Timon Jenne | 3.0 | |

Nikola Kovacevic | 3.0 | |

Andreas Kuster | 3.0 | |

Sandar Felicity Lim | 3.0 | |

Manuel Nowack | 3.0 | |

Felix Sarnthein-Lotichius | 3.0 | |

Roberto Starc | 3.0 | |

Elwin Stephan | 3.0 | |

Valentin Stoppiello | 3.0 | |

Jan Wiegner | 3.0 | |

29. | Jorel Elmiger | 1.0 |

### Task of week 1

Write a function `oddSum`

that takes five `Int`

arguments as input and computes the sum of the maximum, the minimum, and the median of the arguments.

For example:

oddSum 1 2 3 4 5 = 1 + 3 + 5 = 9 oddSum 5 4 1 3 2 = 1 + 3 + 5 = 9 oddSum 1 1 5 3 3 = 1 + 3 + 5 = 9

The shortest (in terms of number of tokens^{1}) solution wins! Library imports and type signatures are excluded from the token count. As a measure of reference: the MC has an optimized (but not too crazily optimized) 27 token solution.

^{1. This competition continues a long-standing tradition of in-lecture functional programming competitions established at TU München. The definition of tokens is borrowed form there.}

### Results of week 1

Here are the best

Top 10 of week 1 | |||
---|---|---|---|

# | name | #tokens | points |

1. | Matthias Brun | 17 | 10 |

2. | Loris Reiff | 19 | 9 |

3. | Simon Yuan | 26 | 8 |

4. | Roger Baumgartner | 27 | 7 |

Moritz Knüsel | 27 | 7 | |

Börge Scheel | 27 | 7 | |

7. | Gabriel Giamarchi | 29 | 4 |

8. | Giuseppe Arcuti | 31 | 3 |

Lukas Baege | 31 | 3 | |

Noah Berner | 31 | 3 | |

Andrin Bertschi | 31 | 3 | |

Mike Boss | 31 | 3 | |

Chris Buob | 31 | 3 | |

Miro Haller | 31 | 3 | |

Lukas Heimes | 31 | 3 | |

Timon Jenne | 31 | 3 | |

Nikola Kovacevic | 31 | 3 | |

Andreas Kuster | 31 | 3 | |

Sandar Felicity Lim | 31 | 3 | |

Manuel Nowack | 31 | 3 | |

Tynan Richards | 31 | 3 | |

Felix Sarnthein-Lotichius | 31 | 3 | |

Roberto Starc | 31 | 3 | |

Elwin Stephan | 31 | 3 | |

Valentin Stoppiello | 31 | 3 | |

Jan Wiegner | 31 | 3 |

The MC is happy about 104 student participants in this first week of the competition (+ 1 tutor + 1 professor + the MC). Several participants have submitted more than one solution without any comments (such as `-- grade me`

), leaving the MC the choice to pick the worst one in terms of tokens. However, the MC was so happy about the fact that every participant has submitted at least one correct solution (determined by the MC by a mixture of testing and manual inspection), that he decided to take the best submission in terms of tokens into account.

Before we start looking at the best solutions, let's see what one can do with what we've seen in the lecture so far. The following solution is MC's attempt to only use the material from the lecture and also one of the submissions by our number 3, Simon Yuan (and a few others, who wasted tokens by adding some needless parentheses or the like). With 58 tokens it is far away from being Simon's or the MC's best solution and even far away from the top 10.

oddSum a b c d e | a > b = oddSum b a c d e | b > c = oddSum a c b d e | c > d = oddSum a b d c e | d > e = oddSum a b c e d | otherwise = a + c + e

To understand how this solution works, one has to realize that the guards are matched *sequentially*: i.e., the program will only check `c>d`

after having established `a≤b`

and `b≤c`

. With this insight it is pretty obvious what this code does: the last trivial guard `otherwise`

is only reached once `a b c d e`

is sorted.

Now we have a baseline to compete against. It is fairly obvious, that sorting should be part of any successful solution. Can we sort the five elements in Haskell, without implementing our own sorting algorithm? Of course: we can just use the right libary function. To use this function we must import
the library `Data.List`

, which contains many useful functions about the Haskell list type (more on this next week in the lecture; what you see below should be mostly intuitive usage of finite sequences of `Int`

elements). But, hey, imports cost 0 tokens!

Once the list is sorted, we have to think how to extract the minimal, maximal, and median element. Many solutions, including the following one by an anonymous professor, decided for an index-based addressing, written `list!!index`

in Haskell.

oddSum a b c d e = let xs = sort [a,b,c,d,e] in xs !! 0 + xs !! 2 + xs !! 4

This gives 34 tokens (or 33 when using `where`

) and brings us closer to the top 10.

To get into the top 10, this time it was enough to access the first and last elements of the sorted list using library functions (`head`

and `last`

). Note that `head xs`

are two tokens, while `xs!!0`

are three. Alternatively, but less efficiently, one can extract the `minimum`

or the `maximum`

of the (already sorted!) list. Alas, all 31-token solutions, including one submitted by a tutor who likes applicative functors (more on this much later in the lecture), are variations of the following two.

import Data.List oddSum a b c d e = maximum xs + minimum xs + xs !! 2 where lst = sort [a,b,c,d,e] oddSum' a b c d e = head xs + last xs + xs !! 2 where lst = sort [a,b,c,d,e]

**Side remark 1:** Some participants mistook tokens for characters, and tried to minimize the number of characters used in their program. Readability suffered from this quite a bit, e.g., `oddSum a b c d e=xs!!0+xs!!2+xs!!4 where xs=sort[a,b,c,d,e]`

. Useful advice: leave the reader of your program some air to breathe. It is better to kill needless parentheses than spaces (also in terms of tokens).

All solutions further up in the ranking use techniques other than indexing to extract the three elements from the sorted list. Gabriel Giamarchi uses the following trick (omitting the `Data.List`

import):

oddSum a b c d e = sum ((subsequences (sort [a,b,c,d,e]))!!21)

Before we talk about the semantics lets see if syntactically this solutions does its best in terms of tokens. No! Prefix function application **always** binds stronger than any infix operator. That is, `f a b c d !! g x y z`

is always parsed as `(f a b c d) !! (g x y z)`

.
Have I already mentioned what to do with needless parentheses? Don't get me wrong: Gabriel did a great job by using `subsequences`

in the first place. But those two parentheses around `subsequences ...`

cost him three competition points, which in the end might decide between Lindt and Migros Budget.

So how does this solution work? The library function `subsequences`

computes the list of subsequences (i.e., a list whose elements are lists themselves) of the input list, where a list `xs`

is a subsequence of another list `ys`

if one can obtain `xs`

by removing zero or more elements from `ys`

(but without reordering the elements). For example: `subsequences [1,2,3] = [[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3]]`

. Now, when applied to a list with five elements, the result of `subsequences`

will contain the subsequence consisting from the first, middle, and last element in the 22nd position (i.e., accessible using the 0-based index 21). This holds regardless of the list elements' types or values and is exploited here. Finally, the library function `sum`

sums up the elements of this subsequence, which is exactly what is asked for. BTW: you might ask: is computing subsequences the most efficient way to do this? The MC answers: certainly not, but who cares? It is all about tokens.

The best solutions of Moritz Knüsel and Simon Yuan are variants of this idea, which are more economic about tokens. Here is for example Simon's solution which came in third with 26 tokens:

oddSum a b c d e = sum $ subsequences (sort [a,b,c,d,e]) !! 21

Using the infix operator `$`

is a useful trick to save tokens (and sometimes also to improve readability). It is nothing else than function application (`f $ x = f x`

), however, the low precedence of this infix operator helps to avoid parentheses: instead of `f (g x)`

we can write `f $ g x`

(or `f $ g !! i`

instead of `f (g !! i)`

).

Let's look at two other 27 tokens solutions. The first one was found by the two students Börge Scheel and Roger Baumgartner. The second is the MC's best solution.

oddSum a b c d e = sum $ map head $ chunksOf 2 $ sort [a, b, c, d, e] oddSum a b c d e = sum $ scanr (-) 0 $ sort [a, b, c, d, e]

The first solution works by sorting and then using the library function `chunksOf`

, which splits a list into shorter ones (of length at most 2 here as indicated by the passed argument). We have `chunksOf 2 [1,2,3,4,5] = [[1,2],[3,4],[5]]`

. The next applied function is the important `map`

function, which we will see again in the course. It applies its first function argument (yes, in Haskell functions can be passed around as arguments—map is called a higher-order function because it takes a function as an argument; more on higher-order functions in two weeks in the lecture) to every element of the list passed as its second argument. For example: `map head [[1,2],[3,4],[5]] = [head [1,2], head [3,4], head [5]] = [1,3,5]`

.

The second solution uses an even more complicated higher-order function than `map`

: `scanr`

. In our example, it works as follows: `scanr (-) 0 [1,2,3,4,5]`

becomes

[1 - (2 - (3 - (4 - (5 - 0))) , 2 - (3 - (4 - (5 - 0))) , 3 - (4 - (5 - 0)) , 4 - (5 - 0) , 5 - 0 , 0] =

or written differently

[1 - 2 + 3 - 4 + 5 - 0 , 2 - 3 + 4 - 5 + 0 , 3 - 4 + 5 - 0 , 4 - 5 + 0 , 5 - 0 , 0] =

Reading this column-wise, we observe that after `sum`

ming this list the elements in the even positions (2 and 4) cancel out, whereas the elements in the odd positions (1, 3, and 5) survive this telescoping.

**Side remark 2:** When writing this explanation, the MC noticed that it is possible to use a very similar function `scanr1 (-)`

instead of `scanr (-) 0`

to save one token. But unfortunately, we are past the deadline now.

All these tricks are nice, but they are still about 10 tokens away from the two best solutions. The latter came up with a way to avoid writing the list `[a,b,c,d,e]`

or even the five arguments to `oddSum`

. Here is Matthias Brun's winning entry with all the imports and a nice comment describing how the solution works:

import Data.List (sort, subsequences) import Text.Printf (printf) import Data.Composition ((.::.)) {- - This solution works by computing the power set (subsequences) of the sorted - list containing the arguments. The 22nd element (i.e. index 21) of the power - set (as generated by 'subsequences') is always the one containing the - minimum, median and maximum (for a sorted list of 5 values), so we take its - sum. The 'read .::. printf "[%d,%d,%d,%d,%d]"' part simply reads in the - arguments and transforms them into a list in a way that uses few tokens. -} oddSum = sum . (!! 21) . subsequences . sort . read .::. printf "[%d,%d,%d,%d,%d]"

The MC is amazed. This is pointfree functional programming at its best. (The pointfree style is often critisized for obfuscating the program. One particular problem is that it might be hard to see how many arguments a function takes. But it is the best way to save tokens. And in this, case the solution is fairly easy to read.) To the above explanation, the MC only want to add some comments on the operators `.`

and the funny `.::.`

, which for some reason makes the MC think of Space Invaders. Both denote function composition. We have:

(f . g) x1 = f (g x1) (f .::. g) x1 x2 x3 x4 x5 = f (g x1 x2 x3 x4 x5)

Hence, the above solution is equivalent to:

oddSum a b c d e = sum ((!! 21) (subsequences (sort (read (printf "[%d,%d,%d,%d,%d]" a b c d e)))))

The `sum ((!! 21) (subsequences (sort ...)))`

part looks familiar. We've seen the function `read`

, which converts `String`

s to values of different types (or complains with a parse error) in the first exercise class. The type of `printf`

is to complex to explain at this point; however when being used the function behaves just as its homonymous C ancestor (i.e., the number of arguments it takes depends on the formatting string).

The second best solution by Loris Reiff is quite similar (also using `printf`

and `.::.`

), except that it uses a slightly less token-efficient way to extract the first, middle, and last argument from a five-element list.

The MC could not resist the temptation to look at some incorrect submissions: a glimpse into the abyss in a sense. There, one meets participants who tried to sacrifice everything for tokens: even correctness. Some omitted the sorting or did it half-heartedly:

oddSum a b c d e = head xs + last xs + xs!!2 where xs = [a, b, c, d, e] oddSum a b c d e = c + min a (min b (min c (min d e))) + max a (max b (max c (max d e)))

Others tried out mysterious ways:

oddSum a b c d e = l!!0 + l!!2 + l!!4 where l = [ x | x <- [1..], elem x [a,b,c,d,e]] oddSum a b c d e = l!!0 + l!!2 + l!!4 where l = [ x | x <- [-1000..], elem x [a,b,c,d,e]]

Fortunately, all of them found the path of the righteous (wo)man in the end.

And in case you wonder: the longest correct solution has 302 tokens and is, well, a bit repetitive.

min2 a b c d e | minimum [a,b,c,d,e] == a = minimum [b,c,d,e] | minimum [a,b,c,d,e] == b = minimum [a,c,d,e] | minimum [a,b,c,d,e] == c = minimum [a,b,d,e] | minimum [a,b,c,d,e] == d = minimum [a,b,c,e] | minimum [a,b,c,d,e] == e = minimum [a,b,c,d] max2 a b c d e | maximum [a,b,c,d,e] == a = maximum [b,c,d,e] | maximum [a,b,c,d,e] == b = maximum [a,c,d,e] | maximum [a,b,c,d,e] == c = maximum [a,b,d,e] | maximum [a,b,c,d,e] == d = maximum [a,b,c,e] | maximum [a,b,c,d,e] == e = maximum [a,b,c,d] oddSum a b c d e = a + b + c + d + e - min2 a b c d e - max2 a b c d e

Wrapping up: what have we learned this week?

- Conciseness should not be mistaken for understandability or efficiency. But conciseness is somewhat easier to measure than efficiency, and a lot easier to measure than understandability. That is why it is the perfect criterion for the competition. (But we'll come to efficiency eventually.)
- Tokens are not characters. Kill needless parentheses. Keep useful spaces.
- Use lists to handle many elements uniformly.
- Sometimes it is helpful to structure your program in subtasks. Here: assemble a list, sort, and extract the result. Optimize each subtask separately.

### Task of week 2

Lists have still not been spotted in the lecture so far. So let's do some number theory instead.

The *digital product* of a number `n`

is the product of its
(base `10`

) digits. The computation of the digital product can be iterated until a single digit is obtained. The single digit is called the (*multiplicative*) *digital root*, whereas the number of iterations needed is called the (*multiplicative*) *persistence*.

For example:

999 has digital product 9 * 9 * 9 = 729 729 has digital product 7 * 2 * 9 = 126 126 has digital product 1 * 2 * 6 = 12 12 has digital product 1 * 2 = 2

Thus the digital root of `999`

is `2`

and the persistence is `4`

(as we had to compute four digital products).

Your task is to implement the function `persistenceRoot :: Integer -> (Integer, Integer)`

that computes both the persistence (first component) and the digital root (second component). E.g., `persistenceRoot 999 = (4,2)`

. (For a change we want to use Haskell's `Integer`

type of unbounded integer numbers instead of the machine integers `Int`

.)

As last week, the MC is looking for the shortest solution token-wise. (Next week we will move to some other criterion.) His best attempt needs 30 tokens. Just beat it!

### Results of week 2

Here is the overdue top 10 for week 2. This week 66 students have submitted a solution.

Top 10 of week 2 | |||
---|---|---|---|

# | name | #tokens | points |

1. | Mike Boss | 19 | 10 |

Matthias Brun | 19 | 10 | |

Chris Buob | 19 | 10 | |

Gabriel Giamarchi | 19 | 10 | |

Moritz Knüsel 🎩 | 19 | 10 | |

Simon Yuan 📐 | 19 | 10 | |

7. | Tynan Richards | 21 | 4 |

8. | Roger Baumgartner | 24 | 3 |

9. | Loris Reiff 🏆 | 27 | 2 |

Antonio Russo | 27 | 2 |

You may wonder about the emojis used above. The MC will explain. But first let us look at a solution that reflects the state of the lecture at the submission date. It was submitted by the still anonymous professor with the comment "I didn’t try to be clever." and counts 57 tokens.

persistenceRoot x = aux 0 x aux n x | x < 10 = (n,x) | otherwise = aux (n+1) (dp x) dp x | x < 10 = x | otherwise = (dp (x `div` 10)) * (x `mod` 10)

Without trying to be clever, there are some obvious further token-optimizations possible in the above solution. (E.g., what have we learned last week about needless parentheses?) Altogether this might save around 10–15 tokens, which would be still miles away from the top 10.

To get into the top 10, we should rather try to be clever. Being clever for the whole problem at once may cause some headaches, so lets
look at a subproblem first: computing the digital product (i.e., what the above `dp`

function computes). Using `div`

and `mod`

is a natural way to approach this subproblem. Another natural approach to get the digits of a number `x`

is
to print the number: `show x`

. Well, almost. This only gives one a list of `Char`

s, which we would want to convert to a
list of `Integer`

s. Unfortunately, it seems that there is no function to do this conversion directly, i.e., we need to take the
detour over `Int`

s: `map toInteger (map digitToInt (show x))`

. Finally, to compute the product of the digits we can
conveniently use the `product`

function. Making all of this pointfree, we obtain the digital product function with 9 tokens:

product . map toInteger . map digitToInt . show

This expression or a slight variation of it occurs in solutions submitted by Roger Baumgartner and Antonio Russo as well as in MC's own 30 tokens solution. It is good, but not exactly the stuff dreams are made from. The stuff dreams are made from for this exercise could be found on Hackage in the Data.Digits module: the `digits`

function gives one the digits of a number with respect to a given base. Hence, one digital product costs only 4 tokens as noticed by everyone else from the top 10:

product . digits 10

But one digital product is not enough. We need to iterate it and count the number of iterations until we reach a single digits.

Until, hmm `until`

... Here is the approach of the MC:

persistenceRoot = flip curry 0 $ until ((<10) . snd) $ (+1) *** product . map read . map return . show

Or equivalently written in a bit saner and a bit less pointfree fashion:

persistenceRoot x = until ((<10) . snd) ((+1) *** product . map read . map return . show) (0, x)

The function `until`

is another perfect match for this exercise: `until test f x`

computes the first `y = f (... f (f x) ...)`

for which `test y`

is satisfied. The MC is a bit embarrassed to have wasted 5 tokens on the digital product and 3 tokens on the `(+1)`

functions, which exists in the `Prelude`

under the name `succ`

.

Tynan Richards was more careful and used a bunch of funky functions from somewhat exotic modules. And he wrote a nice explanation that speaks for itself. Interestingly, Tynan's solution is actually recursive, i.e., he did **not** use a combinator (a la `until`

) to iterate digital products.

-- The online IDE fails to import these packages import Data.Digits (digits) import Data.Tuple.Update (upd1) import Data.Tuple.Extra (dupe, first) import Data.Bool.Extras (bool) import GHC.Integer (leInteger) {- - The pointfree style makes it hard to see exactly what this function does, so - I'll briefly try to explain... - 'bool' is a function that works like an if-then-else statement. It checks the - value of 'leInteger 10', which is equivalent to '(<=) 10'. If that condition - is false, then the input is smaller than 10 and we have the digital root, as - well as persistence 0, so we return the value calculated by 'upd1 0 . dupe'. - 'dupe x' gives the tuple (x, x), 'upd1 0 .' sets the first value of that - tuple to 0, giving (0, x). - If the input is greater than 10, we instead return the value calculated by - 'first succ . persistenceRoot . product . digits 10', which recursively - calculates the digital product ('persistenceRoot . product . digits 10'), - then adds one to the first element of the tuple ('first succ') when it - returns. - The <*> operator applies the function's argument to each of the expressions - of bool, which enables the complete lack of parentheses. -} persistenceRoot :: Integer -> (Integer, Integer) persistenceRoot = bool . upd1 0 . dupe <*> first succ . persistenceRoot . product . digits 10 <*> leInteger 10

Lets finally look at the winning solutions. Five of the six were slight variations of the following one by Simon Yuan. Again, it is beautifully documented.

import Data.List import Data.Bifunctor import Data.Digits --19 Token persistenceRoot :: Integer -> (Integer, Integer) {- The code explained: "product . digits 10" calculates the digital product (lets call this function dp). digits 10 n is evaluated to a Integral list, with the digits (in base 10) of n as its elements. product then takes to product of all the elements in that list. "iterate dp n" is evaluated to [n, dp n, dp (dp n), ...] This list contains the input n as well as all the intermediate steps. "break (<10)" splits the list into 2, it is a tuple of 2 lists. The breakpoint is at the first element for which (<10) is true. As soon as an element x is less than 10 (x is the first 1 digit number), x will be equal to dp x. Hence x is the digital root. "bimap genericLength head (a,b)" is evaluated to (length a, head b) The length of a is how many intermediate steps it took to reach a 1 digit number (= persistence). The head of b (or any element of b) is the digital root. To save 2 tokens, make the function pointfree by omitting the input twice. -} persistenceRoot = bimap genericLength head . break (<10) . iterate (product . digits 10)

Above `genericLength`

is used instead of `length`

to obtain an `Integer`

rather than an `Int`

. The `iterate`

function deserves special attention: yes `iterate f x`

just iterates the function `f`

as in `[x, f x, f (f x), ...]`

. Remarkably, the iteration never stops, i.e., the output of `iterate f x`

is unconditionally an infinite list (more on this in a few weeks in the course). The lazy evaluation strategy of Haskell saves us from waiting infinitely long for it to finish. In the following computation we only need to observe a finite prefix of the infinite list.

The remaining 19 token solution by Gabriel Giamarchi is somewhat different:

import Data.Digits import Data.Tuple {- - The idea is to avoid costly (in tokens) termination criteria in recursive-functions. - We use an easily seen property, namely that the digital persistence of x is smaller or equal to x (Except for zero !!). - This allows to write the following function, where we zip from the right so that the minimum (lexicographical probably) selects the correct tuple. - Luckily this also works for zero :-) -} persistenceRoot :: Integer -> (Integer, Integer) persistenceRoot x = swap $ minimum $ iterate (product . digits 10) x `zip` enumFromTo 0 x

This one beautifully relies on a mathematical property of the persistence (to prove the above "easily seen" property, prove first that the digital product of a number is smaller that the number for numbers > 10) and the default lexicographic ordering (i.e., `(a,b) ≤ (c,d) = a < c || a = c && b ≤ d`

) on pairs in Haskell. Moreover, it is not even pointfree, which usually hints that it is possible to beat it in terms of tokens. And indeed, here is MC's rewrite of Gabriel's solution that saves 1 token:

import Data.Digits import Data.Tuple import Control.Arrow import Control.Monad persistenceRoot = swap <<< minimum <<< zip `liftM2` iterate (product . digits 10) `arr` enumFromTo 0

Here, `<<<`

is another operator for composition `.`

(with precedence 1, whereas `.`

has 9) and `arr`

is another operator for application `$`

(with precedence 9 as any function written in backquotes, whereas `$`

has 0). Finally, the general function `liftM2`

, whose type cannot be explained at this point, behaves in this particular instance as `liftM2 z f g x = z (f x) (g x)`

.

Great, now to the emojis. They can be considered as special awards for a distinguished achievement in a particular exercise. The precise meaning and value (e.g., in terms of Schoggi) of these awards is unclear yet (even to the MC), but hey, now you have a nice emoji stuck to your name.

First of all, Simon Yuan receives the mathematician award 📐. One of his early submissions read `... . take 100000 . iterate ...`

and he wrote lengthy justifications why this is safe:

If you are interested to read, I have also done some research ( you really dont have to read this ;) ): It is conjectured that 11 is the max persistence for base 10. 277777788888899 is the first number with persistence 11, and they have tested it up to 10^233. I actually tried to proof that 100000 is way more than enough by proving an upper limit directly (i.e. I tried to prove the conjecture (before I knew it was an conjecture) but couldnt really get anywhere. Then I tried to come up with a lower bound of the number of digits needed for it to have persistence at least p. ... [something about the number of atoms in the universe] ...

The MC had advised Simon not to use this program and not to rely on the size of the universe. Clearly, we want to write programs (especially in this competition) that will work forever and under all circumstances. And as we know the universe is expanding. Incidentally, taking x first elements of that list when computing the persistence of x (instead of the fixed 100000 first elements) will always suffice—this was also exploited in Gabriel's solution.

The second award goes to Moritz Knüsel. He is a true black hat 🎩 as his 13 token solution demonstrates (comments: his, censorship: MC's).

import System.Process import System.IO.Unsafe import Text.Printf {- If this isn't allowed, please disregard this submission -} persistenceRoot :: Integer -> (Integer, Integer) {- very slow, gets killed after completing the last test, also using ghci incurs an extra token, because it prints extra stuff unless told not to persistenceRoot = read . unsafePerformIO . readCreateProcess (shell "ghci -v0") . printf "\ \import Data.List\nimport Control.Arrow\n\ \genericLength *** head $ span (>=10) $ iterate (product <<< fmap read <<< fmap return <<< show) %d\n" -} -- this goes faster, but kinda defeats the purpose of a -- haskell programming exercise persistenceRoot = read . unsafePerformIO . readProcess "bc" [] . printf "\ \define digprod (x) {\n\ \if (x==0) return 1\n\ \return (x %% 10)*digprod(x/10)\n\ \}\n\ \define droot (x) {\n\ \if (x >= 10) return droot(digprod(x))\n\ \return x\n\ \}\n\ \define drootcnt (x) {\n\ \if (x >= 10) return 1+drootcnt(digprod(x))\n\ \return 0\n\ \}\n\ \define void printit(x) {\n\ \print \"(\", drootcnt(x), \",\" ,droot(x),\")\n\"\ \}\n\ \printit(%d)\n"

In principle, Moritz would have won this exercise, if it was not for the censored "back door" he used. To Moritz' defense: this was not explicitly prohibited in the rules (but of course in the unspoken rules of the competition) and Moritz was already regretting his usage of such an evil and unportable function as his first comment indicates. *Challenge for everybody:* can you write a solution in such a style (i.e., using an external interpreter) without using the back door? For this to work, the interpreter would need to be pure (i.e., a side effect free calculator) and expressive enough to computer the digital root and persistence. If something like that is possible, the MC has to redefine the notion of tokens to take the length of a literal string into account.

The last award goes to Loris Reiff. He receives the fairness award 🏆 not for a solution he submitted (his was rather unspectacular), but rather for a solution that he has not submitted:

Btw, today I was looking for solutions online for some inspiration for next time. I found one on Rosetta Code which only uses 21 tokens. This is obviously not a submission, since copying is no fun ;) [and I've already submitted anyway] but it might be something for the blog (if no one has submitted this solution or a better one). 21 m=(g*h).s(>9).i(p.d1) -- slightly adapted from https://rosettacode.org/wiki/Digital_root/Multiplicative_digital_root#Haskell -- import Data.List import Data.Digits import Control.Arrow mpmdr :: Integer -> (Integer, Integer) mpmdr = (genericLength *** head) . span (> 9) . iterate (product . digits 10)

The MC feels guilty. His task is to come up with exercises that are original enough not to be on Rosetta code. It is surprising that even the type signature matches. Fortunately, the solution there is not token-optimal, yet it is uncomfortably close to the the optimum. The MC promises to research such "related work" better next time.

Lessons of the week, apart from that the MC should do his related work research properly, are:

- To get to the top 10 one typically has to try to be clever.
- Walk down the right back alley on Hackage and you can find anything.
- Include a comment describing your implementation to make MC's job easy and the MC happy. But be careful: the comment might end up on a public webpage.
- The definition of tokens should be reconsidered with regard to literal strings.

### Task of week 3

Yay, finally some lists in the lecture!

Here is the task of week 3: Given a finite directed graph G and a subset of its vertices S, define the function that computes the minimal elements in S with respect to G. An element n is *minimal* in S with respect to G if there is **no** other element m in S, such that there exists a path from m to n in G (note that only the start point m and end point n of such a path are required to be in S).

We represent directed graphs by their adjacency matrix using the type `[(Int,[Int])]`

.
For example, the entry `(1,[2,3,4])`

in such an adjacency matrix means that vertex `1`

has outgoing edges to the
vertices `2`

, `3`

, and `4`

. And the entry `(2,[])`

means that `2`

has no outgoing edge (to the same effect we could have just omitted that entry). You may assume that for each vertex there is at most one entry in the adjacency matrix. Moreover, we are using lists `[Int]`

to represent sets of verdices.

So we are looking for a function of type ` graphMin :: [(Int,[Int])] -> [Int] -> [Int]`

that performs the computation described above. The function may return the minimal elements in any order (and even with duplicates).
For example:

import Data.List sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,1,2,3,4] = [0] sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [1,2,3,4] = [1,2] sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [1,3,4] = [1,4] sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,3,4] = [0] sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,1,2,3,4,5,6] = [0,5,6] sort $ nub $ graphMin [] [0,1,2,3,4] = [0,1,2,3,4] sort $ nub $ graphMin [(0,[1,2]),(1,[0])] [1,2,3] = [1,3] sort $ nub $ graphMin [(0,[1,2]),(1,[0])] [1,0,3] = [3]

The evaluation criterion is time efficiency. Your function will be tested on large graphs, small graphs, dense graphs, sparse graphs, short lists, and longer lists (but probably not too long lists). The solution of the MC is fairly unoptimized. It can handle lists of length up to few hundreds elements for larger graphs (100000 edges) in a few seconds. There is some room for improvement.

**Hint:** `:set +s`

in GHCi is useful to obtain some timing and memory measurements.

### Results of week 3

This week there were only 26 student submissions (plus one by the MC). The MC is slightly disappointed. A decrease of 30% compared to the previous week is ok; a decrease of 60% compared to the previous week indicates that the exercise was simply too hard or not perceived as interesting. And this even though the idea behind this competition problem originated from a small "real-world" subproblem that the MC's alter ego has encountered in his recent (not yet published) research: compute the minimal elements of a set with respect to a preorder.

Additionally, the upcoming quiz and the associated preparations can be presumably blamed as well. (Although not for a 100% decrease in professorial participation. The professors forfeited to the so far absolutely dominant student performance.)

The MC hopes for at least a non-decrease in participation next week. Even if you have close to 0 points at this point, it is still possible to get into the top 10 of the semester, although it will probably be hard to get into the top 5.

Anyway, enough rambling. Let's get down to business.

Top ≈10 of week 3 | |||
---|---|---|---|

# | name | points | |

1. | Moritz Knüsel | 10 | |

1+ε. | Miro Haller | ➖🧟 | |

2. | Mike Boss | 9 | |

Simon Yuan | 9 | ||

4. | Roger Baumgartner | 7 | |

5. | Loris Reiff | 6 | |

6. | Matthias Brun | 5 | |

Antonio Russo | ➖5 | ||

Zur Schmaria | 5 | ||

9. | Noah Berner | 2 | |

Jorel Elmiger | ➖2 | ||

Tynan Richards | 2 |

How are these results coming together? Through a sequence of increasingly harder tests. We start with a soundness test. For that purpose the MC uses his favorite random testing library: QuickCheck. Here is a ghci script that the MC ran on each of the submissions, after briefly ensuring that the submissions won't attempt to erase his file system (please don't do this: it would cost you a zillion competition points, but also earn you a 🎩).

:m + Test.QuickCheck Data.List let getAll xs x = nub $ concat [zs | (y,zs) <- xs, x == y] let dom = map (abs . fst) let union xs = map (\x -> (x, filter (/= x) $ map abs $ getAll xs x)) $ dom xs let test sol = quickCheckWith stdArgs { maxSize = 10, maxSuccess = 3000 } (\xs ys0 -> let ys = nub $ map abs ys0 in let g = union xs in (sort $ nub $ graphMinSol g ys) == (sort $ nub $ sol g ys)) :set +s test GraphMin.graphMin

The MC should explain the quickCheckWith invokation. QuickCheck will execute its function argument in this invokation on 3000 randomly generated integer lists ys0 and graphs xs of length at most 10 (also all randomly chosen integers in those lists are between -10 and 10). To ensure that the randomly generated graphs satisfy the promised property of having at most one adjacency matrix entry for each vertex, the MC used the `union`

function defined above. For each of the 3000 tests, the output of MC's unoptimized solution `graphMinSol`

(shown below) is compared (modulo sorting and duplicates) to the output of the contestant's solution `GraphMin.graphMin`

.

import Data.List path g = go [] where go vis x y | x == y = True | otherwise = case lookup x g of Nothing -> False Just zs -> let vis' = x : vis in or [go vis' z y | z <- zs \\ vis'] graphMinSol :: [(Int,[Int])] -> [Int] -> [Int] graphMinSol g qs = let ps = nub qs in filter (\p -> all (\q -> p == q || not (path g q p)) ps) ps

Only 17 participants (including the MC) passed this test. If your name does not occur somewhere below, try running QuickCheck on your solution to see the actual bugs. Three of the remaining participants would have failed the test, if the MC had not removed the negative numbers from the random inputs (via `abs`

). Since nothing in the exercise stated that all numbers will be non-negative, these solution could be directly disqualified, in principle. In an unexpected bout of mercy, the MC decided to see where these contestants will end up and potentially penalize them with a point deduction later. As a reminder, let's put a "heavy minus" emoji next to their names:
Antonio Russo ➖, Jorel Elmiger ➖, Miro Haller ➖.

The whole truth about Miro Haller's submission is even more shocking. He spent great effort on making his solution very efficient. Yet a minor bug (even on positive numbers) compromised his whole work. When the MC reported the bug to Miro after the submission deadline, he quickly came up with a fix that still failed on negative numbers, but seemed to be correct otherwise. It is clear that the MC cannot give points to such a late submission, but he decided to at least evaluate this submission's efficiency. In other words: here is Miro's well deserved zombie emoji 🧟.

Something similar happened also to Matthias Brun's efficient solution. However, he was suspecting a bug and took precautions. He submitted an alternative "naive" solution and asked the MC to grade that one. A wise decision to keep some points.

OK, next test. First, here are three particular (families of) graphs:

k i = map (\k -> (k, [1..k-1] ++ [k+1..i])) [1..i] -- komplete [*] c i = map (\k -> (k, [(k + 1) `mod` i])) [0..i-1] -- cycle l i = map (\k -> (k, [k + 1])) [1..i] -- line

[*] Wikipedia is not sure whether complete graphs are typically abbreviated with a "k", because of the German word *komplett* (which seems to be a false friend) or the mathematician Kuratowski.

The MC then tested the following three graphs.

let i = 20 in (sort $ nub $ GraphMin.graphMin (k i) [1..i]) == [] let i = 300 in (sort $ nub $ GraphMin.graphMin (c i) [0..i-1]) == [] let i = 300 in (sort $ nub $ GraphMin.graphMin (l i) [1..i]) == [1]

And here are the results (in no particular order).

TIMEOUT Dan Kluser 0.01 0.01 0.21 Loris Reiff 0.05 0.10 0.05 Roger Baumgartner 0.01 1.48 4.05 Sandro Panighetti 0.01 0.01 0.01 Moritz Knüsel TIMEOUT Patrick Ziegler TIMEOUT Silvan Niederer 0.01 0.01 0.01 Mike Boss 0.01 0.01 0.02 Zur Shmaria 0.02 4.35 4.32 Noah Berner 0.02 5.32 2.61 Tynan Richards 0.01 0.04 0.07 Antonio Russo ➖ 0.02 5.32 1.92 Jorel Elmiger ➖ 0.01 0.28 0.56 MC 0.02 0.34 0.14 Matthias Brun 0.01 0.01 0.01 Miro Haller 🧟 0.06 0.01 0.01 Simon Yuan

Three timeouts and a bit fluctuation otherwise. Not really conclusive. But you already see three solutions that are equally fast on all graphs: Moritz Knüsel, Mike Boss, and our dead man walking. The timeouts are eliminated before the next round of course.

The MC also played around with some permutations and subsequences of the above arguments. The test `let i = 20 in (sort $ nub $ GraphMin.graphMin (k i) [i,i-2..0]) == []`

has revealed a bug in Sandro Panighetti's solution, which managed to survive the QuickCheck tests. A good reminder that testing is good, but proving is better. The MC decided at this point that the remaining 11 contestants will all end up in the top 10. So the next tests are about the final placing.

Here is the next challenge: lines with many complete subgraphs along them.

:m + Data.List let getAll xs x = nub $ concat [zs | (y,zs) <- xs, x == y] let dom = map (abs . fst) let union xs = map (\x -> (x, filter (/= x) $ map abs $ getAll xs x)) $ dom xs let k i j = map (\k -> (k, [i..k-1] ++ [k+1..j])) [i..j] let l i = map (\k -> (k, [k + 1])) [1..i] :set +s let i = 100 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0..i]) == [] let i = 100 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,10..i]) == [0] let i = 100 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,20..i]) == [0] let i = 300 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0..i]) == [] let i = 300 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,10..i]) == [0] let i = 300 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,20..i]) == [0] let i = 600 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0..i]) == [] let i = 600 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,10..i]) == [0] let i = 600 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,20..i]) == [0] let i = 900 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0..i]) == [] let i = 900 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,10..i]) == [0] let i = 900 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i -> k i (i + 9)) [0,10..i-10])) [0,20..i]) == [0]And the corresponding results:

0.02 0.02 0.01 0.07 0.14 0.09 0.19 0.73 0.50 0.57 2.35 1.46 Loris Reiff 0.04 0.01 0.02 0.20 0.12 0.07 0.87 0.27 0.30 1.98 0.71 0.66 Roger Baumgartner 0.02 0.01 0.01 0.04 0.07 0.06 0.22 0.11 0.14 0.32 0.26 0.38 Moritz Knüsel 0.06 0.02 0.02 0.11 0.12 0.15 0.42 0.43 0.21 0.50 0.75 0.52 Mike Boss 0.03 0.02 0.02 0.13 0.09 0.10 0.61 0.25 0.47 1.52 0.68 0.99 Zur Shmaria 0.27 0.04 0.03 6.14 0.67 0.37 49.61 TIMEOUT Noah Berner 0.26 0.05 0.03 5.79 0.64 0.37 42.66 TIMEOUT Tynan Richards 0.03 0.02 0.02 0.12 0.11 0.12 0.48 0.36 0.43 0.90 0.58 0.50 Antonio Russo ➖ 0.37 0.02 0.02 9.44 0.14 0.09 TIMEOUT Jorel Elmiger ➖ TIMEOUT MC 0.07 0.02 0.01 0.79 0.15 0.11 5.47 0.78 0.52 17.09 2.27 1.35 Matthias Brun 0.06 0.01 0.02 0.06 0.06 0.05 0.23 0.23 0.22 0.48 0.48 0.33 Miro Haller 🧟 0.03 0.01 0.01 0.07 0.06 0.05 0.21 0.20 0.21 0.46 0.38 0.34 Simon Yuan

The MC is defeated. And with him three other contestants.

Next, we'll try something more intuitive and realistic. How about the prime numbers?

:m + Data.List Data.Hashable let multiples k b = [k * i | i <- [1..b`div`k + 1]] let g i = map (\k -> (k, multiples k i)) [1..i] let g1000 = g 1000 let g5000 = g 5000 let g10000 = g 10000 :set +s hash $ GraphMin.graphMin g1000 [2..length g1000] hash $ GraphMin.graphMin g5000 [2..length g5000] hash $ GraphMin.graphMin g10000 [2..length g10000]

Note, how the `hash`

function is used to avoid measuring the time needed to print the results. The minimal elements of the above graphs are exactly the prime numbers (up to the bound b). Perhaps this is the best algorithm for computing primes?

0.23 5.25 22.53 Loris Reiff 0.09 0.32 1.09 Roger Baumgartner 0.04 0.11 0.24 Moritz Knüsel 0.04 0.15 0.48 Mike Boss TIMEOUT Zur Shmaria TIMEOUT Antonio Russo ➖ TIMEOUT Matthias Brun 0.04 0.18 0.45 Miro Haller 🧟 0.08 0.42 1.87 Simon Yuan

Three students gone. Making the divisibility graph even bigger (e.g., `let g100000 = g 100000`

), while keeping the list size (`hash $ GraphMin.graphMin g100000 [2..10000]`

) also makes Loris Reiff's and Roger Baumgartner's solution time out.

Time for a final showdown. Let's go Big Data. In an intermediate submission Moritz Knüsel claims to have tested his solutions on graphs with 20 millions edges. The divisibility graph with 2 millions vertices, has roughly 31 millions edges.

:m + Data.List Data.Hashable let multiples k b = [k * i | i <- [1..b`div`k + 1]] g i = map (\k -> (k, multiples k i)) [1..i] let gx = g 2000000 :set +s hash $ GraphMin.graphMin gx [30000,29998..10000]

And here the final results:

13.84 Moritz Knüsel TIMEOUT Mike Boss 21.79 Miro Haller 🧟 TIMEOUT Simon Yuan

Pretty amazing, huh? Note that those huge graphs are processed without any parallelization on a usual laptop.

Finally, lets briefly sample a few of the best solutions. They are a bit long and the MC will (as usual) rely on the comments provided by the contestants.

Mike Boss uses Haskell's Data.Graph library:

module GraphMin where {- Grade this (for real now) - Longer first part is just the conversion to type Graph. Problems with non-existing vertices and renaming are fixed. - notInG: all endpoints of edges which are not in the graph - allOfG: all vertices from the graph (even ones with no outgoing edges) - sButNotG: vertices which are in the set but not in the graph => are minimum - sAndG: vertices in the set which are also in the graph - convertSet: fixes renaming - Second part is the actual algorithm: Deletes all vertices reachable from the vertices in the set which means they aren't minimums. -} import Data.Graph import Data.Maybe import qualified Data.IntSet as Set graphMin :: [(Int,[Int])] -> [Int] -> [Int] minimumInGraph :: Graph -> Set.IntSet -> Set.IntSet -> [Int] graphMin graph set = [x | (x, _, _) <- result] ++ (Set.toList sButNotG) where realSet = Set.fromList set edges = Set.fromList (concat $ map snd graph) vertices = Set.fromList (map fst graph) notInG = Set.difference edges vertices allOfG = Set.union vertices notInG sButNotG = realSet Set.\\ allOfG sAndG = realSet Set.\\ sButNotG correctedGraph = [(s, []) | s <- (Set.toList notInG)] ++ graph (builtGraph, v, k) = graphFromEdges (map (\(a, b) -> (a, a, b)) correctedGraph) convertSet = Set.map (fromJust . k) sAndG result = map v (minimumInGraph builtGraph convertSet convertSet) minimumInGraph graph set result | Set.null set = Set.toList result | Set.null result = [] | Set.size set == 1 = Set.toList (result Set.\\ reachFromS) | otherwise = minimumInGraph graph (Set.delete head set) (result Set.\\ reachFromS) where reachFromS = Set.delete head (Set.fromList (reachable graph (head :: Int))) head = Set.findMin set

Miro Haller's corrected solution has a nice complexity analysis, which the MC didn't attempt to verify. First Miro has to fix the negative number problems, anyway.

-- I use containers-0.5.10.2 (https://hackage.haskell.org/package/containers-0.5.10.2) module GraphMin where import Prelude hiding (fromList) import Data.IntMap as IntMap import Data.Sequence as Seq -- I use IntMap because they have (almost) constant access time I think. -- It depends on the number of bits the integers use. -- Also I rely on the fact that the nodes are numbered in a more or less -- ascending sequence of natural numbers graphMin :: [(Int,[Int])] -> [Int] -> [Int] graphMin graph subset = findUnreachable subset [] where -- O(n) graphNodes = [node | (node, xs) <- graph] -- O(n) maxNodeNr = maximum (graphNodes ++ subset) -- Visited array O(n) (I assume n = O(maxNodeNr)) visited = IntMap.fromList [(node, False) | node <- [0..maxNodeNr]] -- Convert out nodes lists to sequences O(n^2) graphSeq = [(node, Seq.fromList outNodes) | (node, outNodes) <- graph] -- Out edges O(n) outEdges = IntMap.fromList graphSeq -- True if node is in subset -- O(n) isInSubset = IntMap.fromList [(node, True) | node <- subset] -- Array with for each node nodes from which it is reachable -- O(n) reachableFromNodesInit = IntMap.fromList [(node, []) | node <- [0..maxNodeNr]] -- DFS like algorith, write from node to reachable nodes -- O(n^2 * log(n) ) dfsIter [] reachableFromNodes = reachableFromNodes dfsIter (subsetNode:subsetRest) reachableFromNodes = dfsIter subsetRest newReachableFromNodes where -- O(1) connectedNodes = findWithDefault Seq.Empty subsetNode outEdges -- O(n) newReachableFromNodes = dfsFindReachable connectedNodes reachableFromNodes visited where dfsFindReachable Seq.Empty reachableFromNodes visitedLoc = reachableFromNodes dfsFindReachable todoNodesSeq reachableFromNodes visitedLoc -- Cases checking O(1) | findWithDefault False nextNode isInSubset = dfsFindReachable nodesRest extendedReachableFromNodes visitedNew | findWithDefault False nextNode visitedLoc = dfsFindReachable nodesRest reachableFromNodes visitedNew | otherwise = dfsFindReachable extendedListRest reachableFromNodes visitedNew where updateFn key x = True visitedNew = adjustWithKey updateFn nextNode visitedLoc -- Both O(1) because it is the first entry nextNode = Seq.index todoNodesSeq 0 nodesRest = Seq.deleteAt 0 todoNodesSeq -- O(1) extendedReachableFromNodes = IntMap.adjust (subsetNode :) nextNode reachableFromNodes -- O( log( min{n1, n2} ) ) where n1,n2 are the lengths of the sequences extendedListRest = (findWithDefault Seq.Empty nextNode outEdges) >< nodesRest -- Comp. stands above reachableFromNodes = dfsIter subset reachableFromNodesInit -- Find all nodes that are at most reachable from itself (through a path through nodes -- which are not in the subset S) -- O(n^2) findUnreachable [] unreachableNodes = unreachableNodes findUnreachable (subsetNode:subsetRest) unreachableNodes | hasOtherNode (findWithDefault [] subsetNode reachableFromNodes) = findUnreachable subsetRest unreachableNodes | otherwise = findUnreachable subsetRest (subsetNode : unreachableNodes) where -- Check if a node is reachable from another node -- O(n) hasOtherNode [] = False hasOtherNode (otherNode:listRest) = if subsetNode == otherNode then hasOtherNode listRest else True

Finally, Moritz Knüsel has apparently found the right balance between using lazy and eager data structures. And it nicely handles edge cases separately.

{-# LANGUAGE BangPatterns #-} module GraphMin where import Data.List import qualified Data.IntSet import qualified Data.IntMap.Lazy -- final submission type LG = [(Int,[Int])] graphMin :: LG -> [Int] -> [Int] graphMin g s = graphMin11 g s -- Basic Idea: -- from every node in S, we walk the graph and find all elements in S that are -- reachable from that particular Node, except the node itself, -- since a minimal element may still have paths coming back to itself -- the union of all nodes discovered like this is then subtracted from S, -- yielding only the nodes that are not reachable from any other -- -- This would obviously be way too slow, so, on a walk starting at a node v, -- we save all nodes we've seen, except the ones that are on a path from v to v -- these nodes are saved in a map, together with v, -- so we can avoid walking that path again later -- This collection of nodes and the map with shortcuts is -- then passed along for the search starting at the next node in S -- because of stack overflow problems some strictness annotations -- have been added, which seemed to solve them -- shortcuts for all the datastructures and functions for accessing them type Set2 = Data.IntSet.IntSet ins2 = Data.IntSet.insert empty2 = Data.IntSet.empty toLst2 = Data.IntSet.toList fromLst2 = Data.IntSet.fromList del2 = Data.IntSet.delete isIn2 = Data.IntSet.member union2 = Data.IntSet.union size2 = Data.IntSet.size diff2 = Data.IntSet.difference type G2 = Data.IntMap.Lazy.IntMap [Int] lgToG2 = Data.IntMap.Lazy.fromList getEdges2 g v = Data.IntMap.Lazy.findWithDefault [] v g --- type GoesToMap2 = Data.IntMap.Lazy.IntMap Int emptyGTM2 = Data.IntMap.Lazy.empty memberGTM2 = Data.IntMap.Lazy.member getGTM2 = Data.IntMap.Lazy.findWithDefault 0 unionGTM2 = Data.IntMap.Lazy.union fromSetGTM2 = Data.IntMap.Lazy.fromSet mergeSetGTM2 gtm set v = unionGTM2 gtm (fromSetGTM2 (const v) set) -- recursively look at all nodes reachable from a given node, recording the path taken fun11 :: GoesToMap2 -> -- contains nodes that, in earlier searches, -- were found to loop back to where the search started Set2 -> -- contains all nodes that go back to the value this search started at, -- needed in case of loop back paths with cycles in them G2 -> -- the graph we're looking at Set2 -> -- the set S Set2 -> -- the set of points we've already visited. This set may -- contain nodes that are in lgtm, since these will only be -- removed later, so it's important to check lgtm first Set2 -> -- the nodes on the path we've come down Int -> -- the current node Int -> -- the node we started at (Set2,Set2) -- returns the set of nodes seen and a set of nodes that lead back to sval. -- If the current node is not on such a path, this will be empty fun11 gtm lgtm g s !seen pset v sval -- we've come back around, so we need to return the nodes we took to get here | sval == v && not (vInSeen) = (seen,vpset) -- we've come back around, but sval is already known not to be minimal | sval == v && vInSeen = (seen,empty2) -- the current node is on a path that leads to sval, so we need to save how we got here | mlGTMv = (seen, vpset) -- we've already visited this node at some point | vInSeen = (seen,empty2) -- the current node is in S, we can't go further | isIn2 v s = (vseen,empty2) -- the current node leads to an element in S, but we can skip going down the entire path -- since all other nodes on that path are not in S, we don't need to add all of them to seen | mGTMv = (ins2 gGTMv vseen,empty2) | otherwise = -- recurse on v's neighbors descend11 gtm lgtm g (getEdges2 g v) s vseen vpset sval where vInSeen = isIn2 v seen vpset = ins2 v pset vseen = ins2 v seen mGTMv = memberGTM2 v gtm mlGTMv = isIn2 v lgtm gGTMv = getGTM2 v gtm -- helper function for the recursion -- since, in the function above, we check lgtm before seen, -- we can avoid removing elements from seen here descend11 :: GoesToMap2 -> Set2 -> G2 -> [Int] -> Set2 -> Set2 -> Set2 -> Int -> (Set2,Set2) descend11 _ lgtm _ [] _ seen _ _ = (seen,lgtm) descend11 gtm lgtm g (e:es) s !seen pset sval = let (nseen,npset) = fun11 gtm lgtm g s (seen) pset e sval in descend11 gtm (union2 lgtm npset) g es s nseen pset sval -- searches on all elements in S -- if a search discovered paths that go back to its starting node, all nodes on such paths -- are taken out of seen (if we changed some guards around in fun11, this would not be necessary -- but some quick tests indicate it helps if s is large and g is dense) and added -- to the map gmhelp11 :: G2 -> [Int] -> Set2 -> Set2 -> GoesToMap2 -> Set2 gmhelp11 g [] s seen gtm = seen gmhelp11 g (e:es) s !seen gtm = let (nseen,npset) = descend11 gtm empty2 g (getEdges2 g e) s seen empty2 e in gmhelp11 g es s (diff2 nseen npset) (mergeSetGTM2 gtm npset e) -- transforms the graph and S into the datastructures used above, calls gmhelp11 and -- subtracts the result from S graphMin11 :: LG -> [Int] -> [Int] -- get trivial cases out of the way right away graphMin11 _ [] = [] graphMin11 _ [x] = [x] graphMin11 [] s = s graphMin11 g s = toLst2 $ diff2 s' (gmhelp11 (lgToG2 g) s s' empty2 emptyGTM2) where s' = fromLst2 s

Oh, and the heavy minuses translate into a one point deduction (truncated at 0).

What have we learned?

- Quickly check your program using
`<insert bad pun about a great library>`

. Seriously: Soundness is indispensable. Random testing does not ensure soundness, but helps eliminate silly mistakes. - Almost all
~~correct~~not obviously wrong solutions got into the top 10. - Remember, remember, the negative numbers.
- Using the library is good. Outsmarting the library is better.
- When the MC evaluates efficiency, he makes random, subjective choices. Don't try to guess what he will evaluate, but just be efficient in all cases.

### Task of week 4

A string `s`

is a subsequence of another string `t`

, if one can obtain `s`

by removing characters from `t`

. For example: The string `"foo"`

is a subsequence of `"fofo"`

, `"frohlocken"`

, and `"funktionale Programmierung is einfach toll"`

. It is not a subsequence of `"ofo"`

, `"FoO"`

, `"oooo"`

, and `"tofu"`

.

A lists of strings `xs`

is subsumed by another list of strings `ys`

, if for each string `s`

from `xs`

, there exists a string `t`

in `ys`

, such that `s`

is a subsequence of `t`

. Moreover, you may not use strings from `ys`

more than once (as witnesses for the above "exists").

Write a function `subsub :: [String] -> [String] -> Bool`

that checks whether its first argument is subsumed by its second argument.

All following examples should evaluate to `True`

:

subsub ["a", "ab", "abc"] ["false", "bab", "cdb", "cabAdAbc", "true"] subsub ["FM", "FP"] ["Ferrovia Mesolcinese", "Microsoft FrontPage"] subsub ["FP", "FM"] ["Ferrovia Mesolcinese", "Microsoft FrontPage"] subsub ["1+2=2"] ["12+12=24"] not $ subsub ["FM", "FP"] ["FMFP"] subsub ["FM", "FP"] ["FMFP", "FMFP"] subsub ["FM", "FM"] ["FM", "Ferrovia Mesolcinese"] not $ subsub ["FM", "FM"] ["FM"]

Evaluation criterion: efficiency. Your solution will be tested with short and long lists and short and long strings.

### Results of week 4

Top 10 of week 4 | |||
---|---|---|---|

# | name | ranking | points |

1. | ? | 0 | |

2. | ? | 0 | |

3. | ? | 0 | |

4. | Your name could be here | ? | 0 |

5. | ? | 0 | |

6. | ? | 0 | |

7. | ? | 0 | |

8. | ? | 0 | |

9. | ? | 0 | |

10. | ? | 0 |

### Task of week 5

The MC loves regular expressions. But sometimes he manages to insert sputid tysop in his text that cause a later regex search to fail. In this task, your goal is to develop a regex matcher that is immune against scrambled letters.

We consider the following algebraic datatype (wait for the lecture on Tuesday, where those will be introduced) of regular expressions that should be familiar from a Theoretical Computer Science course:

data RE = Zero -- Zero matches nothing | One -- One matches only the word [] | Atom Char -- Atom c matches only the word [c] | Plus RE RE -- Plus r s matches all words matched by r or s | Concat RE RE -- Concat r s matches all words w = u ++ v -- such that u is matched by r and v is matched by s | Star RE -- Star r matches [] and all words w = u ++ v -- such that u is matched by r and v by Star r

Hence, for a standard definition of `match :: RE -> String -> Bool`

we would have:

match (Star (Atom 'a' `Concat` Atom 'b')) "" match (Star (Atom 'a' `Concat` Atom 'b')) "ab" match (Star (Atom 'a' `Concat` Atom 'b')) "abab" match (Star (Atom 'a' `Concat` Atom 'b')) "ababababab" not $ match (Star (Atom 'a' `Concat` Atom 'b')) "a" not $ match (Star (Atom 'a' `Concat` Atom 'b')) "abba" not $ match (Star (Atom 'a' `Concat` Atom 'b')) "baba" not $ match (Star (Atom 'a' `Concat` Atom 'b')) "ababa"

However, here we are not interested in the standard `match`

, but rather in a `dyslexicMatch :: RE -> String -> Bool`

function that accepts a string `s`

exactly if there is a permutation `t`

of `t`

(of the same length, with the same multiplicities for duplicated characters) that would be accepted by the usual `match`

function.
For example, all the following should yield `True`

:

dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ab" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "abab" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ababababab" not $ dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "a" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "abba" -- consider "abab" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "baba" -- consider "abab" not $ dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ababa"

Evaluation criterion: feficiency. Your solution will be tested with short and long strings and simple and deeply nested regular expressions.

### Results of week 5

Top 10 of week 5 | |||
---|---|---|---|

# | name | ranking | points |

1. | ? | 0 | |

2. | ? | 0 | |

3. | ? | 0 | |

4. | or here | ? | 0 |

5. | ? | 0 | |

6. | ? | 0 | |

7. | ? | 0 | |

8. | ? | 0 | |

9. | ? | 0 | |

10. | ? | 0 |