Wettbewerb

Jedes Übungsblatt beinhaltet eine vom Master of Competition, Herrn Dr. MC „Hammer“ Blanchette, vom Co-Master of Competition, Herrn Co-MC Noschinski, oder vom Contra-Master of Competition, Herrn Contra-MC Hupel, sorgfältig ausgewählte Wettbewerbsaufgabe. Diese zählt so wie die anderen Hausaufgaben, wird aber zusätzlich als Teil eines semesterlangen Wettbewerbs bewertet. Die Bewertungskriterien variieren jedes Mal. Die besten 30 Lösungen bekommen je 30, 29, ..., 1 Punkte.

Jede Woche wird die Top 30 der Woche veröffentlicht und die Top 30 des Semesters aktualisiert. Die Lieblingslösungen des MC werden ins Internet gestellt. Am Semesterende werden die fünf besten Studenten mit geschmackvollen Trophäen geehrt. Es gibt hier also keine Bonuspunkte, dafür aber jede Menge Spaß und Ehre (und Trophäen)!

Wichtig: Wenn Sie nicht möchten, dass Ihr Name im Internet erscheint, können Sie entweder auf den Wettbewerb verzichten (ganz einfach, indem Sie die {-WETT-}...{-TTEW-} Tags weglassen) oder den MC anschreiben.

Gliederung:

Top 30 des Semesters
  Platz Wettbewerber(in)   Punkte
1.Maximilian Haslbeck365.333
2.Simon Roßkopf339.333
3.Daniel Stüwe334.333
4.Stefan Peter Dirix321.333
5.Hauke Brinkop304.333
6.Michael Schreier274.333
7.Atanas Mirchev250.000
8.Maximilian Kirchmeier232.500
9.Julian Biendarra223.333
10.Florian Dreier203.000
11.Alexander Weidinger196.667
12.Nikita Basargin181.333
13.Lyubomir Stoykov181.000
14.Alexander Christian Hefele150.000
15.Christian Ziegner148.000
16.Martin Mihaylov147.000
Moritz Sichert147.000
18.Ludwig Peuckert142.000
19.Alexandra Fritzen137.000
20.David Otter132.000
21.Lukas Michael Stumberg127.000
22.Clemens Jonischkeit112.000
23.Julia Kindelsberger97.000
24.Johannes Ismair87.000
25.Jens Wöhrle81.000
26.Kevin Yu78.000
Thomas Pettinger78.000
28.Thomas Zwickl71.000
29.Hannes Bibel67.000
30.Roland Schmid65.000

Die Ergebnisse der ersten Woche

Top 30 der Woche
  Platz Wettbewerber(in)   Token   Punkte
1. Clemens Jonischkeit 20 30
Lukas Michael Stumberg 20 30
Michael Schreier 20 30
Simon Roßkopf 20 30
5. Florian Dreier 25 26
Roland Schmid 25 26
Hauke Brinkop 25 26
Julien Schmidt 25 26
Maximilian Kirchmeier 25 26
10. David Otter 26 21
Stefan Peter Dirix 26 21
Atanas Mirchev 26 21
Dominik Gutermuth 26 21
Daniel Stüwe 26 21
Lars Wüstrich 26 21
Moritz Sichert 26 21
Christian Brosche 26 21
Michael Zellner 26 21
Fabian Weise 26 21
Daniel-Theodor Plop 26 21
Lyubomir Stoykov 26 21
Andrei Vlad 26 21
Jens Wöhrle 26 21
Lawrence Krug 26 21
Maximilian Haslbeck 26 21
Vladislav Stepa 26 21
Aleksander Umov 26 21
Stefan Kreisig 26 21
Michael Benedikt Schwarz 26 21
Martin Mihaylov 26 21

The Master of Competition is proud to report that over 200 contestants took part in the first week's challenge! Bravo!

First things first. The MC mentioned in the Aufgabentext that he had a 31-token solution. Here it is:

        f124 x y z = a + 2 * b + 4 * c
          where [a, b, c] = sort [x, y, z]

Decidedly, he didn't try very hard—indeed, he wouldn't have made it into the week's top 20, not even in the top 80! Asche auf sein Haupt! But to tell the truth, he didn't want to set the bar too high and demotivate the contestants. In particular, the coefficients 1, 2, 4 had been chosen because they exhibit nice mathematical properties, but the MC had made no attempt to use these properties.

How does this solution work? First, it builds the three item list [x, y, z] with the three arguments to the function in it. Lists in Haskell are written using a very natural notation. The list is sorted in increasing order, and the result is assigned to the list [a, b, c]. This is called pattern matching. Once this step is performed, a is the least element, b is the middle element, and c is the maximum element. Finally, the formula is computed. Notice how this solution is an almost direct Haskell rendering of the German prose of the Aufgabentext. Hence it is a good compromise between readability and conciseness.

Now let's move on to the shortest solution, with 20 tokens:

        f124 x y z = sum $ map maximum $ tail $ subsequences [x, y, z]

This solution is the work of a genius. It was submitted by four competitors, although the MC is not naive enough to believe that it was discovered independently by four geniuses. There's a saying that “great minds think alike,” but still. In any case, the solution is truly amazing. And all the functions involved are from the Haskell standard library.

It works as follows. First, the $ signs are nothing to be afraid of. They're just a cheap trick to get rid of some parenthesis pairs. At the cost of three more tokens, the above function could have been written as

        f124 x y z = sum (map maximum (tail (subsequences [x, y, z])))

The subsequences call computes the list of all subsequences of [x, y, z]. A subsequence of a list is a list that contains a subset of the elements of the original list in the same order. For [x, y, z] this means

        [[], [x], [y], [z], [x, y], [x, z], [y, z], [x, y, z]]

The tail call removes the first element, [], leaving a 7-element list. Then the map maximum computes, for each list in the list of lists, the maximum. Assuming x > z > y, after the maximum computation we would have

        [x, y, z, x, x, z, x]

Finally, the sum function computes

        x + y + z + x + x + z + x

How many x's do we have? Four. How many z's? Two. How many y's? One.

Here's the explanation from one of the four competitors who submitted this entry (presumably the genius who thought of it in the first place):

        {-
        Bildet die Potenzmeng, entfernt die fordere leere liste,
        wendet auf die teillisten x,y,z,xy,xz,yz,xyz jewails maximum an
        => Der groesste Wert gewinnt 3 vergleiche, der mittlere einen,
          dazu kommt jedes element noch ein mal
        => 4 mal das groesste, 2 mal das mittlere, 1 mal das kleine ent. a + 2b + 4c
          dann noch zusammenzaehlen. done.

        -}

(Incidentally, the genius's German orthography leaves much to be desired. Potenzmeng? Jewails? Today is a bad day for the German language. And don't come and tell the MC it's the neue Rechtschreibung! ;))

Let's move on. The next best solutions had 25 tokens:

        f124h u v = u + v * 2
        f124 x y z = foldr1 f124h $ sort [x, y, z]

Again, this solution uses some material that hasn't been shown in class yet. But that's fine for the competition, where everything goes. What's the trick here? First, there's the sort, which we now understand. So let's conceptually replace sort [x, y, z] by [a, b, c] in our minds and get rid of that annoying $ sign. This gives

        f124h u v = u + v * 2
        f124 x y z = foldr1 f124h [a, b, c]

Next, we have to explain that weird foldr1 call. The function will be explained in a couple of weeks (and you will then understand why its name sounds like the German word for “torture”), but for the moment it is sufficient to understand that it simply applies an operation (here, f124h) repeatedly on a list. So the above is nothing else than

        f124h u v = u + v * 2
        f124 x y z = f124h a (f124h b c)

Expanding the definition of the helper function f124h, we get

        f124 x y z = a + 2 * (b + 2 * c)

which is correct. The solution is clearly inspired by the base-2 version of the Horner rule. [Here the MC would insert a link to an appropriate web site, e.g., Wikipedia, but he's working offline and doesn't know a relevant URL by heart. Please look it up for yourself.]

Another version of this solution inlines the helper function using a so-called λ-abstraction (λ = lambda). It also needs 25 tokens:

        f124 x y z = (\u v -> u + 2 * v) `foldr1` sort [x, y, z]

The λ-abstraction is really nothing more than an anonymous function. It is introduced by a backslash (\), an ASCII-friendly approximation of λ, then come the arguments (u and v), then an arrow, and finally the body of the function. Mathematically inclined readers can think of it as being a perverse computer scientist notation for (u, v) ↦ u + 2v.

The other oddity is the `foldr1` token. A two-argument function f is normally invoked by passing the arguments after the function name, as in f x y, but the alternative syntax x `f` y is also allowed. Im Endeffekt, the backticks (`) transform a prefix operator into an infix operator. The infix operator has a fairly low precedence, so this can be used to save some parentheses. Once we name the anonymous function and get rid of the backtick syntax, we fall back on the other 25-token solution. Which goes to show that these tricks don't buy you much!

The 26-token solutions were also rather exciting. They came in two variants:

        f124 x y z = (max x y + max x z + max y z) * 2 + min x y `min` z
        f124 x y z = (x + y + z + x `max` y `max` z) * 2 - x `min` y `min` z

No new Haskell syntaxes need to be introduced to understand these solutions. What's needed is some (non-lethal) dose of mathematics.

It is customary for the MC not only to present the top solutions, but also median and egregious ones. This is a median solution, at 34 tokens:

        f124 x y z | x > y = f124 y x z
                   | y > z = f124 x z y
                   | otherwise = x + 2 * y + 4 * z

The self-calls to f124 are a nice way to normalize the order of the arguments without having to appeal to library or helper functions. In fact, the MC had a very similar solution in his drawer. And to offer some contrast, here is the longest working solution, at 185 tokens:

        f124 x y z = (minimum [x,y,z]) + 2 * (mid x y z) + 4 * (maximum[x,y,z])
        mid x y z
                | maximum[x,y,z] == x && minimum[x,y,z] == y = z
                | maximum[x,y,z] == x && minimum[x,y,z] == z = y
                | maximum[x,y,z] == y && minimum[x,y,z] == x = z
                | maximum[x,y,z] == y && minimum[x,y,z] == z = x
                | maximum[x,y,z] == z && minimum[x,y,z] == x = y
                | maximum[x,y,z] == z && minimum[x,y,z] == y = x

There were 28 solutions with over 100 tokens, so this is not an isolated case. There's also no reason to be ashamed here, since we're at the very beginning of the course. We're here to learn, after all. But what's important to notice is what makes this solutions long: It's the needless case distinctions. Six cases are distinguished and handled separately, with a lot of copy-paste. All the solutions in the top 20 managed to avoid the case distinctions, either by calling sort or by relying on some tricks. The medial solution distinguished three cases, two of which were reduced to the third case through recursive calls. One of the MC's favorite quotes from Donald Knuth goes as follows:

The ability to handle lots of cases is Computer Science's strength and weakness. We are good at dealing with such complexity, but we sometimes don't try for unity when there is unity.

Finally, here's the shortest incorrect solution (27 tokens), to remind us of the need to QuickCheck our programs:

        f124 x y z = sum [max x a + a | a <- [0, y, z, max y z]]

It's hard to explain how this function works, especially since it doesn't. It is such a pity, because the same contestant had actually come up with a correct solution as well:

        -- schönere Lösung, aber mehr Token:
        -- f124 x y z = sum (zipWith (*) (sort [x,y,z]) [1,2,4])

Here's another wrong (but plausible-looking) solution:

        f124 x y z = min (min x y) z + 2 * min (max x y) z + 4 * max (max x y) z

To conclude, the MC would like to mention a few solutions that were communicated to him privately by last year's winner of the competition, Lorenz Panny (who is apparently bored by his other lectures). Herr Panny has two 21-token solutions, both based on the read function, which translates a string into a Haskell value:

        f124 a b c = sum $ uncurry replicate =<< read "[1, 2, 4]" `zip` sort [a, b, c]
        f124 a b = sum <<< uncurry replicate <=< uncurry zip . fmap sort . read . printf "([1, 2, 4], [%d, %d, %d])" a b

He also has a 23-token solution:

        f124 a b c = sum $ uncurry replicate =<< 3 `delete` enumFrom 1 `zip` sort [a, b, c]

No attempts will be made to explain these. Especially the second solution is mysterious even to the MC, because it uses advanced Haskell-specific constructs that won't be taught in class.

So what are the lessons to learn from this week's little experiment?

The first lesson is probably that Mathematics can be used to shorten programs. Nearly all of the shortest solutions exploited mathematical properties of the coefficients 1, 2, 4, namely, 2 × 1 = 2 and 2 × 2 = 4. Whether it's a good thing or not is another queston. For the competition, it's obviously a good thing, because it saves tokens. But it does make the programs harder to understand and harder to maintain. If the specification changes and the program has to work for the coefficients, say, 8, 3, 299, the most optimized solutions have to be rethought from scratch.

The second lesson is that Libraries can be used to shorten programs. And using the library instead of reimplementing functionality is almost invariably a good thing, because

  • the library is virtually bug-free;
  • the library is well documented;
  • the library is likely to be optimized to be faster and/or use less memory than most user applications;
  • the library is part of the vocabulary of other Haskell programmers, so they can read your code and understand it quickly.

The third lesson is that Cheap tricks can be used to shorten programs. This is not a very important lesson for programming in the real world, because sometimes it's better to waste a few tokens and to write clearer code. The cheapest trick of all is surely the use of read to transform strings into Haskell values. The competition is an artificial setting. In noncompetitive contexts, use your common sense!

There's a fourth lesson: If you find yourself copy-pasting code, you must be doing something wrong. Copy-paste occurs when you notice some similarity and exploit it. But there are almost always better ways of exploiting the similarity.

Despite the exceptions, there is a fairly strong correlation between program length and quality of code. The ratio above between shortest and longest solution is over 9:1. For large programs, this can make a real difference. It is not unknown from industry and academia that some people achieve more in 2,000 lines of code (LOCs) what others in 20,000 LOCs. And in nearly all case, the 2,000 LOC program is going to be much easier to understand, maintain, and extend. What this week's exercise didn't show, but is certainly true in practice, is that short code tends naturally to be more general and more widely applicable. In fact, there was a bit of this already in the top solutions: Most of them can easily be generalized to compute a + 2b + 4c + 8d + 16e + etc. or even a + 10b + 100c + 1000d + 10000e + etc., because the core algorithm works on lists and implements some variant of the Horner rule.

PS: Somebody who shall remain nameless submitted this “9-token” solution:

        f124naiv :: Integer -> Integer -> Integer -> Integer
        f124naiv x y z 
          |x<=y && y<=z = x + (2 * y) + (4 * z)
          |x<=z && z<=y = x + (2 * z) + (4 * y)
          |y<=x && x<=z = y + (2 * x) + (4 * z)
          |y<=z && z<=x = y + (2 * z) + (4 * x)
          |z<=x && x<=y = z + (2 * x) + (4 * y)
          |z<=y && y<=x = z + (2 * y) + (4 * x)

        f124 :: Integer -> Integer -> Integer -> Integer
        f124 x y z = f124naiv x y z

Shame on that person! Needless to say, the MC does not let himself be fooled by the wrong placement of the opening {-WETT-} tag.

Die Ergebnisse der zweiten Woche

First, an important announcement: In consultation with the Meta-Master (Prof. Nipkow), the MC has decided to extend the Top 20 to a Top 30 and retrospectively grant 10 additional points to the winners of the first week's contest.

Top 30 der Woche
  Platz Wettbewerber(in)   Token   Lösung (Kurzfassung)   Punkte
1. Maximilian Kirchmeier 22 par=lgirf(wa=e0(sr+r)) 30
2. Simon Roßkopf 23 par=a(=)rf(wac[0.r+sr]) 29
3. Hauke Brinkop 26 hk=k=rkpar=fh$wac[0.2*r+1] 28
Julian Biendarra 26 ix=x=rxpar=fi$wac[0.r*2+1] 28
5. Daniel Stüwe 27 par=f((=)ar)$w(na)<e0$2*r+1 26
6. Florian Dreier 28 fa=a=rapar=ff$c(wa)[0.r*2+1] 25
Clemens Jonischkeit 28 par=[x|i<e0$r+sr,x<wai,rx=x] 25
Michael Schreier 28 par=[x|y<e0$sr+r,x<way,x=rx] 25
Stefan Peter Dirix 28 par=fp$c$wam[0.r*2+1]px=x=rx 25
10. Maximilian Haslbeck 29 par=[a|a<c$wam[0.r*2+1],ra=a] 21
Florian Weinberger 29 par=[x|x<c(wa)[0.2*r+1],x=rx] 21
Hannes Bibel 29 par=s(cl)[w+a+rw|a<wa1,w<war] 21
13. Jonathan Kienzle 30 par=[w|n<[0.2*r+1],w<wan,w=rw] 17
Roland Schmid 30 par=[w|n<[0.2*r+1],w<wan,w=rw] 17
Dominik Dechamps 30 par=[w|r<[0.2*r+1],w<war,w=rw] 17
Alexander Weidinger 30 par=[w|x<[0.r*2+1],w<wax,w=rw] 17
Maximilian Weininger 30 par=[x|n<[0.2*r+1],x<wan,x=rx] 17
Sönke Erfkamp 30 par=[y|x<[0.r*2+1],y<wax,y=ry] 17
19. Tobias Gottwald 31 ix=x=rxpar=fi$c$zw(ra)[0.r*2+1] 12
Lars Wüstrich 31 par=[x|x<n$c(wa)[0.2*r+1],x=rx] 12
Martin Mihaylov 31 par=c[ifwan|n<[0.2*r+1]]iw=w=rw 12
Lawrence Krug 31 par=f(\x>rx=x)$n$c(wa)[0.2*r+1] 12
Bogdan Iacob 31 par=n[x|x<[0.2*r+1],x<wax,x=rx] 12
24. Gernot Brunner 32 par=[x|n<[0.(r*2+1)],x<wan,x=rx] 7
Benjamin Schagerl 32 par=f(\x>x=rx)$c[war|r<e0$r*2+1] 7
26. Zhechko Zhechev 33 par=[x|x<c[war|r<[0.2*r+1]],x=rx] 5
Konstantin Bauer 33 par=[y|y<c(\x>wax)[0.2*r+1],y=ry] 5
Daniel-Theodor Plop 33 par=c[f(\x>x=rx)$war|r<[0.2*r+1]] 5
Matthias Grimm 33 par=n["+x|z<[0.2*r+1],x<waz,x=rx] 5

This week's Top 30 has a bit more diversity in it than last week. There are more equivalence classes, which suggests that Gruppenarbeit (which is durchaus erlaubt) was not quite as prominent as last week. But in essence, the top solutions are all based on the same idea, namely that the simplest way to generate all palindromes of radius n is to generate all words of at most length 2n + 1 and filter out the words w that do not satisfy the property w == reverse w.

Here is the top solution, with 22 tokens:

        palindromesOfRadius alpha radius =
          liftM2 geq id reverse `filter` (wordsOfLength alpha =<< enumFromTo 0 (succ radius+radius))

This solution is abbreviated as par=lgirf(wa=e0(sr+r)) in the table above, by taking one letter for each token. The MC normally likes to explain solutions that use some constructs that will be seen later in class, but here the liftM2 construct is esoteric enough that he will restrain from doing so and rather focus on simpler solutions. The same comment applies to

        palindromesOfRadius alphabet radius =
          ap (==) reverse `filter` (wordsOfLength alphabet `concatMap` [0..radius + succ radius])

In short, if you find yourself importing any of the modules GHC.Exts, Control.Arrow, and Control.Monad, chances are that (1) you will win the competition and (2) the MC will call your solution cryptic.

Thankfully, the third place is occupied by a less cryptic entry:

        h k = k == reverse k
        palindromesOfRadius a r = filter h $ wordsOfLength a `concatMap` [0..2*r+1]

First, let's rewrite the right-hand side of the main function's definition to make it a bit more readable:

        filter h (concatMap (wordsOfLength a) [0 .. 2 * r + 1])

concatMap is a higher-order function, that is, a function that takes a function as argument. The function it takes as argument is wordsOfLength a. That's right: Even though wordsOfLength is supposed to take two arguments, it's OK to call it with a single argument. In that case it returns a function that takes a second argument, and that function is passed to concatMap.

In general, concatMap f [x1, x2, ..., xn] (where f is a function that returns a list and [x1, x2, ..., xn] is an n-element list) returns the list f x1 ++ f x2 ++ ... + f xn. So in the above case, we get

        filter h (wordsOfLength a 0 ++ wordsOfLength a 1 ++ ... ++ wordsOfLength a (2 * r + 1))

(except that Haskell doesn't support the ... notation). Finally, the filter call takes a predicate (a function that returns Bool) and filters out all the elements that do not satisfy the predicate. Here, the predicate is h, and it is defined as h k = k == reverse k.

This use of filter and concatMap is anticipating material covered later in the lectures. An alternative that is just as powerful is to use list comprehensions, as in this 28-token solution:

        palindromesOfRadius alphabet radius =
          [x | i <- enumFromTo 0 $ radius + succ radius, x <- wordsOfLength alphabet i, reverse x == x]

What about the MC's solution? Well, even though the instructions were clear that token count is what matters, he couldn't resist writing an efficient implementation. That cost him 49 (!) tokens:

        palindromesOfRadius :: [Char] -> Integer -> [[Char]]
        palindromesOfRadius alphabet radius =
            (if radius == 0 then [] else palindromesOfRadius alphabet $ radius - 1) ++
            [w ++ mid ++ reverse w | mid <- "" : map (: "") (nub alphabet),
             w <- wordsOfLength alphabet radius]

Some of the MC's friends also participated in the competition on an informal basis. (Actually, the whole competition is informal, but these people are not eligible for the trophies.) First, here's a 27-token solution by Manuel Eberl, who was a tutor last year:

Puisque Lorenz Panny participe à la compétition maintenant, je pense que je pouvais le faire aussi. :) C'est ma solution avec 27 tokens:
        import Control.Monad
        import Data.List
        import Data.Function.Predicate

        palindromesOfRadius :: [Char] -> Integer -> [[Char]]
        palindromesOfRadius xs n =
          filter (reverse >>= equals id) $ [0..2*n+1] >>= sequence . flip genericReplicate xs
(Il y a une version plus courte (25 tokens) avec "replicateM" si on a "Int" au lieu de "Integer", mais on ne l'a pas) Je pense qu'il a besoin d'une version actuelle de GHC, parce qu'il utilise l'instance de "Monad" pour ((->) [a]). Ma version est 7.6.2, je sais que ça ne marche pas avec version 7.4.2.

Then the MC got this email from Lorenz Panny (the winner of last year's competition):

PS: Achja, und bevor ich's vergesse ;-)...: (19 Tokens, aber mit bösem GHC.Exts-Import für plusInteger)
        palindromesOfRadius as = filter (isPrefixOf <*> reverse) . wordsOfLength as
         <=< enumFromTo 0 <<< plusInteger <*> succ
(20 Tokens, dafür aber "sauber" (naja...) mit Standardbibliotheken (Data.List, Control.Monad, Control.Applicative, Data.Bits) -- man beachte "flip subtract <*> complement" für "λn. 2n+1")
        palindromesOfRadius as = filter (isPrefixOf <*> reverse) . wordsOfLength as
         <=< enumFromTo 0 <<< flip subtract <*> complement
(und noch eine "konstruktive" (d.h. nur Palindrome bauen, statt alle Nicht-Palindrome herauszufiltern) Lösung mit sehr viel Arrow- und Monad-Magie -- diese Lösung ist nur geringfügig auf Tokenzahl optimiert, da das ohnehin aussichtslos schien :-))
        palindromesOfRadius as = app <=< uncurry mplus . unzip
         . map (fmap return &&& fmap ((`map` nub as) . flip (:))
           <<< map . mplus &&& reverse) . wordsOfLength as <=< enumFromTo 0

And finally, Andreas Bergmaier, who ended up in fifth place last year because of some bad luck (he is truly number 2 in the MC's heart), felt that he had to give Herr Panny a run for his money:

Auch als Tutor tüftel ich gern wieder bei den Wettbewerbsaufgaben mit. Diese Woche bin ich bis auf 24 Token heruntergekommen, mit der ziemlich ineffizienten Lösung
        palindromesOfRadius alp r = ap (==) reverse `filter` ([0..2*r+1] >>= wordsOfLength alp)
(davon ausgehend, dass nub innerhalb von wordsOfLength verwendet wird, um die Duplikatfreiheit zu garantieren). Sie ist äquivalent zu der einfachen List-Comprehension-Lösung
        palindromesOfRadius alp r = [ p | n<-[0..2*r+1], p<-wordsOfLength alp n, reverse p == p]
Eine effizentere Lösung mit detaillierter Konstruktion der Palindrome habe ich nur in 38 Tokens geschafft:
        palindromesOfRadius alp = genericIndex $ iterate help enalp
         where
          help short = enalp ++ do -- prepend empty word and alphabet to
           p <- short               -- for each shorter (previous) solution
           c <- tail enalp           -- and everything from alphabet (but the empty word)
           return $ c++p++c           -- wrap it between the chars
          enalp = "" : group (nub alp) -- "extended nubbed alphabet", vom Typ [String]
Die do-Notation ist tatsächlich 1 Token kürzer als [c++p++c | p <- short, c <- tail enalp] :-)

To quote the Meta-Master: Diese Jungs haben zu viel Energie.

What are this week's lessons?

The first lesson is that Competition is addictive and may have a detrimental effect on your other interests. In the worst-case scenario, you may find yourself addicted for life, just like Herr Panny, Herr Eberl, and Herr Bergmaier, and periodically write emails to a tragicomical figure who calls himself MC Hammer, not to mention the harm caused by constantly refreshing wettbewerb.html, desperately waiting for the next fix.

The second lesson is that Haskell provides many different means to attack problems at an abstract level. Some of these are way beyond the scope of this course, and can lead to cryptic code, but they have nonetheless legitimate uses.

The third lesson is that Simplicity and efficiency are hard to achieve simultaneously. The top solutions had to give up on long examples like saippuakauppias (the Finnish soap merchant). Efficient solutions take about twice as many tokens, because wordsOfLength cannot be reused. In practice, we often have to choose before simplicity and efficiency, and we must use our wits to determine which is more important for a specific application. Even an algorithm guru like Donald Knuth uses inefficient procedures in some of his programs (see paragraph 22).

There's a really terrible language out there that calls itself Perl and whose main claim of fame is that in that language, TMTOWTDI. If the competition has shown anything, it must be that this is even more true of Haskell.

Finally, you can Write emails to the MC in any language supported by Google Translate. And you can use du or Du or Sie or tu or vous or whatever and he'll try to answer in the same style. (Knigge & Co. have been on a slippery slope since the day a journalist said du to the Swedish prime minister back in the 1970s. The barbarians are at the gate.)

Die Ergebnisse der dritten Woche

Here are the results for the third week. The top 30 is effectively a top 43, due to a slew of 14 competitors with nearly identical solutions occupying the last step in the stairway to heaven.

Top 30 der Woche
  Platz Wettbewerber(in)   Lösung (Kurzfassung)   Token   Punkte
1. Clemens Jonischkeit e=d.fd"dk=m(lzk") 17 30
Simon Roßkopf e=d.fd"dk=m(lzk") 17 30
3. Maximilian Kirchmeier e=m.fl.z"d=m.fl.fz" 19 28
4. Daniel Stüwe e=e"d=fe"ekk=c.m(fl$zkk) 24 27
Benjamin Schagerl ekc=m(lz"k)cdkc=m(lzk")c 24 27
6. Aleksander Umov ekc=m(k!)$c>fe"dkc=ekic!2 25 25
7. Alexander Weidinger ek=mhWha=gk$oa-9dkt=ekitg2 26 24
Hauke Brinkop ek=mhWha=gk$oa-9dkt=ekitg2 26 24
9. Maximilian Haslbeck ek=m$gk.s9.odk=m$c.s1.h.fek 27 22
10. Stefan Peter Dirix ek=m$gk.s9.odk=m$c.(+9).f.fek 29 21
Julian Biendarra ek=m$gk.s9.odk=m$c.(9+).h.fek 29 21
12. Michael Schreier ekc=dk['.]dcdkc=[c$9+h(eak)|a<c] 32 19
13. Moritz Sichert ek=m$\c>k!(oc-9)dk=m$\c>c$9+h(eck) 34 18
Florian Weinberger ek=m$\l>gk$ol-9dk=m$\l>c$9+(h$elk) 34 18
Atanas Mirchev fabw=c[lx$zab|x<w]ekc=f"kcdkc=fk"c 34 18
16. David Otter ek=m(\c>k!(oc-9))dk=m(\c>c$h(eck)+9) 36 15
Julien Schmidt ek=m(\x>k!(ox-9))dk=m(c.(+9).h.(ek)) 36 15
18. Konstantin Bauer ekc=m((!)k.s9.o)cdkc=[c$9+h(exk)|x<c] 37 13
Lyubomir Stoykov ekl=(!)km(s9mmol)dkl=mc[9+h(etk)|t<l] 37 12
20. Vladislav Stepa ek=_k(sk)dk=_(sk)k_kkt=[k!(h$exk)|x<t] 38 11
Marco Probst ekc=[gk$ocm9|c<c]dkc=[c$9+(h$eck)|c<c] 38 11
Valentin Zieglmeier ekc=[k!(oc-9)|c<c]dkc=[c$9+h(eck)|c<c] 38 11
Jens Wöhrle ekc=[k!(oc-9)|c<c]dkc=[c$h(eck)+9|c<c] 38 11
Barbara Moser ekc=[k!(oc-9)|c<c]dkc=[c(eck!0+9)|c<c] 38 11
Christian Ziegner ekc=[k!(ox-9)|x<c]dkc=[c$9+h(exk)|x<c] 38 11
Maximilian Pudelko ekc=[k!(ox-9)|x<c]dkc=[c$9+h(exk)|x<c] 38 11
Martin Hartmond ekc=[k!(ox-9)|x<c]dkc=[c$h(exk)+9|x<c] 38 11
Amar Saljic ekc=[k!s9(oa)|a<c]dkc=[c$9+h(eak)|a<c] 38 11
Thomas Pettinger ekk=[k!(ox-9)|x<k]dkc=[c$9+h(exk)|x<c] 38 11
30. Roland Schmid ekc=[k!(oa-o')|a<c]dkc=[c$o'+eak!0|a<c] 39 1
Thomas Engel ekc=[k!(oc-9)|c<c]dkc=[c$9+(h$eck)|c<c] 39 1
Axel Fischer ekc=[k!(oc-9)|c<c]dkc=[c(9+h(eak))|a<c] 39 1
Kwon-Jin Jensen ekc=[k!(oc-9)|c<c]dkc=[c(h(eck)+9)|c<c] 39 1
Maximilian Mumme ekc=[k!(ol-9)|l<c]dkc=[c$(h$elk)+9|l<c] 39 1
Adrian Thilo ekc=[k!(ol-9)|l<c]dkc=[c(h(elk)+9)|l<c] 39 1
Thorsten Fuchs ekc=[k!(ox-9)|x<c]dkc=[c$9+(exk)!0|x<c] 39 1
Alexander Christian Hefele ekc=[k!(ox-9)|x<c]dkc=[c$9+(h$exk)|x<c] 39 1
Ludwig Peuckert ekc=[k!(ox-9)|x<c]dkc=[c(h(exk)+9)|x<c] 39 1
Konrad Weiss ekc=[k!(ox-9)|x<c]dkc=[c(h(exk)+9)|x<c] 39 1
Bogdan Iacob ekc=[k!(ox-9)|x<c]dkc=mc[(h$exk)+9|x<c] 39 1
Mihai Dutescu ekc=[k!(s9$ox)|x<c]dkc=[c$h(eyk)+9|y<c] 39 1
Zhechko Zhechev ekc=m(\x>k!(ox-9))cdkc=[c$h(eyk)+9|y<c] 39 1
Lukas Holzner ekk=[k!(ox-9)|x<k]dkg=[c(h(exk)+9)|x<g] 39 1

In conjunction with global warming, nonlinear effects have often been mentioned in the media. Now, the MC is not a climatologue, but whatever nonlinear effects affect the climate, they are certainly dwarfed by what we observe in the bottom half of the table: A one token increase (from 38 to 39) results in a 91% drop (from 11 to 1) in the score! Clearly, every token counts. (The same can unfortunately not be said of votes.)

To resolve the ties, the MC is entitled to resort to the Nebenkriterium, namely the readability (“Lesbarkeit, wobei Formatierung und Namensgebung besonders gewichtet werden”). By now it should be clear that this criterion is just an empty threat. Nevertheless, please continue to keep this in mind when writing code. Several solutions this week came with detailed comments, which makes for nice bedtime reading.

All right, let's move to the top solution, with 17 tokens:

        --in kooperation mit Simon Rosskopf
        encode = decode . flip decode "abcdefghijklmnopqrstuvwxyz"

        -- by decoding the key with the alphabet you get the inverse key
        -- encode key text = decode (inverse key) text
        decode key = mapMaybe (`lookup` zip key "abcdefghijklmnopqrstuvwxyz")

        -- No black magic in use! Due to lookup returns Maybe, mapMaybe is used, but it works similar to a normal map

The first comment, “in kooperation mit Simon Rosskopf,” is a frank acknowledgment that shows intellectual integrity. MC likes! College may feel like an artificial setting, but sooner than you expect you will find yourself in a situation where forgotten acknowledgments will cause you serious trouble. Here it would be nice to have a quote from Donald Knuth (the MC's idol) to emphasize this point, but we are lucky enough to have the second best thing: A quote about Donald Knuth!

Full of youthful enthusiasm at being able to communicate improvements on a previously published algorithm (Don was a Junior then), he failed to mention his co-authors in the paper; Don did the writing but other students contributed illustrations and most of the ideas of the algorithm. At the time he had no notion there was academic prestige to be gained through publication, Don confessed. This is, he said, a common mistake among young authors who frequently overlook proper acknowledgements in their haste to get the news out.

(Quelle: Section 18 of Mathematical Writing.)

Let's close this parenthesis and move on to the solution itself. Let's first massage it a little bit. The original version uses a few cheap trick, notably the function composition operator (.), to save tokens, but here's the essence:

        encode key cleartext = decode (decode key "abcdefghijklmnopqrstuvwxyz") cleartext
        decode key cryptotext = mapMaybe (\c -> lookup c (zip key "abcdefghijklmnopqrstuvwxyz")) cryptotext

As the comment in the code suggests, decoding the string abc...z with the key gives the inverse key, and decoding with the inverse key is the same as encoding with the original key. This works because (well-formed) keys are permutations and permutations are invertible.

The next solution, at 19 token, has some of the same flavor, but no attempt is made to reuse decode to implement encode (or vice versa):

        encode = mapMaybe . flip lookup . zip "abcdefghijklmnopqrstuvwxyz"
        decode = mapMaybe . flip lookup . flip zip "abcdefghijklmnopqrstuvwxyz"

Let's jump directly to Herr(n) Umov in 6th place and his mathematically inspired solution:

        encode key cleartext = map (key !!) $ cleartext >>= flip elemIndices "abcdefghijklmnopqrstuvwxyz"
        decode key cryptotext = encode key `iterate` cryptotext !! 26771144399

What's going on? The encoding is a bit odd, but what's really interesting is the decoding. Decoding the text is the same as encoding it 26771144399 times! There were three solutions with the magic constant 26771144399 appearing in them. Herr Umov's solution had this explanation:

        {-
        The encoding algorithms repeats itself after n times, while n = length key.
        Because of that, if you want to decode a given cryptotext, you can just keep encoding the cryptotext n times and you will have the same cryptotext again.
        But if you take the result of the (n-1)th encoding, you will get the original cleartext.
        So a simple solution would be to write:
                decode key cryptotext = encode key `iterate` cryptotext !! (length key - 1)
        However, this took up too many tokens. Thus, I wrote "!! 26771144399", since 26771144400 is the smallest number which is dividable by all numbers from 1 to 26.
        The downside of this solution is that the evaluation takes up quite much time with current CPUs.
        -}

The explanation is a bit confused. In particular, the claim that “the encoding algorithms repeats itself after n times” does not quite hold, for reasons that will become clear below.

Alternative, more correct explanation, this time in German (mostly):

        {- Positive Einflussfaktoren auf die Lösung ->
         - * ( Bier & Kaffee ... just being honest ;) )
         - * Diskrete Strukturen
         - * Symmetrisches Verschlüsselungsverfahren
         -
         - Die Zahl '26771144399' entspricht dem kgV der Zahlen 1 bis 26, -1.
         - Warum kgV 26? 26 Buchstaben des Alphabets und
         -   somit potentielle Zyklen der Länge 1 bis maximal 26.
         - Warum -1? Durch Verwendung der encode Methode zum Entschlüsseln
         -   haben wir den Text bereits einmal verschlüsselt und
         -   würden somit beim entschlüsseln eine Stelle "zu weit entschlüsseln".
         -}

And yet another explanation–the promise of immortality on the MC's weblog seems to arouse the didactive impulse in the competitors!

        {-
         - Ich nehme an, die erste Frage, die sich auftut, wenn man diesen Code liest, ist "WTF sucht diese Zahl da?!?!"
         - Die Antwort dazu ist ein wenig kompliziert, vorallem, weil man es empirisch nicht zeigen kann (da Haskell
         - der Speicher ausgeht >.<).   [Anmerkung: Eigentlich hatte ich 26!-1, es ist also schon auf Effizienz optimiert xD]
         - Zuersteinmal: Was ist das für eine Zahl? Diese Zahl ist das Ergebnis des Ausdruckes
         - kgv `foldr1` [1..26] - 1
         - Also das kleinste gemeinsame Vielfache aller Zahlen bis 26 abzüglich der Konstante 1
         - Warum 26? Weil das die maximale Alphabetlänge ist und damit auch die maximale Länge der Zyklen in der Permutation
         - (also im Key!)
         - Warum KGV? Eine Permutation der Länge l lässt sich durch Zyklen ausdrücken, deren Längen summiert <= 26 sind
         - Für jeden Zyklus z der Länge l gilt: z^l = id und für 2 Zyklen z1 z2 der Längen l1 l2 (z1 . z2)^(l1*l2) = id
         - da jede Zykluslänge <= 26 ein Teiler des KGVs ist, gilt hier: (z1 . z2 ...)^(l1*l2*...*k) = id^k = id
         - wobei k eine natürliche Zahl ist.
         - Warum -1? Eigentlich ist das Ziel ja, die Abbildung "id" zu erzeugen, ausgehen vom unkodierten Wort. Da es
         - aber bereits einmal kodiert wurde, sprich jeder Zyklus 1 mal angewandt, muss eben dieser abgezogen werden,
         - damit diese Funktion eine echte Decodierung durchführt.
        -}

What about the Jungs who have too much Energie? No worries, they're doing fine.

Lorenz Panny (the eternal winner from last year) also discovered this solution–which just goes to show that new generations aren't necessarily smarter than old ones ;). He wrote a lengthy description on his web page.

Manuel Eberl (tutor emeritus) has an even less efficient solution. Instead of taking the “kgV” (whatever that acronym stands for) of 26, he simply took the factorial (= die Fakultät) of 26, also minus 1.

And Andreas Bergmaier (the 5th place from last year, but #2 in the MC's heart) has a 21-token solution:

        encode = codeHelp (=<<)
        decode = codeHelp ap
        codeHelp dir = mapMaybe . flip lookup . dir zip sort

He apologized thusly for not winning: Hab allerdings nur eine Dreiviertelstunde investiert, also erwarte nicht zu viel :-)

So what are the lessons to learn this week?

In the first week, we saw that Mathematics can be used to shorten programs. This was confirmed this week. The top solution exploited algebraic properties of permutations, and the five solutions with kgV 26 − 1 or 26! − 1 were basically leveraging group-theoretic results.

But even without mathematics, short solutions were also possible for those who prefer to wear a computer scientist's hat. The third place is taken by a solution that requires no algebraico-group-theoretical mojo.

Perhaps the main lesson is: If you have to perform two (or more) similar tasks, see if there's a way to reuse one to implement the other. This is true in a variety of domains; for example, there exist frameworks that combine parsing and pretty-printing into a single grammar, potentially saving half the work, and half the bugs! The code that doesn't get written doesn't come back to haunt us later.

Die Ergebnisse der (vierten und) fünften Woche(n)

Finally, here are the results of the combined fourth/fifth weeks' contest!

Top 30 der Woche
  Platz Wettbewerber(in)   Punkte
1. Stefan Peter Dirix 60
2. Florian Dreier 58
3. Kevin Yu 56
4. Maximilian Haslbeck 56
5. Daniel Stüwe 52
6. Atanas Mirchev 50
Lyubomir Stoykov 50
Julian Biendarra 50
Lukas Michael Stumberg 50
Michael Schreier 50
Simon Roßkopf 50/2
12. Sebastian Peter Lehrbaum 38
Mihai Dutescu 38/2
14. Fabian Volz 34
Jens Wöhrle 34
Moritz Sichert 34
David Gaßmann 34/2
18. Christian Haba-Schneider 26
Nikita Basargin 26
Zhechko Zhechev 26
21. Christian Ziegner 20
David Otter 20
Johannes Ismair 20
Martin Helfrich 20
Martin Mihaylov 20
Ludwig Peuckert 20/2
27. Arved Baus 8
Florian Weinberger 8
Hauke Brinkop 8
Johannes Klicpera 8
Victor Seifert 8/2

Let us start with some practical matters. First, some competitors have a penalty of /2. This penalty is taken into consideration in the Top 30 des Semesters. The reason is given below. Second, following feedback from a competitor, the MC wanted to introduce a new scoring scheme, whereby solutions detected to be Gruppenarbeit count as one solution, so as not to penalize the solutions below it in the table; but this week the MC could not detect any duplicates, so it doesn't change anything one way or the other.

This somewhat longer problem attracted 43 nontrivial submissions (i.e., solutions with the {-WETT-}...{-TTEW-} tags that were not undefined) that passed the QuickCheck tests. In addition, there is one solution by Ex-Tutor Manuel Eberl and one by the MC. The lower number of competitors probably reflects the difficulty of the task.

To rank the solutions, the MC appealed to the following very nice program generously provided by Herr Eberl (and very slightly adapted by the MC):

        generateDistanceTable :: Int -> [(String, Integer, String)]
        generateDistanceTable n =
          [(show i, dist i j, show j) | i <- [0..n-1], j <- [0..n-1], i /= j]
          where g = mkStdGen 0
                rs = randomRs (-1000000000000, 1000000000000) g
                positions = V.unfoldrN n (\(x:y:rs) -> Just ((x,y),rs)) rs
                sqr x = x * x
                dist i j = case (positions V.! i, positions V.! j) of
                             ((x1,y1), (x2,y2)) -> floor (sqrt (sqr (x2-x1) + sqr (y2-y1)) :: Double)

The program randomly distributes n cities called 0, 1, etc., on a cartesian plane and then computes the Euclidean distances. Normally the result should be a sane table, but due to the use of floating-point arithmetic there are no guarantees. (Using Manhattan distances could have solved this problem.)

Six of the solutions solutions returned False for the case where n = 10, so these were eliminated right away. Five of these six programs failed for the (perfectly sane) distance table

        [                ("A", 17, "B"), ("A", 17, "C"), ("A", 9, "D"),
         ("B", 17, "A"),                 ("B", 8, "C"),  ("B", 8, "D"),
         ("C", 17, "A"), ("C", 8, "B"),                  ("C", 9, "D"),
         ("D", 9, "A"),  ("D", 8, "B"),  ("D", 9, "C")                 ]

possibly because they used strict comparisons (< bzw. >) instead of nonstrict ones (<= bzw. >=). If your name is not in the table and your solution is faster than the naive solution, you might want to check this example.

In addition, five solutions failed on the corner case []: four crashed, one returned False. Strictly speaking, they should be ruled out; but since it's a two-week contest with twice the amount of points, this seems a little harsh for a corner case. Hence, these solutions are be ranked as usual, but the score is divided by 2.

This leaves us with 32+5+2 competitors (normal + buggy for [] + nonstudents).

The first rounds were evaluated with a timeout of 10 s.

Round 1: n = 10
Leaves 31+5+2 participants.

Round 2: n = 15
Leaves 26+5+2 participants.

Round 3: n = 20
Leaves 22+4+1 participants. (The MC's solution was hardly optimized at all, so this is where he “flies out.”)

Round 4: n = 25
Leaves 17+3+1 participants.

Round 5: n = 30
Leaves 14+3+1 participants.

Round 6: n = 50
Leaves 11+2+1 participants.

At this point the timeout is raised to 20 s.

Round 7: n = 100
Leaves 10+1+1 participants.

Round 8: n = 100, but with 100 repetitions of the table
Leaves 5+0+1 participants.

At this point the timeout is cranked up to 60 s, and the absolute times give the top 5 rankings

Round 9: n = 100

         8.03 s   Stefan Peter Dirix
        10.68 s   Florian Dreier
        12.59 s   Manuel Eberl
        13.41 s   Maximilian Haslbeck
        18.81 s   Kevin Yu
        42.40 s   Daniel Stüwe

Update: Looking at the results with disbelief, Manuel Eberl deduced that the MC ran the programs in ghci. Drastically different results are obtained using ghc -O3. In addition, Max Kirchmeier had sent an alternative IO-monad-based version per email, but the MC forgot to run it. So here are the results once again:

         0.19 s  Manuel Eberl
         0.28 s  Stefan Peter Dirix
         0.29 s  Florian Dreier
         0.49 s  Kevin Yu
         0.92 s  Maximilian Haslbeck
         1.81 s  Daniel Stüwe
        23.93 s  Maximilian Kirchmeier

And for good measure, with n = 300:

         1.32 s  Manuel Eberl
         3.83 s  Stefan Peter Dirix
         8.75 s  Florian Dreier
        11.72 s  Kevin Yu
        16.47 s  Maximilian Haslbeck
        17.63 s  Daniel Stüwe
            ∞ s  Maximilian Kirchmeier

For the competition, this means Herr Yu and Herr Haslbeck have exchanged position. But the MC is not keen on subtracting points he has already given out, so he's going to give Herr Yu his additional 2 points without subtracting them from anybody else. End of update.

So what do the solutions look like? Let's have a look at Herr Dirix's:

        data Node = Node String Int (Map.Map String Integer) deriving (Eq, Ord)
        type NodeMap = Map.Map String Node

        isDistanceTableSane :: DistanceTable -> Bool
        isDistanceTableSane tab = checkTab tab /= Nothing
                where checkTab tab = do 
                        map <- buildMap tab Map.empty
                        checkCompleteness map
                        checkTriangles (Map.elems map)

        buildMap :: DistanceTable -> NodeMap -> Maybe NodeMap
        buildMap [] map = Just map
        buildMap (t:tab) map = let newMap = handleEdge t (Just map) in 
                               case newMap of Nothing -> Nothing
                                              Just newMap -> buildMap tab newMap

        addNode :: String -> NodeMap -> NodeMap
        addNode name map = Map.insert name (Node name 0 Map.empty) map

        handleEdge :: (String, Integer, String) -> Maybe NodeMap -> Maybe NodeMap
        handleEdge _ Nothing = Nothing
        handleEdge (nameA, distance, nameB) (Just map)
                | distance < 0 = Nothing
                | nameA == nameB = Nothing
                | otherwise = do
                        (nodeA, map1) <- getNode nameA map
                        (nodeB, map2) <- getNode nameB map1
                        addEdge nodeA nodeB distance map2

        getNode :: String -> NodeMap -> Maybe (Node, NodeMap)
        getNode name nodeMap = case node of Nothing -> Just (initNode, Map.insert name initNode nodeMap)
                                            Just node -> Just (node, nodeMap)
                where node = Map.lookup name nodeMap
                      initNode = (Node name 0 Map.empty)

        checkCompleteness :: NodeMap -> Maybe Bool
        checkCompleteness nodeMap
                | allLengthsOk = Just (True)
                | otherwise = Nothing
                where nodes = Map.elems nodeMap
                      lengthTarget = length nodes - 1
                      allLengthsOk = and $ map (\(Node name n distanceMap) -> n==lengthTarget) nodes

        addEdge :: Node -> Node -> Integer -> NodeMap -> Maybe NodeMap
        addEdge (Node nameA nA mapA) (Node nameB nB mapB) distance map = do
                updatedNodeA <- updateNode (Node nameA nA mapA) nameB distance
                updatedNodeB <- updateNode (Node nameB nB mapB) nameA distance
                Just (Map.adjust (\_->updatedNodeB) nameB $ Map.adjust (\_->updatedNodeA) nameA map)
        

        updateNode :: Node -> String -> Integer -> Maybe Node
        updateNode (Node name n map) nameB distance = 
                case oldDistance of Nothing -> Just (Node name (n+1) newMap)
                                    Just oldDistance -> if (oldDistance /= distance) then Nothing else Just (Node name n map)
                where (oldDistance, newMap) = Map.insertLookupWithKey (\key oldVal newVal -> newVal) nameB distance map

        checkTriangles :: [Node] -> Maybe Bool
        checkTriangles [] = Just (True)
        checkTriangles (node:nodes)
                | checkTrianglesForNode node nodes = checkTriangles nodes
                | otherwise = Nothing
        
        checkTrianglesForNode :: Node -> [Node] -> Bool        
        checkTrianglesForNode node nodes = and [checkTrianglesForNodes node nodeB nodeC | nodeB <- nodes, nodeC <- nodes, nodeB < nodeC]

        checkTrianglesForNodes :: Node -> Node -> Node -> Bool
        checkTrianglesForNodes (Node nameA nA mapA) (Node nameB nB mapB) (Node nameC nC mapC) = 
                (distAB <= distBC + distAC) && (distBC <= distAB + distAC) && (distAC <= distBC + distAB)
                where distAB = fromJust $ Map.lookup nameB mapA
                      distBC = fromJust $ Map.lookup nameC mapB
                      distAC = fromJust $ Map.lookup nameC mapA

The code adheres to very nice stylistic conventions, but it's still highly nontrivial to understand how it works. But to quote Frederick Brooks (1975),

Show me your flowcharts and conceal your tables, and I shall continue to be mystified.
Show me your tables, and I won't usually need your flowcharts; they'll be obvious.

Using modern terminology, we would probably phrase it more like this:

Show me your algorithms and conceal your data structures, and I shall continue to be mystified.
Show me your data structures, and I won't usually need your algorithms; they'll be obvious.

Here, the data structures are helpfully introduced at the very top:

        data Node = Node String Int (Map.Map String Integer) deriving (Eq, Ord)
        type NodeMap = Map.Map String Node

A Node represents a row (or column) in the distance table. The first argument is the source city, the second argument the number of cities with which the city is connected, and the third argument is a map from target cities to distances. Perhaps the name DistanceTableRow would have been more illuminating.

Then a NodeMap represents an entire distance table. A map is an associative data structure that stores key–value pairs and that allows us to perform quick lookups on the keys.

Once we understand this much the algorithms are indeed quite natural.

Here's the solution by Herr Dreier:

        isDistanceTableSane :: DistanceTable -> Bool
        isDistanceTableSane table = and [ dis>=0 && a/=b | ((a,b),dis)<-distList ] && vollstaendig && triangle_rule
                where
                        (froms,dists,tos) = unzip3 table
                        cities_list = nub $ froms++tos -- contains all city names of the table
                        cities_len = length cities_list
                        cities_map = fromList $ zip cities_list [1..cities_len] -- maps city name to index
                        createMapRek :: DistanceTable->Map (Int,Int) Integer->Map (Int,Int) Integer
                        createMapRek [] curMap = curMap
                        createMapRek ((from,dist,to):xs) curMap =
                                if notMember tup curMap then
                                        createMapRek xs $ Data.Map.insert tup dist curMap
                                else
                                        if (findWithDefault (-1) tup curMap) /= dist then
                                                Data.Map.insert tup (-1) curMap
                                        else
                                                createMapRek xs curMap
                                where
                                        a = cities_map ! from
                                        b = cities_map ! to
                                        tup = if a<b then (a,b) else (b,a)
                        distMap = createMapRek table Data.Map.empty
                        distList = toList $ distMap
                        distance (a,b)
                                        | a<b = findWithDefault (-1) (a,b) distMap
                                        | otherwise = findWithDefault (-1) (b,a) distMap
                        vollstaendig = length distList == cities_len*(cities_len-1) `div` 2
                        triangle_rule = and [distance (x,y)+distance (y,z)>=distance (x,z)|x<-[1..cities_len-1],y<-[x+1..cities_len],z<-[1..cities_len],y/=z,x/=z]

This solution isn't quite as nice to read, and one is puzzled by the Mischung aus Englisch and German, but it has one nice refinement, namely that cities are mapped to integer indices. This way, if the problem contains very very long and nearly identical city names, they need not be compared over and over. Indeed, had the MC thought of it, he would have added some tests that generate such ridiculously long names.

Finally, for the record, here's Herr Eberl's solution's text. It uses all sorts of evil tricks which won't be covered here or in class. It also uses a so-called state monad, which explains why he won (virtually):

        import Data.Maybe
        import Control.Monad
        import Control.Monad.ST
        import qualified Data.Array.IArray as IA
        import qualified Data.Array.ST as STA
        import qualified Data.Array as A
        import qualified Data.Array.Unsafe as AU
        import qualified Data.HashTable.ST.Basic as HT

        -- A simple pairing function for unordered pairs {i,j}.
        pair :: Integral a => a -> a -> a -> a
        pair n i j = i' * n + j'
          where i' = max i j
                j' = min i j

        -- Monadic fold over a list that aborts if an intermediate result is "Nothing"
        foldMAbort :: Monad m => (a -> b -> m (Maybe a)) -> a -> [b] -> m (Maybe a)
        foldMAbort f σ [] = return (Just σ)
        foldMAbort f σ (x:xs) = f σ x >>= maybe (return Nothing) (\y -> foldMAbort f y xs)

        -- Collects the town names from the table and produces a HashTable that maps 
        -- names to numbers from 0 to n-1, where n is the number of towns. Returns 
        -- that HashTable and n.
        buildNameTable :: [(String, Integer, String)] -> ST s (Maybe (HT.HashTable s String Int, Int, Integer))
        buildNameTable distTable =
          do nameTable <- HT.new
             let ins town n = HT.lookup nameTable town >>= 
                                  maybe (HT.insert nameTable town n >> return (n, n + 1)) 
                                        (\i -> (return (i,n)))
             n <- foldMAbort (\(n, maxDist) (town1, d, town2) -> 
                    if d < 0 then
                      return Nothing    -- non-negativity violated
                    else do
                      (i, n') <- ins town1 n
                      (j, n'') <- ins town2 n'
                      let maxDist' = max d maxDist
                      return (if i == j then Nothing else Just (n'', maxDist)) -- check irreflexivity
                  ) (0,0) distTable
             return (fmap (\(n,maxDist) -> (nameTable, n, maxDist)) n)

        -- Check whether the given packed distance matrix with the given size
        -- fulfils the triangle inequality
        checkTriangleInequality :: (IA.IArray a e, Integral e) => Int -> a Int e -> Bool
        checkTriangleInequality n distArray = null $
          do i <- [0..n-1]  -- pick "point of departure"
             k <- [0..i-1]  -- pick "destination" (w.l.o.g. assume i > k)
             let dik = dist i k
             j <- [0..n-1]  -- pick "detour"
             guard (i /= j && j /= k)  -- assume i ≠ j and j ≠ k (i.e. "proper" detour)
             guard (dist i j + dist j k < dik)  -- assume that the detour is "shorter"
             -- this produces an empty list iff the triangle property holds
          where dist i j = distArray IA.! pair n i j

        -- Generic helper function for isDistanceTableSane that can work on any 
        -- kind of mutable array – boxed and unboxed.
        genDistanceTableSane distTable nameTable n newArray =
          do let nEntries = n * (n - 1) `div` 2
             -- This array contains a distance matrix, indexed by a clever pair function
             -- Uninitialised distances are set to -1.
             distArray <- newArray (0, n * n) (-1)
     
             -- get a town's ID by its name
             let getIndex = liftM fromJust . HT.lookup nameTable
     
             -- inserts a distance entry into the array and keeps track of the number 
             -- of unoccupied entries in the matrix
             let insertDist nEmptyEntries (t1, d_, t2) =
                   do i <- getIndex t1
                      j <- getIndex t2
                      let d = fromInteger d_
                      let k = pair n i j
                      d' <- STA.readArray distArray k
                      STA.writeArray distArray k d
                      if d' < 0 then
                        return (Just (nEmptyEntries - 1))    -- decrease unoccupied
                      else if d == d' then
                        return (Just nEmptyEntries)    -- duplicate entry, consistent
                      else
                        return Nothing    -- duplicate entry, inconsistent
          
             if length distTable < nEntries then
               -- early abort for distance tables that are too small to be complete
               return False
             else do
               return True
               -- insert the entries into the matrix
               nEmptyEntries <- foldMAbort insertDist nEntries distTable
               -- check whether table is consistent and all entries were present
               case nEmptyEntries of
                 Nothing -> return False    -- inconsistent
                 Just 0 -> liftM (checkTriangleInequality n :: Integral e => A.Array Int e -> Bool)
                               (AU.unsafeFreeze distArray)
                 Just _ -> return False     -- not all entries occupied

        isDistanceTableSane :: [(String, Integer, String)] -> Bool
        isDistanceTableSane distTable = runST $
          do r <- buildNameTable distTable  -- build mapping from town to ID
             case r of
               Nothing -> return False
               Just (nameTable, n, maxDist) -> 
                 -- If this condition is true, the sum of two distances will be representable 
                 -- using a machine-length integer, therefore, we can use unboxed arrays
                 -- with Ints. Otherwise, fall back to boxed array with Integers.
                 if maxDist * 2 <= toInteger (maxBound :: Int) then
                   genDistanceTableSane distTable nameTable n
                       (STA.newArray :: (Int,Int) -> Int -> ST s (STA.STUArray s Int Int))          
                 else
                   genDistanceTableSane distTable nameTable n
                       (STA.newArray :: (Int,Int) -> Integer -> ST s (STA.STArray s Int Integer))

Notice how the program chooses an efficient Int-based implementation if the distances are small enough, falling back on “God's integers” otherwise.

This week, there's only one lesson: Choose your data structures with care.

Actually, there's a second lesson, for those who were rewarded a /2 “bonus”: Remember to test the corner cases! And keep in mind that there are no completeness guarantees associated with the QuickCheck suite run by the submission system.

Third lesson: There is no third lesson.
Third lesson: (Updated:) Run ghc -O3 when benchmarking programs for speed.

Die Ergebnisse der sechsten Woche

Here the results for the sixth week:

Top 30 der Woche
  Platz Wettbewerber(in) Tiefe   Punkte
1. Julian Biendarra 4374 30
2. Daniel Stüwe 2916 29
Simon Roßkopf 2916 29
Hauke Brinkop 2916 29
5. Maximilian Haslbeck 2187 26
6. Maximilian Kirchmeier 1296 25
7. Atanas Mirchev 1024 24
Lyubomir Stoykov 1024 24
9. Florian Dreier 972 22
10. Roland Schmid 768 21
11. Martin Mihaylov 576 20
12. Stefan Peter Dirix 129 20
13. Johannes Klicpera 22 19
14. Alexander Güssow 12 18
15. Christian Ziegner 6 17
16. Fabian Volz 5 16
17. Axel Fischer 4 15
18. Fabian Schöttl 3 14
19. Hannes Bibel 2 13
Johannes Ismair 2 13
Arved Baus 2 13
Frédéric Simonis 2 13
Julia Kindelsberger 2 13
24. David Otter 2187 8
25. Barbara Moser 2 7
Max-Philipp Schrader 2 7
Thomas Zwickl 2 7

This week, the MC is away to learn a language even more arcane than some of the submissions to the Wettbewerb. The friendly takeover of the Co-MC brings also brings a change to the rules: Usually, we want the job done, as fast and cheap as possible. This time, we were more explorative: How far can we get with a fixed budget?

In this round, we got 87 submissions participated. The Co-MC's mood got a bit somber as it turned out that 56 of these were not even trying. Only index1 and more than 30 tokens to spare? Where is the scholar's spirit, your will to boldly go where no one has gone before? "Vun nix kütt nix" wie der Kölsche sagt, so this leaves 31 programs.

As is customary, each candidate has to survive a few rounds of "Ene, mene, muh und raus bist du" before assuming her or his rightful place in the weekly wall of fame. Early on, two hopeful aspirants got left behind, as the alert Guard of the Haskell Code (GHC) spotted some inaccuracies in their code. Remember, the GHC is willing to give your code as many chances as you need, but once you submit, its rejection is final! This is particulary annoying to the participants, as in both cases, the error seemed to be a simple mispelling of a function name.

It is said that looking into the eyes of a man (or woman, for that matter) gives you a glimpse of the character of said person. Similarly, looking at the types of a program, tells you a whole lot of what this program is about. But as you can get lost in the deep eyes of your companion on a lovely Summer's day, you can get lost in the types of a program in the faint light of an LCD, in particular if they are hundreds or thousands of characters long. Committed to be impartial, the Co-MC cannot afford to get lost, so he makes a Faustian pact with the occult and type-less forces of the Unix to unearth "des Types Kern":

        printf ':t %s\n' "$fun" |
        ghci round5/$student.hs |
        awk 'BEGIN { pr = 0 }; /^\*Main>/ { pr = 1 - pr; }; { if (pr == 1) print  }' |
        tr '\n' ' ' |
        tr -s ' ' |
        sed -e 's/^.*:: *//' |
        tee round5/types/$student |
        sed -e 's/^.*=> *//; s/\[.\]/\[a\]/; s/Integer/I/g; s/Int/I/g; s/t1/I/g; s/[cbt]/I/g' |
        tr -d ' ' |
        tee round5/types-simp/$student |
        tr -s '[]' |
        tee round5/types-norm/$student > /dev/null

By these incantations, a correct submission's type is normalized to [a]->I->([I],I). Unfortunately, one unhappy soul get lost in size of his type and submitted a function with normalized type of (a->I->(I,I))->[a]->I->([I],I). This leaves 28 candidates and it breaks the Co-MCs heart to see that the Top-30 won't be a Top-30 this time.

Even if you correctly judged the type -- Who knows what evil lurks in the hearts of (wo)men? Hence, the Co-MC tests each program on the empty list, and again, two climbers onto the steep rock of fame lose their grip on this treacherous terrain. Apparently, nobody expects the spanish inqui^W^Wthe empty list. After this trivial test-case, the Co-MC started doing more in-depth functional tests, but was stopped short already by the test

        indexN [ [..["a","b"]..], [..["c,d"]..] ] 0 == [ [..[0,1]..], [..[2,3]..] ]

where [..[ and ]..] stand for N-1 opening resp. closing brackets. His puny laptop ground to a halt as the GHC desperately tried to grasp the deeply nested types of the top contenders. Only switching to a bigger machine allowed him to test all the programs. Satisfied that all remaining programs passed this test, he set out to rank the survivors.

The ranking criteria is simple: Just compare the nesting depth. And of course, there was the limit of 60 tokens. But seeing the field of competitors so sparse, the Co-MC decided to award points even to those who blew the budget; setting them at the end of this weeks ranking. Again, a bit of black magic sets the contrast to Haskell's beautiful structure:

        for file in ../candidates-3/*@*; do
            user=${file##*/}
            normsize=$(stat -c '%s' ../types-norm/$user)
            simpsize=$(stat -c '%s' ../types-simp/$user)
            depth=$((($simpsize - $normsize) / 4 + 1))
            tokens=$(grep $user ../TOKENS | cut -d' ' -f1)
            [ "$tokens" -le 60 ] && echo $depth $user
        done | sort -nr > rank

        for file in ../candidates-3/*@*; do
            user=${file##*/}
            normsize=$(stat -c '%s' ../types-norm/$user)
            simpsize=$(stat -c '%s' ../types-simp/$user)
            depth=$((($simpsize - $normsize) / 4 + 1))
            tokens=$(grep $user ../TOKENS | cut -d' ' -f1)
            [ "$tokens" -gt 60 ] && echo $depth $user
        done | sort -nr >> rank

Let's have a look at this week's competitors: Almost all recognized that the function foldlMap from exercises can be put to good use here:

        index1 as j = ([j .. (j+(length as)-1)], j+length as)
        index2 as j = foldlMap index1 as j

or

        index1 = foldlMap (\a i->(i,i+1)) 
        index2 = foldlMap index1
        index3 = foldlMap index2
        index4 = foldlMap index3

This is the key insight for this exercise and also the one which is really useful outside of this competition. It makes one wonder, why both solutions stopped, having still a number of tokens available -- with the same schema, one could achieve a nesting depth of 12. Leaving out the intermediary functions (the MC just asked for the deepest possible nesting, so the nestings in between are not necessary), one can get up to depth 22:

        index = foldlMap (foldlMap $ foldlMap $ foldlMap $ foldlMap $ foldlMap $ foldlMap
            $ foldlMap $ foldlMap $ foldlMap $ foldlMap $ foldlMap $ foldlMap $ foldlMap
            $ foldlMap $ foldlMap $ foldlMap $ foldlMap $ foldlMap $ foldlMap $ foldlMap
            $ foldlMap (\_ y -> (y,y+1)))

This looks like it exhausted foldlMap pretty well, doesn't it? But this is only linear behaviour, there is a way to grow exponentially, leading us to depth 1024.

        f a b = (b, succ b)
        foldlMap_4 = foldlMap . foldlMap . foldlMap . foldlMap
        foldlMap_16 = foldlMap_4 . foldlMap_4 . foldlMap_4 . foldlMap_4
        foldlMap_64 = foldlMap_16 . foldlMap_16 . foldlMap_16 . foldlMap_16
        foldlMap_256 = foldlMap_64 . foldlMap_64 . foldlMap_64 . foldlMap_64
        foldlMap_1024 = foldlMap_256 . foldlMap_256 . foldlMap_256 . foldlMap_256
        index1024 = foldlMap_1024 f

Quite neat, isn't it? Can we get better? Yes, but the not by an order of magnitude. Hauke Brinkop gives credit to his partner and notes:

        {-
         - Zusammen mit Stüwe
         - Dass immer 3 calls pro zeile sind liegt daran, dass
         - a**b, 2ab + b <= 42 
         - bei a = 3, b = 6 ein lokales Maximum besitzt
         -}

So packing three calls in one definition is better than four, and leads them and a third competitor up to a depth of 2916, securing them a nice second place. Careful tuning of the same idea brings us to this weeks winner, Julian Biendarra. He manages a whooping 4374 levels of nesting:

        index3N = foldlMap . foldlMap . foldlMap
        index9N = index3N . index3N . index3N
        index27N = index9N . index9N . index9N
        index81N = index27N . index27N . index27N
        index243N = index81N . index81N . index81N
        index729N = index243N . index243N . index243N
        index2187N = index729N . index729N . index729N
        index4374 = index2187N . index2187N $ const $ id &&& (succ :: Int -> Int)

An interesting tidbit is the (&&&) operation. While it is much more general than that, it is used here to combine the two functions id and succ to a function \x -> (id x, succ x).

So 4374 is the best we can do? While he employed the same scheme, Simon Roßkopf feels that this shouldn't be the end:

        I do not really like this solution because of all the helper functions basically
        contain the same code.

        I am pretty sure using template haskell or the trick described in this thread
        (http://stackoverflow.com/questions/5994051/is-there-a-function-to-flatten-a-nested-list-of-elements)
        it would be possible to construct some function to do the job for arbitrary deep
        nested lists. But as both methods are not standard haskell(and more importantly,
        I do not understand them) I decided not to use them.

And indeed, this feeling is right. Again, two of our informal participants, who have the advantage of age, discovered that it is possible not only to go over 9000, but even beyond, up to an arbitrarily deep nesting. Manual Eberl apparently strives towards teaching, so he provided us with a nice write-up of his solution. On the other hand, Lorenz Panny seems to like living on the edge, digging deep into the dark pits of unsafe and unportable behaviour. Do not try this at home or in the exam!

        idx :: Int -> [a] -> Int -> ([b], Int)
        idx 0 = unsafeCoerce const $ id &&& (succ :: Int -> Int)
        idx n = unsafeCoerce foldlMap $ idx $ pred n

        index1 :: [a] -> Int -> ([Int], Int)
        index1 = idx 1

        index2 :: [[a]] -> Int -> ([[Int]], Int)
        index2 = idx 2

        index539 :: [[...[[a]]...]] -> Int -> ([[...[[Int]]...]], Int)
        index539 = idx 539

So, what can we learn from this week?

The Master of Competition's Addendum to Week 6

こんにちは。This is the real MC speaking. Co-MC Noschinski was graceful enough to pass over the MC's official solution, which can be found in Sol_Exercise_6.hs and is reproduced below:

        foldlMap4 = foldlMap . foldlMap . foldlMap . foldlMap
        foldlMap16 = foldlMap4 . foldlMap4 . foldlMap4 . foldlMap4
        foldlMap48 = foldlMap16 . foldlMap16 . foldlMap16
        foldlMap144 = foldlMap48 . foldlMap48 . foldlMap48

        index1 xs j = (enumFromTo j $ pred j', j') where j' = j + length xs
        index432 = foldlMap144 $ foldlMap144 $ foldlMap144 index1

Note that index432 is a misnomer: This one goes to 433. That leaves the MC at place 11, or 13 if we count Herr Eberl and Herr Panny. Not so impressive!

The reason the MC is bringing this up is that there is an important lesson to learn from his (relative) failure. We computer scientists are successful because we are experts in the Noble Art of Reducing Big Problems to Small Problems: problems of size n to problems of size n − 1; problems of size 2 to problems of size 1; problems of size 1 to problems of size 0. The key to nearly all nontrivial programming is to handle the base case(s) explicitly and use recursion (in Haskell) or iteration (in Java) to perform the reduction.

Now, this reduction presupposes that we can correctly identify the base cases. The MC's mistake, as it were, was to consider the “n = 1” case as the base case, when it's much more convenient to look at “n = 0.” In other words, instead of starting with

        index1 :: [a] -> Int -> ([Int], Int)
        index1 xs j = (enumFromTo j $ pred j', j') where j' = j + length xs

it would have made much more sense to take

        index0 :: a -> Int -> (Int, Int)
        index0 x j = (j, succ j)

as the starting point—like most of the top competitors did. If index1 is desired (which is not the case for the Wettbewerb), it can easily be obtained as follows:

        index1 :: [a] -> Int -> ([Int], Int)
        index1 = foldlMap index0

This brings us to our fourth lesson:

This will save you tokens aplenty, potential bugs, and certain embarrassment.

Die Ergebnisse der siebten Woche

The results for week 7 are in!

Top 30 der Woche
  Platz Wettbewerber(in)   Punkte
1. Simon Roßkopf 30
2. Stefan Peter Dirix 29
3. Maximilian Haslbeck 28
4. Moritz Sichert 27
5. Lyubomir Stoykov 26
6. Hauke Brinkop 25
Daniel Stüwe 25
8. Florian Dreier 23
9. Atanas Mirchev 22
10. Maximilian Kirchmeier 21
11. Ludwig Peuckert 20
Christian Ziegner 20
13. David Otter 18
Kevin Falkenstein 18
Stefan Hörl 18
16. Alexander Christian Hefele 15
Alexander Weidinger 15
Alexandra Fritzen 15
Arved Baus 15
Benjamin Holzschuh 15
Adrian Thilo 15
Jens Wöhrle 15
Julian Biendarra 15
Konrad Weiss 15
Maximilian Strobel 15
Michael Schreier 15
Nikita Basargin 15
Thomas Zwickl 15
Valentin Zieglmeier 15
30. Fabian Kovacs 1

We see many of the usual suspects at the top. In particular, Herr Roßkopf is slowly climbing back to the top of the Top 30 des Semesters, after having lost half of his points in the fourth/five weeks due to a failure to handle the [] case correctly. (Trust the MC: He's unlikely to forget the [] case again anytime soon!)

This week, 50 solutions passed the official QuickCheck tests and had the {-WETT-}...{-TTEW-} tags. In addition, 6 meta-solutions are considered: one from Manuel Eberl (tutor emeritus), Andreas Bergmaier (the 5th place from last year, but #2 in the MC's heart), Maximilian Kirchmeier (a competitor this year who wanted to double his chances of winning), plus three solutions by the MC, called Naive, I, and II. The naive solution is as in Sol_Exercise_7.hs:

        quasiIdentical :: String -> String -> Bool
        quasiIdentical [] [] = True
        quasiIdentical (c : cs) (d : ds) = (c == d && quasiIdentical cs ds) || cs == ds
        quasiIdentical _ _ = False

        fixTypo :: [String] -> String -> String
        fixTypo vocabs word =
          if length quasis == 1 then head quasis else word
          where quasis = List.nub (List.filter (quasiIdentical word) vocabs)

        fixTypos :: [String] -> [String] -> [String]
        fixTypos = map . fixTypo

A good starting point for ranking the solutions is this snippet from the problem statement:

        Exercise_7> import Data.List
        Exercise_7> import Data.Hashable
        Exercise_7> let vocabs = take 10000 (permutations "abcdefghi")
        Exercise_7> let text = take 100000 (permutations "abcxefgih")
        Exercise_7> :set +s
        Exercise_7> hash (fixTypos vocabs text)
        3652546476494351287
        (3.65 secs, 1029321328 bytes)

A few competitors noticed that the hash value is wrong! Indeed, the MC changed the problem before publishing the Aufgabenblatt and neglected to update the hash. Also, it has been observed that the hash function behaves differently on different machines, perhaps due to different versions of Haskell. On the MC's MacBook Pro, the correct hash is 5025742898364772150. Also, the 3.65 s taken by this solution can be drastically reduced by compiling the sources using ghc -O3, which is done for the evaluation below.

All right, enough blah blah, let's get started! All engines go!

Round 1: The above program with 1000 vocables (instead of 10000), 10000 words of text (instead of 100000), and a generous 60 second timeout.

A bug was found in one solution (by somebody with the initials M.B.), and seven solutions timed out. This leaves 42 solutions plus 6 metasolutions. It is quite frankly a mystery to the MC how these solutions could time out, given that the naive solution took less than 1 s. These competitors should take a second look at their brainchildren.

Round 2: 1000 vocables, 100000 words, 20 seconds.

We lost 12 competitors this time. This leaves us with 30+6 (meta)solutions. We have our Top 30 already!

Round 3: 10000 vocables, 10000 words, 20 seconds.

We decupled the number of vocables but reduced the text length accordingly. Most solutions exhibited similar run-time behavior, but one timed out. Its owner is rewarded by 1 point in the table. As the Germans say: Schade, schade, Marmelade! (Ah, the MC and his Schadenfreude!)

Round 4: 5000 vocables, 50000 words, 10 seconds.

At this point, 14 solutions “fly out” (through the window, presumably). Incredibly, we still have the MC's naive solution, which took 6.57 s. This leaves us with 15+6 (meta)solutions.

Round 5: To make the setting less artificial, the MC took the first 50000 words from /usr/share/dict/web2a on his machine, removing all non-a-z characters. As the text, he simply used Data.List.subsequences "abcdefghijkl", which gives 4096 words such as abel, beg, and chi (and plenty of nonsense like aefikl and bdghk). Timeout: 60 s.

Here Naive took 8.76 s. The MC decided that he had enough and kicked it out as well as the two solutions that were slower. Notice how this protocol reveals many highly unscientific behaviors from the MC's part. This reflects a fundamental difference between genuine scientific experiments and informal competitions for €5 trophies picked up at Nanu-Nana. By the same token, this pseudo-blog (pseudo, because there's no way to enter comments at the bottom of this page) is written in a style that is, to say the least, not very scientific. It's Dr. Jekyll and Mr. Hyde all over again. Ha ha ha ha ha HA HA HA HA HA!!

We are left with 10+5 (meta)solutions.

Round 6: Like round 5, but each vocable and each word w is made four times longer by taking w ++ w ++ w ++ w. This is enough to considerably slow down some solutions, while others are doing just fine:

         0.28 s   Andreas Bergmaier
         0.45 s   Simon Roßkopf
         0.51 s   Stefan Peter Dirix
         0.62 s   Master of Competition II
         0.85 s   Maximilian Haslbeck
        ------------------------------------------------
         6.22 s   Moritz Sichert
        13.40 s   Lyubomir Stoykov
        14.67 s   Hauke Brinkop
        14.69 s   Master of Competition I
        14.96 s   Daniel Stüwe
        17.22 s   Florian Dreier
        17.73 s   Atanas Mirchev
        18.09 s   Maximilian Kirchmeier I
        19.38 s   Manuel Eberl
        31.88 s   Maximilian Kirchmeier II

We got our Top 10. Here is the solution by our official winner, Herr Roßkopf:

        --Map version
        --Awfully slow(Around 10 secs on my laptop), not sure if it works,
        --but it exists and that is all required right now.
        --Update: Compiling with O3 seems to reduce the runtime to about 1s
        --

        --The hash tests produces different results for some reason.
        --I cannot find any mistake however... Let's give it a shot

        --I build a 'prefix-tree' from the vocabs list to avoid checking all
        --possibiilities of all words each time. It is certainly not optimal 
        --as I overlooked some cases while planning this structure and worked
        --and quickfixed to insert them...

        data VocabTree = Node Bool (M.Map Char (VocabTree))
          deriving (Show)

        emptyVT = Node False M.empty

        buildVocabTree :: [String] -> VocabTree
        buildVocabTree = foldl insertWord emptyVT

        insertWord :: VocabTree -> String -> VocabTree
        insertWord (Node _ as) [] = (Node True as)
        insertWord (Node end as) (w:ws) =
          case M.lookup w as of
            Nothing -> Node end $ M.insert w (insertWord emptyVT ws) as
            Just vt -> Node end $ M.insert w (insertWord vt ws) as

        fixTypos :: [String] -> [String] -> [String]
        fixTypos vocabs text = map (flip fixWord vocabTree) text
           where
              vocabTree = buildVocabTree vocabs

        fixWord :: String -> VocabTree -> String
        fixWord ws vt = if b == 1 then a else ws
          where
            (a, b) = goTillMistake ws vt

        goTillMistake :: String -> VocabTree -> (String, Int)
        goTillMistake [] (Node True  _) = ([], 1)
        goTillMistake [] (Node False _) = ([], 0)
        goTillMistake (w:ws) (Node _ as) = (a, b)
          where
            (a, b) = if numPossibilities == 1 then head possibilities else ([], numPossibilities)
            possibilities = filter (\x -> snd x > 0) $ M.elems $ M.mapWithKey (checkNextStep) as
            numPossibilities = sum $ map snd possibilities
            checkNextStep key val
              | key == w = (w:a, b)
              | otherwise = (key:a', b')
              where
                (a, b) = goTillMistake ws val
                (a', b') = goFromFirstMistake ws val

        goFromFirstMistake :: String -> VocabTree -> (String, Int)
        goFromFirstMistake [] (Node True _) = ([], 1)
        goFromFirstMistake [] (Node False _) = ([], 0)
        goFromFirstMistake (w:ws) (Node _ as) = 
          case M.lookup w as of
            Nothing -> ([], 0) 
            Just vt -> helper w vt
          where
            helper w vt = (w:a, b)
              where        
                (a, b) = goFromFirstMistake ws vt

        --Some quickChecks
        propFixTypos_lengthOk vocabs text = length text == length (fixTypos vocabs text)
        propFixTypos_identity vocabs = fixTypos vocabs vocabs == vocabs
        propFixTypos_wordLength vocabs text = and $ zipWith (\x y -> length x == length y) text $ fixTypos vocabs text

This solution is very similar to MC II. It builds a tree storing all the vocables. The root node indicates whether the empty word is part of the dictionary. Each node has up to 26 child nodes (subtrees), one for each letter of the alphabet. For example, the tree

        Node True {
          'a' |->
            Node True {
              'n' |-> Node True {},
              's' |-> Node True {}
            },
          'b' |->
            Node False {
              'e' |-> Node True {}
            }
        }

(using the meta-notation {k1 |-> v1, ..., kN |-> vN} to denote the map that associates key k1 to value v1 etc.) represents the dictionary [a, an, as, be]. The Boolean False for the node associated with b indicates that b (unlike a) is not part of the dictionary.

The lookup function on this data structure is a bit special because it needs to take potential typos into account. This is reflected in Herr Roßkopf's code by the pair of functions goTillMistake and goFromFirstMistake. Let's be crazy and try to look up the word ae:

We can either take the a branch of the root node (and continue with goTillMistake) or accept that the a is a typo and take the b branch (and continue with goFromFirstMistake). We do both and see where this leads us. In the a branch, we proceed recursively and take the n and s branches. At this point, we've consumed the entire word and the flag is True in both leaf nodes, so an and as are part of the answer. In the b branch, we have already accounted for one typo, so we do a standard lookup of the remaining suffix (e), yielding be. Hence, conceptually, the result of the lookup is the set {an, as, be}.

Since this set is not a singleton set, the typo in ae won't be corrected.

Herr Bergmaier's, Herr Dirik's, and Herr Haslbeck's solutions are conceptially very similar. The MC II solution is a bit different in that it uses a 26-element vector instead of a map. The map makes more sense if the tree is sparsely populated, as is the case when building it with actual English words. In contrast, the vector tends to be superior for more exhaustive dictionaries, such as the permutation-based example from the problem statement.

The next group of solutions—those that tend to be about 10 times slower than the first group—are solutions that generated “patterns” or “substitution masks” for all vocables and stored them in the dictionary. For example:

        type SubsMask = String

        {- produce substition masks for the given string
           e.g. subsMasks "asdf" == ["\0sdf", "a\0df", "as\0f", "asd\0"] -}
        subsMasks :: String -> [SubsMask]
        subsMasks "" = []
        subsMasks s = map (\i -> let (a, _:b) = splitAt i s in a ++ ('\0':b)) [0 .. length s - 1]

        type SubsTable s = HashTable s SubsMask (Maybe String)

This group includes Herr Brinkop, Herr Dreier, Herr Kirchmeier (I and II), Herr Stüwe, and MC I. That Brinkop and Stüwe would land in the same group should not come as a surprise, since they form the so-called Team Nord, whatever that is meant to mean. (Reminder: Gruppenarbeit is durchaus erlaubt.)

These solutions are slower for longer words, because a word of length n gives rise to n patterns of length n, i.e. n2. This quadractic (2) component becomes very expensive as we increase n, which is exactly what the MC did by taking w ++ w ++ w ++ w. The dictionary gets more entries and the number of lookup gets larger as we lengthen the words. In contrast, the solutions from the first group are almost as fast for the quadrupled words as for the original ones.

Here's a representative implementation from that group:

        type TypoHash = Map String (Set String)

        patternsOfWord :: String -> [String]
        patternsOfWord [] = []
        patternsOfWord (c : cs) = ('?' : cs) : List.map (c :) (patternsOfWord cs)

        insertPattern :: String -> TypoHash -> String -> TypoHash
        insertPattern word hash pattern =
          Map.insertWith Set.union pattern (Set.singleton word) hash

        insertWord :: TypoHash -> String -> TypoHash
        insertWord hash word =
          List.foldl (insertPattern word) hash (patternsOfWord word)

        findPattern :: TypoHash -> String -> Set String
        findPattern hash pattern = Map.findWithDefault Set.empty pattern hash

        fixTypo :: TypoHash -> String -> String
        fixTypo hash word =
          if numWords == 1 then head (Set.elems wordSet) else word
          where patterns = patternsOfWord word
                wordSets = List.map (findPattern hash) patterns
                wordSet = Set.unions wordSets
                numWords = Set.size wordSet

        fixTypos :: [String] -> [String] -> [String]
        fixTypos vocabs = List.map (fixTypo hash)
          where hash = List.foldl insertWord Map.empty vocabs

A slight variation, implemented by Herr Eberl, Herr Mirchev, Herr Sichert, and Herr Stoykov, is to represent a pattern as a pair (String, Integer). This has the advantage that we do not have to set aside a special character (e.g., \0 or ?). To quote Herr Eberl's code:

        {-
           Returns all lists that can be obtained from the given list by deleting
           exactly one character, and the position of the element that was deleted.
        -}
        deleteOne :: [a] -> [([a], Integer)]
        deleteOne xs = zip (deleteOne' [] xs) [0..]
        where deleteOne' _ [] = []
              deleteOne' xs (y:ys) = (xs ++ ys) : deleteOne' (xs++[y]) ys

        {-
           The key insight here is that two words w₁, w₂ are quasi-identical iff they 
           can be made the same by deleting the i-th character of w₁ and w₂ for some i.
           Using this, we can now simply keep a table mapping all correct words with 
           one character deleted (tagged with the position from which it was deleted)
           to the original word and we can find all correction 
           candidates for a misspelt word by
             1. deleting the i-th character (for any i)
             2. performing a lookup in our table
        -}

Herr Sichert's solution is faster than the others in that group. He uses a couple of further tricks. Since his code is very nicely documented, it is worth quoting it in full:

        fixTypos :: [String] -> [String] -> [String]
        fixTypos vocabs = map findVocab
            where
                -- vocabMap ist ein Baum mit der Länge der Wörter als key und
                -- den dazugehörigen vocabCharsList als values:
                -- 1 -> vocabCharsList für vocabs der Länge 1
                -- 2 -> vocabCharsList für vocabs der Länge 2
                -- ...
                vocabMap = M.fromAscList $
                    map ((\ vs -> (length $ head vs, vocabCharsList vs)) . sortedNub . sort) .
                    groupBy (\ a b -> length a == length b) .
                    sortBy (comparing length) $ vocabs

                -- vocabCharsList ist eine Liste mit allen vocabCharsMaps, wobei bei jeder
                -- vocab jeder Buchstabe jeweils einmal weggelassen wird. Dies wird als key
                -- verwendet. Der weggelassene Buchstabe wird als value verwendet.
                -- Beispiel vocabCharsList für vocabs der Länge 4 mit den vocabs "abcd" und "efgh":
                -- [ vocabCharsMap von [("bcd", 'a'), ("fgh", 'e')],
                --   vocabCharsMap von [("acd", 'b'), ("egh", 'f')],
                --   vocabCharsMap von [("abd", 'c'), ("efh", 'g')],
                --   vocabCharsMap von [("abc", 'd'), ("efg", 'h')] ]
                vocabCharsList vs =
                    map vocabCharsMap [map (\ v -> (deleteIndex i v, v !! i)) vs | i <- [0 .. length (head vs) - 1]]

                -- Erstellt ein Map aus den Werten, die von vocabCharsList übergeben wurden,
                -- um effizient darauf zugreifen zu können.
                -- Da Wörter, für die es mehrere mögliche Ersetzungen gibt, nicht ersetzt werden
                -- sollen, kann man hier direkt schon alle rausschmeißen, bei denen mehrere
                -- Lösungen möglich wären. Das erledigen groupBy und filter.
                vocabCharsMap vs = M.fromDistinctAscList $
                    concat .
                    filter (\ xs -> length xs == 1) .
                    groupBy (\ a b -> fst a == fst b) .
                    sortBy (comparing fst) $ vs

                -- Sucht für ein Wort w eine mögliche Ersetzung.
                -- Wenn kein Map aus vocabMap mit der richtigen Wortlänge gefunden wird,
                -- wird sofort das Wort selbst zurückgeliefert, ansonsten wird mit der
                -- "richtigen" vocabCharsList weiter gesucht.
                findVocab w
                    | isNothing mapOfLengths = w
                    | otherwise              = fromMaybe w $ lookupWord w $ fromJust mapOfLengths
                    where
                        mapOfLengths = M.lookup (length w) vocabMap
        
                -- Erhält ein Wort w und eine vocabCharsList ms und sucht darin eine gültige
                -- Ersetzung.
                -- suitableWords enthält zunächts alle vocabs, die als Ersetzung in Frage kommen.
                -- Dazu wird von w jeder Buchstabe jeweils einmal weggenommen und damit in den
                -- dazugehörigen Maps gesucht. Wenn dabei nichts rauskommt, als suitableWords
                -- leer ist, gibt es keine Ersetzung.
                -- Dann wird überprüft ob es mehrere Ersetzungen gibt. Wenn ja, soll laut
                -- Aufgabenstellung nicht ersetzt werden.
                -- Ansonsten wird die gefunden Ersetzung verwendet. 
                lookupWord w ms
                    | null suitableWords = Nothing
                    | length replacements /= 1 = Nothing
                    | otherwise = Just $ head replacements
                    where
                        suitableWords =
                            map (\ (i, v) -> replaceIndex i (fromJust v) w) $
                            filter (isJust . snd) [(i, M.lookup (deleteIndex i w) m) | (i, m) <- zip [0..] ms]
                        replacements = filter (\v -> isValidReplacement v w) suitableWords

                -- Prüft, ob eine vocab v eine gültige Korrektur für ein Wort w ist.
                -- Es wird davon ausgegangen, dass length v == length w 
                isValidReplacement v w = sum [1 | (vc, wc) <- zip v w, vc /= wc] == 1

        -- Wie nub aus Data.List, nur geht davon aus, dass die Liste sortiert ist.
        -- Hat den Vorteil, dass die Laufzeit (inklusive Sortieren) O(n log n) ist,
        -- anstatt O(n²), wie bei Data.List.nub.
        sortedNub :: Eq a => [a] -> [a]
        sortedNub [] = []
        sortedNub (x:[]) = x:[]
        sortedNub (x:y:xs)
            | x == y    = sortedNub (x:xs)
            | otherwise = x:sortedNub (y:xs)

        -- Entfernt aus einer Liste xs den i-ten Eintrag und gibt sie dann wieder zurück.
        -- z.B. deleteIndex 1 "abc" == "ac"
        deleteIndex :: Int -> [a] -> [a]
        deleteIndex i xs = take i xs ++ drop (i+1) xs

        -- Ersetzt den i-ten Eintrag von xs mit x.
        -- z.B. replaceIndex 1 'x' "abc" == "axc"
        replaceIndex :: Int -> a -> [a] -> [a]
        replaceIndex i x xs = take i xs ++ [x] ++ drop (i+1) xs

(Strangely enough, Herr Sichert didn't feel like using the German word Vokabel in his documentation.)

Wow, this was an exciting week! The MC was a bit disappointed to see that his naive solution outperformed so many other solutions, but he was glad to see that there were so many (apparently) bug-free programs.

A week without some lessons wouldn't be a week. So here are they:

The first lesson is that It is possible to achieve quite respectable performance without fancy optimizations. This was illustrated by the MC's naive solution, which would have landed at place 11, but more convincingly by the authors of solutions revolving around patterns or substitution masks.

The second, complementary lesson is that We have to keep an eye on computational complexity. As the MC increased the length of the words, we saw the gap grow at an alarming rate between the solutions that were linear (n) in the word length and those that were quadratic (n2).

Die Ergebnisse der achten Woche

The results for week 8 are in!

Top 30 der Woche
  Platz Wettbewerber(in)   Punkte
1. Stefan Peter Dirix 30
2. Ludwig Peuckert 29
3. David Otter 28
4. Clemens Jonischkeit 27
5. Maximilian Kirchmeier 26
6. Daniel Stüwe 25
Hauke Brinkop 25
8. Atanas Mirchev 23
9. Florian Dreier 22
Simon Roßkopf 22
10. Martin Helfrich 21
11. Alexander Christian Hefele 20
12. Michael Benedikt Schwarz 19
13. Maximilian Haslbeck 18
14. Martin Hartmond 17
Lyubomir Stoykov 17
Johannes Klicpera 17
Lukas Michael Stumberg 17
Nikita Basargin 17
Stefan Hörl 17
20. Alexander Weidinger 11
Hannes Bibel 11
Christian Ziegner 11
Konstantin Bauer 11
Fabian Volz 11
Johannes Ismair 11
Michael Schreier 11

This week, we had 43 submissions which had the official {-WETT-} tags. Since the MC is away in a multi-million city you have probably never heard of – busy conferencing – and the Co-MC took some days off – when he left on Wednesday, he was mumbling something about painting his apartment –, yet another figure in this never-ending saga of the "Wettbewerb" appears: The Contra-MC. But fear not, for he shall be faithful to the ideals of the Great MC.

Of the 43 submissions, one did not typecheck, three produced wrong results for the examples given in the exercise sheet, and two failed at runtime with non-exhaustive pattern matches. The Contra-MC loves correctness and hates non-exhaustive pattern matches, hence these were removed in the zeroth round. Additionally, there were two submissions by Manuel E. M. Eberl. He also considered the Musterlösung, leaving us at 37+3 solutions.

Let's first reproduce the Musterlösung here:

        matches :: [RegEx] -> String -> Bool
        matches [] s = null s
        matches (Any : _) [] = False
        matches (Any : rs) (_ : cs) = matches rs cs
        matches (One _ : rs) [] = False
        matches (One d : rs) (c : cs) = c == d && matches rs cs
        matches (OneIn _ : _) [] = False
        matches (OneIn [] : _) _ = False
        matches (OneIn ((l, u) : lus) : rs) (c : cs) =
          if l <= c && c <= u then matches rs cs else matches (OneIn lus : rs) (c : cs)
        matches (Concat r r' : rs) cs = matches (r : r' : rs) cs
        matches (Alt r r' : rs) cs = matches (r : rs) cs || matches (r' : rs) cs
        matches (Repeat _ : _) [] = False
        matches (Repeat r : rs) cs = matches (r : rs) cs || matches (r : Repeat r : rs) cs

        match :: RegEx -> String -> Bool
        match r = matches [r]

The basic idea is that the first argument in matches carries a list of still-to-be-matched regular expressions. Surprisingly, this performs pretty well.

Alright, let's start the actual measurements. In round 1, the Contra-MC noticed that there were still some correctness issues left. Nine more solutions did wrong matches on a regular expression which was supposed to be a test for performance, and two threw runtime errors.

To be fair, one of the runtime errors was introduced by the Contra-MC himself, because one contestant wrote code like if x then y without an else branch, so he just added else undefined in the hope that This Will Never Be Executed™. The other offenders with runtime errors have no excuse, though. Their names and addresses have been given to the Supreme Court for Totality Enforcement where they will face justice. Did the Contra-MC already mention that he hates partiality?

This leaves us with 26+3 solutions, hence there will just be a Top 26 this week.

Anyway, what is this mysterious regular expression which is supposed to do the performance testing? Ideally, it should force the naive implementations to backtrack a lot on a suitably contrived input. The trick is to construct a regular expression with lots of alternatives, where the correct choice is only determined at the very end of the input. Hence, we obtain (a|aa|…|an)an, and test it against the words an+1, a2n, a2n+1 and a2nb, where the former two match, and the latter two don't. The Contra-MC did these measurements with increasing n, starting with 30 and going up to 300.

In the first round, programs were run with a time limit. 14 submissions were ruled out. After that, the Contra-MC varied the associativity of the concatenation and alternative operators in the regular expression. Two more solutions were lost in this process, leaving us at 10+3 solutions. Note that the quick-and-dirty solution, written by the MC himself, is still doing well here.

Let's have a look at a ranking of the remaining contestants at that point:

       < 0.01 s   Ludwig Peuckert
       < 0.01 s   David Otter
       < 0.01 s   Daniel Stüwe
       < 0.01 s   Maximilian Kirchmeier
       < 0.01 s   Hauke Brinkop
       < 0.01 s   Clemens Jonischkeit
       < 0.01 s   Stefan Peter Dirix
       < 0.01 s   MC
         0.11 s   Atanas Mirchev
         0.14 s   Manuel Eberl II
         0.56 s   Florian Dreier
         4.03 s   Martin Helfrich
         4.37 s   Manuel Eberl I

All right. At this point it makes sense to peek into the implementations to see what different ideas are being used. Some of the sources are very disturbing, since the authors clearly exhibit severe symptoms of the Wettbewerbssyndrom:

        -- Dear you, who reads this, i can bearly sleep,
        -- the last weeks had been really tough. Because of my strong addiction to Haschkell
        -- i tried cold detoxification. My Hands are trembling, i can't take it any more.
        -- I need to get my fix. I AM BACK.

The Contra-MC isn't sure what Haschkell is supposed to be, but guesses that it must be a schwäbische Version of Haskell, with keywords like inschtanz and wenndannsonscht. In any case, should you also experience Haskell addiction: the Haskell Anonymous support group in Munich gathers monthly.

Anyway, the Contra-MC was able to group the submissions into four meta-solutions:

The implementations for the nondeterministic approach are pretty short, so let's have a look at one of them.

        match :: RegEx -> String -> Bool
        match regEx string = ([], True) `elem` (match' regEx string)
        
        match' :: RegEx -> String -> [(String, Bool)]
        match' Any (_:xs) = [(xs, True)]
        match' (One c) (x:xs) = [(xs, c == x)]
        match' (OneIn cs) (x:xs) = [(xs, any (\ (a, b) -> a<=x "" x<=b) cs)]
        match' (Concat a b) (x:xs) | null bereinigt = []
                       | otherwise = concatMap (match' b . fst) bereinigt
                        where bereinigt = filter snd $ match' a (x:xs)
        match' (Alt a b) xs = filter snd $ match' a xs ++ match' b xs
        match' (Repeat r) (x:xs) | null rep = []
                     | otherwise = rep ++ concatMap (match' (Repeat r) . fst) rep 
                        where rep = filter snd $ match' r (x:xs)
        match' _ _ = []

As we can see in the One case, it returns a singleton list which compares the current character x with the desired character c, and additionally returns the remainder of the input string. A call to match' hence produces a list of tuples (rem, matches), and we check whether this list contains a match with an empty remainder. This solution could nicely be adapted to a more liberal definition of "match" where a string matches iff a substring strictly matches.

Here's another notable snippet from a contestant:

        --Ueberprueft, ob eine Zahl in einem Bereich liegt (0 = Unendlich)
        isInBounds :: Int -> (Int,Int) -> Bool
        isInBounds x (min,max) = x >= min "" (max == 0 || x <= max)

The Contra-MC experiences physical pain watching someone assign special meaning to a particular integer value. Remember Java? (If you don't: It's an obscure niche language from the 90s. Its creation has made a lot of people very angry and been widely regarded as a bad move.) The compareTo method returns an int, and the first 2147483648 values mean "less than", the next one means "equal", and the remaining 2147483647 values mean "greater than". Don't do that. Haskell has special data types for these and other purposes. Listen to the BZgA and practice safer programming.

Before continuing with examining more source code, the Contra-MC decided to crank up the difficulty even more by using the regular expression (a|aa|…|an)an(a|aa|…|an).

Here's the ranking for n = 200:

         0.39 s   MC
         0.74 s   Stefan Peter Dirix
         1.17 s   Ludwig Peuckert
         1.67 s   Clemens Jonischkeit
         1.96 s   David Otter
         9.98 s   Maximilian Kirchmeier
      --------------------------------------
      > 10.00 s   Hauke Brinkop
      > 10.00 s   Daniel Stüwe

Unfortunately, our Team Fischkopp Nord has been kicked out of the leading group. Here are some excerpts from their code:

        -- RegEx ismorph NFA isomorph Parallele VM
        -- RegEx in VM-Code umwandeln
        -- VM-Code interpretieren!
        -- Der Trick dabei: Nicht jeden möglichen Ausführungspfad ausführen, 
        -- sondern nur die jeweiligen Positionen im Code merken 
        -- und ggf. zusammenführen.

        -- Anzahl der VM-'Threads' == Anzahl der Code-Zeilen
        -- Anzahl der Code-Zeilen  <  Anzahl Ausführungspfad im Code
        -- Also eig. Ken Thompsons Algorithmus ursprüngliche Implementierung.

        data VMInstruction = 
            IChar Char
          | IRange (Set.Set (Char , Char))
          | IMatch
          | IJmp Int            -- Springe an Adresse
          | ISplit Int Int      -- Teile Programmpfad in zwei Teile auf!
          | IAny
          deriving (Show)
         
        type Codeline = Int
        type Code = [(Codeline, VMInstruction)]
        type Instructions = Vec.Vector VMInstruction
        type ProgramCounter = Int

        convert :: RegEx -> Instructions
        convert regEx = Vec.fromList (map snd code) `Vec.snoc` IMatch
            where code = regExToVMInstruction regEx 0
            
        regExToVMInstruction :: RegEx -> Codeline -> Code
        regExToVMInstruction regExTop codeline = case regExTop of
            Any -> [(codeline, IAny)]
            One c -> [(codeline, IChar c)]
            OneIn inters ->
                [(codeline, IRange $ Set.fromDistinctAscList $ wellform inters)]
            Concat regEx regEx' -> let
                        code = regExToVMInstruction regEx codeline
                        codeline' = codeline + length code
                        code' = regExToVMInstruction regEx' codeline'
                    in code ++ code'
            Alt regEx regEx' -> let
                        newCodeline = succ codeline
                        code = regExToVMInstruction regEx newCodeline
                        codeline' = 2 + codeline + length code
                        code' = regExToVMInstruction regEx' codeline'
                        codeline'' = codeline' + length code'
                    in (codeline, ISplit newCodeline codeline') : code ++
                            ((pred codeline', IJmp codeline'') : code')
            Repeat regEx -> let
                        code = regExToVMInstruction regEx codeline
                        codeline' = codeline + length code
                    in  code ++ [(codeline', ISplit codeline (succ codeline'))]

The Contra-MC acknowledges the clever idea underlying this implementation, but cannot fathom why it performs so badly.

The other solution which took very long used the ST monad, which in turn was a key performance ingredient in M. Eberl's winning submission for the fifth week. Incidentally, one of his submissions for this week also used ST, but it didn't performed very well either. The Contra-MC concludes that not all advanced Haskell constructs are worth it.

This leaves us with 4+1 solutions standing. We're getting close to the final evaluation.

The Contra-MC was not satisfied with the regular expression he used (which had been contributed by the CoMC Traytel). He decided to concoct a regular expression so sophisticated that advanced number theory is needed to explain the behaviour (left as an exercise to the reader). He humbly claims to have succeeded, and presents you this evil little monster: (a53|a59)+(b3|b5)+ (the b's are just there for added fun-ness). Note that all occuring numbers are prime numbers. The Contra-MC ran the programs against the words a1000b12 (no match) and a1132b12 (match). The non-matching words are far more interesting here, because they (statistically) require a significantly higher amount of backtracking. All remaining contestants were able to produce the correct results in less than ten seconds.

Still not satisfied, he ran again with somewhere in the range of around 1030 a's, which provoked stack space overflows or timeouts from all contestants. Especially the nondeterminism-based implementations were susceptible of space leaks. Not wanting to unduly discriminate against these solutions, he settled for a smaller number, judging by the execution time.

Finally, he arrived at this ranking:

         4.66 s   MC
         4.71 s   Stefan Peter Dirix
         6.32 s   Ludwig Peuckert
         8.64 s   David Otter
         9.18 s   Clemens Jonischkeit

And this concludes this week's evaluation. What did we learn? Firstly, compile with -W. It'll tell you when you use non-exhaustive pattern matches. Secondly, ST is no magic smoke making all Haskell programs perform well. Lastly, make sure you don't run out of memory!

Update: One contestant raised a good point via mail. Indeed, the choice of regular expressions does influence the ranking. Regular expressions of the form (a|aa|…|an)an actually provoke worst-case performance of automata-based constructions. For completeness' sake, here are the results of running the coin problem regular expression on the solutions which timed out earlier:

       < 0.01 s   Michael Benedikt Schwarz
       < 0.01 s   Eberl I/II
         0.01 s   Daniel Stüwe
         0.01 s   Hauke Brinkop
         0.02 s   Florian Dreier
         0.02 s   Maximilian Kirchmeier
         0.10 s   Simon Roßkopf
         0.21 s   Lyubomir Stoykov
         0.34 s   Maximilian Haslbeck
         0.42 s   Alexander Weidinger

The Contra-MC acknowledges these solutions. They (conceptually) excel on this kind of regular expression.

Furthermore, the Contra-MC forgot to test one submission. It was not his fault, though – the contestant forgot to put the {-WETT-} tags into his file. The leader board has been updated accordingly and as always, no points have been deducted.

Die Ergebnisse der neunten Woche

The “Great” MC is back with a vengeance the results of the ninth week:

Top 30 der Woche
  Platz Wettbewerber(in)   Punkte
1. Atanas Mirchev 30
2. Maximilian Kirchmeier 29
3. Maximilian Haslbeck 28
4. Hauke Brinkop 27
Daniel Stüwe 27
6. Alexander Weidinger 25
7. Michael Schreier 24
Simon Roßkopf 24
Christian Ziegner 24
9. Alexander Christian Hefele 22
Alexandra Fritzen 22
Anna Kropp 22
Hannes Bibel 22
David Otter 22
Frédéric Simonis 22
Natalie Reppekus 22
Mathias Staudigl 22
Ludwig Peuckert 22
Barbara Moser 22
Florian Dreier 22
Sebastian Zeitel 22
Johannes Ismair 22
Kevin Yu 22
Lukas Holzner 22
Marco Probst 22
Martin Mihaylov 22
Moritz Becher 22
Moritz Sichert 22
Nikita Basargin 22
Thomas Engel 22
Thomas Pettinger 22
Valentin Zieglmeier 22
Zhechko Zhechev 22

The criterion is efficiency. Like in previous weeks, the MC is sticking to a round-based approach, ruling out at each round solutions that fall below a threshold. We saw last week that this is dangerous, because this depends very much on the choice of benchmarks. However, when the proper care is taken for selecting the examples, this works well. For the regular expression problem of last week, it would have been better to conjure a diverse benchmark suite and measure the speed of each solution on each benchmark, then compute scores. That's because there are so many ways to implement regular expressions, with various trade-offs.

The above comments shouldn't be construed as a critique of the Contra-MC's work. Quite the opposite. The MC is extremely pleased with the work of his Co(ntra)-MCs. He even found himself rolling on the floor of his hotel room laughing while reading the comments about inschtanz and wenndannsonscht. He has himself a couple of friends from Schwoabia and remembers how, after a few cocktails, one of them repeatedly schtumbled trying to pronounce “Hauschschlüssel”. But even the Contra-MC, who clearly enjoys taking cheap shots at others, is not perfect: He has a slight tendency to lisp when he speaks English, which becomes very interesting when the talks about the Lisp programming language. And for all his humor about Team Nord/Fischkopf, he's from the East/Ossie. ;) (Being a French Canadian like the MC is great, because it means you're at the top of the food chain. Or maybe not. [Disclaimer: The linked page has yellow text on a white background.])

It feels strange for the MC to read this “weblog” as opposted to be writing it. For those who watch the Daily Show with Jon Stewart, the feeling is similar to when Stewart had to watch John Oliver host his own show. His comment is politically incorrect enough that the MC will have to take out his <non-PC> tags:

<non-PC>
I don't watch it all the time because it's too weird. It's like watching someone have sex with your wife's desk.
</non-PC>

All right, let's take the invitation offered by the closing </non-PC> tag to move on to more serious stuff. For the quasi-majority element problem, there are basically two classes of solutions: the quadratic ones (O(n2)) and the linear ones (O(n)). The quote from Boyer & Moore at the beginning of the Aufgabentext was a “subtle” hint to the existence of a linear solution to the standard majority element (Mehrheitselement) problem, from which it is not too hard to derive a solution to the quasi version of the problem.

First, some stats: 78 solutions were submitted, among which 22 didn't pass the QuickCheck tests. This leaves 56 solutions, plus a solution by Manuel Eberl, plus a naive solution by the MC (in Sol_Exercise_9.hs), plus a very naive solution by the MC (like the naive one but without the nub), plus three real solutions by the MC (!). The various MC solutions help him keep the Überblick over the relative speed of the competitors' entries. The timeout is always 10 seconds.

Round 1: One hundred thousand times the same element:
Data.List.replicate 100000 1

10 solutions took over 10 seconds. The other solutions, including the very naive one, took less than 1 second. This leaves 46+6 entries.

Round 2: Six million times one element followed by six million times another element:
Data.List.replicate 6000000 1 ++ Data.List.replicate 6000000 2

This is enough to kick out 12+1 solution, the +1 being the very naive one as expected. This leaves 34+5.

Round 3: Like the above, but the two elements are interwoven:
Data.List.concat (replicate 2000000 [2, 1, 2, 2, 1, 1])

We lost two more competitors and are left with 32+5.

Round 4: 10000 elements repeated 5 times:
map (`mod` 10000) [(0::Int) .. 50000]

This is where we lose all the quadratic solutions. They just cannot cope with the high number of distinct elements. After this round, only 8+4 solutions survive.

Round 5: Twenty million distinct elements:
[1 .. 20000000]

There's quite some variation among the linear solutions. One solution even causes a stack overflow (denoted SOFL, “sitting on the floor laughing”):

         0.85 s   Maximilian Haslbeck
         1.01 s   Alexander Weidinger
         1.02 s   Maximilian Kirchmeier
         1.04 s   Master of Competition II
         1.26 s   Manuel Eberl
         1.37 s   Master of Competition III
         1.65 s   Maximilian Haslbeck
         1.80 s   Master of Competition I
         1.92 s   Daniel Stüwe
         2.07 s   Hauke Brinkop
         2.13 s   Atanas Mirchev
        ------------------------------------------------
         5.76 s   Simon Roßkopf
         SOFL     Michael Schreier

We had to draw the line somewhere, and the factor 2.7 gap between Herr Mirchev and Herr Roßkopf seemed like an appropriate place. For ten million instead of twenty million elements, the solution by Herr Schreier took over 5 seconds, so we can extrapolate that it would have been over the timeout had it not totally SOFL'd out beforehard.

Round 6: Twenty million times the same element:
Data.List.replicate 20000000 123456

This is reminescent of round 1.

         0.51 s   Maximilian Kirchmeier
         0.69 s   Master of Competition III
         0.76 s   Atanas Mirchev
         0.86 s   Daniel Stüwe
         0.91 s   Hauke Brinkop
         1.00 s   Maximilian Haslbeck
         1.33 s   Manuel Eberl
         2.12 s   Master of Competition I
         2.57 s   Master of Competition II
        ------------------------------------------------
         3.02 s   Alexander Weidinger

Before we say farewell to the last competitor, let's quote the nice comment in his code:

        {-
        This code is mainly based on the „Majority Vote Algorithm“ by Boyer and Moore
        with a little difference, to match the „quasiMajority“ criteria.
        Instead of just using the algorithm I had to differentiate between two types of Lists.
                1. The input list has an odd number of elements:
                   In that way if the majority algorithm finds a majority,
                   it's the only possible quasiMajority and we're done
                2. The input list has an even number of elments:
                   In that way, the majority algorithm could find no quasiMajority
                   - of course, because it only works for majoritys (> 50%)
                   and not quasimajoritys (>=50%) -
                   or the algorithm finds a majority, but it's not the only majority,
                   because it's a quasiMajority.

        To solve the problem with 2. i came up with an (easy) solution,
        which i longly discussed with my father ( thanks dad ;) ).
        The idea in that case is, to split the first element from the list
        and use the majority algorithm on the shorten list,
        with now an odd number of elements.
          So, no matter what the algorithm finds,
        we have to check if the splitted element is also a quasiMajority of the list.
        Only if the majority algorithm finds the same element as the splitted element
        we can stop with just one majority element.
        -}

Let's all say “Bye bye” to Herr Weidinger and his dad!

Round 7: Half of the elements are the same value, the other half is made of unique elements, and the two halves are interwoven: Data.List.concatMap (\x -> [1234567890123456789 + x, 1234567890123456789]) [1 .. 5000000]

To make matters worse, the values chosen are big integers, to penalize the solutions that perform needlessly many comparisons on the element type. This gives us our top 5:

         1.45 s   Atanas Mirchev
         1.48 s   Master of Competition II
         1.58 s   Maximilian Kirchmeier
         1.65 s   Master of Competition III
         1.79 s   Manuel Eberl
         2.02 s   Maximilian Haslbeck
         2.15 s   Master of Competition I
         3.09 s   Hauke Brinkop
         3.89 s   Daniel Stüwe

Interestingly, the two members of Team Fischkopf submitted once again slightly diverging solutions, resulting in some timing differences. So we're counting them as two separate entities this week (Herr Fisch and Herr Kopf?). (It might seem like the MC is recycling the Contra-MC's humor, but it was actually his idea. Actually, it wasn't really his idea but his special lady friend's. For the record, she's from Konschtanz. Then the MC opened his big mouth and the Contra-MC profited from his absence to steal the idea.)

Update: Herr Fisch wrote to complain about the separation of the Fisch and the Kopf, so the MC in his endless magnanimity has decided to give Kopf the same number of points as Fisch, even though the explanation as to whether they had diverging solutions was a bit “fishy”. In the interest of academic transparency, here's what Fisch wrote:

you mentioned in the text that me and Daniel would have slightly different solutions. This is not correct, cause i sent it to him after i pressed "save" and we just had a look: what i sent was the same code as he gab ab ( ;-) ). But im working directly on the server, and it seems the server was down that time so it didnt save the changes. So it would be fair if he would get the same amount of points as i did.

Actually im running gedit on a remote file system (ssh/sftp), mounted with nautilus. So im editing in a local editor, and when i press "save" it sometimes need to remount the remote file system. The server was down a few times, it e.g. freezed my remote shell (and also those of other guys) and i couldn't reconnect (for about 5 minutes). I guess it was just bad timing, That wouldnt be fair to Daniel, because he is aiming for the total win :D

(End of Update)

One competitor found it a pity that the signature of the function didn't have Ord a => as an additional type class constraint. Indeed, if the MC could travel back in time, he would change that, because then we'd have not only O(n) and O(n2) solutions but also O(n log n). Likely, it should be possible to use unsafeCoerce and/or some other type system trick to determine if the concrete instance of a supports Ord and use appropriate code if it does; but the MC's Haskell mojo isn't quite up to the task!

The MC would also like to formally apologize to Herr Eberl. For weeks, compiling his solutions with ghc -O3 might have produced the message

        The function `main' is not defined in module `Main'

which the MC failed to notice. Compilation failed as a result, and Herr Eberl's solutions had an unfair disadvantage over the other solutions in that it was interpreted instead of compiled. This having been said (genauer: written), Herr Eberl is not entirely blameless, since he provided the MC with a file that didn't have the module Exercise_9 where heading, which caused the error in the first place.

Herr Eberl's solution is probably the one that best combines conciseness and efficiency:

        quasiMajorityElems :: Eq a => [a] -> [a]
        quasiMajorityElems xs = [x | x <- candidates (1::Integer) xs, occs x * 2 >= n]
          where n = genericLength xs
                occs x = genericLength (filter (==x) xs)
                candidates 0 (x:xs) = candidates 1 xs
                candidates 1 [x,y] = if x == y then [x] else [x,y]
                candidates k (x:y:xs) = candidates (if x == y then k+1 else k-1) (x:xs)
                candidates _ xs = xs

He also went into the extra trouble of formally proving that his solution is correct. He did his proof using Isabelle/HOL, a theorem prover developed at StuttgartLehrstuhl 21. Here it is for the record:

        theory Wett9
        imports Main "~~/src/HOL/Library/Efficient_Nat"
        begin

        (* Number of occurrences of an element in a list *)

        definition occs :: "'a => 'a list => nat" where
        "occs x xs = length (filter (op = x) xs)"

        (* Definition of the (nonstrict) majority winners of a list *)

        definition isQuasiMajorityWinner :: "'a => 'a list => bool" where
        "isQuasiMajorityWinner x xs <-> (x ∈ set xs & occs x xs * 2 >= length xs)"

        definition quasiMajorityWinners :: "'a list => 'a set" where
        "quasiMajorityWinners xs = {x. isQuasiMajorityWinner x xs}"

        lemma isQuasiMajorityWinner_imp_in_list[dest]:
          "isQuasiMajorityWinner x xs ==> x ∈ set xs" 
              unfolding isQuasiMajorityWinner_def by simp

        (* Some lemmas about occs *)

        lemma occs_replicate_same[simp]: "occs x (replicate k x) = k"
          by (induction k, simp_all add: occs_def)

        lemma occs_replicate_different[simp]: "x ≠ y ==> occs x (replicate k y) = 0"
          by (induction k, simp_all add: occs_def)

        lemma occs_append[simp]: "occs x (xs @ ys) = occs x xs + occs x ys"
          by (induction xs, simp_all add: occs_def)

        lemma occs_Cons[simp]: 
          "occs y (x#xs) = (if x = y then occs y xs + 1 else occs y xs)"
          by (induction xs, simp_all add: occs_def)

        lemma occs_in_set[simp]:
          "x ∈ set xs ==> occs x xs > 0"
          by (induction xs, auto simp: occs_def)

        lemma occs_notin_set[simp]:
          "x ∉ set xs ==> occs x xs = 0"
          by (induction xs, auto simp: occs_def)

        (* Narrows the candidates for majority winners of a list down to <= 2 *)

        fun quasiMajorityCandidates :: "nat => 'a list => 'a list" where
        "quasiMajorityCandidates 0 (x#xs) = quasiMajorityCandidates 1 xs" |
        "quasiMajorityCandidates _ [] = []" |
        "quasiMajorityCandidates _ [x] = [x]" |
        "quasiMajorityCandidates (Suc 0) [x,y] = (if x = y then [x] else [x,y])" |
        "quasiMajorityCandidates k (x#y#xs) = 
             quasiMajorityCandidates (if x = y then k + 1 else k - 1) (x#xs)"

        (* Correctness and distinctness of majorityCandidates *)

        lemma quasiMajorityCandidates_correct:
          assumes "xs ≠ []"
          assumes "isQuasiMajorityWinner y (replicate k (hd xs) @ tl xs)" 
          shows "y ∈ set (quasiMajorityCandidates k xs)"
        using assms
        proof (induction k xs rule: quasiMajorityCandidates.induct)
          case (goal5 k' x1 x2 xs)
            show ?case
            proof (cases "x1 = x2")
              case False
                hence "isQuasiMajorityWinner y ((replicate (Suc k')) x1 @ xs)" using goal5
                  by (rule_tac ccontr, simp add: isQuasiMajorityWinner_def split: split_if_asm)
                with False and goal5(1) show ?thesis by simp
            qed (insert goal5(1,3), simp add: replicate_app_Cons_same)
        next
          case (goal6 k' x1 x2 x3 xs)
            show ?case
            proof (cases "x1 = x2")
              case False
                hence "isQuasiMajorityWinner y ((replicate k') x1 @ x3 # xs)" using goal6
                  by (rule_tac ccontr, simp add: isQuasiMajorityWinner_def split: split_if_asm)
                with False and goal6(1) show ?thesis by simp
            qed (insert goal6(1,3), simp add: replicate_app_Cons_same)
        qed (force simp: isQuasiMajorityWinner_def split: split_if_asm)+

        lemma quasiMajorityCandidates_distinct: "distinct (quasiMajorityCandidates k xs)"
        by (induction k xs rule: quasiMajorityCandidates.induct) simp_all

        (* For nicer generated code (i.e. not wrapped in "Nat" type,
           define version with int instead of nat *)

        fun quasiMajorityCandidates_int :: "int => 'a list => 'a list" where
        "quasiMajorityCandidates_int _ [] = []" |
        "quasiMajorityCandidates_int k [x] = (if k = 0 then [] else [x])" |
        "quasiMajorityCandidates_int k (x#y#xs) = 
          (if k = 0 then quasiMajorityCandidates_int 1 (y#xs)
          else (if k = 1 & xs = [] then (if x = y then [x] else [x,y])
                else quasiMajorityCandidates_int (if x = y then k + 1 else k - 1) (x#xs)))"

        lemma quasiMajorityCandidates_int_correct:
          "quasiMajorityCandidates_int (int k) xs = quasiMajorityCandidates k xs"
        by (induction k xs rule: quasiMajorityCandidates.induct) (case_tac xs, auto)

        (* Definition of the function that returns the list of majority winners *)

        definition quasiMajorityWinnersList where
        "quasiMajorityWinnersList xs == (let n = length xs in
            filter (λx. occs x xs * 2 >= n) (quasiMajorityCandidates_int (int 1) xs))"

        lemma quasiMajorityWinnersList_correct:
          "set (quasiMajorityWinnersList xs) = quasiMajorityWinners xs"
        proof (cases "xs = []")
          assume "xs = []"
          thus ?thesis unfolding quasiMajorityWinnersList_def 
              quasiMajorityWinners_def isQuasiMajorityWinner_def by simp
        next
          assume "xs ≠ []"
          show ?thesis unfolding quasiMajorityWinnersList_def quasiMajorityWinners_def
            using quasiMajorityCandidates_int_correct[of "Suc 0" xs]
            apply (intro equalityI subsetI)
            apply (rule ccontr, simp add: isQuasiMajorityWinner_def `xs ≠ []`)
            apply (auto simp: `xs ≠ []` isQuasiMajorityWinner_def 
                      intro!: quasiMajorityCandidates_correct)
            done
        qed

        lemma quasiMajorityWinnersList_distinct: 
          "distinct (quasiMajorityWinnersList xs)"
              unfolding quasiMajorityWinnersList_def Let_def
              by (subst quasiMajorityCandidates_int_correct, 
                  intro distinct_filter quasiMajorityCandidates_distinct)

        lemmas [code] = quasiMajorityWinnersList_correct[symmetric]

        (* Code export *)

        export_code quasiMajorityWinnersList in Haskell file "/home/manuel/haskell/wett/wett9"

        end

The MC is almost done babbling and rambling, but he would like to mention one news item he saw this week. No, it's not about the GroKo, although the news did reach him even behind the Chinese firewall (no Facebook though). Rather, it was an article called Exponential algorithm making Windows XP miserable could be fixed that drew the MC's attention. It turns out that Windows XP, a version of Windows that is still very popular in some parts of the world (like China), suffers from a truly debilitating bug. Here are three extracts from the article:

Each time you boot your Windows XP machine, it slows to a crawl. There's a built-in process, svchost.exe, chewing up the entire processor, sometimes for an hour or more at a time. Wait long enough after booting and the machine will eventually return to normalcy. But an hour can be a long time to wait.

Machines using Windows Update retrieve patch information from Microsoft's servers. That patch information contains information about each patch: what software it applies to (for example, systems that have been upgraded to Internet Explorer 7 or 8 don't need Internet Explorer 6 patches), what knowledge base article it relates to, and, critically, what historic patch or patches the current patch supersedes.

Unfortunately, the Windows Update client components used an algorithm with exponential scaling when processing these lists. Each additional superseded patch would double the time taken to process the list. With the operating system now very old, those lists have grown long, sometimes to 40 or more items. On a new machine, that processing appeared to be almost instantaneous. It is now very slow.

But the best part of the article is that Microsoft have, in their own words, “all the right (and smartest) people” working on it. One can only wonder what would happen if they had all the right people working on it, but the right people weren't smart, or if they had the wrong smart people (like Dr. Frankenstein), or, even worse (but more likely perhaps), if they had the wrong dumb people working on it. Hm, one really does wonder who actually is working on it (assuming somebody is)!

Pfui, this was a long week! The main lesson is perhaps to Pay attention; seemingly irrelevant information, like the Boyer & Moore quote about delegates kocking down each other with their placards, can be highly relevant to knock down the competition. By the way, there's a quote in the Wettbewerbsaufgabe for Week 10 as well.

Update (Updated): Christian Ziegner had submitted a wrong solution that didn't pass the QuickCheck tests. He also had an alternative solution that was commented out and put it back in at 15:45 on 17 December. The MC tested this solution and found out that it would have been kicked out in round 5. Since we had wrongly written 18 December as the Abgabetermin (something only a handful of contestants seem to have noticed, thankfully), he now got all his points.

Important message: We checked the submission directories and found no other discrepancy between the submissions on December 17 and 18. However, the results of the contest were already announced a couple of hours before the contest officially closed. In the unlikely event that you have been somehow affected by this mess-up, please contact the MC and he'll see what he can do.

Die Ergebnisse der zehnten Woche

The results are in!

Top 30 der Woche
  Platz Wettbewerber(in) Score   Punkte
1. Atanas Mirchev 0.145 30
Lyubomir Stoykov 0.145 30
3. Maximilian Haslbeck 0.152 28
4. Alexander Weidinger 0.198 27
Daniel Stüwe 0.198 27
Hauke Brinkop 0.198 27
7. Julia Kindelsberger 3.41 24
8. Nikita Basargin 3.73 23
9. Alexandra Fritzen 3.75 22
10. Alexander Güssow 3.98 21
Christian Ziegner 3.98 21
Johannes Ismair 3.98 21
Simon Roßkopf 3.98 21
Thomas Engel 3.98 21
Thomas Pettinger 3.98 21
Thomas Zwickl 3.98 21
17. Mihai Dutescu 6.64 14
18. Alexander Christian Hefele N/A 5
Stefan Peter Dirix N/A5
Florian Dreier N/A5
21. Maximilian Kirchmeier N/A2.5

As usual, let us start with some stats. Three solutions had failed the official QuickCheck suite. For these, kein Pardon!

In addition, four solutions that had either succeeded on the QuickCheck server or that had timed out failed on additional tests. These get 5 points. (The MC can send the failing inputs to the concerned parties. Just send him an email.) One of the solutions even had the wrong signature, with Int instead of Integer. The MC generously repaired it, but it still failed the tests horribly. Giving 5 points to a solution that does not even compile would be a travesty, so this competitor gets 2.5 points. He himself probably won't be too surprised, since he wrote the following in his code (wie recht er hatte...):

        -- this is a grim day for democracy, I know... but congresses and snowboarding
        -- vacations (which I sacrificed part of for this... construct) ate all my time

The MC has no pity! He sacrificed vacation to prepare the Othello-Aufgabe and now he's sacrificing the Pub Quiz to write this pseudo-weblog entry.

Anyway, this leaves us with 17 real contestants. To rank them, the MC and the Contra-MC joined forces and created a database of 66 randomly generated inputs with anything between 1 and 3000 cities. For each input and each solution, they computed the variance (σ) of the sizes of the Wahlkreise, and divided it by the sum of the variances of all solutions. Then they took the sum over all 66 inputs to obtain a numeric score (displayed in the third column of the Top 30 table above), from which a rank was computed.

The solutions neatly aggregated into clusters of people who implemented the same algorithm. Team Balkan (score: 0.145), consisting of Herren Mirchev and Stoykov, have a nice 250 LOC (lines of code) solution. Unfortunately it is not very well documented; their solution is all Greek (or Bulgarian) to the MC. Herr Stoykov even had a solution to spare:

        {-JUST IN CASE THE FIRST SOLUTION IS NOT BUG FREE, HERE IS ANOTHER ONE: 
        [211 lines are omitted here]
        -}

Update: Herr Stoykov has now explained the solution in an email to the MC:

Hi MC,

just not to leave a bad (balkan) impression - we don't tend to write SOO bad code!!!:D Our solution was a last minute ride (as always). The main idea was at every recursive step to find which alternative is more favorable - "vertical splitting" or "horizontal splitting" line. Of course, we didn't have the time to fully implement the solution, therefore we had to always split horizontally;P We barely managed to submit at ~15.20;-)

All best,

LS

(End of Update)

Then comes Herr Haslbeck (score: 0.152) and his very nicely documented solution:

        ---------------
        --  Naja, das hätte besser sein können
        --  Im Groben sortiert die Funktion alle Städte und Eckpunkte (Eckpunkte sind
        --  dann Städte mit 0 Einwohnern, so entstehen hübschere Polygone), bündelt
        --  dann die Städte immer bis zu einer gewissen Einwohnerzahl (in unserem Fall
        --  (Bevölkerung / Anzahl der Teile)) und erstellt so eine Liste.
        --  Die Implementierung findet oft nicht die optimale Lösung und ist
        --  ziemliches Flickwerk, da ich in den Ferien immer wieder daran
        --  rumgewurschtelt habe.
        ---------------
        split :: District -> Integer -> [Polygon]
        split (District ps cs) n = bundleCities cityAndPoin aver n
          where cityAndPoin = sortCities $ cs ++ polygonToCities ps
                aver = div (population cs) n

        ---------------
        -- arbeitet die Städte der Reihe nach ab und erstellt ein Polygon
        -- wenn gewisse Grenze überschritten.
        -- Besteht ein Polygon nur aus ein oder zwei
        -- Punkten, wird noch ein "Wettbewerbspolynom" darausgemacht.
        -- Sind nur noch drei Städte zum verarbeiten übrig, zerhackt
        -- er das Dreieck einfach so oft bis wir die gewünschte Anzahl
        -- an Polygonen haben. (Naja, alles nicht so schön)
        ---------------
        bundleCities :: [City] -> Integer -> Integer -> [Polygon]
        bundleCities ((City _ p popu):cs) aver rest
          | rest == 1 = [convexHull $ p:(citiesToPoints cs)]
          | lenRestCities <= 3 = splitTriangleMult (p:(citiesToPoints cs)) rest
          | otherwise = partDist' : bundleCities restCs aver (rest - 1)
            where
             partDist = convexHull $ pointSet
             lenDist = genericLength partDist
             partDist' | lenDist == 1 = cheatOnePoint (head partDist) (head $ citiesToPoints restCs)
                       | lenDist == 2 = cheatTwoPoints partDist
        .       | otherwise = partDist
             pointSet = (p:(citiesToPoints $ formPolygon cs aver popu))
             lenPS = genericLength $ pointSet
             lenRestCities = 1 + genericLength cs
             restCs = drop (lenPS - 1) cs
        bundleCities _ _ _ = []

        -- nimmt soviel Städte vom Beginn der Liste weg,
        -- bis Durchschnitt erreicht
        formPolygon :: [City] -> Integer -> Integer -> [City]
        formPolygon [_,_,_] _ _ = []
        formPolygon (c:cs) aver popu
          | newPopu < aver = c:(formPolygon cs aver newPopu)
          | abs (popu - aver) < abs (newPopu - aver) = []
          | otherwise = [c]
            where cityPopu (City _ _ i) = i
                  newPopu = popu + cityPopu c
        formPolygon _ _ _ = []

        -- macht aus einem Dreieck zwei Dreiecke
        splitTriangle :: Polygon -> [Polygon]
        splitTriangle [p1,p2,p3] = [[p1,midpoint,p3],[p2,midpoint,p3]]
          where (x1,y1) = p1
                (x2,y2) = p2
                midpoint = ((x1+x2)/2,(y1+y2)/2)
        splitTriangle _ = []

        -- macht aus einem Dreieck n Dreiecke
        splitTriangleMult :: Polygon -> Integer -> [Polygon]
        splitTriangleMult poly n | n == 1 = [poly]
                                 | otherwise = t : (splitTriangleMult t' (n-1))
          where (t:t':_) = splitTriangle poly

        -- macht aus Punkten Städte mit null Einwohnern
        polygonToCities :: Polygon -> [City]
        polygonToCities = map (\p -> City [] p 0)

        -- extrahiert Koordinaten der Städte
        citiesToPoints :: [City] -> [Point]
        citiesToPoints = map (\(City _ p _) -> p)

        -- sortiert Städte anhand ihrer Koordinaten
        sortCities :: [City] -> [City]
        sortCities = sortBy compareCities
          where compareCities (City _ p1 _) (City _ p2 _) = compare p1 p2

        -- bestimmt Gesamtbevölkerung einer Liste von Städten
        population :: [City] -> Integer
        population = sum . map (\(City _ _ i) -> i)

        -- macht aus einem Punkt ein gültiges Polygon (eine Linie mit 3 Punkten)
        cheatOnePoint :: Point -> Point -> Polygon
        cheatOnePoint (x1,y1) (x2,y2) = [(x1,y1),quarterPoint,midPoint]
          where quarterPoint = ((3*x1+x2)/4,(3*y1+y2)/4)
                midPoint = ((x1+x2)/2,(y1+y2)/2)

        -- macht aus zwei Punkten ein gültiges Polygon (eine Linie mit 3 Punkten)
        cheatTwoPoints :: [Point] -> [Point]
        cheatTwoPoints [(x1,y1),(x2,y2)] = [(x1,y1),midPoint,(x2,y2)]
          where midPoint = ((x1+x2)/2,(y1+y2)/2)
        cheatTwoPoints _ = []

        ----------------
        -- Konvexe Hülle Algorithmus
        -- geklaut von hier:
        -- https://en.wikibooks.org/wiki/Algorithm_Implementation/Geometry/Convex_hull/Monotone_chain#Haskell
        ---------------

        [omitted]

Then comes Team Fischkopf (score: 0.198), this time extended by a third member, Herr Weidinger (likely a Weißwurstfresser). This North-South collaboration seems to have implemented a cool algorithm, but again in the absence of documentation it's hard to see what they really did.

Finally, Frau Kindelsberger's solution (score: 3.41) is the first of many similar short solutions (which could have been made even shorter by removing the spurious first equations for getPointList and getStaedteList). The formatting also leaves much to be desired—to minimize the chances are that the code on the right-hand side will be clipped by your browser, the MC wad obliged to reduce the font size drastically:

        split :: District -> Integer -> [Polygon]
        split _ 0 = []
        split  district 1 = [getPointList district]
        split district n  = take pointsDivN point  : split (District (point \\ deletePoints) staedte) (n-1) where point = getPointList district
                                                                                                                  staedte = getStaedteList district      
                                                                                                                  pointsDivN  = if  div (genericLength point) (fromIntegral n :: Int) <= 3 then 3 else div (genericLength point) (fromIntegral n :: Int) + 1
                                                                                                                  deletePoints = tail (take (pointsDivN-1) point)                                                                                          
        getPointList :: District -> [Point]        
        getPointList (District [] []) = []
        getPointList (District point _) = point

        getStaedteList :: District -> [City]
        getStaedteList (District [] []) = []
        getStaedteList (District _ city) = city

The awesome Contra-MC wrote a program to visualize the results. Here's the output on one of the examples from the benchmark set (where larger circles correspond to larger cities):

Team BalkanHerr HaslbeckTeam Fischkopf–WeißwurstEverybody else

We see that Team Balkan and Team Fischkopf didn't hesitate to cut existing boundary segments into two. Both appear to like horizontal and vertical lines. The naive approach, in contrast, simply tries to connect existing vertices, with catastrophic results for the example shown above. The coolest of these solutions is undoubedly Herr Haslbeck's, who even leaves empty territory as a pink no man's land. It is a pity that his solution didn't win on this particular benchmark suite.

The main lesson this week is probably that When you implement a cool algorithm, please document your code! This increases your chances of seeing your solution on this weblog, but also means that you will be able to read it again in n years from now (for large values of n).

Die Ergebnisse der (elften und) zwölften Woche

Update: The results have been multiplied by 2, as promised in the Aufgabentext for a zweiwöchige Wettbewerbsaufgabe; the text below has been modified accordingly. The MC says: Asche auf sein Haupt!

Here they are at last, the results for week 12!

Top 30 der Woche
  Platz Wettbewerber(in)   Punkte
1.Maximilian Haslbeck30 × 2
(2.Maximilian Kirchmeier(29/2) × 2)
2.Julian Biendarra29 × 2
3.Stefan Peter Dirix28 × 2
4.Alexandra Fritzen27 × 2
5.Martin Mihaylov26 × 2
Georgi Dikov26 × 2
7.Michael Schreier24 × 2
Simon Roßkopf24 × 2
9.Alexander Christian Hefele22 × 2
10.Ludwig Peuckert(21 − 2) × 2
11.Alexander Weidinger20 × 2
Hauke Brinkop20 × 2
13.Julia Kindelsberger18 × 2
14.Nikita Basargin17 × 2
15.Atanas Mirchev16 × 2
16.Lukas Michael Stumberg15 × 2
17.Thomas Zwickl14 × 2
18.Daniel Stüwe13 × 2
19.Christian Ziegner12 × 2
Martin Wurzer12 × 2
Thomas Pettinger12 × 2

First, some general comments:

The MC was quite pleased to see that most competitors wrote detailed descriptions of their programs. He extracted them and linked them. (Just click the name of a competitor in the table below.) He was tempted to subtract points (× 2) to those who didn't write any docs, since the instructions were abundantly clear about the need to do so, but he resisted the temptation, and now the temptation is gone forever.

In addition to the total score, the table also lists the total time taken by the program. It's refreshing to see that some of programs are both fast (bzw. slow) and good (bzw. bad). The MC's program is pretty terrible, but that's because he spent most on his energy on this (pseudo-)weblog instead. The real participants don't have that excuse!

Total ScoreTotal Time (s)
Maximilian Haslbeck 231662937.593
Maximilian Haslbeck 124892573.549
Maximilian Kirchmeier (tardy)222758.912
Julian Biendarra 11739103.315
Julian Biendarra 21735141.089
Maximilian Haslbeck 3171319740.533
Stefan Peter Dirix 115761419.802
Stefan Peter Dirix 315255481.855
Alexandra Fritzen 31205185.314
Mihaylov & Dikov 412030.042
Stefan Peter Dirix 21182168.998
Mihaylov & Dikov 310299.982
Mihaylov & Dikov 28301.789
Michael Schreier 1631357.103
Simon Roßkopf 1631322.309
Alexander Christian Hefele307146.607
Ludwig Peuckert2492.851
Alexander Weidinger2280.139
Hauke Brinkop2280.059
Alexandra Fritzen 1970.042
Michael Schreier 3−112.824
Julia Kindelsberger−950.104
Nikita Basargin 3−931313.838
Nikita Basargin 2−97039.002
Atanas Mirchev 1−10696.976
Atanas Mirchev 3−106928.465
Nikita Basargin 1−11385.164
Master of Competition−11800.032
Lukas Michael Stumberg−12005.053
Thomas Zwickl−135934.335
Atanas Mirchev 2−145110203.498
Daniel Stüwe−1468600.620
Alexandra Fritzen 2−15020.127
Mihaylov & Dikov 1−16522.412
Michael Schreier 2−326070530.723
Simon Roßkopf 2−341872110.760

Readers of this page are highly encouraged to click on the above links. Much of the humor you would normally expect from the MC is delegated to this week's participants. Some of them are really funny and/or witty. The MC's personal favorite is Ludwig Peuckert's entry, not so much for his witty comments as for his willingness to go his own way. Oh, and don't worry if the non-ASCII letters come up all screwed up; just use your imagination to fill the gaps. (Hint: Usually, the missing letter will be one of ä ö ü ß, apart from the entry that features elaborate, 18+ ASCII-art.)

The next table is the coolest one. It shows the outcome of every single game (except those involving Herr Kirchmeier). The entries on the left indicate black players; those at the top are the white players. Since the entries are ranked by score, one would expect the upper right triangle to be filled with b's and the lower left triangle with w's (see the legend).

Legend
 w: white won 
 b: black won 
 W: white won due to timeout 
 B: black won due to timeout 
 =: tie 
M
a
x
i
m
i
l
i
a
n

H
a
s
l
b
e
c
k

2
M
a
x
i
m
i
l
i
a
n

H
a
s
l
b
e
c
k

1
J
u
l
i
a
n

B
i
e
n
d
a
r
r
a

1
J
u
l
i
a
n

B
i
e
n
d
a
r
r
a

2
M
a
x
i
m
i
l
i
a
n

H
a
s
l
b
e
c
k

3
S
t
e
f
a
n

P
e
t
e
r

D
i
r
i
x

1
S
t
e
f
a
n

P
e
t
e
r

D
i
r
i
x

3
A
l
e
x
a
n
d
r
a

F
r
i
t
z
e
n

3
M
i
h
a
y
l
o
v

&

D
i
k
o
v

4
S
t
e
f
a
n

P
e
t
e
r

D
i
r
i
x

2
M
i
h
a
y
l
o
v

&

D
i
k
o
v

3
M
i
h
a
y
l
o
v

&

D
i
k
o
v

2
M
i
c
h
a
e
l

S
c
h
r
e
i
e
r

1
S
i
m
o
n

R
o
ß
k
o
p
f

1
A
l
e
x
a
n
d
e
r

C
h
r
i
s
t
i
a
n

H
e
f
e
l
e

 
L
u
d
w
i
g

P
e
u
c
k
e
r
t

 
A
l
e
x
a
n
d
e
r

W
e
i
d
i
n
g
e
r

 
H
a
u
k
e

B
r
i
n
k
o
p

 
A
l
e
x
a
n
d
r
a

F
r
i
t
z
e
n

1
M
i
c
h
a
e
l

S
c
h
r
e
i
e
r

3
J
u
l
i
a

K
i
n
d
e
l
s
b
e
r
g
e
r

 
N
i
k
i
t
a

B
a
s
a
r
g
i
n

3
N
i
k
i
t
a

B
a
s
a
r
g
i
n

2
A
t
a
n
a
s

M
i
r
c
h
e
v

1
A
t
a
n
a
s

M
i
r
c
h
e
v

3
N
i
k
i
t
a

B
a
s
a
r
g
i
n

1
M
a
s
t
e
r

o
f

C
o
m
p
e
t
i
t
i
o
n

 
L
u
k
a
s

M
i
c
h
a
e
l

S
t
u
m
b
e
r
g

 
T
h
o
m
a
s

Z
w
i
c
k
l

 
A
t
a
n
a
s

M
i
r
c
h
e
v

2
D
a
n
i
e
l

S
t
ü
w
e

 
A
l
e
x
a
n
d
r
a

F
r
i
t
z
e
n

2
M
i
h
a
y
l
o
v

&

D
i
k
o
v

1
M
i
c
h
a
e
l

S
c
h
r
e
i
e
r

2
S
i
m
o
n

R
o
ß
k
o
p
f

2
Maximilian Haslbeck 2 bbbwbbbbbbbbbbbbbbbbbbbbbbbbbbbbBB
Maximilian Haslbeck 1w bbBbbbbbbbbbbbbbbbbbbbbbbbbbbbbBB
Julian Biendarra 1wb bwwwbbwwbbbbbbbbbbbbbbbbbbbbbbBB
Julian Biendarra 2www wwwwbwbbbbbbbbwwwbbbbbbbbBbbbBB
Maximilian Haslbeck 3bWWb bbbWbbbWWbbbbWbWbbbbbbbbbbbbBB
Stefan Peter Dirix 1bwb=w BbbbbbbbbbbbbbbbbbbbbbbBbbwBB
Stefan Peter Dirix 3wwbwww wbbbbbbbbbbbbbbbbbbbbbBbbbBB
Alexandra Fritzen 3wwbbwww bwb=wwbwwwbwbbbbbbbbbbbbbBB
Mihaylov & Dikov 4wwwwwwBw wwbwwbbbbbwbbbbbbbbbbbbbwB
Stefan Peter Dirix 2wwbbwbbbb bbbbbbbbbbbbbbbbbbbbbbbBB
Mihaylov & Dikov 3wwbbwwbbww wbbbw==bbbbbbbbbbbbbbbBB
Mihaylov & Dikov 2wwwwwwwwwww bbbwwwbbbbbbbbbbbbbbbBB
Michael Schreier 1wwwbwwwwwwbb bbbbbbbb==bbbbbbbbbbBB
Simon Roßkopf 1wwwbwwwwwwbbb bbbbbbb==bbbbbbbbbbBB
Alexander Christian Hefele   wwwwBwwwbwwbww wbbbwbbbbbbwbwbwwbBB
Ludwig Peuckert   wwwwwwBwbwbwbbb bbwbwbbbbbwbbBwwbBB
Alexander Weidinger   wbwwBwwwwwwwwwbb wbbbbbbbwbbwbbb=BB
Hauke Brinkop   wbwwBwwwwwwwwwbbw bbbbbbbwbbwbbb=BB
Alexandra Fritzen 1wwwwwwwbwwwwwwwwww wbbbbbbbbbbbbbBB
Michael Schreier 3wwwwwwwwwwwwwwwbwww wbbbbbbwbbbbbww
Julia Kindelsberger   wwwwwwwbwwwwwwwwwwbw bbbbbbbbbbbbww
Nikita Basargin 3wwwwwwwwwwwwwwwwwwwww wbbbwwwbbwbBB
Nikita Basargin 2wwwwwwwwwwwwbbwwwwwwww wwwwbwwb=bBB
Atanas Mirchev 1BwwwBwwww=wwwwwwwwwwwbb wbbbbwbbwBB
Atanas Mirchev 3BwwwBwwww=wwwwwwwwwwwbbw bbbbwbbwBB
Nikita Basargin 1wwwwwwwwwwwwwwbwwwwwwwwbb wwwbbbbBB
Master of Competition   wwwwBwwwwwwwwwbbbbwwwwwbbw wwbbwbBB
Lukas Michael Stumberg   wwwwBwwwwwbwwwwwwwwwwwwbbbb bbbbwBB
Thomas Zwickl   wwwwBwwwwwwwwwwwwwwwwwwbbwwb Bwwbww
Atanas Mirchev 2WwwwWwwww=Wwwwwwwwwwwbbwwbbbb bbwBB
Daniel Stüwe   wwwwwwwwwwwwwwwwwwwwwbbbbbbwwb bb=B
Alexandra Fritzen 2wwwwwwwwwwwwwwwwwwwwwwwwwwbbwwb bww
Mihaylov & Dikov 1wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww BB
Michael Schreier 2WWWWWWWWWWWWbbWWWWWWWWWWWWbWbWWbW W
Simon Roßkopf 2WWWWWWWWWWWWbbWWWWWWWWWWWWbWbWWbWW

So we have a clear winner! Congratulations to Herr Haslbeck! With a few exceptions (including losses against himself), he really knocked everybody's socks off! If this gentleman doesn't get a trophy at the end of the term, nobody will.

It would be nice to show the actual games, but the MC wasn't able to make the Java applet that's supposed to do that work. As a sample of what you're missing, here's the log of a game in which the MC takes a real beating from Herr Haslbeck's third solution.

Looking at the solutions, it appears that nobody thought about “cheating,” und zwar in the following way: Submit a real solution A and two fake solutions B and C. The fake solutions play extremely well, except that if they detect that they are playing against A (e.g., by running A for the first 10 or so moves and checking that the moves coincide), then they start playing extremely crappily, thereby raising A's score. This can be improved by recruting, say, 20 students who normally don't submit solutions, offering one can of beer to each of them, in exchange for their submitting further copies of the fake solution. Then just make sure that the solutions look different enough so that the MC doesn't notice what's going on, and your're on your way to victory.

What are this week's lessons?

Update: Mister Tardy has this to say:

Well, would you look at that... it seems my caffeine-powered coding marathon did actually pay off. All the more frustrating that my tardiness cost me half the points my stubbornness earned me :(

But that's how it goes, I guess... still, it's nice to see my effort validated :)

Ergebnisse der dreizehnten Woche

The results for week 13 are in!

Top 30 der Woche
  Platz Wettbewerber(in)   Punkte
1. Stefan Peter Dirix 30
2. Simon Roßkopf 29
Michael Schreier 29
4. Maximilian Haslbeck 27
5. Daniel Stüwe 26
6. Moritz Sichert 25
7. Alexandra Fritzen 24
Julia Kindelsberger 24
9. Alexander Christian Hefele 22
David Krebs 22
Ludwig Peuckert 22
David Zhang 22
Hauke Brinkop 22
Nikita Basargin 22
15. Maximilian Kirchmeier 16
16. Alexander Weidinger 15

Unless one of the current top five competitors of the Semester gets disqualified somehow (e.g., for cheating in an egregious fashion), we already have a set of five winners. Which is good news not only for these participants (congrats!), but also for the MC, because he had grabbed the five trophies already and was dreading a fifth-place tie that would force him to take the U-Bahn to the PEP in Neuperlach and hope the Nanu-Nana still has some trophies left. (Technically, a tie is still possible if Michael Schreier gets 30 points and Herr Brinkop gets 0 points, but what are the odds?)

This week was a rather weird week to rank. The MC divided the solutions into classes and made sure that the higher classes received more points than the lower classes.

In the words of one of the competitors:

        {-
        Unfortunately I had not enough time to finish the Wettbewerbsaufgabe this time. :/
        But there is still a small propability of getting some points.
        (Because the other students seem to be quite lazy and won't even change "undefined" to "id")

        Well, at least the output is never longer than the input.
        -}

Entirely accurate! (Even the bit about the students being like their favorite programming language—or is this mistaking lethargy for strategy?) The MC was about to write something along those very lines. Great minds think alike! (And Eigenlob st...) But the small “propability” of getting some points materialized into a 100% probability of getting 22 out of 30 points! Not bad!

To make matters more complicated, a couple of occasionally empty-headed, maybe-just-not-working solutions threw exceptions like

        Maybe.fromJust: Nothing
        Data.ByteString.head: empty ByteString

or exhausted the stack space upon decompression. The MC decided to give these programs the benefit of the doubt—i.e., he assumed that the compressed data should be recoverable, using a bug-free program (which exists in principle even though it has yet to be found). But there's a thin line between giving the benefit of the doubt and being utterly naïve, which the MC is not ready to cross; so whenever a decompression function failed, the expansion ratio is taken to be 200% (and displayed in italics to make it more peinlich for the offenders), so that this serves as a lesson. As Joe StalinVladimir Ilyich Ulyanov is supposed to maybe have said: Vertrauen ist gut, Kontrolle ist besser. (Update des Meta-Masters: Joe Stalin said “Vertrauen ist schlecht, Gulag ist besser, Hinrichtung ist am besten”.)

All right, here come the detailed results in all their glory. The MC built an archive of 50 text files and 50 binary files. In each group, he put 10 tiny files (two-digit size), 10 small files (three-digit size), 10 medium files (four-digit size), 10 large files (five-digit size), and 10 huge files (six-digit size). He didn't dare to go into the millions of bytes because Haskell performance is rather hard to predict and debug for these, and the goal was to focus on compression algorithms, not on Haskell performance tuning. Results are displayed as size after : size before ratios, in %. Lower is better.

Fritzen & KindelsbergerSichertDirixSchreier & RoßkopfHaslbeckStüwe
FileSize (bytes)GZipgzip/bzip2/xzLZWLZWLZ77/Huffman“Arith”
ccc.html30167%103%103%103%153%117%
conclusion.tex35146%103%97%103%157%114%
test.cpp44145%102%109%102%184%111%
access_3box_innerleft.gif45138%102%96%102%182%107%
border_top.gif50128%102%90%102%162%104%
access_border_bottom.gif50128%102%90%102%158%104%
access_border_right.gif52127%102%94%102%171%104%
Test055.ML58116%102%88%102%129%109%
README_en_GB_thes.txt63122%102%98%102%143%108%
rule.gif65125%102%82%102%143%103%
i16.gif72121%101%94%101%194%101%
Test062.ML82100%101%89%101%116%106%
border_bottom.gif82121%101%93%101%174%106%
description-om.txt87110%101%97%101%123%106%
rgb.png89116%101%99%101%185%100%
ChatInputFill.png90111%101%93%101%168%96%
EmbeddedManager_impl.cpp94106%101%99%101%133%105%
ViewingPaneLeftInnerShadow.png96111%101%95%101%172%93%
version.texi9982%83%82%101%118%102%
trashed.html99106%101%96%101%127%104%
version.cpp120112%101%105%101%143%104%
ctl0.tex16876%76%79%101%105%104%
Test036.ML28556%57%68%100%89%101%
unlocked_bw_18.png349106%100%108%100%160%98%
tictac.html42161%61%72%100%81%100%
description-ta.txt55748%44%51%93%66%101%
ROOT.ML65160%60%70%100%78%98%
packages.html65558%58%74%100%90%100%
QuestionMark.pdf67655%55%67%100%78%97%
gdi.exe68665%65%69%100%82%85%
Pfeil1.pdf68665%65%74%100%93%98%
user.exe69064%64%69%100%81%86%
krnl386.exe69265%65%69%100%82%86%
Firefox3.png703103%100%117%100%132%100%
mem_alloc.cpp71346%46%67%100%77%100%
wallet-20.png735103%100%117%100%131%101%
buttonec.pdf78873%74%81%100%105%99%
Lambda.pdf802103%100%120%100%129%101%
translations.tex85253%53%68%100%68%100%
msg_44.txt89554%54%79%100%83%100%
Calendar_Alert.pdf116762%62%73%100%92%98%
csharpexec-test.exe256022%22%23%40%34%34%
preface.tex321248%48%63%97%67%100%
write.exe337623%22%27%45%35%42%
contact_us.html351843%43%60%93%67%99%
esther-ch2.tex497850%47%61%91%70%98%
metatex.exe614444%42%55%81%64%80%
TODO.txt664548%48%58%85%61%99%
doc.txt693839%38%53%77%62%100%
backward.png7019100%100%138%100%103%100%
clasimp.ML718234%34%52%76%65%99%
background.pdf726155%52%89%100%90%99%
multi_page_logins.html747538%38%53%78%64%99%
CAAudioFileRecorder.cpp755635%35%54%79%69%99%
DM_Attribute-Number.pdf836552%50%101%100%98%100%
AppleSamplePCI.cpp893730%30%50%72%65%98%
bnf_fp_def_sugar_tactics.ML898231%31%48%69%64%99%
Stop.pdf928624%23%38%56%63%95%
SmallTux.png982499%99%132%100%102%100%
creditcard-solo.png9899100%100%139%100%102%100%
end_user_license_agreement.html1003139%38%55%78%66%99%
Calendar_Publish.pdf1874857%55%78%96%77%98%
HP_DeviceSettings.cpp2629921%20%43%54%67%100%
HISTORY.txt3598136%32%51%60%63%99%
groebner.ML3749330%29%52%57%61%98%
fun-brain-2.png40042100%100%140%100%101%100%
root.tex4108233%31%54%58%61%100%
money_back.png42225100%100%141%100%101%100%
welcome-keychain-icon.png44112100%100%141%100%101%100%
clock.exe4648817%14%44%43%46%72%
Reply_Pressed.pdf4856924%22%56%49%61%94%
AUBase.cpp6270521%20%45%46%66%100%
thm.ML6404024%23%46%47%59%98%
idnchars.txt6965022%14%45%38%63%93%
diagonal_2_light.pdf7303148%45%88%85%75%99%
xpidl.exe8448039%33%74%65%72%95%
dired-ref.pdf8564695%95%137%100%100%100%
HOL.tex8684332%29%58%53%63%99%
LispCall.html9173211%9%42%35%68%99%
xpcshell.exe9728040%35%73%65%74%96%
safari_import.png110941100%100%141%100%101%100%
realconv.cpp11266425%21%63%42%57%93%
welcome-browser-spiral.png120703100%100%141%100%101%100%
THIRDPARTY.*JAVAFX.txt12332426%23%52%47%62%99%
cmath_testcases.txt13629528%22%45%41%56%98%
WhitePiece.png137941100%100%141%100%101%100%
dictionary.html16741515%12%38%32%60%100%
PG-adapting.texi17195428%23%53%42%60%100%
acachetutorial.pdf19393393%93%137%100%100%100%
PARSE_TREE.ML30478721%200%42%32%50%93%
1Password.html31348947%200%93%73%72%99%
xwindows.cpp31675817%200%52%32%62%96%
texinfo.tex32019730%200%57%44%62%99%
eground.exe39112443%200%81%62%73%94%
polyimport.exe49808037%200%89%58%69%93%
kew-dec2004.pdf61925398%200%141%100%100%100%
mapping.pdf62571739%200%83%59%70%98%
metis.ML66465916%200%44%29%55%97%
TB_JunkTemplate.pdf8004005%200%62%21%62%99%
SPASS.exe81942941%200%79%58%72%98%
Average66.30%79.19%79.73%81.71%93.74%97.88%

(It's frightening how much rubbish there is lying around on the MC's Mac. Especially worrying are all these Windows executable files, like SPASS.exe. Probably all viruses.)

The two “winning” solutions are those that use GZip or gzip etc. These don't really count, but they are shown in the table for reference. Then comes Herr Dirix's solution:

        {-Info
           Compression Algorithm: Lempel<96>Ziv<96>Welch
           http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch

           Additional info from:
           http://www.cs.duke.edu/csed/curious/compression/lzw.html

           Starts with 8-bit codes (to not expand small files) to max 12-bit codes

           This should work nicely on textfiles. For binary formats a proper bitrange should be set (for example 5-12) depending on the binary (e.g. gif)
           (This implementation is hardcoded to a range of [8-16] bits)
        -}

That's a nice choice! The MC remembers implementing a GIF decoder in C back in 1995, and GIF uses LZW. Ah, nostalgie ! quand tu nous tiens...

Then comes another LZW implementation, due to Herr Schreier and Herr Roßkopf. Strangely enough, the numbers are very different from those obtained by Herr Dirix, sometimes for the better, often for the worse:

        {-

        Version: -inf (Pre-alpha)

        Team:
        Simon Roßkopf
        Michael Schreier

        Translated the LWZ algorithm from here: http://www.cplusplus.com/articles/iL18T05o/ (Version 1)

        We planned to implement LWZ+Huffman(Similar to DEFLATE). But, no time. So only LZW
        No time, no comments sadly. Maybe next time.
        -}

        --Those check wether the LZW algorithm inflates the input
        --If yes, just use the input as output

(Perhaps the doubious performance comes from the anachronistic misspelling of LZW as LWZ. Anachronistic because LZW = LZ78 + Welch84.)

Next, Herr Haslbeck and his dual LZ77 + Huffman implementation:

        {--
         -- Wow, one could spend a lifetime optimising a compression algorithm.
         --
         -- Ich habe hier eine Version von Huffmankodierung und fuer Dateien
         -- der Groesse 1 kB bis ca. 1 MB wird zusaetzlich noch eine LZ77 Variante
         -- verwendet (das Ding ist saulangsam, was wahrscheinlich an meinem Code
         -- und das Haskell nicht ideal für sowas ist liegt).
         --
         -- Huffmankodierung:
         -- Okay, ich erstelle einen Huffmantree mithilfe der Funktionen von
         -- letzter Woche und bringe ihn in die sogenannte "canonical form"
         -- (siehe Wikipedia). Somit kann er platzsparend gespeichert werden
         -- (max. 256 bytes). Es werden immer 32KB Bloecke komprimiert, also
         -- fuer jeden dieser Bloecke ein eigener Baum und entsprechner Code
         -- erstellt. Ein komprimierter Block beginnt mit einem Bit, indem
         -- gespeichert wird, wie der Baum gespeichert wurde. Dann folgen
         -- 15 Bits mit der Länge des komprimierten Blocks, dann der Baum
         -- und dann Huffmancode in ByteString Form.
         --
         -- LZ77:
         -- Ich habe versucht, so etwas ähnliches wie lz77 zu implementieren.
         -- (Also hab ich fast schon deflate (huffman + lz77) implementiert :) )
         -- Ganz grob, fuer jede Position wird gesucht ob die naechsten 256 Zeichen
         -- irgendwie schon in den vorigen 4096 Zeichen vorkommen. Falls eine
         -- Zeichenkette laenger 2 vorkommt, wird statt dem eigentlichen Zeichen
         -- ein Verweis auf die vorige Zeichenkette + deren Laenge gespeichert.
         -- Das Ganze ist saulangsam und wird deshalb nur fuer Dateien > 1024 KB
         -- bzw. < ca. 1MB angewendet. Hauptsache ich klau jmd. die 40 Punkte.
         --
         -- Das Ganze funktioniert hoffentlich. Es ist fuerchterbarer Durcheinander.
         -- Es sollte besser kommentiert sein, aber die Zeit ging fuer Testen und
         -- ultrasubtile Fehler finden drauf.
         --
         -- Es ist 15:07. Ich werde jetzt noch ein bisschen testen, wahrscheinlich
         -- feststellen, dass ein Bock drin ist und dann alles woran ich glaube
         -- anzweifeln.
         -}

Amen.

Update: Herr Haslbeck writes:

der Kommentar zu meinem Programm ist nicht ganz korrekt. Ich habe gestern um 15:20 Uhr ein JPEG mit meinem Huffman + LZ77 noch komprimiert und das Programm lieferte ein falschen Ergebnis [sic]. Deshalb komprimiert mein Programm nicht mit LZ77, sondern implementiert eine reine Huffmankodierung.

Auf zur nächsten Challenge: Kunst programmieren.

Und dann gibt es die Lösung von Herrn Stüwe. (Oops, I got a little bit too inspired by the above passage in German, and the sentence is ausgerutscht like that. Oops, now I even forgot that I'm supposed to be the MC.) [Throat clearing sound]

        -- http://www.bodden.de/wordpress/wp-content/uploads/2007/11/ac.pdf
        -- http://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/arith.pdf
        -- http://www.staff.science.uu.nl/~jeuri101/afp/afp4/arith-4.pdf
        -- http://www.cs.cmu.edu/~guyb/realworld/compression.pdf
        -- http://tcs.rwth-aachen.de/lehre/Komprimierung/SS2012/

Lots of links to prestigious colleges, but how come the ratio is only 97.88%? The MC's grandma can compress better than that!

Interestingly, Herr Stüwe (a.k.a. Herr Kopf) apparently dissolved Team Nord. Last week, we had the interesting case, which the MC neglected to comment, that Herr Stüwe had managed to “verschlimmbessern” his solution a few hours before the deadline; as a result, he got far less points than his “lazier” teammates (where “lazy” can mean “studying for the exams”, of course). For better or for worse, competitions don't reward efforts; they reward results (and luck). One of the challenges in life is to achieve a high correlation between efforts and results (and to maximize luck). This can be arbitrarily difficult slash highly nontrivial. It been said of Einstein that he could have gone sailing for the last 30 years of his life (or served his majesty's government and become a Sir, perhaps) instead of spending his time on fruitless research (relatively speaking).

What is the lesson this week? Hint: It's hidden in the previous paragraph!

Oh, the MC has got mention one more thing. If you kept clicking on last week's entries to find some 18+ ASCII-art, you're probably disappointed by now not to have found any. But one reader who took this task a bit too seriously for his own good found a true gem, a perle rare.

        This is the 1st version of the game tree that uses the minimax algorithm to find the best possible move. 
        It will take too long.  

And how long did the solution take to play 68 games? 0.127 seconds!! Which brings us to the second lesson of the week: You have the right to remain silent. Anything you write in a comment can and will be used against you. ;-)

Ergebnisse der vierzehnten Woche

The results for the final week are in!

Top 30 der Woche
  Platz Wettbewerber(in)   Punkte
1.Maximilian Haslbeck29.333
2.Stefan Peter Dirix24.333
3.Daniel Stüwe23.333
.Hauke Brinkop23.333
.Simon Roßkopf23.333
.Michael Schreier23.333
6.Alexander Weidinger22.667
7.Nikita Basargin22.333
8.Julian Biendarra21.333
9.Alexander Christian Hefele21.000
10.Martin Hartmond18.000

Was ist denn ein Token?

The MC is a big fan of token counting as a rough measure of software quality. As the previous year's results have clearly shown, shorter answers tend to be more elegant and (usually) less likely to have subtle bugs.

The MC is making his token-counting program available to the rest of the world. The program builds on Niklas Broberg's Language.Haskell.Exts.Lexer module, which can be installed using Cabal:

        cabal update
        cabal install haskell-src-exts

There appears to be disagreement regarding the status of backquotes in different versions of Haskell (98 vs. 2010). The MC generously adopted the Haskell 98 definition, with `max` treated as one token rather than three.

Here's how to compile and run the program on Linux or Mac:

        ghc Tokenize.hs -o tokenize
        ./tokenize

At this point, type in some Haskell code, e.g.,

        sum_max_sq x y z = x ^ 2 + y ^ 2 + z ^ 2 - x `min` y `min` z ^ 2

Make sure to type in an empty line afterwards. Press Ctrl+D to terminate. The program then outputs

        24 sxyz=x^2+y^2+z^2-xmymz^2

On Windows, you can run the program with

        .\tokenize

or

        tokenize

and terminate it with Ctrl+C (Windows 8) or Ctrl+Z followed by Enter (Windows 7, probably also for earlier versions).

The first field, “24”, gives the number of tokens. The second field is a short version of the program, where each token is abbreviated as one letter. This can be useful for debugging. The MC aggregates this information in his top-secret Master View, which looks something like this:

        18 sxyz=mxy^2+mxymz^2 xxx@tum.de
        18 sxyz=mxy^2+xmymz^2 xxx@tum.de
        18 sxyz=mxy^2+zmmxy^2 xxx@mytum.de
        20 sxyz=(mxy)^2+(myz)^2 xxx@mytum.de
        20 sxyz=mxy^2+m(mxy)z^2 xxx@tum.de
        20 sxyz=mxy^2+m(mxy)z^2 xxx@tum.de
        ...
        52 sxyz|x=z&y=z=x*x+y*y|x=y&z=y=x*x+z*z|y=x&z=x=y*y+z*z xxx@mytum.de
        52 sxyz|x=z&y=z=x*x+y*y|x=y&z=y=x*x+z*z|y=x&z=x=y*y+z*z xxx@mytum.de
        52 sxyz|x=z&y=z=x*x+y*y|x=y&z=y=x*x+z*z|y=x&z=x=y*y+z*z xxx@mytum.de
        52 sxyz|x=z&y=z=x^2+y^2|x=y&z=y=x^2+z^2|y=x&z=x=y^2+z^2 xxx@mytum.de
        ...
        252 sxyz|(x+y)>(x+z)&(x+y)>(y+z)=(x*x)+(y*y)|(x+z)>(x+y)&(x+z)>(y+z)=(x*x)+(z*z)|(y+z)>(y+x)&(y+z)>(x+z)=(y*y)+(z*z)|x=z&x>y=(x*x)+(z*z)|x=y&x>z=(x*x)+(y*y)|z=y&z>x=(y*y)+(z*z)|x=z&x<y=(x*x)+(y*y)|x=y&x<z=(x*x)+(z*z)|z=y&z<x=(x*x)+(z*z)|x=y&y=z=(x*x)+(y*y) xxx@mytum.de

The Master View allows the MC to compile the Top 30 der Woche, but also to quickly spot interesting solutions and duplicates. Last year, the Wettbewerbsaufgaben were also Prüfungsleistungen and hence plagiarism was a serious issue to deal with. This year there are no specific guidelines regarding team work.

For the competition, the above program is invoked only on the code snippet included between {-WETT-} and {-TTEW-}, with all type signatures stripped away.

Some competitors have tried to confuse tokenize by using German letters (ä etc.) in their source code. Setting

        export LANG=de_DE

before running the program solves this problem.