Wettbewerb
Jedes Übungsblatt beinhaltet eine vom Master of Competition Senior, Herrn Dr. MC „Hammer“ Blanchette, sorgfältig ausgewählte Wettbewerbsaufgabe. Diese zählt so wie die anderen Hausaufgaben, wird aber zusätzlich als Teil eines semesterlangen Wettbewerbs bewertet. Die Bewertungskriterien variieren jedes Mal. Die besten 30 Lösungen bekommen je 30, 29, ..., 1 Punkte.
Jede Woche wird die Top 30 der Woche veröffentlicht und die Top 30 des Semesters aktualisiert. Die Lieblingslösungen des MC Sr. werden ins Internet gestellt. Am Semesterende werden die fünf besten Studenten mit geschmackvollen Trophäen geehrt. Es gibt hier also keine Bonuspunkte, dafür aber jede Menge Spaß und Ehre (und Trophäen)!
Wichtig: Wenn Sie nicht möchten, dass Ihr Name im Internet erscheint, können Sie entweder auf den Wettbewerb verzichten (ganz einfach, indem Sie die {-WETT-}...{-TTEW-} Tags weglassen) oder den MC anschreiben.
Gliederung:
- Top 30 des Semesters
- Die Ergebnisse der ersten Woche
- Die Ergebnisse der zweiten Woche
- Die Ergebnisse der dritten Woche
- Die Ergebnisse der vierten Woche
- Die Ergebnisse der fünften Woche
- Die Ergebnisse der sechsten Woche
- Die Ergebnisse der siebten Woche
- Die Zwischenauswertung der achten und neunten Woche
- Die Ergebnisse der achten und neunten Woche
- Die Ergebnisse der zehnten und elften Woche
- Die Ergebnisse der zwölften Woche
- Die Ergebnisse der dreizehnten Woche
- Die Ergebnisse der vierzehnten Woche
- Was ist denn ein Token?
Top 30 des Semesters | |||
---|---|---|---|
Platz | Wettbewerber(in) | Punkte | |
1. | David Widmann | 312 1/3 | |
2. | Martin Wauligmann | 297 2/3 | |
3. | Christoph Rudolf | 259 1/3 | |
4. | Chaoran Chen | 256 | |
5. | Fabio Madge Pimentel | 231 2/3 | |
6. | Lukas Tenbrink | 228 | |
7. | Daniel Schubert | 201 | |
8. | Maximilian Endraß | 175 | |
9. | Joshua Rombauer | 173 | |
10. | Sebastian Neubauer | 161 | |
11. | Julian Spahl | 155 | |
12. | Julius Imbery | 150 | |
13. | Jonas Heintzenberg | 148 | |
14. | Lukas Heinzmann | 139 | |
15. | Alexander Küchler | 129 | |
16. | Nathanael Schilling | 123 | |
17. | Christoph Griesbeck | 103 | |
18. | Rudolf Hattenkofer | 95 | |
19. | Dario Beckert | 94 | |
Philipp Jutzi | 94 | ||
21. | Michael Hesse | 93 | |
22. | Michael Kubitza | 82 | |
23. | Lukas Prantl | 79 | |
24. | Daniel Kowatsch | 77 | |
25. | Simon Rehwald | 70 | |
26. | Anna-Kathrin Kopetzki | 69 | |
Sebastian Weiß | 69 | ||
28. | Adrian Philipp | 68 | |
Alexander Book | 68 | ||
30. | Felix Thimm | 67 | |
Michael Eder | 67 |
Die Ergebnisse der ersten Woche
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Token | Punkte |
1. | Dennis Fischer | 22 | 30 |
2. | Julius Imbery | 23 | 29 |
3. | Maximilian Reif | 24 | 28 |
Björn-Aljoscha Kullmann | 24 | 28 | |
Hans Kirchner | 24 | 28 | |
Maximilian Muth | 24 | 28 | |
Robert Hamsch | 24 | 28 | |
David Widmann | 24 | 28 | |
Simon Zettler | 24 | 28 | |
Jessica Blankenburg | 24 | 28 | |
Christian Feiler | 24 | 28 | |
Rudolf Hattenkofer | 24 | 28 | |
Sebastian Bruhn | 24 | 28 | |
Dario Beckert | 24 | 28 | |
Adrian Jervolino | 24 | 28 | |
Daniel Kowatsch | 24 | 28 | |
Sebastian Wieland | 24 | 28 | |
Patrick Langer | 24 | 28 | |
Kirill Haar | 24 | 28 | |
Daniel Doehring | 24 | 28 | |
Thomas Rößl | 24 | 28 | |
Joshua Rombauer | 24 | 28 | |
Alexander Küchler | 24 | 28 | |
Michael Hesse | 24 | 28 | |
Daniel Schubert | 24 | 28 | |
Simon Rehwald | 24 | 28 | |
Arsatius Regler | 24 | 28 | |
Martin Wauligmann | 24 | 28 | |
Thomas Günzel | 24 | 28 | |
Wolfgang Hobmaier | 24 | 28 | |
Julian Litti | 24 | 28 | |
Roland Würsching | 24 | 28 | |
Niklas Gschwendtner | 24 | 28 | |
Jonas Heintzenberg | 24 | 28 | |
Jakob Huber | 24 | 28 | |
Florian Hölzl | 24 | 28 | |
Nico Metz | 24 | 28 | |
Stefan Wolf | 24 | 28 | |
Matthäus Haselmaier | 24 | 28 | |
Theodor Cheslerean Boghiu | 24 | 28 | |
Maximilian Endraß | 24 | 28 | |
Christian Winter | 24 | 28 | |
Stefan Kreuzer | 24 | 28 | |
Daniel Schneider | 24 | 28 | |
Jan Herlyn | 24 | 28 | |
Julian Schießer | 24 | 28 | |
Markus Ansorge | 24 | 28 | |
Christoph Smutny | 24 | 28 | |
Markus Obrull | 24 | 28 | |
Anna-Kathrin Kopetzki | 24 | 28 | |
Adrian Schultz | 24 | 28 | |
David Frank | 24 | 28 | |
Florian Wiedner | 24 | 28 | |
Philipp Jutzi | 24 | 28 | |
Stefan Cimander | 24 | 28 | |
Alexander Ungar | 24 | 28 | |
Christoph Rudolf | 24 | 28 | |
Dennis Maier | 24 | 28 | |
Kilian Käslin | 24 | 28 | |
Frederic Naumann | 24 | 28 | |
Emanuele Lamanna | 24 | 28 | |
Nathanael Schilling | 24 | 28 | |
Noah Bergbauer | 24 | 28 | |
Lukas Heinzmann | 24 | 28 | |
Michael Eder | 24 | 28 | |
Michael Kubitza | 24 | 28 | |
Lukas Tenbrink | 24 | 28 |
The Master of Competition Senior was glad to see that approximately 260 students decided to participate in the competition, most of which actually take the Info2 course this year. The number is approximate because it is not clear what to do with entries where the closing {-TTEW-} tag was missing or misspelled ({-TTEW -} is not OK) and other odd cases like that. Nor was it necessary to decide what to do in those cases, since the Top 30 solutions didn't suffer from such Schönheitsfehler. Here's a good hint, nonetheless:
When given instructions, follow them.
The chances are good that the MC Sr. uses advanced computer-based technology (Perl and Haskell) to deal with the large amounts of data. If your entry doesn't meet the expected format, you can expect it to fall through the cracks.
Let's start with the longest correct solution:
allDistinct a b c d e = not ((elem a [b,c,d,e]) || (elem b [a,c,d,e]) || (elem c [a,b,d,e]) || (elem d [a,b,c,e]) || (elem e [b,c,d,a]))
This entry has a few gratuitous weaknesses: First, why the needless parentheses? When competing for reducing the number of tokens, parentheses should be the first suspects you go to interrogate: What are you doing here, Left Parenthesis? And you, Right Parenthesis? Just hanging around? You certainly don't want your parentheses to go all lazy and just hang around; you want them to do some work or get the heck out of there. Second weakness: There's no point in checking both that a is not equal to e and that e is not equal to a. Equality and disequality are symmetric operations. But the solution is actually correct, which is more that can be said of many other solutions.
Here's a similar idea, at 54 tokens:
allDistinct a b c d e = if a `elem` [b, c, d, e] then False else if b `elem` [c, d, e] then False else if c `elem` [d, e] then False else if d == e then False else True
No gratuitous parentheses and explotation of the inherent symmetry: good. Needless if—then—else expressions: not so good. Here's a general guideline:
If you find yourself writing True or False in one of the branches of an if, this means you're doing something too complicated.
The operators not, &&, ||, and == are your friends. Keep that in mind. Professor Nipkow's Forschungsgebiet is Logik, so if you want to impress him, you have to demonstrate that you can manipulate the logical operators. Imagine if math textbooks were rewritten so that “if m > 0, then n > 0” would become “if m > 0, then if n > 0 then true else false else true”. The MC Sr. is ready to bet that that textbook wouldn't be very popular with students! They wouldn't exactly rush to the bookstore to buy the second edition, would they? And yet—and this really messes with the MC Sr.'s brain—it is the students and not the instructors who insist on producing such material, year after year, Aufgabenblatt after Aufgabenblatt. God moves in mysterious ways (if at all).
All right, let's move on. The next solution, at 66 tokens, is a quite common one:
allDistinct a b c d e = (a/=b) && (a/=c) && (a/=d) && (a/=e) && (b/=c) && (b/=d) && (b/=e) && (c/=d) && (c/=e) && (d/=e)
Again, are those parentheses just hanging around? Go away, you silly things!
Here's a kind-of-cool solution at 56 tokens, which uses lists:
allDistinct a b c d e = and (zipWith (/=) [a, a, a, a, b, b, b, c, c, d] [b, c, d, e, c, d, e, d, e, e])
The canonical solution, at 46 tokens, was also the MC Sr.'s first iteration:
allDistinct a b c d e = a /= b && a /= c && a /= d && a /= e && b /= c && b /= d && b /= e && c /= d && c /= e && d /= e
Notice how the nonstandard indentation helps visualize the problem. The MC Sr. had another 46-token solution in store, of which he is rather proud:
distinctLoop :: Integer -> Integer -> Integer -> Integer -> Integer -> Bool distinctLoop a b c d e = a /= b && b /= c && c /= d && d /= e && e /= a allDistinct a b c d e = distinctLoop a b c d e && distinctLoop a c e b d
The program is inspired by the Haus vom Nikolaus and the Seven Bridges of Königsberg (now Калининград).
Let's jump to 24 tokens, which corresponds to the MC Sr.'s best solution as well as that found by many competitors populating the “top 30”:
allDistinct a b c d e = 5 == length (nub [a, b, c, d, e])
An odd 23-token variant, which exploits some funky operators:
allDistinct a b c d = ap (==) nub . (:a:b:c:return d)
The winner, Dennis Fischer, has a really crazy 22-token version, using printf!
allDistinct a b c d e = ap (==) nub . words $ printf "%d %d %d %d %d" a b c d e
A couple of alumni (Maximilian Kirchmeier and Daniel Stüwe) found a 21-token solution:
allDistinct a b c d e = ap geq nub [a, b, c, d, e]
Comment from Herr Stüwe:
ich war etwas zu träge meine Lösung zu kommentieren, da ich dachte Moritz Sichert würde meine Version übernehmen, da ich einfach seine Lösung, die er mit Adreas Bergmaier entwickelt hat, optimiert habe, da ich aus dem letzten Jahr noch die Funktion geq kannte. (Sh. Kommentar von Moritz Sicherts Code) Daher bitte ich Sie darum, im Blog explizit zu erwähnen, dass ich mit diesen beiden zusammengearbeitet habe, sonst habe ich ein schlechtes Gewissen.
Based on Herr Fischer's solution, Master Lars Noschinski derived a 12-token solution, given here with all the imports:
import Control.Monad (ap) import Data.Composition import Data.List (nub) import Text.Printf (printf) allDistinct :: Integer -> Integer -> Integer -> Integer -> Integer -> Bool allDistinct = ap (==) nub ∘ words .::. printf "%d %d %d %d %d"
Comment from Herr Noschinski:
Auf Funktionen gilt "ap f g x = f x (g x)". Weiterentwickelt aus der Lösung eines Studenten mit dem Pseudonym "chaosfisch".
Dennis Fischer, a.k.a. Chaosfisch. Here's one name we're likely to see again.
One could go all the way down to 10 tokens by replacing (==) by geq.
A couple of failed attempts at glory are worth mentioning here. This awesome 15-token solution is the shortest (and generally awesomest) wrong solution (apart from undefined ones, which are anything but awesome):
allDistinct a b c d e = length [a..e] == 5
Double you tee eff?!?!?!
It's also possible to use 22 tokens to get to a wrong result:
allDistinct a b c d e = a < b && b < c && c < d && d < e
So what have we learned this week?
- Lists are a great way to deal with information uniformly. Even if the problem isn't a priori about lists, it often pays off to use lists internally.
- There is a direct correlation between the length of the solution and the number of needless parentheses in it.
- Competitions are addictive (as witnessed by our alumni).
And of course:
- If you find yourself writing True or False in one of the branches of an if, this means you're doing something too complicated.
- When given instructions, follow them.
Closing remark: You might wonder why the MC Sr. calls himself the MC Sr. and not just The MC. That's because he's actually not the only MC. The next three weeks, other MC-like characters will make their appearance, while the MC Sr. will be on vacation. Then the MC will be back, only to vanish again to greener pastures at the end of the year.
Die Ergebnisse der zweiten Woche
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Token | Punkte |
1. | Julius Imbery | 42 | 30 |
Daniel Schubert | 42 | 30 | |
David Widmann | 42 | 30 | |
4. | Maximilian Bandle | 43 | 27 |
Florian Gratzer | 43 | 27 | |
Maximilian Reif | 43 | 27 | |
Christoph Rudolf | 43 | 27 | |
Julian Spahl | 43 | 27 | |
Lukas Tenbrink | 43 | 27 | |
10. | Nathanael Schilling | 44 | 21 |
11. | Theodor Cheslerean Boghiu | 46 | 20 |
Martin Wauligmann | 46 | 20 | |
13. | Sebastian Neubauer | 49 | 18 |
Anna Sesselmann | 49 | 18 | |
15. | Maximilian Endraß | 50 | 16 |
Christopher Zentgraf | 50 | 16 | |
17. | Dennis Fischer | 51 | 14 |
18. | Yinshui Chang | 57 | 13 |
Lukas Heinzmann | 57 | 13 | |
Daniel Kowatsch | 57 | 13 | |
Lukas Prantl | 57 | 13 | |
Lukas Reitschuster | 57 | 13 | |
23. | Joshua Rombauer | 58 | 8 |
24. | Heiko Lengenfelder | 60 | 7 |
25. | Philipp Jutzi | 61 | 6 |
26. | Moritz Issig | 63 | 5 |
Thomas Rößl | 63 | 5 | |
28. | Florian Andres | 65 | 3 |
Markus Ansorge | 65 | 3 | |
Simon Rieß | 65 | 3 |
Another week, another exercise, another MC. As was mentioned last week, the old Master of Competition's time with us is almost up, but as you may know, the Master never dies – he simply regenerates into another body (in a process that involves lots of dramatic music and cheap 1970s special effects). And indeed, this week, the MC Sr has regenerated into a new body: younger, somewhat less fashionable, and using British English. However, apart from that, nothing will change in the Wettbewerb.
All right, allons-y! The MC Jr counted a total of 352 student submissions this week. Of these, 157 seem to be real attempts (i.e. not “undefined”). Additionally, the MC Jr received six unofficial submissions from tutors and alumni who apparently have too much time on their hands. As far as the number of tokens is concerned, this week's solutions seem much more diverse than last week's, ranging from minimalistic, concise 42-token solutions up to a colossal (and wrong) 496-token one. (Mind you, the smallest input for which it fails is apparently a list of length 264, but that does not make it any less incorrect.)
Speaking of incorrect solutions: with considerable disappointment, the MC Jr found that despite being warned not to do so by the Meta-Master Tobias Nipkow in the last lecture, two would-be Top 30 competitors attempted to use the logBase function, e.g. like this:
decodeIntPairs ns = [(s n,s $div n 2) | n<-ns,n>=0] where s i = sum [2^k*mod (div i $4^k) 2 | k<-0:[1..(truncate $logBase 2 $fromIntegral i/2)]]
The MC Jr would like to reiterate that using floating point arithmetic in a discrete problem such as this is almost always a very bad idea. For instance, the above fails to decode the encoding of (10^155, 10^155) due to numeric overflow. Since the counterexample is rather large, the automatic test system did not detect the errors; the MC Jr, however, did. Needless to say, the students who submitted these entries received 0 points and a stern look.
Another student submitted the following solution, which also passed all tests.
decodeIntPairs l = [(sum [shiftR n i .&. bit i | i <- enumFromTo 0 32], sum [shiftR n (i+1) .&. bit i | i <- enumFromTo 0 32]) | n <- l, n >= 0]
Note that it contains the ominous constant 32. Yes, that is 32 as in 32 bits. The MC Jr tested this code and found, to his complete and utter lack of surprise, that decoding the encoded pair (2^33,2^33) returned (0,0).
Now, on to to the good bits: The MC Jr was pleased that despite considerable scrutiny on his part, all the other non-undefined entries with a competitive amount of tokens seemed to be correct.
The following 65-token solution is very close to the Musterlösung, and around two thirds of the Top 30 solutions are merely variations of it:
decodeIntPairs a = map decodeIntPair [b | b <- a, b >= 0] decodeIntPair a | a <= 0 = (0,0) | otherwise = (x*2 + mod a 2, y*2 + mod (div a 2) 2) where (x,y) = decodeIntPair(div a 4)
This solution is simple, understandable, and only uses material covered in the lecture.
In contrast, for the next example, someone dove deep into the depths of Haskell's libraries and tried their luck with printf and the Data.Digits module, which provides functions for converting numbers into lists of digits and back again. It led them to the following 46-token solution and place 11:
decodeIntPairs = map ((\n -> (head n, last n)) . map (unDigits 2 . digits 10 . read . reverse) . transpose . chunksOf 2 . reverse . printf "0%b") . filter (>=0)
This solution is so cutting-edge that the MC Jr had to download and instal the very latest version of GHC to compile it. Nevertheless, it is possible to do better with less.
Another student took a very different approach:
{- Im wesentlicha demmer die decodeLeftInt Funktion pro Int zweimal anwenda, einmal auf das originale Int, und einmal auf ein geshiftes Int. Die decodeLeftInt Funktion isoliert jedes 2i-te bit (des hat den wert 4^i wenn gesetzt), teilt dann durch 2^i und summiert am die summe nemma dädat, aber log2(n) <= n, und tokens send teuer. -} decodeIntPairs xs= [(decodeLeftInt x, decodeLeftInt $ div x 2) | x <- xs,x>=0] decodeLeftInt n = sum [n .&. 4^i `div` 2^i | i<-enumFromTo 0 n]
This solution, while reasonably easy to understand, is very inefficient – whilst the problem allows for an obvious linear-time solution, this implementation requires exponential time. The MC Jr acknowledges the clever idea that underlies this solution and the author's thoughtfulness in writing his comments in Schwäbisch for the benefit of his coworkers at the Stuttgarter Firma, but while the MC also agrees that, as the author quite rightly states in his comments, Tokens are expensive – so is time. Desch koscht ais a Geld! Remember: the Aufgabenblatt clearly states:
Lösungen, die gleichwertig bezüglich der Tokenanzahl sind, werden im Hinblick auf ihre Effizienz verglichen.
However, seeing as this is the only solution with this particular number of tokens, it lands in tenth place despite its inefficiency. In the latter half of the Wettbewerb, when efficiency will be the key criterion by which solutions are measured, things will be quite different.
This week's winners are David Widmann, Daniel Schubert, and Julius Imbery with reasonably efficient, very down-to-earth 42-token entries that choose simple arithmetic and clever programming over fancy libraries and dirty tricks:
David Widmann:
decodeIntPair 0 = (0,0) decodeIntPair n = (mod n 2 + snd m * 2, fst m) where m = decodeIntPair $ div n 2 decodeIntPairs = map decodeIntPair . filter (>=0)
Daniel Schubert:
decodeIntPairs = map decode . filter (>=0) decode e = (ti e, ti $ div e 2) ti 0 = 0 ti e = ti (div e 4) * 2 + e .&. 1
Julius Imbery:
decodeIntPairs z = [(help a ,help$div a 2)|a <-z,a>=0 ] help 0 = 0 help z = 2*help (div z 4)+mod z 2
David Widmann's solution is the MC Jr's personal favourite due to its well-chosen function names and general conciseness. It is also formatted quite nicely, and the MC Jr ponders whether to deduct points for poor formatting in the future.
Julius Imbery's code also deserves special mention seeing as it only uses material covered in the lecture so far (apart from the “$” operator, which is just a way to save parentheses). This shows that even without perusing Haskell's extensive libraries for hours, one can write a nice, competition-winning entry.
However, this is not to say that fancy libraries and dirty tricks are not useful for the Wettbewerb; tutor Andreas Bergmaier submitted the following 37-token solution using Arrows, which will not be covered in this lecture (and probably in no other lecture you will ever attend):
decodeIntPairs = map (decodeInt &&& decodeInt . flip shiftR 1) . filter (>= 0) decodeInt 0 = 0 decodeInt n = 1 .&. n .|. decodeInt (n `shiftR` 2) `shiftL` 1
In the same spirit, alumnus and long-term Wettbewerb addict Lorenz Panny privately communicated to the MC Jr the following 36-token solution which uses Arrows, Applicatives and the module GHC.Integer from the very depths of GHC's innards for andInteger:
decodeInt n = sum $ andInteger <*> div n <$> (2^) <$> enumFromTo 0 n decodeIntPairs = map (decodeInt &&& decodeInt . flip div 2) . filter (>=0)
If that was not complicated enough for you yet, the MC Jr was also not immune to the nerd-sniping qualities of the MC Sr's problem and concocted the following 27 tokens of complete gibberish:
decodeIntPairs = filter (>= 0) >>> map (bimap `join` unDigits 2 <<< unzip <<< map (read "[(0,0), (1,0), (0,1), (1,1)]" `genericIndex`) <<< digits 4)
The idea that lies hidden beneath all this is simple enough: convert the input number into base 4, convert each digit (from 0 to 3) to a pair of binary digits using a lookup table, concatenate the results to get two binary numbers, and convert these back to integers. The MC Jr expected some of the students to use lookup tables as well and wonders why, apparently, none of them did.
In conclusion, the lessons for this week are:
- Correctness does not stop at 32 bits.
- Using fixed-precision floating point arithmetic to solve an arbitrary-precision discrete problem will almost always lead to incorrect code
- When the Meta-Master gives you advice, ignore it at your own peril.
- You can fool the automatic tests, but the MC (Jr and Sr) sees everything.
- Your current priorities for the Wettbewerb should be: 1. correctness, 2. brevity, 3. efficiency, 4. elegance and formatting. Keeping your programs sufficiently simple and concise (such as David Widmann's) can go a long way towards fulfilling all four of these while at the same time staying within the scope of the lecture.
- Our tutors have too much spare time; perhaps we ought to give them more work.
Die Ergebnisse der dritten Woche
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Token | Punkte |
1. | Lukas Tenbrink | 8 | 25 |
Martin Wauligmann | 8 | 25 | |
Alexander Küchler | 8 | 25 | |
Christoph Rudolf | 8 | 25 | |
Daniel Schubert | 8 | 25 | |
Lucas Stoffl | 8 | 25 | |
Maximilian Endraß | 8 | 25 | |
Simon Zettler | 8 | 25 | |
9. | Joshua Rombauer | 15 | 22 |
10. | Anna-Kathrin Kopetzki | 17 | 21 |
Thomas Rößl | 17 | 21 | |
12. | Lukas Heinzmann | 18 | 19 |
13. | Maximilian Muth | 20 | 18 |
David Widmann | 20 | 18 | |
Philipp Jutzi | 20 | 18 | |
16. | Manuel Burghard | 23 | 15 |
17. | Julius Imbery | 24 | 14 |
Nathanael Schilling | 24 | 14 | |
19. | Daniel Kowatsch | 25 | 12 |
20. | Kordian Bruck | 27 | 11 |
21. | Arsatius Regler | 28 | 10 |
Thuy Tran | 28 | 10 | |
23. | Dario Beckert | 29 | 8 |
24. | Theodor Cheslerean Boghiu | 30 | 7 |
25. | Jonas Heintzenberg | 32 | 6 |
Chaoran Chen | 32 | 6 | |
27. | Simon Rehwald | 36 | 4 |
28. | Hans Kirchner | 37 | 3 |
29. | Christian Winter | 43 | 2 |
Christoph Smutny | 43 | 2 | |
Rudolf Hattenkofer | 43 | 2 | |
Jakob Huber | 43 | 2 |
Before we start, a word of caution: a number of submissions this week looked remarkably similar, if not quasi-identical. The MC Jr would like to remind everyone again that Gruppenarbeit is not allowed and will result in the subtraction of one zillion points. The solutions in question were short enough that it may very well have been coincidence, but be advised that the MC always maintains a watchful eye on plagiarism. Do not test his lenience.
This week, the MC asked you to write the coalesce function, which helps him compute the overall sum of points for the Wettbewerb. This was no joke – the MC Jr indeed used this function to compute the very ranking you are reading right now. This problem was a lot easier than last week's; nevertheless, the MC Jr only received 85 submissions (plus 6 unofficial ones). The median solution was 53 tokens long and was basically a somewhat less elegant version of the Musterlösung, which has 50 tokens.
The longest solution this week (at 186 tokens) is considerably shorter than last week's 496-token monster. It also seems to be correct, but that is quite hard to verify:
coalesce :: [(String, Integer)] -> [(String, Integer)] coalesce [] = [] coalesce xs = hi xs [] hi :: [(String, Integer)] -> [(String, Integer)] -> [(String, Integer)] hi [] re = re hi (x:xs) re | y && null re = if fst x == fst (head xs) then hi (tail xs) [((fst x), ((snd x)+(snd ( head xs))))] else hi xs [x] | y && fst x == fst r = hi xs ((init re)++[((fst r), ((snd r)+(snd x)))]) | length re /= 0 && null xs && fst x == fst r= (init re)++[((fst r), ((snd r)+(snd x)))] | null xs = re ++ [x] | otherwise = hi xs (re ++ [x]) where y = length xs /= 0 r = last re
Now, on to the TOP 30 solutions. Alas, there was a severe lack in diversity this week. All in all, there were roughly three different equivalence classes of solutions:
- Class 1: For a Fistful of Tokens
The solutions in the lowest part of the Top 30 were all essentially pruned versions of the Musterlösung, e.g. like this:
coalesce (t1:t2:ts) = if fst t1 == fst t2 then coalesce $ (fst t1, snd t1 + snd t2) : ts else t1 : coalesce (t2:ts) coalesce ts = ts
This solution very straightforward, quite readable, and (almost) only uses material from the lecture.
- Class 2: For a Few Tokens Less
Considerably shorter versions are possible with some knowledge of the Data.List and Data.Function module. The groupBy function takes care of the grouping of adjacent elements, thus solving a huge chunk of the problem, e.g. in this 30-token solution:
coalesce = map (\tuple -> (fst . head $ tuple, sum $ map snd tuple)) . groupBy ((==) `on` fst)
The largest part of this solution is concerned with transforming the result of groupBy into the actual result. This can be done using the unzip function (which was mentioned in the lecture) to transform the list and Arrows to transform the tuples. Applying the geq trick from last week can save another two tokens, bringing us down to 17 tokens, e.g. this much shorter and nicer solution by Anna-Kathrin Kopetzki:
coalesce = map (unzip >>> head *** sum) . groupBy (on geq fst)
One student had a variant of Ms Kopetzki's solution in his comments, packaged in a very peculiar fashion, for which the MC Jr would like to award him the Günther Oettinger Award for Creative Use of the English Language:
{- I heard, you like gibberish code, so here I got a present for you ;) -} coalesce = -- **. .** -- -- \ / -- -- -\\----||--||----//------- -} {- ------------------------------------------ -} -- \\ || || // / -} {- LABEL on the Box: -} {- ############################ -} {- ------------------------------------------ -} {- -} {- -} {- Mind your brain when -} {- -} map {- -} {- Wettbewerbs-syndrom strikes back -} {- -}(head *** sum <<< unzip){- -} {- ------------------------------------------ -} {- -} . groupBy {- -} {- To risks and side effects read the hackage -} {- -} (on geq fst) {- -} {- library and ask your tutor or current MC -} {- -} {- -} {- (They know from their own) -} {- ############################ -} {- ------------------------------------------ -}
While the MC is very touched by the thoughtful gift, one should not take the contestant's comments too seriously. First of all, everyone knows that Haskell does not have any side effects and secondly, the MC would not call this code gibberish at all. Apart from the terrible geq trick, this is a very nice and concise solution, and it is, in fact, quite close to the MC Jr's personal solution using Bifunctors (which has 14 tokens):
coalesce xs = bimap head sum . unzip <$> on geq fst `groupBy` xs
A related solution that a handful of people found uses functions from Data.Map and the List monad to merge the elements. The shortest one (adapted from that by tutor Andreas Bergmaier) has 13 tokens:
coalesce = toList . fromListWith plusInteger <=< groupBy (geq `on` fst)
- Class 3: The Good, the Bad and the Unspecified
The MC Jr did not expect 8-token solutions this week, so the fact that eight people handed in one (yes, all eight of them were virtually the same) was quite a surprise to him. He is somewhat ambivalent about these solutions: on the one hand, the MC appreciates unexpected approaches, dirty hacks, and breaking all the rules (at least in the context of the Wettbewerb); on the other hand, the solution is quite a bit too dirty even for the MC Jr's taste:
coalesce = toAscList . fromAscListWith (+)
This solution undoubtedly works. However, as the name fromAscListWith implies, it expects the list it is given to be ascending. The behaviour of the function is unspecified when given a unsorted list, meaning that this solution relies on unspecified implementation details. The current implementation of fromAscListWith builds a binary search tree that violates the invariants for binary search trees when given an unsorted list. Due to the implementation of toAscList, this still works – but other functions from Data.Map return incorrect results.
Another issue the MC Jr has with this solution is that after someone claimed to have an 8-token solution on Piazza and gave a very vague description of how he obtained it, eight people (in total) found the same solution, but all of them failed to see that it could be reduced by another two tokens in a very obvious fashion by using plusInteger instead of (+). The MC Jr suspects that the competitors influenced each other a little too much with this communication and would therefore like to urge everyone (including himself) to wait for the submission deadline in the future before communicating their solutions to anyone – directly or indirectly. That said, the MC Jr is not without blame himself – he could have prevented this solution by requiring coalesce to work on tuple Eq a => (a, Integer) instead of (String, Integer), which would have precluded these solutions.
For these reasons, the MC Jr has decided not to award the full 30 points to these 8-token solutions, but only 25 each. This still leaves a well-deserved 3-point gap between them and the 15-token solution in second place, but also reinforces the fact that the MC Jr is not amused by code relying on unspecified behaviour.
What have we learned this week?
- The creativity behind the submissions sometimes even takes the MC by surprise
- Some solutions for the Wettbewerb use very dark magic, such as internal GHC modules, unsafeCoerce, and unspecified behaviour. The people who do this are either trained professionals or reckless dare-devils (quite possibly both) – do not try this at home.
- Even the tiniest bit of communication can lead to several people discovering the same far-fetched solution simultaneously, but it can also stifle innovation by sending every one of them down the same tracks and making them miss obvious improvements.
Die Ergebnisse der vierten Woche
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Token | Punkte |
1. | Nathanael Schilling | 21 | 30 |
2. | Maximilian Endraß | 22 | 29 |
3. | Philipp Jutzi | 27 | 28 |
David Widmann | 27 | 28 | |
5. | Lukas Tenbrink | 27 | 26 |
6. | Lukas Heinzmann | 31 | 25 |
7. | Manuel Burghard | 32 | 24 |
Michael Eder | 32 | 24 | |
9. | Julius Imbery | 38 | 22 |
10. | Arsatius Regler | 39 | 21 |
11. | Anna-Kathrin Kopetzki | 48 | 20 |
12. | Martin Wauligmann | 54 | 19 |
13. | Christian Winter | 55 | 18 |
14. | Julian Labeit (KIT) | 55 | 17 |
15. | Michael Hesse | 56 | 17 |
16. | Rudolf Hattenkofer | 57 | 16 |
17. | Chaoran Chen | 66 | 15 |
18. | Jonas Heintzenberg | 79 | 14 |
19. | Thomas Günzel | 79 | 13 |
20. | Daniel Schubert | 91 | 12 |
21. | Andreas Schaarschmidt (KIT) | 92 | 11 |
22. | Christoph Rudolf | 94 | 11 |
23. | Florian Weber (KIT) | 98 | 10 |
24. | Max Vogler (KIT) | 99 | 10 |
25. | Fabio Madge Pimentel | 103 | 10 |
26. | Joshua Rombauer | 118 | 9 |
27. | Maximilian Reif | 121 | 8 |
28. | Thomas Rahm | 124 | 7 |
29. | Alexander Book | 133 | 6 |
30. | Julian Spahl | 136 | 5 |
31. | Karsten Diekhoff (KIT) | 138 | 4 |
32. | Dario Beckert | 143 | 4 |
33. | Yinshui Chang | 151 | 3 |
34. | Markus Ongyerth | 159 | 2 |
35. | Kirill Haar | 162 | 1 |
This week, the MC Jr is pleased to be able to welcome the students of the Programming Paradigms lecture by Prof Snelting from Karlsruhe to the Wettbewerb. They can submit solutions until the end of our lecture and they will be ranked among the TUM students in each week's Top 30, identified by a ‘(KIT)’ next to their names. They will, however, not appear in the Semester Top 30, seeing as they did not participate in the first three weeks. Also, for consistency, the Top 30 will be extended in such a way that they always contain at least 30 TUM students and the points for the Semester ranking will be computed as if there were no KIT students.
The MC Jr received 84 student solutions from TUM and 8 from KIT. There were also 6 submissions from tutors and alumni from TUM and 4 from alumni and employees at KIT. A staggering 47 of the TUM solutions and 3 of the KIT solutions did not pass the MC Jr's inspection. Many solutions looped indefinitely for negative numbers in the input or crashed with exceptions and non-exhaustive patterns. A lot of solutions also timed out for input with long lists or large numbers, suggesting non-linear run time. This ubiquity of incorrect and slow (often both) solutions made the MC Jr's life quite hard and he had to employ a wide variety of shell scripts to automate testing enough to get his Top 30.
The variety of solutions was significantly greater than last week. Apart from the already-mentioned ‘Crash Randomly’ and ‘Return Nonsense’, the MC Jr identified roughly six different approaches:
- The Pragmatist
Seeing as we already learned how to encode two integers into one, the obvious approach for this week is to apply this method recursively to encode all list elements and the length of the list itself. Some participants took this approach (plus handling of negative numbers), e.g. like this:
encodeInts [] = 0 encodeInts xs = encodePair (genericLength xs) $ foldr1 encodePair $ map encodeSign xs encodePair 0 0 = 0 encodePair x y = mod x 2 + 2 * encodePair y (div x 2) encodeSign x = abs x * 2 + div (signum x + 1) 2 decodeInts 0 = [] decodeInts x = decodeLoop (decodeLeft x) $ decodeLeft $ div x 2 decodeLoop 1 x = [decodeSign x] decodeLoop n x = decodeSign (decodeLeft x) : decodeLoop (n - 1) (decodeLeft $ div x 2) decodeLeft 0 = 0 decodeLeft x = mod x 2 + 2 * decodeLeft (div x 4) decodeSign x = div x 2 * (mod x 2 * 2 - 1)
While, the MC Jr always appreciates pragmatism, this approach required a lot of tokens and thus only barely made it into the Top 30. It does, however, have the advantage of producing reasonably small encodings, at least if the list elements are roughly the same order of magnitude.
- All your base
One of the most popular approaches was to print the absolute of the given numbers in some base, use additional symbols to encode the sign and the separator between the numbers, and interpret the result as a number in a different base. The obvious choice is to print the numbers in the octal system, i.e. digits from 0 to 7, then use the digits 8 and 9 as separators indicating positive and negative numbers, and interpret the result as a decimal number:
encodeInts = read . encodeHelper encodeHelper (a:rest) = showOct (abs a) $ (if a > 0 then '9' else '8') : encodeHelper rest encodeHelper _ = "0" decodeInts = decodeBreak . show decodeHelper (a, (b:rest)) = (if b == '9' then 1 else -1) * (fst $ head $ readOct $ '0':a) : decodeBreak rest decodeHelper _ = [] decodeBreak = decodeHelper . break (`elem` "89")
The numbers generated by this approach are reasonably small: the encoding of k numbers with n decimal digits each requires only about 1.1⋅ k⋅ n decimal digits.
- The Big Spender
Of course, one does not have to use n-ary encoding. Some students forwent all considerations of efficiency and simply encoded a number x as x occurrences of some fixed character (i.e. unary encoding), followed by some other character as a separator. This, of course, leads to exponential growth in output size. The most peculiar solution of this kind was the following:
encodeInts :: [Integer] -> Integer encodeInts xs = foldl (\h x -> (1 + 2*h) * 2^(if x >= 0 then 2*x else 1 - 2*x)) 0 xs log2 1 = 0 log2 i = 1 + log2 (i `div` 2) decodeInts :: Integer -> [Integer] decodeInts 0 = [] decodeInts i = decodeInts (i `div` 2 `div` 2^a) ++ [if odd a then -a `div` 2 else a `div` 2] where a = log2 $ i `xor` (i-1)
In this case, the list is encoded as a binary string where 1 acts as a separator and the number of 0 inbetween encodes a number from the input list. The log2 $ i `xor` (i-1) part is a very strange and complicated way of obtaining the position of the rightmost 1 bit and suggests that perhaps the author of this solution, like the MC Jr, is an avid reader of Hacker's Delight. The knowledge of low-level binary magic like this is declining rapidly these days and the MC Jr is very glad to find that is apparently still alive in some of his students. For the record, a = popCount (i `xor` (i-1)) - 1 or a = pred . popCount $ ap xor pred i would also have done the trick, and saved quite a few tokens, too.
- The Number Theorist
Any student who paid attention in Diskrete Strukturen will know that:
- There are countably infinitely many prime numbers
- Decomposition into prime numbers is unique
This, of course, means that it is possible to encode arbitrarily many natural numbers into a single natural number as 2a0⋅3a1⋅5a2⋅… . This plus some way to encode negative numbers makes for a nice algebraic solution, especially considering that functions for generating prime numbers and factoring integers are available in the library:
encodeInts = product . zipWith (^) primes . tr> if even x then div x 2 else pred x `div` negate 2) . map genericLength . group . primeFactors
Unfortunately, the numbers generated by this approach are naturally quite large, since the worst case of a singleton list will produce exponentially large encodings, similarly to the unary approach.
- Show, Don't Tell
The laziest students (and also the most successful ones this week) let Haskell do most the work for them by using show or something similar to convert the list to a string and then convert the resulting string into an integer, e.g. like this nice solution by tutor Johannes Bader from KIT:
encodeInts = read . concatMap show . map ord . show decodeInts = read . map chr . map read . chunksOf 2 . show
A more refined version of this is the following 22-token solution by Maximilian Endraß, which interprets the ASCII string representation of the input list as a number in base 94 (which is possible, because the ASCII codes of [],-0123456789 are all below 94):
encodeInts = fst . head . readInt 94 isPrint ord . show decodeInts x = read $ showIntAtBase 94 chr x ""
This week's winner, Nathanael Schilling, used the hex package to achieve a 21-token solution. He prints the list as a string, converts the string to hexadecimal, and reads the hexadecimal string back into an integer:
encodeInts = fst . head . readHex . hex . show decodeInts = read . join . unhex . printf "%x"
Incidentally, this is quite close to the MC Jr's personal 18-token solution:
encodeInts = read . mappend "0x" . hex . show decodeInts = unhex . printf "%x" >=> read
- The Reckless Hacker
Some men just want to watch the world burn – and alumnus Philipp Meyer is certainly one of them. He reached deep into the abyss of unsafe and unportable functions and produced the following 18-token solution, which is basically Mr Endraß's solution on drugs:
encodeInts = unDigits 94 . map unsafeCoerce . show decodeInts = read . map unsafeCoerce . digits 94
It is technically correct (after a fashion), but morally questionable, seeing as it uses the unsafeCoerce function to coerce Char to Integer and vice versa. This is a dirty trick that relies on a lot of compiler internals, and in fact, it fails horribly with the MC Jr's GHC 7.6.3, but with the latest version (GHC 7.8.3), it seems to work fine. Luckily, Philipp Meyer participates außer Konkurrenz, so the MC does not have to worry about whether to give him points for this (impressive, but horrific) hack.
However, Mr Meyer's solution seems positively sane when compared to one that came from an actual student:
encodeInts = unsafeCoerce . encode decodeInts = decode . unsafeCoerce
Like a number of students, he uses encode and decode from the binary package to encode the input list as a ByteString. Unlike other students, who then converted this ByteString into an Integer in the usual fashion, this student decided that type safety is not for him and used the tried-and-tested C++ approach (cf. reinterpret_cast): ByteString, Integer – where's the difference? Everything is a string of bytes in the end, right? Well, yes. But not like this. While the QuickCheck test decodeInts (encodeInts xs) runs through happily without complaining, all hell breaks loose if you try to actually look at the ‘integer’ the encodeInts function returns.
To his credit, he was perfectly aware of the issue and was prudent enough to hand in an alternative 51-token solution for the eventuality that the MC does not find the unsafeCoerce solution to his liking. On the other hand, he also not-so-prudently put his {-WETT-} tags around unrelated, commented-out code by mistake, so the MC will not award any points.
If you thought that this is the end of the line, you are wrong: it appears that type safety is not taken very seriously in Karlsruhe either, as Denis Lohner, an employee at KIT, handed in the following ‘solution’:
encodeInts = unsafeCoerce decodeInts = unsafeCoerce
As one would expect, this solution works about as well as the previous one. Again, the QuickCheck test runs through fine, but when the MC tried to inspect the actual value returned by encodeInts, he spent the following three hours fending off raptor attacks.
This week's lessons:
- Considering the fact that there were more incorrect solutions than correct ones, many competitors would have done well to employ some simple QuickCheck tests before submitting their solution.
- Again: if you want to participate in the Wettbewerb, place your {-WETT-} tags correctly, i.e. {-WETT-}…{-TTEW-}, with an actual solution inbetween. Alternative solutions in the comments are fine, but empty tags or no tags will lead to no points. Also, for our friends from Karlsruhe: do not add additional spaces in the tags, e.g. {-WETT -}. The MC was lenient in this matter this time, but he will not be next week.
- UNIX hacker Henry Spencer allegedly said:
If you lie to the compiler, it will get its revenge.
Along these lines: if you use unsafeCoerce or something similar, you had better know bloody well what you're doing. If your code does not work on the MC's GHC installation, there is a good chance that you will get no points for it. That said, there was an interesting (and morally correct) use of unsafeCoerce in last year's Wettbewerb. - The documentation for unsafeCoerce states that it is
highly unsafe
, but it neglects to mention that it can lead to raptor attacks. The MC filed a bug report and hopes that this fact will be mentioned in future releases. - There were some good submissions from Karlsruhe, but only eight in total. Surely, our Badisch friends can do better than that.
Die Ergebnisse der fünften Woche
The MC Sr. is back from his lengthy vacation. He does not want to reveal too much about his private life (that would be soo un-German and he's trying hard to fit in), except for the fact that he now wears a ring on his left ring finger and there is now a Mistress of Competition at his side.
Now to the results of the fifth week:
Top 30 der Woche | ||||
---|---|---|---|---|
Platz | Wettbewerber(in) | Schlechtheit | Laufzeit | Punkte |
1. | Nathanael Schilling | 4 | 6.92 | 30 |
2. | Lukas Heinzmann | 4 | 7.21 | 29 |
3. | Michael Hesse | 4 | 7.49 | 28 |
4. | Martin Wauligmann | 4 | 8.37 | 27 |
5. | Julius Imbery | 4 | 8.49 | 26 |
6. | Julian Labeit (KIT) | 4 | T/O 2 | 25 |
Fabio Madge Pimentel | 4 | T/O 2 | 25 | |
8. | Florian Weber (KIT) | 4 | T/O 1 | 24 |
9. | Manuel Burghard | 6 | 5.17 | 24 |
Daniel Kowatsch | 6 | 5.17 | 24 | |
11. | Julian Spahl | 6 | 5.18 | 22 |
Rudolf Hattenkofer | 6 | 5.18 | 22 | |
13. | Björn-Aljoscha Kullmann | 6 | 5.20 | 20 |
David Widmann | 6 | 5.20 | 20 | |
15. | Lukas Tenbrink | 6 | 5.21 | 18 |
16. | Christopher Zentgraf | 6 | 5.21 | 17 |
Moritz Issig | 6 | 5.21 | 17 | |
18. | Jonas Heintzenberg | 6 | 5.23 | 15 |
19. | Chaoran Chen | 6 | 5.25 | 14 |
20. | Christian Winter | 6 | 5.26 | 13 |
21. | Adrian Philipp | 6 | 5.28 | 12 |
Sebastian Bruhn | 6 | 5.28 | 12 | |
23. | Theodor Cheslerean Boghiu | 6 | 5.31 | 10 |
24. | Daniel Schubert | 6 | 5.32 | 9 |
25. | Heiko Lengenfelder | 6 | 5.42 | 8 |
26. | Christoph Griesbeck | 6 | 5.68 | 7 |
27. | Dmitriy Zayko | 6 | 5.70 | 6 |
28. | Philipp Jutzi | 6 | 5.77 | 0 |
29. | Martin Sperr | 6 | 5.79 | 4 |
30. | Noah Bergbauer | 6 | 5.82 | 3 |
31. | Sebastian Neubauer | 6 | 5.88 | 2 |
32. | Karsten Diekhoff (KIT) | 6 | 5.89 | 1 |
33. | Michael Eder | 6 | 5.93 | 1 |
(Five points were subtracted from Herr Philipp Jutzi's score, because he fixed a performance bug 9 minutes past the deadline. Unfortunately, that leaves him at 0 points. The slower solution he had submitted on time would have landed him a 34th place, which is also 0 points. Though luck!)
And the Top 4 of tutors and alumni:
Top 30 der Woche | ||||
---|---|---|---|---|
Platz | Wettbewerber(in) | Schlechtheit | Laufzeit | Punkte |
1. | MC Jr. | 4 | 5.09 | 0 |
2. | Martin Hecker (KIT) | 4 | 6.22 | 0 |
3. | Martin Mohr (KIT) | 4 | T/O 2 | 0 |
4. | Daniel Stüwe | 6 | 5.31 | 0 |
The results were obtained by considering two criteria: badness and run-time. A collection of 11 problems were used to determine the badness of the solutions:
map (\j -> (j, j)) [1..10] map (\j -> (j, j)) [1..20] map (\j -> (j, j)) [1..30] map (\j -> (j, j)) [81..100] (map (\j -> (j, j)) [1..15] ++ map (\j -> (j, j)) [1..15]) (map (\j -> (j, j)) [1..10] ++ map (\j -> (j, j)) [21..29]) map (\j -> (j, j)) (take 10 primes) map (\j -> (j, j)) (take 20 (drop 20 primes)) map (\j -> (j, j)) (take 9 primes ++ [99, 4, 100, 101] ++ take 10 primes) concatMap (\j -> Data.List.genericReplicate j (j, j)) (reverse (take 5 primes)) map (\j -> (j, j)) ([1] ++ replicate 15 16 ++ replicate 15 1)
where primes returns the infinite stream of prime numbers, computed using a naive algorithm:
primes = nextPrimes 2 where nextPrimes n = (if null [() | j <- [2..n - 1], n `mod` j == 0] then [n] else []) ++ nextPrimes (n + 1)
(That the stream is infinite may seem to implicate that the above programs won't terminate. However, thanks to Haskell's lazy evaluation strategy, computing finite segments of the primes function, by dropping and takeing elements from the front, can be done in finite time.)
Notice that the longest of these inputs contains 33 elements, which happens to be the number of entries in our Top 30 (= 30 Munchies + 3 Kitties). Submissions that did not terminate within a reasonable time on any of these inputs were simply filtered out. It might be argued that efficiency is only the second criterion, but if the program is unbearably slow, it is impossible for the MC to evaluate the badness! At the limit, one can imagine a program that would take 15 billion years to terminate. Even if this program is correct in a theoretical sense, the MC pragmatically would put it onto the incorrect pile. This is his privilege, especially in the presence of a vast collection of solutions that do terminate in reasonable time.
In general, the context given in the problem description is not just there to embellish it. It gives hints about the input values to expect. From your experience with the competition, you should know that solutions typically have between 10 and 500 tokens and that there are perhaps 100 or 200 students participating. Negative numbers of tokens don't make sense, so you can rely on the MC not being so nasty as to check this case.
To measure the efficiency, the following 11 inputs were tried:
map (\j -> (j, j)) [1..50] map (\j -> (j, j)) [50..1] map (\j -> (j, j)) [1..200] map (\j -> (j, j)) [200..1] map (\j -> (j, j)) (replicate 200 1000) map (\j -> (j, j)) ([1] ++ replicate 25 26 ++ replicate 25 1) map (\j -> (j, j)) ([1] ++ replicate 45 46 ++ replicate 45 1) map (\j -> (j, j)) (take 500 primes) map (\j -> (j, 10 * j)) (take 1000 primes) map (\j -> (j, 100 * j)) (take 1000 primes) map (\j -> (j, j)) (let ss = take 1000 primes in ss ++ ss ++ ss ++ ss ++ ss ++ ss ++ ss ++ ss ++ ss)
The cumulative time of all these examples (including the computation of the primes function) counted as secondary criterion. The memory aspect and the beauty of the code were entirely omitted from the evaluation; this was the MC Jr.'s idea, but the MC Sr. didn't want to try to measure memory, nor was he inclined to judge the code's elegance. Anyway, it's generally a good idea to use little memory and to write elegant code, irrespective of whether this helps you climb the Top 30 or not.
One submission timed out in the fifth test. A few timed out in the tenth test. Those are marked with T/O 1 and T/O 2, respectively, in the table above.
There were 130 solutions overall, including inofficial ones by tutors and alumni and the MC Jr. Of the 130 entires, 9 were from the KIT (Karlsruhe Institute of Technology), whereas 121 were from the MIT (Munich Institute of Technology). A few correctness and basic performance tests ruled out a dozen solutions. The MC Sr. was stunned to see that some otherwise correct solutions were so suboptimal as to assign both elements of
[(1, 1), (1000, 1000)]
to the same MC! These were disqualified right away–the MC Sr. has no time to waste with entries that do not satisfy basic sanity conditions. As they say in Germany, Was der Bauer nicht kennt, frisst er nicht.
It's difficult to partition the solutions as neatly as the MC Jr. did in the last couple of weeks, without spending hours studying each solution. Instead, we will focus on the top ones. This is the #1 solution by Herr Schilling:
{- - This is an implementation of the "complete differencing" algorithm described by - Mertens in http://arxiv.org/ftp/cond-mat/papers/0310/0310317.pdf and introduced by - Korff in http://ijcai.org/Past%20Proceedings/IJCAI-95-VOL%201/pdf/035.pdf - - It is an anytime algorithm that gives progressively better results the longer it is allowed - to run. This is adjustable. In the worst case, it returns a partition found by the Karmakar - and Karp differencing method. In most cases it does better though, and given enough runtime - it always finds the optimal solution. Given that the partition problem is NP-Hard, we stop - looking for better solutions after a certain (non-exponential) number of tries. - - Because the greedy approach usually gives quite good results and is very fast, - that is run first, and only if it doesn't return an optimal partition do we actually - run the complete differencing algorithm. - - This means that this implementation returns partitions that have a lower "badness" at least as low as - that from the greedy and the differencing algorithm. - - The version of GHC that I tested this on was quite good at being lazy, meaning it stopped - searching for better partitions once an optimal partition had been found. - I haven't tested whether this behaviour works on GHC with version < 7.8.3 - - It is assumed that token numbers are non-negative. - - Sorry that this solution is quite long. -} {-Helper functions and data structures-} --Represents two sets of integers. To avoid having to compute the sum each time, we save the sum --of each set. The words "larger" and smaller refer to the fact that we call a Partition "balanced", --if sumlarger is larger than or equal to sumsmaller data Partition a = Partition { sumlarger::Integer ,larger::[(a,Integer)] ,sumsmaller::Integer ,smaller::[(a,Integer)] } deriving (Show) --Partitions can be compared using badness instance Eq (Partition a) where (==) a b = (badness' a == badness' b) instance Ord (Partition a) where compare = compare `on` badness' --Calculate the badness of a *balanced* partition badness':: Partition a -> Integer badness' x = sumlarger x - sumsmaller x --Balance a partition so that it has a positive badness. Basically exchange the "smaller" set with the --"bigger" one if the "smaller" set is actually bigger. balance :: Partition a -> Partition a balance x | badness' x < 0 = Partition (sumsmaller x) (smaller x) (sumlarger x) (larger x) balance x = x --Insert a balanced partition into a descending list of balanced partitions. ins :: Partition a -> [Partition a] -> [Partition a] ins y [] = [y] ins y (x:xs) | y < x = x : ins y xs ins y xs = y : xs --Minimum that "short-circuits" if one side is 0 or 1. If a partitioning has badness of 0 or 1, then we --know that we cannot find a a better partitioning, so we stop looking for one. --The Integer argument is for the count variable of the findMinPartition function. minBad :: (Partition a,Integer) -> (Partition a,Integer) -> (Partition a,Integer) minBad x y | badness' (fst x) <= 1 = x minBad x y | badness' (fst y) <= 1 = y minBad x y | badness' (fst x) < badness' (fst y) = (fst x, snd y) minBad x y = y --The words sum and difference refer to the fact that with combineDifference, the resulting badness is the difference --of the badness of the arguments, and with sumDifference it is the sum of the badnesses of the arguments. --Both functions expect balanced partitions as argumens. combineDifference:: Partition a ->Partition a-> Partition a combineDifference x y = balance $ Partition (sumlarger x + sumsmaller y) (larger x ++ smaller y) (sumsmaller x + sumlarger y) (smaller x ++ larger y) combineSum::Partition a -> Partition a -> Partition a combineSum x y = Partition (sumlarger x + sumlarger y) (larger x ++ larger y) (sumsmaller x + sumsmaller y) (smaller x ++ smaller y) {-The actual algorithm-} findMinPartition :: Integer -> Integer -> [Partition a] -> (Partition a,Integer) --Expects a list of "Paritions" sorted by badness (descending) as the third parameter. --Basically we run findMinPartition recursively on two shorter lists -- and take the minimum of that. --It gets slightly messy because we want to pass around a counter (called c) and stop summing once it --reaches a certain value of max. I.e. max gives some indication of how many steps we want to take to try and get --a better result than the Karmarkar-Karp differencing algorithm. --Setting max to 0 gives the Karmarkar-Karp differencing algorithm for almost all cases. --Call this with c = 0. --Simplest base case is if we have only one partition. findMinPartition c max [x] = (x,c) --If we have less than 5 Elements, differencing is optimal, see --http://arxiv.org/ftp/cond-mat/papers/0310/0310317.pdf findMinPartition c max (x:y:xs) | length xs < 3 = findMinPartition c max $ ins (combineDifference x y) xs --If we have a badness in one parameter that is larger than the combined badness of the other ones, differencing it with the sum of the --other badnesses gives the best result. findMinPartition c max (x:xs) | (badness' x) >= (sum $ map badness' xs) = (combineDifference x $ foldl combineSum (Partition 0 [] 0 []) xs,c) --If the first number is 0, don't bother taking two branches as both branches will be the same. findMinPartition c max (x:y:xs) | badness' x == 0 = findMinPartition c max $ (combineSum x y : xs) --The main case. Get the minimum of the differenced and summed lists. --We sum only if differencing didn't give an optimal result, and if our counter --isn't bigger than max. The counter is only incremented for summing. findMinPartition c max (x:y:xs) = minBad leftResult rightResult where leftResult = (findMinPartition c max $ ins (combineDifference x y) xs) nextcounter = succ (snd leftResult) rightResult = if (nextcounter < max) then findMinPartition nextcounter max ((combineSum x y):xs) else leftResult --The naive greedy algorithm is a lot faster than findMinPartition, and often gives an optimal result. It is computationally --also very cheap, so we run it first before attempting to run findMinPartition. --Expects its input to be sorted in ascending order. greedy::[Partition a] -> Partition a greedy [x]= balance x greedy (x:xs) = combineDifference res x where res = greedy $ xs --Turns an Element (a,n) into a Partition n [(a,n)] 0 [] with which the algorithm can work. --If we want the algorithm to work with negative numbers too, we would have to balance the partitions, --but in the case of only non-negative numbers, the Partitions are balanced like this. createPartitionList :: [(a,Integer)] -> [Partition a] createPartitionList xs = map (\x -> Partition (snd x) [x] 0 []) xs --Replace (genericLength xs + 2000) below for some other expression to limit how --long the algorithm should run for. Note that the actual worst case complexity is then also --scaled by at least length of xs, meaning that we have at least quadratic worst-case runtime overall (I think) --for this choice of max. splitSubmissions :: [(a, Integer)] -> ([(a, Integer)], [(a, Integer)]) splitSubmissions [] = ([],[]) splitSubmissions xs = (larger minimalResult,smaller minimalResult) where minimalResult = fst $ minBad (greedyResult,0) myResult greedyResult = greedy $ sort partitionList myResult = findMinPartition 0 (genericLength xs+ 2000) $ sortBy (flip compare) $ partitionList partitionList = createPartitionList xs
The #2 solution, by Herr Heinzmann, is somewhat shorter, even though it implements a different algorithm when given more than 10000 entries (which the MC didn't test, regrettably–but Info 2 is not a MOOC!):
--Fallunterscheidung nach Länge splitSubmissions :: [(a, Integer)] -> ([(a, Integer)], [(a, Integer)]) splitSubmissions a = if length a >= 10^4 then splitSubmissions' a else splitSubmissions'' a --Einfacher Algorithmus -> Aufsteigend sortieren und Greedy splitSubmissions' :: [(a, Integer)] -> ([(a, Integer)], [(a, Integer)]) splitSubmissions' x = let (a,b,c,d) = ins (sortBy (flip (compare `on` snd)) x) ([],0,[],0) in (a,c) --Zuordnung der Werte (Greedy) ins :: [(a,Integer)] -> ([(a,Integer)],Integer,[(a,Integer)],Integer) -> ([(a,Integer)],Integer,[(a,Integer)],Integer) ins [] x = x ins (x:xs) (a,c,b,d) = if c < d then ins xs (x:a,c+(snd x) ,b,d) else ins xs (a,c,x:b,d + (snd x)) --Komplexerer Algorithmus, Berrechnung der Differenzen, danach Rückbau der Liste splitSubmissions'' :: [(a, Integer)] -> ([(a, Integer)], [(a, Integer)]) splitSubmissions'' (a:[]) = ([a],[]) splitSubmissions'' a = build a (conc(reverse(split (sortBy (flip (compare `on` snd)) a))) ([],[])) ([],[]) -- a's den Integern zuordnen build :: [(a,Integer)] -> ([Integer],[Integer]) -> ([(a,Integer)],[(a,Integer)]) -> ([(a,Integer)],[(a,Integer)]) build [] _ a = a build ((a,b):xs) (c,d) (e,f) = if b `elem` c then build xs (delete b c,d) ((a,b):e,f) else build xs (c,delete b d) (e,(a,b):f) -- Aufteilung der in split berechnenten Werte conc :: [(Integer,Integer,Integer)] -> ([Integer],[Integer])-> ([Integer],[Integer]) conc [] a = a conc ((a,b,c):xs) ([],[]) = conc ((a,b,c):xs) ([a],[]) conc ((a,b,c):xs) (d,e) = conc xs (if a `elem` d then (b:(delete a d),c:e) else (c:d,b:(delete a e))) -- Berechnung der minimalen Differnz und speichern der Werte zum Rückbau split :: [(a,Integer)] -> [(Integer,Integer,Integer)] split ((_,x):(_,xs):[]) = [(x - xs,x,xs)] split ((a,x):(_,xs):xss) = (x -xs,x,xs):(split (sortIns (a,(x-xs)) xss)) -- Hilfsfunktion zum einfügen in eine sortierte Liste sortIns :: (a,Integer) -> [(a,Integer)] -> [(a,Integer)] sortIns a [] = [a] sortIns (a,b) ((x,y):xs) = if b > y then (a,b):(x,y):xs else (x,y):(sortIns (a,b) xs)
The third place, by Herr Hesse, shows that it's possible to get good results without too much pain:
{- Darstellung von Aufteilungen/Abgaben in splitSubmissions: (badness, submissionsList1, submissionsList2) :: (Integer, [(a, Integer)], [(a, Integer)]) badness = badness / Anzahl an Tokens, die submissionsList1 größer ist als submissionsList2 (submissionsList1 ist immer größer) submissionsList1 / submissionsList2 = Abgaben, die MC 1 bzw. MC 2 zugeordnet werden -} splitSubmissions :: [(a, Integer)] -> ([(a, Integer)], [(a, Integer)]) splitSubmissions [] = ([],[]) splitSubmissions submissions = (resultA,resultB) where (_, resultA, resultB) = splitSubmissionsHelp (sortBy sorter (map transformToInnerType submissions)) -- wandelt eine Abgabe in das o.g. Datenformat um, indem jede Abgabe zunächst MC 1 zugewiesen wird transformToInnerType (name, submissionLength) = (submissionLength, [(name,submissionLength)], []) -- splitSubmissionsHelp :: [(Integer, [(a, Integer)], [(a, Integer)])] -> (Integer, [(a, Integer)], [(a, Integer)]) -- fasst in jedem Schritt die beiden Aufteilungen mit der größten badness zu einer Aufteilung zusammen -- Annahme bei Aufruf von splitSubmissionsHelp: List der Aufteilungen ist bereits absteigend sortiert splitSubmissionsHelp [a] = a splitSubmissionsHelp (first:second:xs) = splitSubmissionsHelp (sortBy sorter (joinPartitions first second:xs)) -- joinPartitions :: (Integer, [(a, Integer)], [(a, Integer)]) -> (Integer, [(a, Integer)], [(a, Integer)]) -> (Integer, [(a, Integer)], [(a, Integer)]) -- fasst zwei Aufteilungen A, B zusammen, indem die größere Liste von A mit der kleineren von B und umgekehrt konkateniert wird -- Annahme bei Aufruf von joinPartitions: badness von A > badness von B => badness vom Ergebnis positiv joinPartitions (badnessA, a1, a2) (badnessB, b1, b2) = (badnessA - badnessB, a1++b2, a2++b1) -- vergleicht zwei Aufteilungen anhand ihrer badness (absteigende Sortierung) sorter (badnessA, _, _) (badnessB, _, _) = compare badnessB badnessA
Finally, the MC Jr. has an educational solution that compresses his knowledge on the topic. His solution is, according to the tests, both the least bad (badness: 4) and the fastest (5.09 s). He achieved this by using quite a few packages that require cabal install, though. It would be a pity not to show it:
type Submission a = (a, Integer) tokenSum :: [Submission a] -> Integer tokenSum = sum . map snd badness :: ([Submission a],[Submission a]) -> Integer badness (xs,ys) = abs (tokenSum xs - tokenSum ys) gcdList :: Integral a => [a] -> a gcdList = gcdList' 0 where gcdList' 1 _ = 1 gcdList' acc [] = acc gcdList' acc (x:xs) = gcdList' (gcd acc x) xs -- Pseudo-polynomial algorithm for the optimal solution. Takes O(m*n) time, where m is the length of the input list -- and n is its token sum. Basically, a witness-providing version of -- http://en.wikipedia.org/wiki/Partition_problem#Pseudo-polynomial_time_algorithm . -- Uses memoisation combinators instead of an explicit table. -- Optional threshold allows to use heuristic solution for very large instances to avoid excessively long computations. splitSubmissionsThreshold :: Maybe Integer -> [Submission a] -> ([Submission a], [Submission a]) splitSubmissionsThreshold threshold xs = case threshold of Just t -> if (n + upperB) * genericLength xs > t then upper else optimal Nothing -> optimal where -- If GCD of list elements is non-trivial, reduce list by GCD and unreduce the result. -- In practice, GCD will probably always be 1, but computing it is cheap when it's trivial and -- the reduction saves a huge amount of time when it's not. g = gcdList (map snd xs) xs' = if g == 1 then xs else map (second (`div` g)) xs unreduce = if g == 1 then id else join bimap (map (second (*g))) n = tokenSum xs' upper = splitSubmissionsHeuristic xs -- heuristic algorithm gives upper bound upperB = badness upper optimal = unreduce $ head $ catMaybes $ [findPartitionWithBadness b | b <- [0..upperB - 1]] ++ [Just upper] -- Try to find a partition with badness exactly b, i.e. ‘tokenSum ys + b == tokenSum zs’ -- Since ‘tokenSum ys + tokenSum zs == n’, this is equivalent to ‘2*tokenSum ys == n - b’ -- Hence, if n is odd, there are no such ys, zs and if n is even, we only need to find a subset of xs -- that sums to ‘(n - b) `div` 2’. findPartitionWithBadness b | odd (n - b) = Nothing | otherwise = findSubsetSum ((n - b) `div` 2) (0 :: Integer) xs' -- Find a subset ys of ‘drop i xs’ that sums exactly to a (and its complement zs) -- by putting each x in xs either in ys or zs. Memoisation w.r.t. a and i. findSubsetSum :: Integer -> Integer -> [Submission a] -> Maybe ([Submission a], [Submission a]) findSubsetSum = memoize2 findSubsetSum' findSubsetSum' a _ [] = if a == 0 then Just ([], []) else Nothing findSubsetSum' a i (x:xs) | a < 0 = Nothing | otherwise = fmap (first (x:)) (findSubsetSum (a - snd x) (i + 1) xs) <|> fmap (second (x:)) (findSubsetSum a (i + 1) xs) -- Karmarkar-Karp differencing heuristic: remove the largest two elements and add their difference back in. -- The idea behind this is that the heap contains partial splittings (ys,zs) sorted by their badness. -- We can merge such partial splittings (xs1,ys1) and (xs2,ys2) by combining (xs1++ys2, ys1++xs2) and get a new -- partial splitting whose badness is the difference of the individual badnesses. -- Note the use of Data.Sequence instead of lists. Lists have terrible performance for long inputs due to the many ++. splitSubmissionsHeuristic :: forall a. [Submission a] -> ([Submission a], [Submission a]) splitSubmissionsHeuristic xs = bimap toList toList (aux heap) where heap = H.fromList [(n, (S.singleton (x,n), S.empty)) | (x,n) <- xs] :: MaxPrioHeap Integer (Seq (Submission a), Seq (Submission a)) aux h = case H.splitAt 2 h of ([], _) -> (S.empty, S.empty) ([(_, (ys, zs))], _) -> (ys, zs) ([(m, (ys1, zs1)), (n, (ys2, zs2))], h) -> aux $ H.insert ((m - n), (ys1 >< zs2, zs1 >< ys2)) h -- The brute force approach: generate all combinations, take minimum. splitSubmissionsBrute :: [Submission a] -> ([Submission a],[Submission a]) splitSubmissionsBrute = fst . minimumBy (compare `on` abs . snd) . combs where combs [] = return (([] , []), 0) combs [x] = return (([x], []), snd x) combs (x:xs) = let cs = combs xs in [((x:ys,zs), n + snd x) | ((ys,zs),n) <- cs] ++ [((ys,x:zs), n - snd x) | ((ys,zs),n) <- cs] -- Uses splitSubmissionsOptimal with default threshold of 10^9. Also, uses brute force approach for small input length. splitSubmissions :: [Submission a] -> ([Submission a], [Submission a]) splitSubmissions xs = if genericLength xs < 20 then splitSubmissionsBrute xs else splitSubmissionsThreshold (Just (10^9)) xs
Additional comments (from MC Jr. to MC Sr.):
Here's my code for the submission splitting exercise. It's actually quite short and concise, despite the considerable amount of optimisation I did. I also commented it extensively.
If you call ‘splitSubmissionsThreshold Nothing xs’, you will get the optimal splitting of xs in O(length xs * tokenSum xs) time. This may be useful as a reference implementation to test the student submissions against, as it tells you (reasonably) quickly, what the optimal solution that could be achieved is.
My experience so far: – for very long lists (> 1000) and numbers that are not ridiculously huge, the differencing heuristic almost always finds a solution with badness ≤ 1, which is optimal. For shorter lists, it often fails miserably. – the optimal solution works well with long lists, but does not perform well with very large numbers, which is why I implemented the GCD thing and built in a threshold.
By the way, when testing, I would advise you not to put negative numbers in the input list. Since they represent numbers of tokens, the problem statement implies that the function will only be called on non-negative numbers. My code, for instance, will not be amused by negative numbers.
To the MC Sr., it looks as though this was a fairly common problem. Those competitors who had the reflex to turn to the literature (starting with, e.g., Wikipedia) had a sizable advantage over the others.
A week without lessons would be like a sundae with no cherry on top (or a wedding cake with no icing):
- The MC needs to run your program. As a competitor, you certainly don't want to write programs that overheat the MC's MacBook Pro, because if you overheat his MacBook Pro, this will make him grumpy, and you positively don't want him to be grumpy when he looks at your solution.
- The scientific literature is full of little gems (and big diamonds). If the problem sounds standard, chances are that the solution is so, too. As a general rule, do not reinvent the wheel.
- Also important: If you get inspired by the literature, remember to acknowledge this by a reference (as our top competitors have done). This will make the MC's job easier, but more importantly this will ensure that nobody accuses you of being a thief.
- Submit your solutions on time. The MC Sr. is sometimes slack with a few things, but deadlines are rather strict with him.
- Kalsruh, wake up! Last time the MC checked (which was some years ago, he will admit so much), KIT was an Eliteuniversität. The quality of some of the solutions we get from you guys corroborates this hypothesis, but what about the quantity? Next week, the MC Sr. expects more submissions from you guys; otherwise, he will personally show up in your Vorlesung (distances are not an obstacle for him). As they say in Swabia, Jede Woch isch Kehrwoch.
- The MC Sr. will sometimes go out of his way to insert some (pseudo-)German Sprüche even in contexts where they obviously don't fit.
Die Ergebnisse der sechsten Woche
Although this week's task was posed by the MC Jr., it is ranked by his Sr. counterpart.
First, there were 91 solutions that were not undefined and that compiled. One solution imported Data.Ranges.Ranges. The MC could not make it work, unfortunately–but the MC would like to hear from that student, esp. if he or she (one can never be sure with Vietnamese names) could contact the MC to sort this out.
The MC Jr. made the MC Sr.'s job more tedious by allowing solutions that are commented out. That means the MC (Sr., not Jr.) had to go through all solutions manually and check if there was some commented out code that should be uncommented. In the end, there were exactly 0 (zero, Z-E-R-O) solutions that met this criterion. (OK, a few were commented out, but it was because they were broken, not because they imported exotic packages.)
To be fair, Unix has a nice utility, called grep, that helped automate this tedious task. Moreover, grep helped reveal some other comments, such as this one:
{- Joachim Breitner (KIT, Mitarbeiter) Dies ist ein einfacher Knotenfärb-Algorithmus, der stets den Knoten mit dem größten Grad (also, dem Zeitslot mit den meisten Überschneidungen) zuweist. Anscheinend heißt das auch „Welsh–Powell algorithm“. Es gibt kein Backtracking, das heißt der Algorithmus kann fehlschlagen, auch wenn es eine Lösung gibt. Dafür ist er hoffentlich einigermaßen schnell, dank Vektor als Datenstruktur. Die Adjazenzmatrix `adj` wird dabei nicht verändert; wenn Knoten zugewiesen werden wird ihr Grad auf -1 gesetzt. Über-den-Daumen-Gepeilte Komplexität: O(max(t·n, n·n)), wobei t die Zahl der Tutoren und n die Zahl der Slots ist. -}
What? Did the MC really read that “der Algorithmus kann fehlschlagen”? Sonst geht's gut, oder? ;-)
Programs that sometimes fail to do their job, but that do their job correctly when they actually do it, are said to be partially correct. Partial correctness is nice, because it means you can trust the results (e.g., the schedule that comes out). But what is even nicer is total correctness, whereby the program of interest always does its job (given enough resources, e.g., time and memory). In the example above, one could say that returning (False, []) instead of (True, ...) is an instance of partial correctness, just as entering a loop or throwing an exception. But strictly speaking, it isn't even: The program did its job, but it did it wrong.
For the Wettbewerb, what we care about is of course total correctness! Otherwise, solutions like findMapping = undefined would be winning every week, and the whole contest would be a joke. (Which it isn't, right?)
So here's a suggestion for Herr Breitner: When you have a efficient but incomplete algorithm, you can always have it fall back on a naive, brute-force (and complete) algorithm when it fails. Now, the MC Sr. is not going to judge Herr Breitner too harshly, since he himself found no time to attack the problem; to his defense, he was traveling (again) last week.
Another comment:
{- Diese Lösung ist darauf ausgerichtet, möglichst auch in Worst Case Anfragen - insbesondere bei denen ohne Lösung - in aushaltbarer Zeit zu beantworten. Die durchschnittliche Laufzeit bei einfachen Anfragen mag zwar nicht top sein, aber die paar Augenblicke, die man mehr warten müsste, werden, so hoffe ich, als weniger wichtig erachtet, als dass das Programm mit einer möglichst geringen Wahrscheinlichkeit gar nicht in absehbarer Zeit (Stunden, Tage, Jahre ...) terminiert. -}
Good principle! The next one looks like a good idea, but the MC Sr. doesn't quite get it:
{- - Basically just try all combinations until something works. - What makes it slightly more efficient is that some combinations are discarded. - We initially sort the slots by beginning time, and then make use of the following property: - Let A be a slot that starts first of all the slots. Let B be the (or a) earliest slot that can be matched with A. - Then it is relatively easy to show that if A cannot be matched with B for a big enough matching, then if there is - a big enough matching, there is a big enough matching in which B is matched with a slot that is after B. - So once we know that A with B does not give a big enough matching, we mark B in a way - that the algorithm will not try matching it with an earlier slot later on. -}
What is a “big enough matching”? A definition and an example would have helped here.
Another noteworthy comment:
{- Die Tests funktionieren hier nicht: "[..] excessive memory usage." Die Lösung ist sehr ineffizient und evtl. schwer verständlich. Um dir die Korrektur zu erleichtern, habe ich den Gedankengang notiert: Zur Beschreibung: in findMapping werden erst alle möglichen, sich nicht überschneidende Tupel herausgesucht und in die liste myTimeSlots geworfen. Dann wird für jedes Tupel mit jeweils 2 TimeSlots geprüft rekursiv geprüft, ob es eine aus den übrigen Tutoren und den übrigen TimeSlots eine Lösung gibt. Es entsteht also ein Baum, deren Äste immer bei einer konkreten Kombination der TimeSlots Tupel TimeSlots verzweigen. Anschließend wird die Liste aus Verzweigungen und Ergebnisse der durch die Verzweigung erreichten Kinder aussortiert, sodass nur noch die gültigen Pfade übrigbleiben. head validList bewirkt, dass das erste korrekte Ergebnis entnommen wird. -}
Finally, the last comment could do with a bit of formatting:
{-a very rushed atempt due to schedule problems. my algorithm is trying to solve the problem by trying to put those numbers that need to be on the leftside of the (timeslot,Timeslot) tuple. it will try to match away the highest snd on the right side so that it wont have to be put on the left sidelater on if it fails though dfindMap will ignore that match (put match in another list and combine lists again in the next smMap call ) -}
And surely its author must know that English sentences must be capitalized, that “wont” is spelled with an apostrophe, and that there should be a space in “sidelater”. But after all, this was a “a very rushed atempt”–so what was the MC Sr. expecting? This is not quite Oettinger-level material, but perhaps it qualifies for the Deutsche Bahn Consolation Prize for Suboptimal Use of the English Language?
Incidentally, the MC Sr.'s blog is kind of a rushed job, too. It would be easy for the MC to spend
days studying the solutions in detail, but his financial backers, the Deutsche
Forschungsgesellmeinschaft (DFG) (not to mention the DFG's backers), probably wouldn't be too pleased
about that. On the other hand, it's quite obvious that the MC Jr., who has so far been working on a
volunteer basis, takes his obligations very seriously. For example, the MC Sr. doesn't bother
putting Deutsch words in italics, whereas his Jr. Doppelgänger does.
End of grepping. Now, soundness tests. QuickCheck and its cousin SmallCheck are nice tools, but they often fail to find bugs in the presence of complicated conditions on the input.
First soundness test:
findMapping [] [] == (True, [])
This is the case where there are 0 tutors and 0 slots (with 0 >= 2 × 0). It's rather disappointing to see 8 solutions “fly out”, with (False, []). There are better ways of getting disqualified.
Second soundness test:
findMapping [] [(1, 2), (2, 3)] == (True, [])
Two solutions give (False, []), and one gives
*** Exception: round2_bug2/20409/Exercise_6.hs:(113,1)-(122,74): Non-exhaustive patterns in function findMapping
Third soundness test:
findMapping ["A"] [(1, 2), (1, 2)] in m == (False, [])
We lost 7, which either entered a loop (or were ridiculously slow, which is the same to the MC) or produced odd results such as
(True, []) (True, [("A", (1, 2), (1, 2))])
Now at 73 solutions. At this rhythm, we'll reach a top 30 without even looking at the actual criterion! (Update: How prophetic!)
This is the reason why the MC Sr., when he writes the Aufgabentext, normally gives a long list of trivial looking examples. The MC Jr. is of the opinion that the TUM is an Eliteuniversität and shouldn't spoon-feed the students. Fair enough.
Incidentally, the Mistress of C. pointed out to the MC Sr. that KIT has, regrettably, lost its Elitestatus since the MC visited that venerable institution in 2009. Schade. The MC's goal wasn't to pour sand in the wounds nor to rub it in (even though that wouldn't be inconsistent with his persona). But the many lights in Karlsruhe's intellectual chandelier, among which Prof. Snelting, are still shining as brightly as ever, so don't let this temporary setback worry you even for a second. Indeed, with all due respect to his alma mater, the MC himself went to a much more provincial university for his bachelor, and now he's Tha MC! Take that, eh! He reckons that what's much more important that all that Elite stuff is how much energy and interest you put in your studies (including this very competition). No German Beamte will be able to take that away from you.
Fourth soundness test:
fst (findMapping ["A", "B"] [(1, 11), (2, 12), (11, 21), (11, 22)]) == False
Some odd results:
(True, [("A", (1, 11), (11, 22)), ("B", (1, 11), (11, 22))]) (True, [("A", (1, 11), (11, 21)), ("B", (2, 12), (11, 22))])
Fifth soundness test:
isMapping (findMapping ["A"] [(1, 11), (2, 10), (3, 9), (4, 8), (5, 7), (1, 2)])
(where isMapping just checks that the answers looks OK.) 27 solutions failed to find a solution, even though at least three existed:
[("A", (1, 2), (3, 9))] [("A", (1, 2), (4, 8))] [("A", (1, 2), (5, 7))]
Among the failures is the one by Herr Breitner, which does no backtracking. Korrektheit ist was anderes. ;-)
Sixth soundness test:
isMapping (findMapping ["A", "B"] [(1, 11), (2, 10), (3, 9), (4, 8), (5, 7), (1, 2), (2, 5)])
We lost another 13! (That's thirteen, not 6227020800!) And that leaves us with 31, including the MC Jr.'s and probably some Kitties. It looks like we already have a Top 30, and we're not even done finding the bugs yet.
More soundness tests:
isMapping (findMapping ["A", "B", "C"] [(1, 11), (2, 10), (3, 9), (4, 8), (5, 7), (1, 2), (2, 5), (1, 2)]) fst (findMapping ["A", "B", "C", "D"] [(1, 11), (2, 10), (3, 9), (4, 8), (5, 7), (1, 2), (2, 5), (1, 2)]) == False isMapping (findMapping ["A", "B"] [(1, 10), (10, 11), (10, 20), (11, 12)]) isMapping (findMapping ["A", "B"] [(1, 4), (2, 5), (3, 6), (2, 3), (3, 10), (4, 5)]) isMapping (findMapping ["A", "B", "C"] [(20, 30), (20, 30), (1, 4), (3, 5), (4, 10), (6, 11)]) isMapping (findMapping ["A", "B", "C", "D"] [(20, 30), (20, 30), (20, 30), (20, 30), (1, 4), (3, 5), (4, 10), (6, 11)]) isMapping (findMapping ["A", "B", "C", "D", "E"] [(31, 32), (20, 30), (20, 30), (20, 30), (20, 30), (20, 30), (1, 4), (3, 5), (4, 10), (6, 11)]) isMapping (findMapping ["A", "B", "C", "D"] [(1, 2), (1, 2), (1, 20), (21, 30), (2, 3), (2, 3), (1, 2), (2, 3)])
We lost 3+1+8+1+2+1+1+2 more. (Notice how simpler tests catch more culprits.)
At this point, the MC noticed that the solution by Denis Lohner, Mitarbeiter am KIT, was silently failing due to the absence of the dynamic library libglpk.dylib on the test machine. So unfortunately, we won't know how well Herr Lohner is going to score.
This is also where MC-check gave up. There are 12 solutions left that must be evaluated for speed. The test
findMapping ["A", "B", "C", "D", "E", "F"] [(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (2, 8), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]
caused Herr Bennet Breier (or rather, his solution) to time out (> 10 s). The test
["A", "B", "C", "D", "E", "F"] (replicate 1000 (0, 1) ++ replicate 1000 (0, 2) ++ replicate 6 (1, 2))
caused Herr Johann Frei and Herr Martin Wauligmann to time out. The test
["A", "B", "C", "D", "E", "F"] (replicate 10 (1, 2) ++ replicate 10 (0, 2) ++ replicate 5 (0, 1))
caused Herr Markus Ansorge, Herr Martin Sperr, Herr David Widmann, and Herr Alexander Ungar to time out. The test
["A", "B", "C", "D", "E", "F"] (replicate 50 (1, 2) ++ replicate 50 (0, 2) ++ replicate 5 (0, 1))
caused Herr Thomas Rahm, Herr Daniel Schubert, and Herr Julius Imbery to time out.
This leaves us with Herr or Frau Chaoran Chen as sole winner (sorry, Chinese names aren't the MC's forte any more than Vietnamese ones [*]), ex aequo with the MC Jr.! As for Herr Lohner's solution, we will never know, unless he runs it on his own machine and sends us the results. (Update: The MC Jr. managed to run the example on his machine, but he got a crash and core dump for the empty-list case. Not good.)
[*] Here the MC Sr. is compelled to inject a personal note. As one can imagine, he's used to receive mail addressed to Frau B. But last week, when talking to the electricity company Energie SaarLorLux on the phone, he was shocked to hear “Vielen Dank für Ihren Anruf, Frau Dr. Blanchette” after a ten-minute conversation.
In conclusion, the Top 30 (!!) of the week is as follows:
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Punkte | |
1. | Chaoran Chen | 30 | |
2. | Thomas Rahm | 29 | |
Daniel Schubert | 29 | ||
Julius Imbery | 29 | ||
5. | Markus Ansorge | 26 | |
Martin Sperr | 26 | ||
David Widmann | 26 | ||
Alexander Ungar | 26 | ||
9. | Martin Wauligmann | 22 | |
Johann Frei | 22 | ||
11. | Bennet Breier | 20 |
A pure massacre.
The MC Jr would like to add a thing or two about this week's problem.
It seems that all (correct) submissions employed some sort of backtracking, which means that they are
worst-case exponential. Indeed, some students even conjectured that this problem (like last week's)
was NP-hard and therefore probably does not have a polynomial solution. However, there is a
polynomial solution: if we arrange the time slots as nodes in a graph and connect precisely those time slots that
do not overlap, the problem simply becomes that of finding the largest matching in that graph. This problem can be solved efficiently e.g. by Edmond's Blossom Algorithm, and the MC Jr implemented it here. That said, despite being worst-case exponential, Chaoran Chen's solution works
extremely well on most practical inputs (much better than the MC Jr's) – but when the problems become
‘conflict-rich’ and backtracking kicks in, things look very different.
Herr Lohner used an interesting approach: he used an external library for integer linear programming, which the MC Jr explicitly encouraged in the problem statement, although ILP was not what he had in mind. The MC expected at least one person to use an external SAT solver (such as picosat) to solve the problem, but ILP works as well. Or it would have, if the ILP library in question did not crash on empty lists.
The MC Jr certainly did not aim to create so brutal an assignment; he was under the impression that the brute
force approach was fairly easy to accomplish. It would appear that he was wrong.
Another student who was as baffled by the poor turnout this week as the MCs were asked on Piazza whether his solution, which does not have {-WETT-} tags could be considered for the Wettbewerb after all. Unfortunately, MC Check found that his code was incorrect. For some reason, his solution contains the comment -- The code never bothered me anyway. The MC is not sure what this refers to; he is quite certainly bothered by incorrect code, especially seeing as said student went to such lengths to get it noticed by the MC. Perhaps he should have just let it go.
What are the lessons this week?
- Specifications are to be met, for the same reasons that deadlines are to be honored.
- Before focusing on the Wettbewerb's first criterium (e.g., efficiency), focus on the zeroeth criterium: correctness! The slowest imaginable naive, brute force solution would have gotten more points this week than the most sophisticated incorrect one (namely, 20 vs. 0). These are basically free points.
- Careful about 'em bugs!
- @KIT: This week, we received only 2 student solutions from your side. This is below our lowest expectations. Come on! Schaffe, schaffe, Häusle baue!
- The MC Jr.'s exercises are nastier than the MC Sr.'s.
- A bugfree program is worth ten buggy ones.
Die Ergebnisse der siebten Woche
(Warning: Before you read this week's blog entry, make sure to get all the updates for last week's, by grepping for “pdate”.)
This week, there were exactly 100 non-undefined solutions, all of which compiled (at least after running cabal install). One solution is by a KIT student, another by a KIT tutor; 97 were by Munich students and one by a Munich tutor.
In the first round, the MC Sr. tested the input [(0, 0), (2, 0), (3, 0)], which was basically the running example in the Aufgabenblatt; to his disappointment, 7 solutions were lost at this stage.
He also tested [(0, 0), (1, 1), (2, 0)], wondering what would happen in the presence of an ambiguity (the nearest shelter to (1, 1) is not uniquely defined). Since the Aufgabentext wasn't entirely clear about what to do in that case, solutions that returned all nearest shelters instead of an arbitrary one were tolerated. After all, in a real-life scenario, such ambiguity could be helpful to prevent overcrowding the nearest shelter. It could also lead the refugees to hesitate, thereby wasting precious seconds and giving Godzilla (or Gojira) a chance to crush them.
Round 2 consisted of the correctness test [(0, 0), (2, 0), (5, 0), (12, 0), (0, 6), (2, 6), (5, 6), (12, 6)], which disqualified one solution, leaving 92.
At this point, the MC Sr. couldn't find more bugs. Either (1) the competitors were more careful than last week or (2) the problem was easier. The MC would like to believe it's (1), but he fears it's (2). We'll see further down. When running efficiency tests, the MC might still run into some bugs. (Update: Prophetic words!)
Most of the speed tests will be based on the mkCircle function:
mkCircle res scale = map (\x -> (round (cos (x * 2 * pi / res) * scale), round (sin (x * 2 * pi / res) * scale))) [0..res - 1]
Round 3 tries mkCircle 1000 1000, i.e., 1000 points more or less along the path of a circle of radius 1000. “More or less” because the x- and y-coordinates are rounded to the nearest integers. Solutions that took more than 1 s were kicked out. We lost 8 solutions, plus 1 buggy one, leaving 83.
To detect buggy solutions, the MC simply computed the “cost” of each solution and compared them:
cost pqs = let pqs' = map head (Data.List.groupBy (\(x, _) (x', _) -> x == x') (Data.List.sort pqs)) in sum (map (\((x1, y1), (x2, y2)) -> abs(x2 - x1) + abs(y2 - y1)) pqs')
Outliers were regarded with suspicion. For example, the optimal cost for mkCircle 100 100 is 736, but the buggy solution gave 738. This test is not complete, but it's hard to imagine that a buggy solution would consistently yield correct costs.
In round 4, the MC tried 10 concentric “circles”, Data.List.concatMap (\n -> mkCircle (100 * n) (100 * n)) [1..10], with a timeout of 10 s. This kicked out 37 solutions, leaving 46.
For round 5, the example becomes Data.List.concatMap (\n -> mkCircle n n) [1..80]. No timeouts this time, but one solution gave a result with a cost of 5749 instead of the expected 5747. The code seemed otherwise quite nice and is due to one of the Top 5 of WS 2012/2015, Herr Bergmaier; but the MC shows kein Pardon when it comes to Korrektheit.
Most programming languages have a rather imperfect notion of integers, by artificially squeezing them into 32 or 64 bits. Haskell is one of the few languages out there where the designers had the courage to bet on correctness, at the expense of efficiency. Haskell's Integers (unlike Ints) are true mathematical integers—God's integers as they are sometimes called (even by people who otherwise don't believe in God). When asked to write a function that takes God's integers as input, we should always strive to preserve this godliness.
This longish preamble was just setting the stage for round 6:
mkCircle res scale = map (\x -> (123456789123456789123456789 * round (cos (x * 2 * pi / res) * scale), 234567891234567891234567891 * round (sin (x * 2 * pi / res) * scale))) [0..res - 1] let ps = Data.List.concatMap (\n -> mkCircle (100 * n) (10 * n)) [1]
This was enough to kick out 3 more solutions. One of them had some conversions to Int; the other two had hard-coded constants such as 9999 and 3000 that looked highly suspicious. Um ernst zu sein, the MC expected some solutions to use floating-point arithmetic; but it turns out that those few solutions that did use Double already failed some earlier tests.
Round 7, final round! Lots of concentic circles:
Data.List.concatMap (\n -> mkCircle n (100 * n)) [1..100]
This gives us our Top 30:
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Laufzeit (s) | Punkte |
1. | Dario Beckert | 0.29 | 30 |
2. | Julian Spahl | 0.53 | 29 |
3. | Christoph Rudolf | 0.54 | 28 |
4. | Rudolf Franz Hattenkofer | 4.09 | 27 |
5. | Dmitriy Zayko | 4.13 | 26 |
6. | Lukas Heinzmann | 4.18 | 25 |
7. | Alexander Küchler | 4.30 | 24 |
8. | Markus Obrull | 4.41 | 23 |
9. | Fabio Madge Pimentel | 4.43 | 22 |
10. | Borys Levkovskyi | 4.69 | 21 |
11. | Michael Hesse | 4.77 | 20 |
12. | Chaoran Chen | 4.99 | 19 |
13. | Felix Wielander | 5.14 | 18 |
14. | Sebastian Weiß | 5.18 | 17 |
15. | David Widmann | 5.19 | 16 |
16. | Jonas Heintzenberg | 5.19 | 15 |
17. | Philipp Jutzi | 5.21 | 14 |
18. | Marla Narazani | 5.30 | 13 |
19. | Johannes Stubenrauch | 5.41 | 12 |
20. | Michael Eder | 5.44 | 11 |
21. | Kirill Haar | 5.48 | 10 |
22. | Armin Hörtreiter | 5.76 | 9 |
23. | Baris Kayalibay | 5.97 | 8 |
24. | Sebastian Josef Neubauer | 6.55 | 7 |
25. | Sebastian Bruhn | 6.66 | 6 |
26. | Stefan Cimander | 7.03 | 5 |
27. | Leander Nachreiner | 7.56 | 4 |
28. | Simon Rieß | 7.77 | 3 |
29. | Lukas Prantl | 7.84 | 2 |
30. | Felix Thimm | 8.02 | 1 |
To distinguish between Herr Widmann and Herr Heintzenberg, both at 5.19 s, the MC appealed to the second criterion, the token count, where Widmann had 59
distance (x,y) (x',y') = abs (x-x') + abs (y-y') nearestNeighbors xs = map (\x -> const x &&& snd . minimumBy (compare `on` fst) . map (distance x &&& id) $ delete x xs) xs
whereas Heintzenberg had 63 (but a similar algorithm):
nearestNeighbors xs = map (nearest xs) xs nearest :: [Point] -> Point -> (Point, Point) nearest xs p@(x,y) = (p, snd $ minimumBy (compare `on` fst) $ map (\s@(a,b) -> (abs (a-x) + abs (b-y),s)) (delete p xs))
And what happened to our tutors? Herr Lohner (KIT) got kicked out at round 3 due to correctness. Herr Bergmaier got kicked out at round 5 due to correctness. Herr Brinkop forgot (?) his {-WETT-} and {-TTEW-} tags.
Let's look at the winners. The first place first, by Herr Beckert:
nearestNeighbors :: [Point] -> [(Point, Point)] nearestNeighbors = procGrid [] . map (sortBy (compare `on` snd)) . groupBy ((==) `on` fst) . sortBy (compare `on` fst) {-Process-all-the-things!-} procGrid :: [[Point]] -> [[Point]] -> [(Point, Point)] procGrid _ [] = [] procGrid lft (col : rgt) = map (colFst lft col rgt) col ++ procGrid (col : lft) rgt procSide :: [[Point]] -> Point -> Maybe Point -> Maybe Point procSide [] _ m = m procSide (col : cols) p m | closerThan d p m = m | otherwise = procSide cols p (maybeMin p (procCol col p) m) where d = abs (fst p - fst (head col)) procCol :: [Point] -> Point -> Point procCol [q] _ = q procCol (q : qs) p | p `distTo` q < p `distTo` head qs = q | otherwise = procCol qs p procOwn :: [Point] -> Point -> Maybe Point -> Maybe Point procOwn [_] _ m = m procOwn (q : qs) p m | p == q = if p == head qs then Just p else maybeMin p (head qs) m | otherwise = procOwn qs p (Just q) {-Get-neighbor-or-crash-tryin'-} colFst :: [[Point]] -> [Point] -> [[Point]] -> Point -> (Point, Point) colFst lft col rgt p | closerThan 1 p h = (p, fromJust h) | otherwise = lftSnd lft rgt p h where h = procOwn col p Nothing lftSnd :: [[Point]] -> [[Point]] -> Point -> Maybe Point -> (Point, Point) lftSnd lft rgt p m | closerThan 1 p h = (p, fromJust h) | otherwise = rgtLst rgt p h where h = procSide lft p m rgtLst :: [[Point]] -> Point -> Maybe Point -> (Point, Point) rgtLst rgt p m = (p, fromJust (procSide rgt p m)) {-Utils-} distTo :: Point -> Point -> Integer distTo (px, py) (mx, my) = abs (px - mx) + abs (py - my) closerThan :: Integer -> Point -> Maybe Point -> Bool closerThan _ _ Nothing = False closerThan i p (Just m) = p `distTo` m <= i maybeMin :: Point -> Point -> Maybe Point -> Maybe Point maybeMin _ h Nothing = Just h maybeMin p h (Just m) | p `distTo` m <= p `distTo` h = Just m | otherwise = Just h
This looks like an impressive solution, but without documentation or references it is difficult to make much sense out of it. :-S Perhaps Herr Beckert could send the MC Sr. a description, which could then be inserted here? Vielen Dank im Voraus für Ihre Bemühungen!
The second place, by Herr Spahl, is hard to make sense of again because of the lack of documentation; but it seems very smart!
nearestNeighbors :: [Point] -> [(Point, Point)] nearestNeighbors ps = findNearestAll ps (groupMatrix (sortBy (order)ps) 0 [] []) [] groupMatrix :: [Point] -> Integer -> [Integer]-> [(Integer,[Integer])] -> [(Integer,[Integer])] groupMatrix [] i ls rs = if null ls then rs else (rs++[(i,ls)]) groupMatrix (p:ps) i ls rs |null ls = groupMatrix ps (fst p) ([snd p]) rs |fst p == i = groupMatrix ps (fst p) (ls++[snd p]) rs |otherwise = groupMatrix ps (fst p) [snd p] (rs++[(i,ls)]) findNearestAll :: [Point] -> [(Integer,[Integer])] -> [(Point, Point)]-> [(Point, Point)] findNearestAll [] _ rs = rs findNearestAll a@(p:ps) gs rs = findNearestAll ps gs ((matchx p [] gs) :rs) matchx :: Point -> [(Integer,[Integer])] -> [(Integer,[Integer])] -> (Point,Point) matchx (x,y) bs ( (a@(x2,yswy)): gs ) |null gs && null ys = findNearest (x,y) (tail bs, []) (fst(head bs), (findNearInRow y q (head q))) |null gs = findNearest (x,y) (bs, []) (x2, (findNearInRow y (ys) (head ys))) |null ys = matchx (x,y) ( bs) gs |x2 >= x = findNearest (x,y) (bs, gs) (x2, (findNearInRow y (ys) (head ys))) |otherwise = matchx (x,y) ( a :bs) gs where ys = if x2 == x then delete y yswy else yswy -- rem point !emptylist q = snd (head bs) findNearest :: Point -> ([(Integer,[Integer])], [(Integer,[Integer])])-> Point -> (Point, Point) -- start near findNearest s ([], []) e = (s,e) findNearest s (((x1,ys1):xs), []) e |distance s e >= abs (fst s -x1) = findNearest s (xs, []) (nearer s e (x1, findNearInRow (snd s) (tail ys1) (head ys1) )) |otherwise = (s,e) findNearest s ([], ((x2,ys2):ys)) e |distance s e >= abs (fst s -x2) = findNearest s ([], ys) (nearer s e (x2, findNearInRow (snd s) (tail ys2) (head ys2) )) |otherwise = (s,e) findNearest s (((x1,ys1):xs), ((x2,ys2):ys)) e |distance s e < abs (fst s -x1) = findNearest s ([], ys) n |distance s e < abs (fst s -x2) = findNearest s (xs, []) n |otherwise = findNearest s (xs, ys) n where pu = (x1, findNearInRow (snd s) (tail ys1) (head ys1) ) pd = (x2, findNearInRow (snd s) (tail ys2) (head ys2) ) nf = nearer s pu pd n = nearer s nf e findNearInRow :: Integer -> [Integer] -> Integer -> Integer findNearInRow y [] r = r findNearInRow y a@(x:xs) r -- |null a = r doesnt work |abs (y-x)> abs (y-r) = r |otherwise = findNearInRow y xs x order (x,a) (y,b) |x < y = LT |x > y = GT |a < b = LT |a > b = GT |otherwise = EQ distance :: Point -> Point -> Integer distance p q = abs(fst p - fst q) + abs(snd p - snd q) nearer :: Point -> Point -> Point -> Point nearer s p q |distance s p < distance s q = p |otherwise = q
The third place, by Herr Rudolf, is more to the MC's taste in terms of documentation (although he would have appreciated a margin wrap setting of 80 or at most 100 characters):
{- Version II using a k-d Tree (http://orion.math.iastate.edu/reu/2001/nearest_neighbor/p509-bentley.pdf), which (according to Wikipedia) is a standard approach to the 'nearest neighbor search' and solves the problem reasonably well for few dimensions. In the original paper (as linked above), the NNS is even described as one application for this structure. -} --Regarding the structure of the tree, the slides of the lecture provide us with this definition (except for the Leaf), which fits perfectly: data Tree a = Empty | Node a (Tree a) (Tree a) | Leaf a deriving (Eq, Show) --Cycling through the axis X and Y --This cycling leads to a huge amount of duplicate code, as there are two functions for building a tree by splitting elements and either the x or the y coordinate. --The same situation occurs when traversing the tree, which leads to two almost identical search functions. --I'm curious if there is a design pattern for cases like this? --I thought about passing a function like fst and snd as parameter. but this doesn't solve all problems and pattern matching should be faster. --Functions for building the k-d-tree --If time left: improve the performance by creating a balanced tree (sorting and taking the center element as node)? buildTreeX :: [Point] -> Tree Point buildTreeX [] = Empty buildTreeX [p] = Leaf p buildTreeX ((x,y):points) = Node (x,y) leftSubTree rightSubTree where leftSubTree = buildTreeY (filter (\(a,b) -> a <= x) points) rightSubTree = buildTreeY (filter (\(a,b) -> a > x) points) buildTreeY :: [Point] -> Tree Point buildTreeY [] = Empty buildTreeY [p] = Leaf p buildTreeY ((x,y):points) = Node (x,y) leftSubTree rightSubTree where leftSubTree = buildTreeX (filter (\(a,b) -> b <= y) points) rightSubTree = buildTreeX (filter (\(a,b) -> b > y) points) --The search function itself --I handle multiple coordinates separately, as this doesn't fit too well with my search inside the tree --Why would NYC build multiple of those Godzilla-Panic-Rooms in the same location anyway? nearestNeighbors :: [Point] -> [(Point, Point)] nearestNeighbors points = map mapNeighbor points where tree = buildTreeX points multiples = map head (filter (\x -> length x > 1) (group $ sort points)) --getting all multiples once mapNeighbor p = if elem p multiples then (p, p) --for each, safety room, see if it has a neighbor with the same coordinate else (p, nearestNeighborX p tree) --Functions solving the problem for a single point: --Again, I apologize for the duplicates. nearestNeighborX :: Point -> Tree Point -> Point nearestNeighborX point (Leaf node) = node nearestNeighborX (px,py) (Node (nx,ny) lTree rTree) | lTree == Empty = bestInRight | rTree == Empty = bestInLeft | px <= nx = testOtherBranchX (px,py) (nx,ny) bestInLeft rTree | otherwise = testOtherBranchX (px,py) (nx,ny) bestInRight lTree where bestInLeft = getBest (px,py) (nx,ny) (nearestNeighborY (px,py) lTree) bestInRight = getBest (px,py) (nx,ny) (nearestNeighborY (px,py) rTree) testOtherBranchX point node best Empty = best testOtherBranchX (px,py) (nx,ny) (bestx, besty) nextTree = if (abs (px - nx)) < getDistance (px, py) (bestx, besty) then getBest (px,py) (bestx, besty) (nearestNeighborY (px,py) nextTree) else (bestx,besty) nearestNeighborY :: Point -> Tree Point -> Point nearestNeighborY point (Leaf node) = node nearestNeighborY (px,py) (Node (nx,ny) lTree rTree) | lTree == Empty = bestInRight | rTree == Empty = bestInLeft | py <= ny = testOtherBranchY (px,py) (nx,ny) bestInLeft rTree | otherwise = testOtherBranchY (px,py) (nx,ny) bestInRight lTree where bestInLeft = getBest (px,py) (nx,ny) (nearestNeighborX (px,py) lTree) bestInRight = getBest (px,py) (nx,ny) (nearestNeighborX (px,py) rTree) testOtherBranchY point node best Empty = best testOtherBranchY (px,py) (nx,ny) (bestx, besty) nextTree = if (abs (py - ny)) < getDistance (px, py) (bestx, besty) then getBest (px,py) (bestx, besty) (nearestNeighborX (px,py) nextTree) else (bestx,besty) --Calculating the 'Manhatten distance' getDistance :: Point -> Point -> Integer getDistance (x1,y1) (x2,y2) = abs (x2 - x1) + abs (y2 - y1) --Get the better one of two possible neighbors getBest :: Point -> Point -> Point -> Point getBest p n1 n2 | p == n1 = n2 --Ignore the point itself, however this requires multiple | p == n2 = n1 --safety rooms at the same location to be handled separately | getDistance p n1 <= getDistance p n2 = n1 | otherwise = n2
As an answer to the comment
--Cycling through the axis X and Y --This cycling leads to a huge amount of duplicate code, as there are two functions for building a tree by splitting elements and either the x or the y coordinate. --The same situation occurs when traversing the tree, which leads to two almost identical search functions. --I'm curious if there is a design pattern for cases like this? --I thought about passing a function like fst and snd as parameter. but this doesn't solve all problems and pattern matching should be faster.
the MC would like to point out that you can't have your cake and eat it too. Passing functions as parameters makes the code more generic, and for this you might have to pay a performance penalty (or not).
<comment author="MC Jr">
The MC Jr disagrees. By passing an infinite list of fst and snd to the traversal functions and some other simplifications, the MC Jr was able to
cut the size of Mr Rudolf's code in half. Example:
both :: (a -> b) -> (a, a) -> (b, b) both f (x, y) = (f x, f y) -- call with: buildTree (cycle [fst, snd]) buildTree :: [Point] -> Tree Point buildTree = buildTree' (cycle [fst, snd]) where buildTree _ [] = Empty buildTree _ [p] = Leaf p buildTree (f:fs) (p:points) = Node p l r where (l, r) = let x = f p in both (buildTree fs) $ partition (f `is` (<= x)) points
While it is true that additional generality can cost performance, this is often negligible, especially considering the cost of code duplication (decreased maintainability, more prone to bugs). In this case, the MC Jr's version of Mr Rudolf's code was only about 10% slower than the original one; with compiler optimisations enabled, it is even consistently slightly faster than the original version.
Note: an even better solution would have been to encode the splitting dimension (x or y) directly in the type of the tree, e.g.:
data Dim = X | Y data Tree = Node Dim Point Tree Tree | Leaf Point | Empty
Or, if you want to go above and beyond (the MC Jr, for one, does) and encode the fact that the splitting dimension always alternates directly into the type using Generalised Algebraic Datatypes (GADTs):
data Dim = X | Y type family OtherDim (d :: Dim) :: Dim where OtherDim X = Y OtherDim Y = X data Tree :: Dim -> * where Leaf :: Tree d Node :: Point -> Tree (OtherDim d) -> Tree (OtherDim d) -> Tree d
</comment>
The solution goes on with a naive version and some test data, including this nice pearl:
--In order to prevent last weeks tragedy, we test the function.
Amen! (But “last week's”, not “last weeks”.)
When compiling the entries, the MC noticed that three competitors used Haskell packages for k-d trees. What happened to them? The following comment, by Herr Schilling, is somewhat illuminating:
--Hier eine etwas effizientere Variante mit KdTrees. --Zwar muss für Data.Trees.KdTree die Distanzmetrik als Double angegeben werden, --Es wird aber angenommen, dass dies problemlos möglich ist, weil -- a) Es zur Problemstellung passt - Manhattan ist nicht so groß, es passen also nur so viele -- Schutzräume rein, es sollte also möglich sein diese so als Integer zu enkodieren, dass die Distanzmetrik -- zum quadrat sich problemlos in ein Double umwandeln lässt und nicht zu viel an Präzision verloren geht. -- Um realistisch zu sein gilt sowieso, dass wenn die Distanz zwischen zwei nearest-neighbor Schutzräumen so groß ist, -- dass Umwandlung in ein Double nicht mehr möglich ist, die Fliehenden soweiso keine Chance mehr haben. -- b) Es spät ist, und ich keine Lust habe KdTrees selbst zu implementieren. Aber (a) ist natürlich -- der motivierende Grund dafür, dass Trotzdem Data.Trees.KdTree benutzt wird.
But his solution got kicked out in round 3 already because it was so slow. Something seems to have gone terribly wrong with the use of the data structure. Furthermore, compromising on correctness is always risky, because the MC is likely to catch it. Indeed, the two other competitors who tried Haskell's KdTrees were kicked out by the correctness tests in rounds 2 and 3 already. The MC tends to follow the spirit of the Problemstellung when testing for performance, but for correctness he sticks to the letter of the law. Also, since the MC hadn't specified any units; what if nanometers are used?
What's a k-d tree? The MC doesn't want to repeat what's written on Wikipedia, but you might enjoy reading it more if you replace k by 2 and “hyperplane” by “plane”.
The idea (and much of the wording) for the exercise came from the scientific article “Functional Pearl: Nearest Shelters in Manhattan” by Shin-Cheng Mu and Ting-Wei Chen (Springer). The basic idea, which seems to underlie some of the solutions above, appears to be as follows: If you want to find the nearest Manhattan-distance point from (x, y), it is either going to be to the North-West, North-East, South-West, or South-East. So you can, for each point, find its nearest NW, NE, SW, and SE neighbor, and then select the nearest of those four. This means we can make four sweeps over the data and merge the results at the end. How to perform those sweeps efficiently? The MC would be dishonest if he claimed that he fully understands it; but the details are there on Wikipedia and in the above paper for those who are very curious.
One more thing: The solutions by Herr Levkovskyi and Herr Jutzi returned [((0, 0), (0, 0))] instead of [((0, 0), (0, 0)), ((0, 0), (0, 0))] for nearestNeighbors [(0, 0), (0, 0)]. This was not entirely clear in the Aufgabentext, but there was an example given that should have clarified it. There's no excuse for not trying the examples. Still, since it can be argued that examples are not really part of the Aufgabe, and only serve to complement it, the MC decided to be generous and let that pass. (Indeed, the example hard-coded a specific order for the results, but the text said that the order was unspecified, which shows that the examples should sometimes be taken with a grain of salt.)
What are the lessons this week?
- As usual, it is a good idea to turn to Wikipedia or other sources when solving a standard-sounding problem (such as what is the nearest neighbor to a point in the cartesian plane).
- But remember to give your sources!
- And if you invented the solution all by yourself, you can claim credit for it. This can be done modestly, e.g.: “This is the solution I came up with, inspired by X, Y, and Z. The solution seems natural enough; I would be surprised if nobody else had thought of it before.”
- When turning to the Haskell library, be careful about side-conditions (e.g. conversions to Double!)
- Haskell is a language in which absolute correctness is possible when dealing with integer values (Integer). This should not be sacrificed so thoughtlessly!
- If you do something unsound (incorrect), you have the choice between honesty and keeping quiet in the hope that the MC won't notice.
Die Zwischenauswertung der achten und neunten Woche
Hallo, this is the MC Jr speaking. The MC Jr would like to apologise for the delay, but he was rather busy this week, chasing paper deadlines with one hand and writing an IO mocking system for next week's homework exercises with the other. After all, We do not want any old student to run ugly side-effect-bearing IO actions on our test servers, so the MC Jr hacked together a system that encapsulates IO actions in pure Haskell code and effectively simulates a file system, a network, and a user at the console in order to test your homework submissions.
But now let us have a look at the Wettbewerb submissions so far. There were 196 submissions this week which contained {-WETT-} tags and compiled (plus one by Joachim Breitner from KIT and one by the MC Jr). Funnily enough, only six people (including the MC Jr himself) actually used the advanced interface. Even some of the best solutions simply used nextMove. Anyway, the MC Jr used some grep, sed, and bash magic to create a Haskell module that imports all 198 submissions. Some competitors made the MC Jr's job unnecessarily difficult by handing in solutions using ancient and obscure encodings such as ISO-8859-1 and Mac OS Classic-style line breaks, but that was easily rectified, too.
Then things got a bit hairy: because with the advanced strategy interface, every strategy can use a state of an arbitrary type, you cannot simply put different strategies into a list (since they have a different type, e.g. Strategy Int v Strategy Bool). This can be solved by using an existential type wrapper:
data MyStrategy = forall u. MyStrategy (Strategy u)
Trying to use those wrapped strategies led to the MC Jr's all-time favourite GHC error message: My brain just exploded
Anyway, after a bit of fiddling about, the MC Jr found out how to work with those wrapped strategies:
let r = case (s1, s2) of (MyStrategy s1', MyStrategy s2') -> playGame g (s1', s2')
In the end, the MC Jr required RankNTypes, ExistentialQuantification, and ImpredicativeTypes for his testing code. Okay, now we have a big list of all strategies and a way to let them play against one another, but there are still 198 of them. Let's decrease that number a bit, shall we.
Round 1
In the first round, every strategy plays 50 games against the random strategy, 25 as the red player and 25 as the blue player.
Any strategy that throws an exception or attempts to play an invalid move is disqualified immediately. 110 strategies were
disqualified for throwing an exception (mostly undefined), 3 were disqualified for playing invalid moves (tut tut!), and
another 4 did not terminate within a reasonable amount of time. Of these last four, three were very simple strategies that tried
to return an arbitrary valid move and failed, looping infinitely. The fourth was a very sophisticated solution by Joachim Breitner,
for which the MC Jr had high hopes after seeing an earlier version of it perform quite well. Mr Breitner, please fix your solution by Friday, it would be quite a pity to see it disqualified.
As it says in the Aufgabenstellung, the MC Jr will not wait for more than a few seconds every turn.
Round 2
The remaining 81 strategies play another 100 games against the random strategy. If a strategy does not win at least 60 of these
games, it is culled from the list as it is ostensibly not significantly better than just picking moves at random and thus not worth the MC Jr's precious CPU time. This eliminated another 67 strategies, plus one that played an invalid move. Most strategies either win about half of their games (so they are about as good as the random strategy) or over 90 % of their games (which makes them significantly better than the random strategy), but one strategy actually managed to win only 1 (!) out of its 100 games. This was done by filling columns from left to right as the red player and rows from left to right as the blue player, which, of course, results in a shockingly terrible strategy.
Round 3
At this point, a meagre 13 strategies remain. The MC Jr let each of these strategies play against every other strategy 200 times,
100 times in the combination red/blue and 100 times in the combination blue/red. Every time a strategy wins against another, it receives one point.
In the end, the strategies are ranked with respect to points, yielding the following ranking:
Vorläufiges Strategienranking | ||
---|---|---|
Platz | Wettbewerber(in) | Punkte |
1. | Fabio Madge Pimentel | 2343 |
2. | Christoph Rudolf | 2097 |
3. | MC Jr | 1822 |
4. | Martin Wauligmann | 1550 |
5. | Joshua Rombauer | 1326 |
6. | Chaoran Chen | 1184 |
7. | Lukas Prantl | 1116 |
8. | Sebastian Weiß | 965 |
9. | Christoph Griesbeck | 931 |
10. | Florian Wiedner | 887 |
11. | Sebastian Neubauer | 626 |
12. | Alexander Book | 572 |
13. | David Widmann | 181 |
Don't forget: these points are the points the strategy achieved; they are not Wettbewerb points. Those will be awarded after the deadline on Friday. This is merely a first unofficial glimpse of the current status of the submissions.
As a special treat, you can have a look at two particular games involving Mr Madge Pimentel's strategy defeating the two runners-up effortlessly.
Considering that he invested very little time in his own strategy, the MC Jr is a bit disappointed that he still landed in a respectable third place. Da ist noch Luft nach oben! The MC Jr hopes that the competitors will use the time until Friday to improve their solutions, so until then: Happy Hacking! And remember: there is no such thing as a draw in the Game of Stones.
Die Ergebnisse der achten und neunten Woche
If you're only here to look at some pretty pictures: look here
After some intense testing, the MC Jr has compiled the results for the eighth week. Apparently, some competitors took the MC Jr's call to improve their solutions before the final deadline to heart; third place, for instance, is occupied by Lukas Tenbrink, who was still absent from the preliminary ranking. Joachim Breitner fixed the stack overflow in his code and actually claims that the code came from fgl, Haskell's Functional Graph Library. Apparently, fgl used foldr instead of foldl in one place, leading to very bad performance – more on that in one of the next lectures. All in all, however, the results look very similar to those from one week ago:
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Spielpunkte | Wettbewerbspunkte |
1. | Fabio Madge Pimentel | 1597 | 60 |
2. | Joachim Breitner (KIT-Übungsleitung) | 1548 | n/a |
Christoph Rudolf | 1416 | 58 | |
3. | Lukas Tenbrink | 1237 | 56 |
4. | MC Jr | 1055 | n/a |
Chaoran Chen | 1027 | 54 | |
5. | Julian Spahl | 1020 | 52 |
6. | Maximilian Endraß | 1013 | 50 |
7. | David Widmann | 994 | 48 |
8. | Daniel Schubert | 946 | 46 |
9. | Martin Wauligmann | 935 | 44 |
10. | Joshua Rombauer | 517 | 42 |
11. | Sebastian Weiß | 457 | 40 |
12. | Lukas Prantl | 425 | 38 |
13. | Florian Wiedner | 401 | 36 |
14. | Christoph Griesbeck | 318 | 34 |
15. | Sebastian Neubauer | 219 | 32 |
16. | Alexander Book | 175 | 30 |
The mode of ranking was similar to the Zwischenauswertung: anything that throws exceptions or does not win against the random strategy at least 60% of the time is disqualified, the remaining strategies play 50 games against each other as each colour. Note, in particular, that the MC Jr did not award any points to competitors whose strategies were no better than the random strategy, so we only have a Top 16 this week. This is due to the fact that when strategies essentially pick moves completely at random, whether or not a particular one lands in the Top 30 is more or less down to luck. Besides, it would mean a significantly higher number of strategies to test, and the MC Jr's time, unlike his fondness for writing moderately witty comments on other people's code, is limited.
The MC Jr wrote a Haskell program to generate this nice tournament table, which shows precisely how often each player won against every other player. (an entry n means that the player in that row won n games out of 100 against the player in that column)
FMP | JB | CR | LT | MCJ | CC | JS | ME | DW | DS | MW | JR | SW | LP | FW | CG | SJN | AB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FMP | n/a | 50 | 100 | 100 | 99 | 98 | 50 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
JB | 50 | n/a | 100 | 100 | 100 | 98 | 50 | 100 | 100 | 100 | 50 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
CR | 0 | 0 | n/a | 100 | 91 | 93 | 50 | 100 | 90 | 100 | 100 | 100 | 92 | 100 | 100 | 100 | 100 | 100 |
LT | 0 | 0 | 0 | n/a | 82 | 64 | 50 | 50 | 92 | 100 | 100 | 100 | 99 | 100 | 100 | 100 | 100 | 100 |
MCJ | 1 | 0 | 9 | 18 | n/a | 52 | 59 | 50 | 54 | 89 | 47 | 90 | 94 | 100 | 97 | 100 | 95 | 100 |
CC | 2 | 2 | 7 | 36 | 48 | n/a | 50 | 42 | 32 | 98 | 15 | 100 | 95 | 100 | 100 | 100 | 100 | 100 |
JS | 50 | 50 | 50 | 50 | 41 | 50 | n/a | 50 | 50 | 50 | 50 | 100 | 79 | 50 | 50 | 50 | 100 | 100 |
ME | 0 | 0 | 0 | 50 | 50 | 58 | 50 | n/a | 4 | 50 | 82 | 100 | 69 | 100 | 100 | 100 | 100 | 100 |
DW | 0 | 0 | 10 | 8 | 46 | 68 | 50 | 96 | n/a | 14 | 86 | 89 | 69 | 100 | 97 | 100 | 61 | 100 |
DS | 0 | 0 | 0 | 0 | 11 | 2 | 50 | 50 | 86 | n/a | 47 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
MW | 0 | 50 | 0 | 0 | 53 | 85 | 50 | 18 | 14 | 53 | n/a | 50 | 62 | 100 | 100 | 100 | 100 | 100 |
JR | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 0 | 11 | 0 | 50 | n/a | 46 | 50 | 50 | 100 | 100 | 100 |
SW | 0 | 0 | 8 | 1 | 6 | 5 | 21 | 31 | 31 | 0 | 38 | 54 | n/a | 25 | 55 | 32 | 75 | 75 |
LP | 0 | 0 | 0 | 0 | 0 | 0 | 50 | 0 | 0 | 0 | 0 | 50 | 75 | n/a | 50 | 50 | 100 | 50 |
FW | 0 | 0 | 0 | 0 | 3 | 0 | 50 | 0 | 3 | 0 | 0 | 50 | 45 | 50 | n/a | 50 | 100 | 50 |
CG | 0 | 0 | 0 | 0 | 0 | 0 | 50 | 0 | 0 | 0 | 0 | 0 | 68 | 50 | 50 | n/a | 50 | 50 |
SJN | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 39 | 0 | 0 | 0 | 25 | 0 | 0 | 50 | n/a | 100 |
AB | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 50 | 50 | 50 | 0 | n/a |
If you prefer shiny pictures to dry numbers, take a look at the MC Jr's collection of games rendered as GIFs, but consider yourselves warned: each game is between 600 kB and 3.5 MB. They are, however, very pretty.
The many ‘50’ entries typically stem from the situation where two deterministic strategies play against each one and each strategy wins when it is the Red player and loses when it is the Blue player, as the Red player has a small advantage.
The MC Jr can now also finally give the game its proper name: it is called Hex and was invented by the Danish mathematician Piet Hein and (independently) the famous John Nash. By a standard strategy-stealing argument, it can be shown that the red player has a winning strategy on any rhomboidal board, but for the board size of 11×11, it is not yet known what this strategy is. Some good heuristics for Hex AIs exist, and some competitors implemented some of them to varying success. Let us go through the different approaches:
The last 5: most of the strategies in the last quarter of the Top 18 do very little analysis of the game field and simply pick the first valid move according to a predetermined pattern that is, for varying reasons, slightly better than picking moves at random. As can be seen from the table above, these strategies perform poorly against the more sophisticated strategies in the Top 10, but, mind you, even the last place was worth 30 points this time.
Sebastian Weiß's strategy is already slightly more interesting: in every move, it attempts to place a stone next to one of its other stones along the principal axis (i.e. left-right if red, top-bottom if blue). This strategy is good enough to win against the random strategy nearly every time, but it still does not take into account the opponent's moves at all.
With six lines of code and a solid tenth place, Joshua Rombauer wins the award for the best result with the smallest effort: his strategy, which the MC Jr calls the ‘Goldilocks strategy’, is but a tiny variation of the trivial strategy also realised in the Musterlösung (i.e. performing the first valid move), and that variation is in the order in which the possible positions are traversed: along the principal axis and preferably close to the middle.
Martin Wauligmann finally implemented some reaction to the opponent's play, and this leads to a 400-point distance between his ninth place and the tenth place behind him. If he has a winning move, he does that. If his opponent has a winning move, he tries to block it. (the MC Jr will refer to this as the ‘winning move analysis’ from now on) If neither is the case, he tries to place a stone along the top-left-to-bottom-right diagonal, which is an interesting choice, but does lead to an strange situation where it beats the much more sophisticated solution by Joachim Breitner in second place. (Wauligmann/Breitner)
Daniel Schubert turned the sophistication-knob all the way up to eleven with his solution. He wrote a neural network implementation and a simple AI and then trained the network to play against the AI (for 16 hours (!)). The MC Jr does not know the first thing about neural networks and admires Mr Schubert's enthusiasm and dedication, but since Martin Wauligmann's simple heuristic was nearly as good as the neural network, neural networks are probably not the best approach for this problem. The fact that it was only trained to play against one particular strategy may also have been a factor. He wrote in his comments: As this is my first neural network etc. please don't expect too much from it. I'm pretty sure you will find a better setup for it that gives us way better results
That may or may not be the case, but unfortunately, the MC Jr does not have the time to improve the competitor's solutions for them, especially if he does not understand them. Besides, it would very much go against the spirit of the Wettbewerb.
David Widmann implements the winning move analysis plus a very eager Bridge-building algorithm. Bridges are very useful, since they can be used to safely connect stones; however, David Widmann's strategy places them on the board somewhat indiscriminately, limiting their effectiveness (Schubert/Widmann). It would have been better to place a small number of bridges along the principal axis and then connect them. Regardless, David Widmann's strategy performs remarkably well.
Similarly, Maximilian Endraß also implements a winning move analysis (function names canIwin and dontLetHimWin) plus something with bridges. It appears that he always first tries to set up a bridge and then close it, but like David Widmann, he does not orient his bridges along the principal axis, so he extends his paths somewhat aimlessly, but with good results.
Julian Spahl employs a strangely asymmetric strategy: as the Blue player, he simply takes the first valid move. As the Red player, he seems to just generally go down and right. Surprisingly, this very simplistic stategy works extremely well, winning against almost all of the other strategies almost every time as the Red player (Spahl/Madge Pimentel), but losing almost all the time as the Blue player (Madge Pimentel/Spahl)
Chaoran Chen's solution is a bit of a mystery to the MC Jr. There seems to be some sort of bounded exploration of the game tree in there, but the MC Jr is not quite sure how it works exactly.
The MC Jr's solution employs the winning move analysis plus Dijkstra's algorithm to compute a shortest path from one side of the board to the other and places a stone on a random field of some shortest path. This works reasonably well, but could work a lot better if it tried to employ some more defensive tactics and/or bridge-building as well, like some of the other strategies did.
Lukas Tenbrink implemented a complex bridge-building algorithm that, similarly to the previous bridge-building algorithms, tries to set up many bridges along its principal axis and then close them. It appears to also ‘defend’ its bridges by closing them immediately when the opponent endangers them by putting a stone next to a bridge. (Tenbrink/MC Jr) The great weakness of this strategy is that it attempts to construct paths that would need to run through an already existing enemy path. (Tenbrink/Breitner)
Christoph Rudolf's strategy is very similar to Lukas Tenbrink's, but also implements the winning move analysis. Its greatest weakness seems to be that one can easily cause it to divert its bridge-building efforts from the principal axis, effectively ‘running its path into the wall’. (Rudolf/Madge Pimentel)
Joachim Breitner follows the following interesting idea and seems to do well with it:
-- * Consider all shortest paths of opponent, measured in number of pieces -- that he has to place to connect his two sides. -- * Pick the field where _most_ of these paths pass through. -- * If there are multiple, pick one where most of our own shortest paths -- pass trough -- * If there are still multiple choicse, pick one closest to the center. -- (this is mostly relevant when starting the game)
Fabio Madge Pimentel also seems to use something with graph algorithms, shortest paths, and analysing the best paths for both players to follow and where they collide, but his code is too complex and too sparsely commented for the MC Jr to understand in more depth. Another curiosity is that, despite the Red player having an advantage in Hex, when Mr Madge Pimentel plays against Mr Breitner, each of them wins against the other as the Blue player. (Madge Pimentel/Breitner, Breitner/Madge Pimentel)
The top 2 solutions by Fabio Madge Pimentel and Joachim Breitner win against most other strategies almost all of the time, but intriguingly, as the Blue player, they lose against the 7th-ranked Julian Spahl every single time. (Spahl/Madge Pimentel, Spahl/Breitner)
To summarise:
- More complex and sophisticated solutions do not necessarily perform better than simple ones.
- If you know how to use them, libraries (such as fgl) can be a powerful tool. When they don't produce Stack Overflows.
- While the rules of Hex are arguably even easier than those of Othello, it seems to be quite a bit more difficult to write a reasonable strategy for it.
- Interestingly, performance wasn't an issue at all. All entries (except for the ones that looped indefinitely) finished their games within a second or so.
- The MC Jr seems to be prone to underestimating the difficulty of his Wettbewerbsaufgaben.
- It seems that none of the students from Karlsruhe want to play with us, but Joachim Breitner certainly makes up for what they lack in competitive spirit. Perhaps Prof Snelting's lecture, unlike our Info 2, is just so demanding that nobody at Karlsruhe has any time left to play silly games. ;)
- Writing exercises that are both suitable as Hausaufgaben and Wettbewerbsaufgaben is difficult. Some students actually asked the MC Jr, full of disbelief, whether it really was enough for the Hausaufgaben to return any valid move. Yes, it was, and perhaps that was a bit too simple – but the MC Jr did not see a good alternative.
Die Ergebnisse der zehnten und elften Woche
Great. The MC Sr. is on permanent leave from TUM to go on other endeavours. Additionally, the MC Jr. is on temporary leave to attend some C++ conference or something, speaking about some storms or whatever. With noone left to write the Wettbewerbsblog, some other guy (hereafter referred to as MC) had to bite the bullet, which made him somewhat grumpy. Happy new yearslackers!
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Punkte | |
1. | Felix Thimm | 60 | |
2. | Christoph Griesbeck | 58 | |
3. | David Widmann | 56 | |
4. | Michael Kubitza | 54 | |
5. | Alexander Küchler | 52 | |
6. | Fabio Madge Pimentel | 50 | |
7. | Chaoran Chen | 48 | |
Sebastian Josef Neubauer | 48 | ||
Lukas Tenbrink | 48 | ||
Jonas Heintzenberg | 48 | ||
11. | Martin Wauligmann | 40 | |
12. | Simon Rehwald | 38 | |
13. | Joshua Rombauer | 36 | |
14. | Aser Abdelrahman | 34 | |
15. | Alexander Book | 32 | |
16. | Christoph Rudolf | 30 | |
17. | Lucas Maximilian Stoffl | 28 | |
18. | Adrian Philipp | 26 | |
19. | Dario Beckert | 24 |
The ranking criterion was primarily efficiency, and secondarily beauty of the code. Of course, only correct solutions have been included in the ranking. To increase confidence in the correctness, the MC not only reran the official QuickCheck tests, but also generated some larger example formulas and compared the results of the contestants with the enterprise-grade SAT solver sat4j. (The MC also tried running other programs submitted to the MaxSAT Competition, but they failed to compile, failed to run or generally failed to exist, which made him grumpy.)
In total, 24 programs have been submitted to the Wettbewerb. Two failed the official tests and three more failed the MC's extended tests. Sadly, Herr Lohner's submission failed the official tests for some corner cases (empty clauses); otherwise, it would've been very competitive. It uses linear programming once again and beats the heck out of all other submissions on almost all example formulas. (For some formulas, it's even faster than the enterprise-grade SAT solver.) The MC therefore awards Herrn Lohner a lobende Anerkennung.
After eliminating the five incorrect submissions, we're left with 19 (hopefully) correct submissions. Let's get ranking.
Benchmark 1: pairs of variables
The MC used the following Haskell code to generate formulas of the shape {{x, y}, {-x, y}, {x, -y}, {-x, -y}}.
bench1 k = concatMap mkGroup [1..k] where mkGroup i = group (2*i) (2*i+1) group i j = [[-i,j],[i,j],[-j,-i],[-j,i]]
The parameter k denotes the number of pairs of variables to generate. The result is always a formula with 2 · k variables and 4 · k clauses. For all assignments, there are 3 · k satisfied clauses. This formula is nice because one of the optimization techniques which could be applied here is separation of independent subformulas.
Almost all contestants, except for three, performed under 0.5s for k = 8. The MC decided to eliminate them at this point before continuing, securing them ranks 17 to 19. Funnily enough, one submission features the following, extremely optimistic, but also misguided claim:
-- Not bad...looks like a working solution...not sure if better than brute force, though...looks better in any case -- It should be O(|vars|log(|vars|)) (due to sorting), if I see it correctly, while brute force would be O(2^|vars|)(?)
The answer is: No. The MC would like to point out that MaxSAT is an NP-hard problem, making itlikely exponential in the general case. Two years ago, the Meta-Master Tobias N. offered a Doktortitel for a polynomial solution to the SAT problem, but to no avail. Looks like he can't give one out this year eitheryou lazy bums.
The MC had to kill one program after consumption of 10 GB of memory, which made him grumpy. Looking at the source, the author apparently tried to be too clever. Look, the MC – of all people – knows that some of the functions in the standard library are inefficient (he's looking at you, nub), but if you don't even test your reimplementations against large inputs where they're leaking space like they're Edward effin Snowden, that'll only make the MC grumpyyou do not want to have the MC be grumpy because of you.
Christoph Rudolf really exceeded in this benchmark, even for large values of k. He also beat the enterprise-grade SAT solver by some orders of magnitude. Unfortunately, not in any other benchmark.
Benchmark 2: all but one satisfied
Next, the MC wrote this piece of code to generate formulas with an optimal solution satisfying all but one clauses:
bench2 k = map mkClause [1..k] ++ [finalClause] where mkClause i = i : [-i+1.. -1] finalClause = map negate [1..k]
For example, with k = 3, the resulting formula is {{1},{2,-1},{3,-2,-1},{-1,-2,-3}}. These have also been used two years ago in the SAT competition.
To the MC's surprise, all 16 remaining contestants performed quite well (< 0.5s) in this benchmark for k = 15. He decided to crank up the difficulty to 11 20. In that case, execution times ranged from 0.40s to 27.52s.
Benchmark 3: forged from research
Did the MC already mention the MaxSAT competition? Well, they have a website, and they offer their test data for download. The MC picked the smallest (around 60 variables) example from the crafted category (meaning: formulas designed in a way to exhibit worst-case runtime behaviour). The enterprise-grade SAT solver solved it instantaneously, whereas no contestant could solve it in less than 30s. The MC became grumpy and declared this benchmark a failure.
Benchmark 4: forged from randomness
Lastly, the MC let his computer come up with some formulas. They're not as pretty as artisanal, hand-crafted ones, but they'll do. Of course, all submissions were ran on the exact same formulas.
All random formulas consist of n variables and k clauses, with each clause composed of two to seven literals. For the first two formulas with n = 10, all submissions were very fast (< 1s). The MC decided to make it more difficult and produced two more: (n, k) = (15, 200) and (n, k) = (17, 300). The enterprise-grade SAT solver took around 9s for the last one.
Benchmarking results
Here are the raw results from the above benchmarks. Every benchmark has two columns: the absolute running time of each submission in seconds and its relative running time compared to the longest-running submission.
Bench 1 | Bench 2 | Bench 4a | Bench 4b | |||||
---|---|---|---|---|---|---|---|---|
Michael Kubitza | 6,90 | 50% | 4,47 | 16% | 1,07 | 44% | 6,64 | 42% |
Martin Wauligmann | 11,02 | 80% | 6,72 | 24% | 1,49 | 61% | 9,78 | 62% |
Alexander Küchler | 7,56 | 55% | 4,47 | 16% | 1,13 | 47% | 7,25 | 46% |
Fabio Madge Pimentel | 8,36 | 61% | 0,40 | 1% | 0,87 | 36% | 10,78 | 68% |
Alexander Book | 12,87 | 94% | 10,83 | 39% | 1,88 | 77% | 12,19 | 77% |
Christoph Rudolf | 0,01 | 0% | 27,52 | 100% | 2,43 | 100% | 15,80 | 100% |
Sebastian Josef Neubauer | 7,84 | 57% | 5,18 | 19% | 1,26 | 52% | 7,87 | 50% |
Felix Thimm | 1,27 | 9% | 0,18 | 1% | 0,10 | 4% | 0,57 | 4% |
Chaoran Chen | 7,99 | 58% | 5,02 | 18% | 1,23 | 51% | 7,89 | 50% |
Lukas Tenbrink | 8,35 | 61% | 5,54 | 20% | 1,24 | 51% | 8,08 | 51% |
Joshua Rombauer | 11,76 | 86% | 7,32 | 27% | 1,67 | 69% | 10,89 | 69% |
David Widmann | 6,56 | 48% | 4,15 | 15% | 0,97 | 40% | 6,41 | 41% |
Christoph Griesbeck | 2,84 | 21% | 0,68 | 2% | 0,37 | 15% | 2,43 | 15% |
Aser Abdelrahman | 13,69 | 100% | 7,96 | 29% | 1,83 | 75% | 11,95 | 76% |
Jonas Heintzenberg | 8,27 | 60% | 5,29 | 19% | 1,27 | 52% | 8,33 | 53% |
Simon Rehwald | 11,77 | 86% | 6,98 | 25% | 1,52 | 63% | 10,13 | 64% |
sat4j | 0,53 | 10,72 | ||||||
Denis Lohner | 0,01 | 0,01 | 0,16 | 1,21 |
The MC assigned the ranks 1-16 by comparing the average of the relative running times. Some of the submissions are really close with that respect, so the secondary criterion kicks in. The four submissions sharing rank 7 are equal in that respect: concise and readable, but no comments. Don't do that.
Finally, let's have a look at the winning entry by Felix Thimm. It scored best in all benchmarks but the first one (where it scored second-best):
vars :: [Clause] -> [Var] vars clauses = nub [abs l | l<-(concat clauses)] isSatisfied :: Clause -> Assignment -> Bool isSatisfied (l:ls) ass | l<0 = notElem (-l) ass || isSatisfied ls ass | otherwise = elem l ass || isSatisfied ls ass isSatisfied _ _ = False countSatisfied :: [Clause] -> Assignment -> Int countSatisfied (c:cs) ass = if isSatisfied c ass then 1+countSatisfied cs ass else countSatisfied cs ass countSatisfied _ _ = 0 maxSat :: [Clause] -> (Assignment, Int) maxSat [] = ([],0) maxSat clauses = (litsToAss a,b) where (a,b) = msh clauses (vars clauses) litsToAss :: [Literal] -> Assignment litsToAss (l:ls) = if l<0 then litsToAss ls else (l:litsToAss ls) litsToAss _ = [] msh :: [Clause] -> [Var] -> ([Literal],Int) msh _ [] = ([],0) msh cs [v] = if (snd (removeSatisfied cs v)) > (snd (removeSatisfied cs (-v))) then ([v], snd $ removeSatisfied cs v) else ([(-v)], snd $ removeSatisfied cs (-v)) msh cs (v:vs) = if (snd $ removeSatisfied cs v) + y2 > (snd $ removeSatisfied cs (-v)) + n2 then ((v:y1), (snd $ removeSatisfied cs v) + y2) else (((-v):n1), (snd $ removeSatisfied cs (-v)) + n2) where (y1,y2) = msh (fst $ removeSatisfied cs v) vs (n1,n2) = msh (fst $ removeSatisfied cs (-v)) vs removeSatisfied :: [Clause] -> Literal -> ([Clause],Int) removeSatisfied [] _ = ([],0) removeSatisfied (c:cs) l = if elem l c then (a,b+1) else ((c:a),b) where (a,b) = removeSatisfied cs l
No comments. I repeat: no comments. This makes the MC grumpy. What is even going on there? What does msh even mean? The MC literally can't even
So what about the second-best entry?Please let it have comments. Please. The MC is pleased to see that Christoph Griesbeck put some comments in there.Finally! The author made some interesting optimization for the assignment check, and gave that a funny name:
--faster than isSatisfied but expects the clause and the assignment sorted by abs isSanta :: Clause -> Assignment -> Bool isSanta [] _ = False isSanta (x:xs) [] = (x<0) || isSanta xs [] isSanta (x:xs) (y:ys) | abs x == abs y = h x y || isSanta xs ys | abs x > abs y = isSanta (x:xs) ys | otherwise = (x<0) || isSanta xs (y:ys) where h l v = l + v /= 0
(the formatting is messed up because – get this – the author used tabs in his sources)
Alright, this concludes this week's blog. The most important lesson this week is: For the love of your favoured deity, put some comments in your code! Oh, and don't make false promises in them :-) The bonus lesson is that even with little effort, you can get good results and even beat enterprise-grade SAT solvers (if only in certain cases).
Die Ergebnisse der zwölften Woche
This week, the Wettbewerbsauswertung was in the hand of one of the MC Jr.'s seniors (not to be confused with the MC Sr. and with Junior Senior).
Update: Due to a regrettable mistake, the wrong student got assigned rank 8. (They didn't even participate in the competition.) Apparently, the MC Jr.'s senior made a mistake while juggling submission IDs. Sorry! The rankings have now been updated accordingly.
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Punkte | |
1. | Martin Wauligmann | 30 | |
2. | David Widmann | 29 | |
3. | Joshua Rombauer | 28 | |
4. | Maximilian Endraß | 27 | |
5. | Lukas Prantl | 26 | |
6. | Thomas Günzel | 25 | |
7. | Fabio Madge Pimentel | 24 | |
8. | Chaoran Chen | 23 | |
9. | Yinshui Chang | 22 | |
Jonas Heintzenberg | 22 | ||
11. | Julian Spahl | 20 |
On the first look, the number of submissions this week was on par with the submissions the week before. Twenty students, poised for the trial, all disquietude silenced by the reassuring green of the testing system. The MC Jr.'s Sr. however was all but flabbergasted when a bit of scrutiny revealed a substantial omission: Any of these twenty submissions produced legitimate anagrams. But only half of them were composed careful enough to always find an anagram if such a thing existed.
An eleventh submission failed this trial, maybe because its author was overly meticulous: Julian Spahl deemed it unbefitting for an anagram to contain any of the fragments making up the original phrase. And he was right with that! But, to adapt the sayings of a most acclaimed fictional detective: „How often have I said to you that when you have eliminated the impossible, whatever remains, however dull, must be the result?“ Assuming that this choice was made deliberately, out of an overwhelming need for style, the MC Jr.'s Sr. mercifully awarded him the eleventh place.
As promised, scalability is of utmost importance to the MC Jr.'s Sr. And while every student ought to expect the unexpected, the MC Jr.'s Sr. felt the surviving submissions deserved a bit of rest after the startling loss of half their competitors. Therefore (and definitely not out of laziness) the MC Jr.'s Sr. decided to start with the examples from the Aufgabenblatt, giving an ample timelimit of 120s and with 8 Gibibytes a considerable amount of memory to generate english anagrams for „Jasmin Blanchette“
"e e n n a a t t i l s m h j b c" "a a b c e e h i j l m n n s t t" "anaesthetic lbj mn" "anaesthetic lbj mn" .KILLED. "anaesthetic lbj mn" "bc haj sentimental" "a a b c e e h i j l m n n s t t" "a a b c e e h i j l m n n s t t" .KILLED.
While Julian Spahl will be appalled by the lack of style of the one-letter sequence and others may wonder, why an english wordlist contains words like „lbj“ and „mn“, a third group will imagine the picture of a sad kitten, missing its cheezburger, bc haj sentimental. Despite this amusing imagery, the MC Jr.'s Sr. knows that this is not the time to wonder (about style), nor to listen to Fury in the Slaughterhouse, and swiftly demotes both candidates failing to deliver a result (Jonas Heintzenberg and Yinshui Chang) to the ninth place in the weekly rating.
Considering the second example, i.e., english anagrams of „Master of Competition“ , does not help the MC Jr.'s Sr. to kick out further candidates.
"e e n a t t t i i o o o r s m m p c f" "a c e e f i i m m n o o o p r s t t t" "commiseration poet ft" "commiseration poet ft" "commiseration poet ft" "af memo protectionist" "a c e e f i i m m n o o o p r s t t t" "a c e e f i i m m n o o o p r s t t t"
Looking at the result, it becomes clear that two sets of candidates share the same idea of what constitutes a good anagram, although th MC Jr.'s Sr. is quite unsure whether there exists anybody else who considers „a c e e f i i m m n o o o p r s t t t“ to be an examplery member of the anagram society. This abundant use of single letters is a despicable way to generate anagrams. Therefore, the MC Jr.'s Sr. decided force the participants to make an honest attempt. If he required anagrams for „Haskell ist eine wunderschoene funktionale Programmiersprace“, consisting only of german words of length 5 or more, how would the contenders hold up?
"tenne tenne daene seele arosa kairo irrig fromm klipp schuh schwul" .KILLED. "persoenlichkeitsentwicklung parfuemhaendler rosenheim arosa" "persoenlichkeitsentwicklung parfuemhaendler rosenheim arosa" "persoenlichkeitsentwicklung parfuemhaendler rosenheim arosa" "areal hafer sauer endomorphismen persoenlichkeitsentwicklung" "aechten aechten aegide aehre eprom fluor knirps linus worms links" "druckfaehig kleptomanisch wahltermin liane epson neuer roesser"
Who would have thought? With persuasion and just a little pressure, the results for most of the candidates improved considerably. For some reaons, this brings the yearly discussion of whether or not to offer a Hausaufgabenbonus back to the MC Jr.'s Sr.'s mind. Chaoran Chen's submission drops out in this round, rewarding him the eighth place.
As a last round, the MC Jr.'s Sr. decided for something a tad self-referential: Finding a anagram consisting of three words for the word „Dreiwortanagrammsuche“, as fast as possible. This word consists of 21 letters, so this can be enforced by having no words of length five or less in the word list.
0m00.390s: "aemter waschraum dornig" 0m00.462s: "armada erogen wurmstich" 0m01.156s: "daenisch geraum wortarm" 0m04.151s: "anrede wortarm schaumig" 0m13.439s: "amsterdam gaucho wirren" 0m48.074s: "amsterdam gaucho wirren" 1m33.111s: "amsterdam gaucho wirren"
These results (now ordered by time instead of by submission id) directly translate into the Top Seven of this week. It is refreshing to see that the winning submission is both short and well-structured:
anagram :: [String] -> String -> Maybe String anagram ws p | null solutions = Nothing | otherwise = Just $ space $ head solutions where solutions = anag words phrase words = map lower ws phrase = lower $ filter isAlpha p space = intercalate " " lower = map toLower anag _ [] = [[]] anag words phrase = [w:a | w <- pWords, a <- anag pWords $ phrase \\ w] where pWords = filter isSub words isSub w = Map.isSubmapOfBy (<=) (toBag w) pBag toBag w = Map.fromListWith (+) $ zip w $ repeat 1 pBag = toBag phrase
The lesson of this week is one which probably does not need to be repeated for the ten hopeful contenders which prematurely dropped out of this weeks evaluation: Just because the tests succeed, there is no guarantee that your solution is correct. Write your own tests for your code. And if that is not enough for you after this humiliation, prove your code correct! Ask us about Loom Isabelle.
Die Ergebnisse der dreizehnten Woche
Finally, the MC Jr has returned from his visit to India, and he brought with him not only a nasty cold, but also a renewed appreciation for the Straßenverkehrsordnung, the thousands of behördliche Lebensmittelkontrolleure, and similarly German things.
Anyway, onwards to this week's Wettbewerbsaufgabe, which Lars Hupel stole from the 23rd BWINF (Round 1, Exercise 5). The MC Jr was mildly shocked to find that only 6 (!) submissions passed the automatic tests of the Abgabesystem this week. While somewhat disappointing, this did make the evaluation a lot easier.
The MC Jr decided to use the multi-round elimination approach pioneered by the MC Sr in his evaluation of the SAT exercise two years ago: all the submissions are compiled with -O and thrown at some reasonably interesting problem instance with the hope that their running times differ enough to eliminate some of them, until some ranking is obtained.
Round 1
For the first round, we use a pathological example in we add n rectangles in such a way that each of them intersects all previous ones, leading to a total of O(n4) rectangles. (to be precise: OEIS A213840, but do not ask the MC Jr why)
test :: Integer -> [Rectangle] test n = foo n 1 where foo 0 _ = [] foo n h = ((-n, -h), (n, h)) : foo (n - 1) (h + 1)
We start of with n = 20, leading to the following results:
0.03 Adrian Philipp
0.05 Lars Hupel
0.09 Martin Wauligmann
0.09 Christoph Rudolf
0.15 Sebastian Neubauer
1.2 Chaoran Chen
-----------------------------------------
6.2 Fabio Madge Pimentel
The MC Jr completely arbitrarily placed the cutoff at 5 seconds, kicking Mr Pimentel out of the race. (but, mind you, even this last place this week was worth a remarkable 25 points)
Round 2
Seeing as the Info 2 team is often criticised for not being ‘real-world’ enough, the MC Jr decided to next test a property that is very important in real-life programs: how they deal with a large amount of rubbish (i.e. irrelevant data). The following produces 2n rectangles in such a way that every rectangle overlaps with precisely one other rectangle, leading to a total of 6n rectangles in the result.
foo :: Integer -> [Rectangle] foo n = [((2*i,0), (2*i+1, 2)) | i <- [0..n-1]] ++ [((2*i,1), (2*i+1, 3)) | i <- [0..n-1]]
The MC chose n = 1000. Now the results look quite a bit different; two solutions that perform quite well on the pathological examples now perform abysmally bad, namely Sebastian Neubauer's and Lars Hupel's:
0.5 Martin Wauligmann
1.2 Adrian Philipp
2.2 Christoph Rudolf
-----------------------------------------
14 Chaoran Chen
69 Lars Hupel
193 Sebastian Neubauer
Round 3
After this foray into the so-called ‘real world’, let us return to the realm of theory and contrived examples, where the MC Jr feels more at home. The next example is courtesy of Lars Hupel:
checkerboard :: Integer -> [Rectangle] checkerboard n = concatMap item [0..n-1] where item k = [((0, k), (n, k + 1)), ((k, 0), (k + 1, n))]
This creates n long horizontal and vertical stripes, leading to a checkerboard pattern with a total of ((n*(n+1))/2)2 rectangles in the result. (OEIS A000537, this one is actually pretty easy)
We choose n = 80 and get:
1 Adrian Philipp 8 Martin Wauligmann 21 Christoph Rudolf
Seeing as there is a distinct gap between each of the three remaining contestants, the MC Jr concluded that he has reached the end of his evaluation and corroborated this by running the submissions on the pathological example from round 1, but with 80 rectangles:
5 Adrian Philipp 21 Martin Wauligmann 68 Christoph Rudolf
Fair enough.
Top 30 der Woche | |||
---|---|---|---|
Platz | Wettbewerber(in) | Punkte | |
1. | Adrian Philipp | 30 | |
2. | Martin Wauligmann | 29 | |
3. | Christoph Rudolf | 28 | |
4. | Chaoran Chen | 27 | |
5. | Sebastian Neubauer | 26 | |
6. | Fabio Madge Pimentel | 25 |
So how do the solutions work? While most of them are almost completely incomprehensible to the MC Jr due to the lack of (helpful) comments, most of them seem to build a list of all the horizontal and vertical lines and then compute all intersections of these lines. Some solutions (the ones that got kicked out in the first two rounds), however take a more brute force approach and enumerate all points within the bounding rectangle of the input, resulting in very poor performance for sparse inputs such as the one in the second round.
Interestingly, it is again the case that the most complicated solutions were also the slowest ones (and that is not counting all the complicated ones that did not pass the automated tests due to subtle bugs). In particular, Fabio Madge Pimentel's solution performs some sort of elaborate clustering of rectangles to speed up computations on sparse inputs, but unfortunately, it appears that the remainder of his code has a huge performance bottleneck somewhere – the MC Jr was unable to find out where exactly.
Lessons for this week:
- The MC Jr's exercises may be difficult, but Lars Hupel's exercises are not child's play either. If the elder MC's exercises really are easier than the MC Jr's, it may very well be just a case of Altersmilde on his side.
- The exercises of the past few weeks have truly separated the men from the boys, the women from the girls, the large furry creatures from Alpha Centauri from the small furry creatures from Alpha Centauri, and everyone else who really knows their way around Haskell and has too much spare time on their hands from everyone else who does not.
- Writing a 100% correct solution may be hard, but even if it is as slow as hell, you can easily get 20 points or more for it when competition is as sparse as it is in these last few weeks. A lot can change yet in the leaderboard!
Die Ergebnisse der vierzehnten Woche
For those who missed the glorious Preisverleihung ceremony (for which there really is no excuse), the MC Jr shall present the 11 submissions, nay, works of art he received this week on this blog as well. And, truly, there are some very æsthetically pleasing, but also some very abstract and provocative pieces among them. The MC Jr shall employ his entire knowledge of the arts (i.e. next to none) to break down the essence of the more cryptic works of art for the laypeople.
It is always difficult to put a numeric value on art, but we did it nevertheless. The submissions were judged by the following highly-renowned panel of art critics:
- Meta-Master Prof. Tobias Nipkow, PhD
- MC Jr Manuel Eberl, MSc
- Grumpy MC Lars Hupel, MSc
Each critic assigns between 0 and 10 points to each submission for its æsthetic and artistic qualities. Additionally, the MC Jr also assigned between 0 and 10 points for the technique used. (complexity and quality of the code, etc.) The sum of the artistic scores was multiplied with ⅔ and added to the technical score, yielding a total maximum of 30 points, as usual.
The MC Jr was not entirely satisfied with the names of the contestants; he found that they simply did not sound artsy enough to seriously follow in the footsteps of the great van Gogh, so he took the liberty to fashion them into suitable noms de plume.
So, let us have a look at the submissions themselves.
11th place
The first work of art is a simple one by the artist Michiel van der Eeder
This work clearly shows the author's fixation on the bat motif. For the anagram problem, van der Eeder submitted a function that always returns ‘Nananananana BATMAN’ and this time, too, his code full of various Batman references.
The meaning of this work is subtle. In European culture, bats have a very negative connotation; they are considered impure animals in the Bible and are associated with demons and hell. In stark contrast to this, the bat in this picture is drawn with very clear and precise lines, perfectly symmetrical, and white, not black – almost angelic in appearance. Considering that other cultures have far more positive associations with bats (e.g. a symbol of luck and success in China), it is clear that the artist's intention was to break with the symbolism of European culture, to do away with Eurocentrism and open the viewer's mind to other cultures' ideas.
The technique used here is a clever (and lazy) one, namely the old gnuplot technique: gnuplot is invoked with the Batman curve as its input and generates the output picture. However, since the Batman curve was not invented by the artist and the implementation is quite simple, the critics' panel could not bring themselves to give out many points.
10th place
The next piece is a very simplistic one by Christoffel Grijsbeck.
This picture obviously draws from the post-war literary movement known as Trümmerliteratur. Just as the Trümmerliteratur authors rejected all but the most basic elements of language, Grijsbeck uses only the simplest of geometric objects: the straight line. He applies this approach even on the technical level, implementing his own rastering functions for lines instead of relying on fancy libraries to get the job done. While the critics found this purism quite admirable, they found this piece too inaccessible, too radical, too provocative and awarded it no points for artistic value. The MC Jr did, however, give it a respectable 4 technical points to reward the artist's effort.
9th place
Felix van Timmerman submitted the following more complex work:
This work is obviously based on the work of Beziér and de Casteljau. The first thing one notices in the picture is the duality of white and black, of inside and outside, of good and bad. This is the key to understanding this work: it is a very subtle criticism of the ‘black and white’ portrayal of issues in the media. The unusual choice of guiding points for the Beziér curves makes the curve bulge awkwardly in some places, emphasising the arbitrary nature of the definition of the borders between good and evil, as do the ‘twists‘, where white becomes infinitely thin.
From a technical point of view, this work is, alas, not very interesting. The Beziér curves are plotted with Rasterific and the guiding points are chosen in a very simple (and strange) way. Due to the artistic simplicity of the picture, the critics decided to award 2 points in each category.
8th place
Of course, it was only a matter of time before someone did something with fractals. Sebastiaan van Wijs submitted the following piece based on the work by Wacław Sierpiński.
Note that the triangle is brightly-coloured and, in the second picture, highly deformed. The artist takes the ‘clean’ and mathematical original work and does with it whatever he pleases. This playfulness is somewhat reminiscent of the 1950s Pop Art movement.
Van Wijs also submitted another fractal-based picture that is more basic in its appearance.
The interpretation of this picture is a bit more difficult. The author calls it simply ‘Farn’, and it does resemble a plant. The key to understanding the message of this picture is that this farn, this plant, seems to grow infinitely, which is, of course, impossible in the natural world. On the other hand, it resembles a road (the Autobahn?) that extends into infinity. We have therefore concluded that the work is a criticism of wastefulness and consumerism (represented by the long road) and the infinite growth (as opposed to sustainability) it requires.
On a technical level, van Wijs employs an immense amount of linear algebra to render his points onto a mutable image provided by JuicyPixels. He claims that his transformations must always be affine, but unless the MC Jr misunderstood some fundamental part of the code, he suggests that the artist look up ‘affine’ again. Affine transformations always map lines to lines, and parallels to parallels, neither of which is the case in the second example. Anyway, the critics gave 3 points each for artistic value and the MC Jr rewarded the linear algebra and fractal rendering with 6 points.
7th and 6th place
Next, we have David van Wijdman and Martijn Wouligman with the obligatory Mandelbrot fractal.
With their intricate structure and colourfulness, Mandelbrot sets are always nice to look at, but they are a bit clichéd. Everybody has written a Mandelbrot rendering program at some point in their life – case in point: the third picture above is actually generated by a program the MC Jr wrote a year ago or so and which can also be found on his Github page. Still, they are nice to look at and not completely uninteresting on a technical level, so the artists received some 4s and 5s.
5th place
Next in the ranking is Fabio Madge van der Pijmentel:
The approach in this picture is similar to that of Grijsbeck (the one with the lines), but in a way even more radical: it rejects even lines and uses only small dots, in reminiscence of the Neo-Impressionist movement known as Pointillism. The picture is, again, generated by pure mathematics: it arranges the natural numbers in a spiral and places a dot at every prime number, generating an infinite, impregnable desert of points. However, even in this chaos, structure emerges: there are diagonal lines of points visible everywhere in the picture, each of which corresponds to a prime-rich quadratic polynomial sequence.
The Ulam spiral may not be very æsthetically pleasing, but the critics have not previously seen a submission quite like it, so they honoured it with a 6/6/7 rating. Since it is, however, not very interesting from a technical standpoint, it only receives 3 points here.
4th place
Chaoraan van Chen created the following very dynamic and Wettbewerbs-centric piece:
The background shows a number of strings of Haskell function names, draped around the circle in arcs, creating the impression of a globe. Written across this ‘globe’ is, in large letters, a {-WETT-} tag. The obvious message is that the entire world revolves around the Wettbewerb – at least for this artist. The artists also apparently associates the Wettbewerb with an overwhelming flood of Haskell library functions, perhaps criticising the decision of allowing arbitrary Hackage libraries in the submissions, which some competitors found somewhat unwise. The font choice (Comic Sans), on the other hand, implies that this work of art is not to be taken entirely seriously and is somewhat tongue-in-cheek.
From a technical point of view, it is worth noting that the picture is rather dynamic: the placement of the background text is random and changes with every programme invocation, as does its content. The function names are taken from Hackage documentation; the program downloads a number of Haskell documentation pages at runtime and parses the function names from the HTML code. The MC Jr rewarded this amusing idea with 6 technical points, and the critics gave out high marks for æsthetic value as well.
3rd place
We have reached the Top 3 with Daniël Schoebart, who submitted the following colourful picture:
The artist describes the picture as autostereoscopic, i.e. it one should be able to see a three-dimensional picture in it without any additional tools (such as 3D glasses). He mentioned during the ceremony that one has to cross ones eyes, but the MC Jr has not been able to see anything in it, no matter how much he crossed his eyes.
Nevertheless, the picture is rather pretty, and the technique behind generating autostereoscopic picture is also quite interesting. Apparently, the picture one is supposed to see is the Greek letter λ (or, as the Unicode standard calls it, amusingly: GREEK SMALL LETTER LAMDA) encoded in the following code segment:
depthMap = [ [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 1, 1, 0, 0, 0, 0], [0, 1, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 1, 1, 1, 1, 0, 0], [0, 0, 1, 0, 0, 1, 0, 0], [0, 1, 1, 0, 0, 1, 1, 0], [0, 1, 1, 0, 0, 1, 1, 0] ]
The MC Jr still does not see it. However, the take-away message here is perhaps that functional programming is everywhere, even if you cannot always see it. (Possibly because, unlike some other languages, it does not lead to two-hundred lines of stack traces on web sites all the time)
2nd place
On second place, we have Christoffel van Rudolf with two pictures:
The first picture is called ‘Some Kind of Mars Surface with Drop Shadows’. The title suggests that the artist himself wasn't quite sure what this is, and the psychedelic appearance clearly suggest that it was created under the influence of certain chemical substances. The technique used here is a simple one (if the MC Jr understood the code correctly): for each pixel, the left and top neighbour are combined and altered with some random value, giving rise to random structures with ‘shadows’ in the bottom-right direction.
The second picture is much more interesting. The MC Jr thinks that it may very well be the most beautiful picture of this competition, but unfortunately, Mr Rudolf did not hand it in officially because he thought it would take too long to render. He claimed that on his machine, rendering took 16 minutes and rightly concluded that that was too long. The MC Jr ran it on his machine nevertheless and found that it took less than a minute. The MC Jr is unsure what kind of setup Mr Rudolf uses, but maybe he should check for burnt-out thermionic tubes.
The technique of this picture is highly interesting and reminded the MC Jr of Arnold Schönberg's twelve-note composition technique. In this technique, all twelve notes of the chromatic scale must be used equally often, giving every note the same importance. Van Rudolf applied the same principle (albeit borrowed) to pictures: starting with an arbitrarily-coloured pixel at some initial position, all colours of the colour space are added in random order in such a way that the colour difference of every added pixel to its neighbours is minimal. This creates a vibrant explosion of colours around the starting point with an intriguingly hand-drawn look. The MC Jr considers printing this picture and putting it up in his office. Had Mr van Rudolf submitted this as his official picture, he would probably have gotten first place; this way, he must contend with second place.
1st place
Sebastiaan Nieuwbouwer is this week's winner, but initially, the critics' panel almost disqualified him entirely, because they failed to understand his work. It initially appeared to them as a black rectangle (first picture below), but it then transpired that it was, in fact, an animated SVG.
The animation is a reference to an obscure work by a little-known Russian artist, Aleksei Pajitnov. It takes the form of a game in which geometric shapes must be arranged in a certain way. The animation displays such a game in which a simple ‘AI’ chooses where the pieces appear. You can download the SVG by clicking on the second picture above, or an OGV video if you do not have a player that supports animated SVGs.
The code contains a total of 240 lines in total and therefore receives the Manuel Eberl Award for Going Hopelessly over the Top with an Assignment.
All right, we have our ranking for this week. The following table shows the æsthetics points given by the Meta-Master (MM), the MC Jr, and the Grumpy MC (GrMC), followed by the technical points, also given by the MC Jr, and the weighted sum of all four, which constitutes the total sum of points.
Top 30 der 3. jährlichen Info 2-Vernissage | |||||||
---|---|---|---|---|---|---|---|
Platz | Wettbewerber(in) | Kunstwerk | MM | MC Jr | GrMC | Tech | Punkte |
1. | Sebastian Neubauer | Tetris | 10 | 10 | 10 | 8 | 28 |
2. | Christoph Rudolf | Mars/Farbexplosion | 9 | 9 | 8 | 7 | 24 1/3 |
3. | Daniel Schubert | Bunte Quadrate | 7 | 8 | 9 | 6 | 22 |
4. | Chaoran Chen | Schrift auf Globus | 8 | 7 | 6 | 6 | 20 |
5. | Fabio Madge Pimentel | Ulamspirale | 6 | 6 | 7 | 3 | 15 2/3 |
6. | Martin Wauligmann | Mandelbrot (bunt) | 4 | 4 | 5 | 5 | 13 2/3 |
7. | David Widmann | Mandelbrot (s/w) | 5 | 5 | 4 | 4 | 13 1/3 |
8. | Sebastian Weiß | Sierpinski | 3 | 3 | 3 | 6 | 12 |
9. | Felix Thimm | Bézier | 2 | 2 | 2 | 2 | 6 |
10. | Christoph Griesbeck | Linien | 0 | 0 | 0 | 4 | 4 |
11. | Michael Eder | Batman | 1 | 1 | 1 | 1 | 3 |
Was ist denn ein Token?
The MC Sr. is a big fan of token counting as a rough measure of software quality. As the previous years' results have clearly shown, shorter answers tend to be more elegant and (usually) less likely to have subtle bugs.
The MC Sr. is making his token-counting program available to the rest of the world. The program builds on Niklas Broberg's Language.Haskell.Exts.Lexer module, which can be installed using Cabal:
cabal update cabal install haskell-src-exts
There appears to be disagreement regarding the status of backquotes in different versions of Haskell (98 vs. 2010). The MC Sr. generously adopted the Haskell 98 definition, with `max` treated as one token rather than three.
Here's how to compile and run the program on Linux or Mac:
ghc Tokenize.hs -o tokenize ./tokenize
At this point, type in some Haskell code, e.g.,
sum_max_sq x y z = x ^ 2 + y ^ 2 + z ^ 2 - x `min` y `min` z ^ 2
Make sure to type in an empty line afterwards. Press Ctrl+D to terminate. The program then outputs
24 sxyz=x^2+y^2+z^2-xmymz^2
On Windows, you can run the program with
.\tokenize
or
tokenize
and terminate it with Ctrl+C (Windows 8) or Ctrl+Z followed by Enter (Windows 7, probably also for earlier versions).
The first field, “24”, gives the number of tokens. The second field is a short version of the program, where each token is abbreviated as one letter. This can be useful for debugging. The MC Sr. aggregates this information in his top-secret Master View, which looks something like this:
18 sxyz=mxy^2+mxymz^2 xxx@tum.de 18 sxyz=mxy^2+xmymz^2 xxx@tum.de 18 sxyz=mxy^2+zmmxy^2 xxx@mytum.de 20 sxyz=(mxy)^2+(myz)^2 xxx@mytum.de 20 sxyz=mxy^2+m(mxy)z^2 xxx@tum.de 20 sxyz=mxy^2+m(mxy)z^2 xxx@tum.de ... 52 sxyz|x=z&y=z=x*x+y*y|x=y&z=y=x*x+z*z|y=x&z=x=y*y+z*z xxx@mytum.de 52 sxyz|x=z&y=z=x*x+y*y|x=y&z=y=x*x+z*z|y=x&z=x=y*y+z*z xxx@mytum.de 52 sxyz|x=z&y=z=x*x+y*y|x=y&z=y=x*x+z*z|y=x&z=x=y*y+z*z xxx@mytum.de 52 sxyz|x=z&y=z=x^2+y^2|x=y&z=y=x^2+z^2|y=x&z=x=y^2+z^2 xxx@mytum.de ... 252 sxyz|(x+y)>(x+z)&(x+y)>(y+z)=(x*x)+(y*y)|(x+z)>(x+y)&(x+z)>(y+z)=(x*x)+(z*z)|(y+z)>(y+x)&(y+z)>(x+z)=(y*y)+(z*z)|x=z&x>y=(x*x)+(z*z)|x=y&x>z=(x*x)+(y*y)|z=y&z>x=(y*y)+(z*z)|x=z&x<y=(x*x)+(y*y)|x=y&x<z=(x*x)+(z*z)|z=y&z<x=(x*x)+(z*z)|x=y&y=z=(x*x)+(y*y) xxx@mytum.de
The Master View allows the MC Sr. to compile the Top 30 der Woche, but also to quickly spot interesting solutions and potential plagiarism. Plagiarism isn't cool, and occurrences of it will be penalized by subtracting 1 000 000 000 (ONE ZILLION) points from the Semester total.
For the competition, the above program is invoked only on the code snippet included between {-WETT-} and {-TTEW-}, with all type signatures stripped away.
Some competitors have tried to confuse tokenize by using German letters (ä etc.) in their source code. Setting
export LANG=de_DE
before running the program solves this problem.