Testing with a verified oracle: using Symbolic Execution and Fuzzing

Chair for Logic and Verification

HiWi/Bachelor Thesis

Automatic testing is an important measure to ensure software quality. While testing can only uncover bugs, it can not ensure correctness on all possible scenarios because of combinatorial explosion – there are just too many sets of input to test! Also it is not at all clear what it means that a program is correct in the first place. Detecting obvious errors (e.g. buffer overflows and null pointer exceptions) is straightforward, but ensuring functional correctness is not – how do you decide that a program emits the correct output to the test input you provided?

Verification on the other hand is used to prove that some piece of software is correct on all possible inputs. The drawback is that software verification today still is too costly and laborious and does not yet scale to large software projects. Furthermore simple algorithms are simple to verify, but the more optimized software is the more complex the verification gets. Thus today verified software is typically less efficient than highly optimized unverified programs.

This project aims to bring together these two worlds by using already verified reference implementations as a test oracle for testing real-world software.

There are many algorithms already formally verified in the interactive theorem prover Isabelle [1] and it is possible to extract executable code from these formalizations. Some examples are sorting algorithms, Network-Flow algorithms [2], SAT-solvers [3] and a verified timed automaton model checker [4].

The starting point is the set of algorithms that are already formally verified in Isabelle, allowing us to construct a verified test oracle. Collecting real-world implementations for those algorithms and gathering available benchmark test-suites for them is the first step to take. Using these, one can already test the different implementations and compare results. If they differ, a potential bug has been found. Not only the emitted results but also run-time and other resource usage may be of interest, possibly uncovering run-time bugs.

A next step would be to generate new sets of test input and use them in the above methodology. Symbolic Execution [5] could be used on the gathered implementations to generate small sets of new interesting test cases, while Fuzzing [6] may be used in order to efficiently generate a larger set of inputs by random mutation.

One concrete example is the DBM Library [7] that is employed by the widely used timed automaton model checker UUPAAL [8]. There is already a verified DBM Library in Isabelle and “an extensive test suite” for UPAAL’s DBM Library, which in turn is implemented in C/C++.

Programming language:
C/C++, OCaml, Standard ML, Scala

Familiarity with testing and basic knowledge about verification; English

Prof. Tobias Nipkow, Raum MI 00.09.055, email {nipkow} AT [in.tum.de]

  1. https://isabelle.in.tum.de/
  2. Lammich, Sefidgar - Formalizing the Edmonds-Karp Algorithm.
  3. Fleury, et al. - A Verified SAT Solver with Watched Literals Using Imperative HOL.
  4. Wimmer, Lammich - Verified Model Checking of Timed Automata.
  5. King, J. C. Symbolic execution and program testing
  6. Sutton, M. et al. Fuzzing: Brute force vulnerability discovery
  7. http://people.cs.aau.dk/~adavid/UDBM/
  8. Cadar, C. et al. KLEE: Unassissted and automatic generation of high-coverage tests for complex systems programs
  9. http://lcamtuf.coredump.cx/afl/