GRAT – Efficient Formally Verified SAT Solver Certification Toolchain

GRAT is a SAT solver certificate checking tool chain, which is formally verified using the Isabelle/HOL theorem prover. If a certificate is verified, you have strong formal guarantees that the solution produced by the SAT solver was actually correct. GRAT is actually faster than the unverified state-of-the-art tool drat-trim, and significantly faster than the verified LRAT tool.

The tool chain consists of two programs: The unverified certificate generator gratgen converts a DRAT certificate (as output by many modern SAT solvers) to a GRAT certificate, which is then checked against the original formula by the verified gratchk tool.

Features

News

Getting started

Download the GRATgen and GRATchk (ML source only) tools and build them as indicated in their README files. The standard GRATchk package contains all Isabelle theories. However, to run the tools you only need the ML source only package.

To check a satisfiability certificate, run

gratchk sat formula.cnf certificate.sat

Here, the formula is a formula in standard DIMACS CNF format, and the certificate is a list of non-contradictory literals that satisfy the formula if assigned to true. The certificate can, for example, be obtained by processing the output of the SAT solver with

sed -re '/^v/!d;s/^v//'

To check an unsatisfiability certificate, run

gratgen formula.cnf drat-cert.drat -l grat-cert.gratl -o grat-cert.gratp [-b] [-j N]
gratchk unsat formula.cnf grat-cert.gratl grat-cert.gratp

Here, drat-cert is a certificate in the standard DRAT format, which is supported by many SAT solvers.(Use option -b for binary DRAT) Note that the DRAT format is downward compatible with DRUP, such that also DRUP certificates can be used. Moreover, if you specify the -j N option, gratgen will run in multi-threaded mode with N threads. For most certificates, this will result in a significant speedup. Good values for N are between 2 and 16, depending on the available hardware.

To check an unsatisfiability certificate in combined proof mode, i.e., lemmas and proofs are stored in the same file, run

gratgen formula.cnf drat-cert.drat -o grat-cert.grat [-j N]
gratchk unsat formula.cnf grat-cert.grat

Combined proof mode yields only a single certificate file, but the certificate checker is slightly slower and needs more memory.

Examples

For getting started quickly, we have compiled a collection of example formulas and certificates.

Benchmarks

We have benchmarked our checker (version 1.2) on a server board with an 22-core Intel XEON Broadwell 2.2GHz CPU and 128 GiB of RAM, and compared it to drat-trim (version: Nov 10, 2016 ,the current version as of May 2017) and incremental LRAT (obtained from the authors in May 2017).

For unsat, we used the problems from the 2016 SAT competition main track. As solvers, we used CryptoMiniSAT and silver medalist Riss6, yielding a total of 238 DRAT certificates. We used CryptoMiniSAT as it seems to be the only solver that generates a significant amount of RAT lemmas.

Using the standard timeout of 20.000 seconds, single-threaded GRAT could verify all certificates, while drat-trim and LRAT timed out on two, and segfaulted on a third one. To verify the remaining 235 certificates, GRAT required 44 hours, while drat-trim required 72 hours and LRAT required 93 hours. With eight threads, GRAT ran out of memory for one certificate. For the remaining 234 certificates, the wall-clock times sum up to only 21 hours.

For the certificates by CryptoMiniSAT, which make extensive use of RAT lemmas, the RAT run optimization shows its efficiency: Single-threaded GRAT needs 17h, while drat-trim need 42h and LRAT need 51h.

For the pure DRUP certificates produced by Riss6, only the sepWL optimization is effective, and GRAT needs 26h, while drat-trim need 30h and LRAT needs 42h.

The raw log files of the unsatisfiable benchmark runs are available as 2017-06-combined_logs.tgz.

We also evaluated the problems that gold medalist Maple solved in the 2017 SAT competition main track. Here, we used GRAT 1.3, the latest available repository version of drat-trim (December 2017), and the incremental LRAT that we obtained from the authors in May 2017 (According to the authors, there have been no updates to LRAT since then). The results are available as 2018-01-logs.tgz.

Memory consumption

The increased speed and formal correctness guarantees come at the price of higher memory consumption. Our tool chain needs roughly three times more memory than drat-trim in single-threaded mode, and memory consumption drastically increases in multi-threaded mode. However, memory consumption seems not to be a general problem. Out of the 347 benchmark problems, gratgen with 8 threads ran out of memory at 128 GiB for only a single certificate.

Satisfiability

Verifying satisfiability is almost straightforward and very efficient. For the 241 satisfiable problems in our benchmark set, the certificates have a size of 566MiB and gratchk verified them in roughly 100 seconds. We provide the raw csv data extracted from the logs: sat.csv Note that the sum of the times there is only about 50 seconds, which is due to small times being below the precision of our time measurement, and thus being rounded down to zero.

Documentation of the code

GRATgen comes with a doxygen generated documentation. For GRATchk, you may want to look at the proof document or the proof outline.

Formal Guarantees

We have formally proved the soundness of GRATchk in the Isabelle/HOL interactive theorem prover. The proof covers the actual implementation of GRATchk, and the semantics of the formula down to the integer array that represents it.

That is, if GRATchk claims that a formula is satisfiable/unsatisfiable, it is very likely that this is actually true.

Trusted Base

The trusted base of a piece of software describes what has to be trusted in order to believe that the produced result is correct. It includes, at least, the hardware, the operating system, and the compilers used to compile the software.

For a SAT solver, it includes the complete implementation of a SAT solver, usually a highly optimized C program. If one uses certificates, the sat solver is replaced by the certificate checker in the trusted base.

For formally verified software, the software itself is not included in the trusted base any more. It is replaced by the trusted base of the used theorem prover and by the correctness specification. This highly decreases the likelihood of bugs in the verified software:

We now list, in detail, the trusted code base of our verified checker, i.e., what pieces of software you have to trust in order to believe in correctness of our checker:

While this list may seem huge, it is worth to list some items that are missing as opposed to an unverified checker written in C:

License

GRAT comes under an OpenBSD style license.

Version History

Version 1.3.3 (Current)