Isabelle offers an export interface, over which theory contents are accessible for external usage and analysis. The entities that are defined in such a formal theory, together with their relationships, can be viewed as a dependency graph; in doing so, a range of existing methods for graph analysis and data mining can now be applied to this context.
One open question that we hope to answer is whether we can identify useful patterns in theorem dependencies, such as lemmas that are frequently used together in proofs. Another interesting question is whether we can find clusters the dependency graph, that could tell us how to potentially refactor the underlying sessions or theories. Your task is to address these questions by applying data mining techniques to the graph data, analyzing the structure and extracting interesting relationships.
Your body of research is the Isabelle Archive of Formal Proofs (AFP), which incorporates hundreds of entries from many authors, in a total of 2.6 million lines of code. Its theory graph spans ~2 million nodes and ~100 million edges.
Prerequisites: Basic knowledge about higher order logic (ideally Isabelle/HOL); English