September 27, 2024
A Centralized Database of All Math
A Mathlib developer at ISTA: Graduate student Martin Dvořák’s ‘side-project’
Mathematics is an umbrella term uniting diverse disciplines with different conventions. Also, accounting for the entirety of mathematics while proving theorems is nearly impossible, and the peer-review process is notoriously slow. Martin Dvořák, a graduate student in the Kolmogorov group at the Institute of Science and Technology Austria (ISTA) is a developer of Mathlib, digitizing concepts proven true across all mathematical disciplines. Find out how he does this ‘on the side’.
A 21st-century Library of Alexandria of digitized math is being built, one brick at a time, in a collective effort of a small community of meticulous mathematicians worldwide. They are driven by the vision of, one day, centralizing all humanity’s mathematical knowledge across disciplines. But beyond simply archiving mathematical concepts, this meticulous ant work is helping them revolutionize math by making it accessible to artificial intelligence (AI)-driven discovery.
This new Library of Alexandria of all math is a computer software called “Mathlib”. It centralizes mathematical concepts that are true across all mathematical disciplines, making them a basis of “truth” for all future entries. Born out of “Lean”, an efficient and uncluttered programming language for software development launched in 2013, Mathlib stands for ‘Lean mathematical library’. “Lean is the code language of logic, and Mathlib is the library built on top of this logic,” says graduate student Martin Dvořák, today the only Mathlib developer at the Institute of Science and Technology Austria (ISTA). Currently, he is the only active contributor in Austria.
“I’m passionate about helping digitize all math known to humanity to drive math’s progress, and I’m doing this as a ‘side-project’. This of course also helps me in my routine of proving theorems as a graduate student. I prove everything interactively in Lean, with tremendous help from Mathlib. The more lemmas—intermediate theorems—are incorporated into Mathlib, the less I have to prove theorems from scratch.”
‘Lean’ logic, from software code to mathematical proof
When Lean was launched, mathematicians quickly recognized that its power in checking the validity of software code was much like verifying a mathematical proof. This was an opportunity they could not miss. Their need for a new, well-planned, and carefully built mathematical library was far too real.
Computational proof assistants are not new. The first such program, “Automath”, was launched in the 1960s. Then followed “Mizar” in 1973 and “Coq” in 1989, two proof assistants that enjoyed wide use among mathematicians. But rather than looking to centralize mathematical knowledge in Coq, its users independently defined problems that only interested them, often using unique language. This led to Coq growing like an unplanned city, with the many smaller, independent libraries making the platform feel jumbled. Lean provided mathematicians with a much-needed fresh start, drawing from past challenges. “Lean allowed us to agree on an interpretation of human-written math collectively. With this tool, we can develop concepts that remain true under all circumstances and encode them unambiguously into a computer software, accessible to all,” says Dvořák. This is how the ‘Lean community’ of developers works on building Mathlib, one brick at a time.
A bullet-proof ‘pyramid of evidence’
Dvořák and his fellow Mathlib developers worldwide are building a solid basis of mathematics. Starting from specific problems in a branch of mathematics, they search for the most general and broadly applicable versions of mathematical concepts. Proving theorems in Lean is an interactive process between the developer and the computer, Dvořák explains. The developer gives more and more hints on verifying a concept, and the software establishes it based on the already verified elements. Mathlib developers also extract mathematical thinking patterns and encode them as ‘tactics’—logical building blocks—in Lean.
Thus, Mathlib works as a pyramid of evidence. Beyond making Mathlib logically bulletproof, the library is also highly agile. “Our job as developers is to make sure the pyramid has solid but flexible bases. When you add a new ‘brick’ into Mathlib, the PC can recompile the entire pyramid overnight, taking it all apart and rebuilding it brick by brick,” says Dvořák. This ensures everything is computer-verified for mathematical correctness. Thus, when a user—whether human or AI, such as AlphaProof by Google DeepMind—enters a problem together with instructions on how to prove it, Mathlib can quickly run the necessary checks and provide a computer-generated proof.
Investing in the future of math
Big mathematical results depend on hundreds of mathematical papers, but human error makes it impossible to ensure that the reasoning is correct throughout. “But if you base your proof on lemmas in Mathlib, you have the highest guarantee that everything you use has been computer-verified for absolute correctness,” says Dvořák.
Beyond making mathematical proofs bulletproof, Mathlib also democratizes mathematics. It boosts large-scale collaborations between mathematicians, overcoming important barriers of scientific ‘trust’ in the field. “Mathlib can tell you within five seconds if you’ve made a blatant mistake in your reasoning. This saves an incredible amount of time compared to waiting several months for a human review,” says Dvořák. On the other hand, it’s extremely satisfying when the computer displays ‘goals accomplished’. Then you know that no one can disprove what you’ve done anymore.”
Dvořák is visibly proud of his contribution to this colossal collaborative work, but he is very much aware that Mathlib, like mathematics, can never be ‘finished’. “We don’t know what math will look like in a few decades from now. In fact, we don’t even know the extent of humanity’s mathematical knowledge as we speak. But I’m convinced that our ant work will pay off in ways we can’t even imagine today.”