Fusion des Maths Sciences et Informatiques

Accueil > Projets > FXMessages X1, X4 ... > Développement du système MathNat pour la formalisation automatique des textes (...)

Développement du système MathNat pour la formalisation automatique des textes mathématiques par Humayoun Muhammad

jeudi 27 juin 2019, par ScientificWare

Humayoun Muhammad. Développement du système MathNat pour la formalisation automatique des textes mathématiques. Mathématiques générales [math.GM]. Université de Grenoble, 2012. Français. NNT : 2012GRENM001. tel-00680095

There is a wide gap between the language of mathematics and its formalized versions.
The term “language of mathematics” or “mathematical language” refers to prose that the mathematician uses in authoring textbooks and publications. It mainly consists of natural language, symbolic expressions and notations. It is flexible, structured and semantically well-understood by mathematicians.
However, it is very difficult to formalize it automatically. Some of the main reasons are : complex and rich linguistic features of natural language and its inherent ambiguity ; intermixing of natural language with symbolic mathematics causing problems which are unique of its kind, and therefore, posing more ambiguity ; and the possibility of containing reasoning gaps, which are hard to fill using the current state of art theorem provers (both automated and interactive).
One way to work around this problem is to abandon the use of the language of mathematics. Therefore in current state of art of theorem proving, mathematics is formalized manually in very precise, specific and well-defined logical systems. The languages supported by these systems impose strong restrictions. For instance, these languages have non-ambiguous syntax with a limited number of possible syntactic constructions.
This enterprise divides the world of mathematics in two groups. The first group consists of a vast majority of mathematicians whose rely on the language of mathematics only. In contrast, the second group consists of a minority of mathematicians. They use formal systems such as theorem provers (interactive ones mostly) in addition to the language of mathematics.
To bridge the gap between the language of mathematics and its formalized versions, we may ask the following gigantic question :
Can we build a program that understands the language of mathematics used by mathematicians and can we mechanically verify its correctness ?

This problem can naturally be divided in two sub-problems, both very hard :
1. Parsing mathematical texts (mainly proofs) and translating those parse trees to a formal language after resolving linguistic issues.
2. Validation of this formal version of mathematics.
The project MathNat (Mathematics in controlled Natural language) aims at being the first step towards solving this problem, focusing mainly on the first question.

For that, first, we develop a Controlled Language for Mathematics (CLM) which is a precisely defined subset of English with restricted grammar and lexicon. To make CLM natural and expressive, we support important linguistic features such as anaphoric pronouns and references, rephrasing of a sentence in multiple ways, the proper handling of distributive and collective readings and so on. The coverage of CLM at the moment is yet rather small and to be improved as the project keeps evolving in future.

Second, we develop MathAbs (Mathematical language). It is a prover independent formal language to represent the semantics of CLM texts preserving its logical and reasoning structure. MathAbs is designed as an intermediate language between CLM and the formal languages of theorem provers, allowing proof checking.

Third, we propose a system that can automatically translate CLM to MathAbs, giving a precise semantics to CLM. We consider that formalizing mathematics automatically in such a formal language that has a precise semantics is an important progress even if it can’t always be proof-checked.
This brings us to the second question for which we report a very limited work.
We only translate MathAbs to the first-order formulas. If we feed these formulas to the automated theorem provers (ATPs), then fundamentally the ATPs should be able to validate them sometimes. In other words, the resulting MathAbs document is not completely verifiable for the moment, but it represents an opportunity for the mathematician to write mathematical text (mainly proofs) without becoming expert of any theorem prover.

Keywords : Computational linguistics, Language technology, Controlled languages, The language of mathematics, Formalization, Formal systems, Validation, Proof checking.