Skip to Content

Exploring probabilistic modeling and the future of math

Robert H. P. Engels
Oct 23, 2023

Some days you get these chains of interesting events following up on each other.

This morning I read the “GPT can solve math[.. ]” paper (https://lnkd.in/dzd7K3sx), then I read some responses to that (o.a. Gary Marcus, X-tweets etc). During and after the TED AI I had many interesting discussions on the topic of probabilistic modelling vs models of math as we know (knew?) it, and this paper sparked some thoughts (so: mission accomplished).

It occurs to me that we have a generation of PhD students building LLMs that have probably not really got interested model thinking and mathematical proofs. I.e. the thinking behind Einsteins´ relativity theory, the thinking behind Euler’s Graph theory, the type of thinking that led (indeed) to a mathematical model that you can implement in a calculator (low footprint), calculates correctly (100% trustworthy) and in addition can calculate things 100% correct on input unseen before.

The question really condenses down to the fact whether you believe in the abstraction capability in current algorithms used for training todays LLMs. Are attention layers at all able to build abstractions on their own (and not regurgitating from abstractions it got ready-served by humans)? Optimism in the Valley is big, just add more data and the problem will go away.

But without changing the underlying attention layer design this seems to be a fallacy. Learning to abstract really means to build metalevels on your information, condense signals and their relations. That is something different as predicting chains of tokens. Such an abstraction layer can be seen as building a 3D puzzle, whereas current attention mechanisms seem single-layered. With a single layer, the most you can build is a 2D puzzle.

With that picture in mind, you can observe that current solutions by LLM suppliers is to extend the 2D puzzle, making it larger (adding more data from all over), or giving it higher resolution for a specific task (like the math paper mentioned above). But no sincere attempts seem to have been made yet to build a 3D picture, which would mean to rock the foundation of the foundation models and rebuild the attention layer mechanism to cover for this deficit.

Until then, let’s focus on getting functions to work reliable and off-load model-based tasks (math, engineering, logic, reasoning) to external capability agents and stop pretending that 2D can become 3D without changing the foundation.

Meet the author

Robert Engels

Global CTIO and Head of Lab for AI Futures and Insights & Data
Robert is an innovation lead and a thought leader in several sectors and regions, and holds the position of Chief Technology Officer for Northern and Central Europe in our Insights & Data Global Business Line. Based in Norway, he is a known lecturer, public speaker, and panel moderator. Robert holds a PhD in artificial intelligence from the Technical University of Karlsruhe (KIT), Germany.