FryAI
Posts
Poor Calculations: Why Is AI So Bad At Math?

Poor Calculations: Why Is AI So Bad At Math?

Hunter Kallay
February 09, 2025

Welcome to this week’s Deep-Fried Dive with Fry Guy! In these long-form articles, Fry Guy conducts in-depth analyses of cutting-edge artificial intelligence (AI) developments and developers. Today, Fry Guy dives into why AI struggles at math. We hope you enjoy!

*Notice: We do not receive any monetary compensation from the people and projects we feature in the Sunday Deep-Fried Dives with Fry Guy. We explore these projects and developers solely to showcase interesting and cutting-edge AI developments and uses.*

🤯 MYSTERY LINK 🤯

(The mystery link can lead to ANYTHING AI-related. Tools, memes, and more…)

Are you bad at math?

In elementary school, my teachers always told me I would need to know math. “You won’t always have a calculator in your pocket,” they would say. Now, I think they couldn’t have been more wrong. If you are anything like me, you turn to a calculator for almost everything, from leaving tips at restaurants to figuring out how much time you spent on a task. And now, we have much more than calculators—we have AI! But there is a weird catch: As more people begin to use AI models to help them with everyday tasks, they are discovering that AI is really bad at math …

In this article, I am going to explore the poor abilities of AI models to solve math problems. My hope is that by the end of this article, you have a better awareness of AI’s math struggles, why those struggles happen, and what you should do about it.

RUNNING SOME NUMBERS

Consider the following: 9.11 and 9.9, which number is larger?

If you said 9.9, you would be correct (if I had a gold star, I would give you one)! It doesn’t seem like that difficult of a math problem. In fact, it seems quite obvious. Surely the most sophisticated AI models of our time would eat this problem for lunch, whipping out the right answer in a matter of seconds. Right? Well, not exactly. I asked this same question to ChatGPT-4o, one of OpenAI’s most advanced AI models. Here’s how it went:

Surprised? Don’t be. Unfortunately, this is not a one-off mistake, and it is not unique to ChatGPT. Google’s Gemini, Meta AI, and more have been known to mess up math problems of all kinds, from advanced calculus to simple addition and subtraction.

Now, let’s not recklessly run to conclusions by saying that AI cannot do math. It is often the case that ChatGPT and other AI models get math problems correct, even complex calculations. But the ability of AI to mess up the most basic math problems can, at times, cause us to hesitate and wonder what’s really going on behind these calculations.

AI’S BIG SECRET: IT CANNOT DO MATH

Experts in the technology field have continually acknowledged the shortcomings of AI when it comes to mathematics. As MIT Review analyst Melissa Heikkilä stated, “Math is really, really hard for AI models.” Why is this the case? Well, there are many reasons, so let’s break them down one by one.

AI predicts but doesn’t “understand.”

Large language models (LLMs) like ChatGPT are designed for pattern recognition and probability-based text generation rather than strict logical computation. These models are trained on vast amounts of human language and work. When a LLM like ChatGPT answers a prompt, it uses a predictive language pattern to determine what a human might say next. So when solving math problems, these models predict the most likely answer based on patterns in the training data rather than directly computing an exact result.

As University of Tennessee philosophers Kristina Gehrman and Hunter Kallay explain, “The trouble is that today’s AI systems can’t earn our trust by sharing the reasoning behind what they say, because there is no such reasoning. LLMs aren’t even remotely designed to reason. Instead, models are trained on vast amounts of human writing to detect, then predict or extend, complex patterns in language. When a user inputs a text prompt, the response is simply the algorithm’s projection of how the pattern will most likely continue. These outputs (increasingly) convincingly mimic what a knowledgeable human might say. But the underlying process has nothing whatsoever to do with whether the output is justified, let alone true.”

Because LLMs are merely predictive, they have limited algorithmic execution when it comes to mathematics. LLMs lack an internal, rule-based computation engine like a calculator. While these models can simulate certain algorithms, they are not optimized for rigorous step-by-step logic unless specifically trained for it. And in the case of popular LLMs, this is not the priority.

LLMs learn from humans … who can be bad at math.

As I have stated, LLMs are trained on vast amounts of human data. This includes mathematic language and calculations done by humans. However, it’s not always clear in the training data when a human has done a problem correctly and when a problem was done incorrectly. Because much of the training data can include incorrect math and poor logic, when rendering a predictive output for a math prompt, the model will spit out what a human would “most likely” say, and in some cases, that could include reproducing mathematical errors. This is especially true as the math becomes more complex, as humans tend to makes more errors on these problems. Additionally, some AI models struggle with precision in large or complex calculations, especially with decimals or irrational numbers. This is often due to things like rounding errors or misunderstanding the order of operations, which, in larger problems, can compound into larger mistakes. This happens because the model is predicting where a human may round a number or what tends to come after 2+2, even if there is a multiplication sign that needs to be prioritized according to PEMDAS. In some complicated math problems, if there is even one slight digit, relation, or step that the AI does not have good training data on, it will most likely hallucinate, completely fabricating an answer. But again, can we blame these bots? After all, LLMs are not designed to be “correct;” they are designed to be like humans.

LLMs don’t intuit mathematical language.

Many mathematical problems are framed in the context of language. Consider the statement, “Is 9.11 larger than 9.9?” Of course, in terms of which number is greater, 9.9 is the easy choice. But 9.11 is three digits, whereas 9.9 is only two. So in some sense, 9.11 appears “larger.” The typical use of “larger” in everyday language is difficult to decipher and can be used in various contexts, so it may be that sometimes the context is interpreted in one way and in another instance it is interpreted by the LLM in another. These subtle misunderstandings can make simple problems difficult for these models. This is especially true in word problems. Prompts like, “If John has 5 apples and gives away 2, how many does he have left?” may confuse an AI model. It may not understand the words “has” and “have,” for instance, as they relate to the numbers, which could lead to different interpretations. If "having" can also mean “John had 5 apples earlier, so he still has those in his past experience,” then the question becomes ambiguous. Some LLMs might also interpret “how many does he have left?” as referring to possession in a broader sense, including scenarios where John no longer physically holds the apples but retains control over them. For instance, if John still considers the apples his but placed them somewhere, does he still “have” them? Additionally, some models trained on large datasets might encounter variations of word problems that include implicit twists or unexpected trick questions. The phrasing “how many does he have left?” could cause a model to second-guess itself and overcompensate by providing odd answers and explanations instead of a straightforward correct answer.

LLMs may be overfitted for mathematical problems.

Many LLMs are trained on textbooks, tutorials, and informal discussions online rather than pure mathematical proofs. Some models may be so trained on these types of examples that they cannot adapt to new problems or more sophisticated problems. If an advanced problem looks similar to an example in its training data, but the LLM doesn’t fully “understand” the logic, it may 1)pick an answer based on contextually similar but incorrect problems or 2)output a response that sounds right linguistically but isn't mathematically valid. In either case, an overfitted model will most likely get these types of complicated math problems wrong.

WHAT CAN WE DO ABOUT THESE MATH PROBLEMS?

Now, the average person might not care that AI is bad at mathematics. Who cares if AI messes up mathematical problems from time to time? As long as it can write essays or produce good images, that’s all that matters. Well, the problem is that mathematics is pivotal for many fields, and more and more of these fields are relying on AI for their operations. If AI models cannot do mathematics consistently, this could have devastating consequences. Imagine the engineer who trusts a model for architecture calculations, or the chemist who relies on AI for a complex pharmaceutical formulation. Can these professionals rely on models that, at times, can’t do simple addition and subtraction?

Tech companies are actively working on advanced reasoning models that improve the abilities of AI models in mathematics by checking their work. Additionally, there are developers working on models specifically for mathematics and logic, such as Wolfram and DeepMind’s AlphaGeometry, which recently outperformed human mathematicians in solving Olympiad-level geometry problems. But unless you are using these math-specific models, you should probably stick to your calculator or start practicing your multiplication tables, at least for now.

Poor Calculations: Why Is AI So Bad At Math?

🤯 MYSTERY LINK 🤯

RUNNING SOME NUMBERS

AI’S BIG SECRET: IT CANNOT DO MATH

WHAT CAN WE DO ABOUT THESE MATH PROBLEMS?

Did you enjoy today's article?