When AI doesn’t know the answer: Tackling unanswerable questions in machine reading
Have you ever asked a question and realized the answer just isn’t in the text? That’s a real problem—not just for humans but also for AI systems trained to read and understand documents. This article explores how Artificial Intelligence (AI), specifically Machine Reading Comprehension (MRC) systems, struggles with “unanswerable questions”—ones that don’t have a valid answer in the given context.
Traditionally, AI models assume every question has an answer in the passage they’re given. But real-life situations often don’t work that way. For example, if a paragraph talks about the video game “Zelda: Twilight Princess” and someone asks about “Zelda: Australia Twilight” (a game that doesn’t exist), the AI has to recognize that this question cannot be answered—and explain why.
To address this, researchers are using two types of strategies: one improves existing models by adding layers that help the system say “no answer,” and the other builds new models specifically designed to detect and explain unanswerable cases. These newer models can sometimes even give reasons, like noting that a key name or number in the question doesn’t appear in the text.
Several specialized datasets, like SQuAD2.0, are used to train and evaluate these systems. These include tens of thousands of questions—some answerable and some not—crafted to test how well AI can tell the difference. Newer datasets go further by labeling why a question is unanswerable (e.g., swapped entities, negation, or missing info), helping models learn not just that a question is unanswerable, but why.
However, the way these questions are created can introduce bias. Some datasets contain “trick” questions that use patterns AI can learn to spot without true understanding. In contrast, real-world questions—like those from actual users—are often vaguer and harder to detect. This means AI that performs well in labs might still struggle in the real world.
To measure performance, researchers use metrics like Exact Match (was the predicted answer an exact string match?), F1 Score (how much overlap between prediction and correct answer), and other tools that check how similar the AI’s answer is to the expected one.
The article concludes that while progress is being made, there's still much to learn before AI can confidently recognize when no answer exists—and explain its reasoning in a trustworthy way.
Noorian, Z. Exploring Unanswerability in Machine Reading Comprehension: Approaches, Benchmarks, and Open Challenges. Artificial Intelligence Review (ACCEPTED)