Don’t you get it?! Research examines how to help AI tools better detect sarcasm
Have you ever had a hard time deciding whether something is sarcastic? Context always matters when identifying sarcasm, but it’s far easier in face-to-face conversations where body language and tone of voice help make it clear.
In the digital age, however, more of our interactions are online and text-based. Without either body language or an audible voice in these exchanges, sarcasm is harder to spot.
Detecting sarcasm in text-based exchanges isn’t just hard for people – large language models (LLMs) also find it challenging. For human users of these artificial intelligence tools, a consistent failure to recognize a sarcastic reply runs the risk of exacerbating whatever issue provoked that sarcasm in the first place.
When sarcasm leads to misunderstanding
Research led by Toronto Metropolitan University computer science professor Andriy Miranskyy is studying how LLMs and other AI tools can be trained and refined to better detect sarcasm, with the goal of improving human-computer interaction. The idea came from a curiosity about how AI-driven customer service systems might be able to better respond to frustrated users.
“We wanted to see what happens if a customer service ticket stays open too long, for instance,” professor Miranskyy said. “Would someone get annoyed because of that and make sarcastic comments? It's probably not a bad idea to detect sarcasm in that situation and have some mitigation, because sarcasm typically crops up when we're not happy. Detecting that might help streamline and smooth the conversation.”
Improved sarcasm detection in AI also has applications beyond customer service, including social media monitoring or moderating online discussions.
Professor Miranskyy’s research team included former TMU master’s student Montgomery Gole, as well as former postdoctoral fellow Williams-Paul Nwadiugwu. The group based their study on a dataset of 1.3 million comments drawn from the social media site Reddit, where users traditionally mark posts intended as sarcasm by ending with the tag /s.
The project made use of an existing benchmark established by a different team of researchers who studied the same trove of Reddit posts, using a panel of five people to assess whether or not each post was sarcastic. In doing so, they established a baseline for average human sarcasm detection on the dataset with an accuracy rate of 83 per cent.
In comparison, professor Miranskyy’s team found that an untrained version of the LLM behind ChapGPT identified approximately 70 per cent of sarcastic comments among the Reddit posts.
Improving accuracy through fine-tuning
An LLM’s output and behaviour are governed by billions of settings called hyperparameters. By “freezing” millions of these parameters and tweaking others, the team was able to train different LLMs to identify sarcasm more often.
“We did what's called fine-tuning, where you give the model extra data and hope it will learn something,” professor Miranskyy said.
That fine-tuning involved teaching the models to identify common linguistic patterns of sarcastic language, such as contradictory or exaggerated statements.
Optimal results came with a trained version of the Llama LLM developed by Meta, which effectively matched the human benchmark by accurately identifying sarcasm 83 per cent of the time. However, while the team’s fine-tuned LLMs got better at detecting sarcasm, the freezing of millions of parameters left them less effective at handling any other kind of request.
“You can’t just use (the sarcasm detector) as a single customer service chatbot,” professor Miranskyy explained. “Perhaps the fine-tuned model can be a building block in creating some sort of generalized model that converses with clients, a separate stream feeding on the same text messages. It could say ‘Look out, big model, this is a sarcastic comment. When you're replying, be careful.’”
Read the paper, “Assessing how hyperparameters impact Large Language Models’ sarcasm detection performance (external link, opens in new window) ,” on arXiv.