Home PC News Researchers develop sentence rewriting technique to fool text classifiers

Researchers develop sentence rewriting technique to fool text classifiers

A recent paper coauthored by MIT researchers highlights the problem of sentence-level attacks against text classifiers, in which an attacker alters a sentence to trigger misclassification while keeping the sentence’s literal meaning unchanged.

Text classifiers are used in a range of applications, particularly document processing. Such systems allow companies to structure, normalize, and standardize business information like email, legal documents, webpages, and chat conversations. Attacks on these classifiers could be disastrous in industries like home lending, which increasingly relies on AI to process the hundreds of pages associated with mortgages.

Their framework — conditional BERT sampling (CBS) — feeds sentences from an AI language model to RewritingSampler, an instance of CBS that rewrites the sentences specifically to attack classifiers. In experiments, the researchers claim CBS and RewritingSampler achieve a better attack success rate than existing word-level methods.

The researchers’ CBS framework and RewritingSampler start with a seed sentence and iteratively sample and replace words in the sentence for a given number of times. They use the sum of word embeddings — a type of word representation that allows words with similar meaning to have a similar representation — to minimize the semantic differences between the original and rewritten sentences. OpenAI’s GPT-2 language model checks the grammatical quality, allowing for control and flexible rewriting of the sentences.

In experiments involving text classification datasets of news, movie reviews, Yelp reviews, and IMDB movie reviews, along with two natural language inference datasets, the researchers found that their approach “significantly” outperformed a baseline. For example, given the sentence “Turkey is put on track for EU membership,” which the target classifier would classify “World,” the rewritten sentence “EU puts Turkey on track for full membership” yields the classification “Business.” Theoretically, if the method were to be used against a real-world classification system, a document labeled “New York loan applications for October” could be mislabeled “Not urgent” as opposed to “Timely,” delaying processing.

“Most adversarial attack methods that are designed to deceive a text classifier change the text classifier’s prediction by modifying a few words or characters. Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inherent in sentence-level rephrasing as well as the problem of setting the criteria for legitimate rewriting,” the researchers wrote. “We solve the problems [with our framework].”

The work builds on TextFooler, a framework for synthesizing adversarial text examples designed by researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), the University of Hong Kong, and Singapore’s Agency for Science, Technology, and Research. Like the coauthors of this latest work, TextFooler’s creators note that while the system could be misused for attacks, it can also be used to test the robustness of models and improve their generalization via adversarial training.

“If [language models] are vulnerable to purposeful adversarial attacking, then the consequences may be disastrous,” Di Jin, MIT Ph.D. student and lead author on the TextFooler research paper, said in a previous statement. “These tools need to have effective defense approaches to protect themselves, and in order to make such a safe defense system, we need to first examine the adversarial methods.”

How startups are scaling communication:

The pandemic is making startups take a close look at ramping up their communication solutions. Learn how

Most Popular

How Mark Kelly used conversational AI to help win a Senate seat

Conversational artificial intelligence has rapidly smartened and scaled since chatbots first entered mainstream social media in 2016. The first few iterations of chatbots on...

Slack could quickly become Salesforce’s golden goose

Last week, news broke that Salesforce was thought to be in advanced talks to acquire Slack. This inevitably fuelled much excitement and debate, not...

Zebra’s enterprise AR glasses add XMReality Remote Guidance software

Augmented reality headsets are becoming important tools for enterprises, enabling frontline workers to instantly access reference data as they’re in the field. Today, industrial...

What’s New in DirectX 12? Understanding DirectML, DirectX Raytracing and DirectStorage

DirectX has been with us for 25 years, providing developers with the tools to make incredible games. The latest version, DX12 was released in...

Recent Comments