Advances in machine learning, particularly in computer vision and facial recognition, have enabled systems deployed by governments and private corporations to erode expectations of privacy. Adversarial attacks, or techniques that attempt to fool models with deceptive images, videos, and text, are one way to combat surveillance. But in a new paper, researchers from Microsoft, Harvard, Swarthmore, and Citizen Lab make the claim that most attacks proposed to date aren’t practical or “ready for the ‘real world.’”
Studies of attacks without sufficient experimentation can cause real harm. The researchers say they’ve been approached by advocates who wish to use adversarial techniques in high-risk contexts — contexts that could lead to imprisonment if those techniques were to fail. “Increased testing of adversarial machine learning techniques, especially with groups from diverse backgrounds, will increase knowledge about the effectiveness of these techniques across populations,” the researchers wrote in a paper describing their work. “This could potentially lead to improved understanding and effectiveness of adversarial machine learning attacks.”
The researchers focused on studies detailing three kinds of computer vision attacks involving humans: (1) adversarial clothing and patches, (2) adversarial hats, and (3) adversarial eyewear. They paid particular attention to studies that received widespread press coverage, honing in on those that claimed to enable wearers to avoid detection or facial recognition.
The researchers found that in some of the papers they reviewed, the authors didn’t thoroughly test their attacks in real life or didn’t describe them in a way that allowed for either evaluation or replication. None of the studies discussed consent or formal approval by an institutional review board. And only some of the papers blurred participant’s images to preserve privacy.
More problematically, the researchers found that all of the papers they reviewed had “noticeably” small sample sizes, usually only one or two people. Authors were often testing their attacks on themselves or friends and colleagues. As a result, the characteristics of the test subjects that could inform the generalizability of attacks, including their gender, race, and age, were rarely reported.
“In some papers we reviewed, the authors did not robustly test the tech in real life, or if they did, did not describe it in a way that allowed for either evaluation or replication of their methods. [And] across many papers, there was no opportunity to observe whether there even could be variance across human test subjects,” the researchers wrote. “For some technologies, we could theorize a physical way in which an attack might interact with different skin colors, skin textures, etc. Likewise, in clothing testing, there was rarely variation in the person’s weight and height, or how the clothes were worn, meaning that it is unclear how widely generalizable the results are to other circumstances.”
To address the lack of test subject diversity, the researchers recommend that adversarial attack studies take into account factors like height, weight, body shape and size, and mobility aids. They also advocate testing in varied real-world environments with different perspectives and contextualizing the research in terms of the population who might seek to use it, reflecting the fact that different populations will have different cultural contexts. Moreover, the researchers highlight the need to carefully document the methods used to enable replication by other researchers, calling on journals and conferences to place stringent requirements to clearly highlight the limitations of trials.