Home PC News Researchers’ AI system strips identifiable attributes like gender from speech recordings

Researchers’ AI system strips identifiable attributes like gender from speech recordings

In a examine accepted to the 2020 International Conference on Machine Learning final week, researchers on the Chalmers University of Technology and the RISE Research Institutes of Sweden suggest a privacy-preserving technique that learns to obfuscate attributes like gender in speech information. They use a mannequin that’s educated
to filter delicate info in recordings after which generate new and personal info impartial of the filtered one, guaranteeing delicate info stays hidden with out sacrificing realism and utility.

Maintaining privateness with out dishing out with like voice assistants altogether is a difficult job, given state-of-the-art AI strategies have been used to deduce attributes like intention, gender, emotional state, and identification from timbre, pitch, and speaker model. Recent reporting revealed that unintended voice assistant activations uncovered staff to non-public conversations; the danger is such that regulation corporations together with Mischon de Reya have suggested workers to mute sensible audio system after they discuss consumer issues at residence. Google Assistant, Siri, Cortana, and different main voice recognition platforms permit the deletion of recorded information, however this requires some — and in some instances substantial — effort on customers’ elements.

The researcher’s answer employs a generative adversarial community (GAN) known as PCMelGAN, a two-part AI mannequin consisting of a generator that creates samples and a discriminator that makes an attempt to distinguish between the generated samples and real-world samples. It maps speech recordings to mel spectrograms, or representations of the spectrum of frequencies of the audio sign because it varies over time, and passes them via a filter that removes delicate info and a generator that provides artificial info as a substitute. PCMelGAN then inverts the mel spectrogram output into audio within the type of a uncooked waveform.

In experiments, the researchers educated PCMelGAN on 10,000 samples from the open supply AudioMNIST information set, which includes 30,000 audio recordings of the digits one via 9 spoken within the English language. They measured privateness by figuring out whether or not a classifier might predict with higher than 50% accuracy a speaker’s unique gender after 5 runs on the spectrograms and the uncooked audio.

Here’s a recording of somebody saying “four”:

And right here’s PCMelGAN’s output:

Here’s somebody saying “six”:

And right here’s PCMelGAN’s output:


According to the researchers, the outcomes present PCMelGAN makes it empirically troublesome for adversaries to, for instance, infer the gender of the speaker whereas retaining qualities together with intonation and content material. “The proposed method can successfully obfuscate sensitive attributes in speech data and generates realistic speech independent of the sensitive input attribute. Our results for censoring the gender attribute on the AudioMNIST dataset, demonstrate that the method can maintain a high level of utility,” they wrote. As extra information is collected in numerous settings throughout organizations, corporations, and international locations, there was a rise within the demand of person privateness.”

Most Popular

Recent Comments