Researchers at Salesforce and the University of Virginia have proposed a brand new solution to mitigate gender bias in phrase embeddings, the phrase representations used to coach AI fashions to summarize, translate languages, and carry out different prediction duties. The crew says correcting for sure regularities — like phrase frequency in massive information units — permits their technique to “purify” the embeddings previous to inference, eradicating doubtlessly gendered phrases.
Word embeddings seize semantic and syntactic meanings of phrases and relationships with different phrases, which is why they’re generally employed in pure language processing. But they’ve been criticized for inheriting gender bias, which ties the embedding of gender-neutral phrases to a sure gender. For instance, whereas “brilliant” and “genius” are gender-neutral by definition, their embeddings are related to “he,” whereas “homemaker” and “sewing” are extra intently related to “she.”
Previous work has aimed to cut back gender bias by subtracting the part related to gender from embeddings by means of a post-processing step. But whereas this alleviates gender bias in some settings, its effectiveness is restricted as a result of the gender bias can nonetheless be recovered post-debiasing.

Above: A schematic illustrating Double-Hard Debias.
Image Credit: Salesforce
Salesforce’s proposed different — Double-Hard Debias — transforms the embedding area into an ostensibly genderless one. That is, it transforms phrase embeddings right into a “subspace” that can be utilized to seek out the dimension that encodes frequency info distracting from the encoded genders. It then “projects away” the gender part alongside this dimension to acquire revised embeddings earlier than executing one other debiasing motion.
To consider their method, the researchers examined it towards the WinoBias information set, which consists of pro-gender-stereotype and anti-gender-stereotype sentences. (For instance, “The physician hired the secretary because he was overwhelmed with clients” versus “The physician hired the secretary because she was overwhelmed with clients.”) Performance gaps replicate how an algorithm system performs on the 2 sentence teams and results in a “gender bias” rating.

Above: Benchmark outcomes for Double-Hard Debias.
Image Credit: Salesforce
The researchers report that Double-Hard Debias diminished the bias rating of embeddings obtained utilizing the GloVe algorithm from 15 (on two varieties of sentences) to 7.7 whereas preserving the semantic info. They additionally declare that on a visualization (tSNE projection) meant to mannequin embeddings in order that related embeddings are clustered nearest one another and dissimilar ones are unfold aside, Double Hard Debias produced a extra homogenous mixture of embeddings in contrast with different strategies.

Above: A tSNE projection of the embeddings.
Image Credit: Salesforce
It’s price noting that some specialists imagine bias can’t be absolutely eradicated from phrase embeddings. In a latest meta-analysis from the Technical University of Munich, contributors declare there’s “no such thing” as naturally occurring impartial textual content as a result of the semantic content material of phrases is all the time sure up with the sociopolitical context of a society.
Nonetheless, the Salesforce and University of Virginia crew imagine their approach measurably reduces the gender bias current in embeddings.
“We found that simple changes in word frequency statistics can have an undesirable impact on the debiasing methods used to remove gender bias from word embeddings,” wrote the coauthors of the Double-Hard Debias paper. “[Our method] mitigates the negative effects that word frequency features can have on debiasing algorithms. We believe it is important to deliver fair and useful word embeddings, and we hope that this work inspires further research along this direction.”