Can face masks have an effect on the accuracy of computerized speech recognition techniques? That’s the query researchers on the Educational Testing Service (ETS), the nonprofit evaluation group headquartered in Princeton, New Jersey, sought to reply in a study printed this week. Drawing on recordings from ETS’ English language proficiency check, for which exam-takers have been required to put on face masks, they discovered that whereas variations between the recordings and no-mask baselines existed, they didn’t result in “significant” variations in scores.
The pandemic has led to a dramatic enhance in using face masks worldwide, with 65% of U.S. adults saying they wore a masks in shops in the course of the month of May, according to the Pew Research Center. This has potential implications for the speech algorithms underpinning good audio system, good shows, cellular apps, and certainly automated language proficiency exams. Face coverings are available all sizes and thicknesses and may affect a wearer’s speech patterns, for instance by distorting the sound of an individual’s speech or by drastically attenuating it.
The researchers got down to decide whether or not masks would possibly introduce bias in automated language proficiency exams — a salient query contemplating rules in some areas of the world make mask-wearing obligatory for test-takers. They collated a corpus of 1,188 responses from 597 folks collected in a language check in Hong Kong between February and March, throughout which exam-takers have been tasked with answering questions for between 45 seconds and 1 minute every.
For the needs of comparability, the researchers created a baseline with 1,200 spoken responses from 300 individuals who took the check in fall 2019, earlier than mask-wearing guidelines have been carried out.
To suss out the variations between the 2 information units, the researchers extracted and in contrast 88 acoustic options together with measurements associated to frequency, amplitude, and spectral traits. They additionally thought of whether or not face masks had any impact on test-taker speech patterns, utilizing speech recognition hypotheses and timestamps to compute options designed to seize whether or not these carrying masks made extra pauses, spoke extra slowly, or confirmed completely different patterns of disfluencies.
The coauthors report that 4 options they thought of — common length of silences and variety of silences per phrase in addition to the length of chunks between pauses — confirmed variations. Speakers carrying masks spoke with about the identical articulation fee as these not carrying masks, however paused barely extra usually, and carrying masks decreased the length of chunks between two pauses by 0.6 phrases (or 0.2 seconds).
These variations, nevertheless, didn’t seem to manifest in speech recognition efficiency metrics. After evaluating a pattern of 55 transcribed responses — 28 from the mask-wearing group and 27 from the baseline group — the researchers discovered the phrase error fee was decrease for masks wearers in contrast with these not donning masks (27.6% versus 20.7%). They additionally report that the imply check scores throughout each teams have been just about the identical: 2.79 for the baseline and a pair of.80 for the mask-wearers.
The researchers would possibly have a battle of curiosity given the check they evaluated is ETS’ personal and that their pattern dimension is on the small facet. But in assist of their findings, they cite earlier, smaller research exhibiting masks have “no significant effect” on language proficiency scores assigned by raters or on the accuracy of “closed-set” speaker identification techniques.
“Our classifier experiments showed that it is possible to predict with almost 80% accuracy whether a test-taker is wearing a mask or not … However, these differences in acoustics and speech patterns did not have a further effect on the performance of automated speech recognition or the automated scoring engine,” the researchers wrote. “The differences we observed for low-level acoustic features suggest that some types of technologies and applications may be more affected than others.”
The work follows a research from Duke Kunshan University, Wuhan University, Lenovo, and Sun Yat-sen University in Guangzhou describing a system that may detect with 78.8% accuracy whether or not an individual is carrying a masks from the sound of their speech. Masked speech detection is a sub-challenge on the 11th annual Computational Paralinguistics Challenge (ComParE), which is scheduled to happen in the course of the upcoming Interspeech 2020 convention in October.