Deepgram, a Y Combinator graduate constructing tailor-made speech recognition fashions, as we speak introduced it has raised $12 million in collection A financing. CEO and cofounder Scott Stephenson says the proceeds will bolster the event of Deepgram’s platform, which helps enterprises to course of assembly, name, and presentation recordings. If all goes in line with plan — if Deepgram’s scale ultimately matches that of the competitors — it may save organizations helpful time by spotlighting key outcomes.
“Consumer-facing technologies like Alexa and Siri have set the stage for speech recognition,” mentioned Stephenson. “However … pre-built speech recognition can only get you so far, and throwing resources at the problem won’t solve the issue either. At Deepgram, we’ve created an entirely different solution using end to end deep learning, resulting in a faster, much more accurate and reliable solution that truly addresses the needs of enterprise companies.”
Deepgram leverages a backend speech stack that eschews hand-engineered pipelines for heuristics, stats-based, and absolutely end-to-end AI processing, with hybrid fashions skilled on PCs outfitted with highly effective graphics processing items. Each customized mannequin is skilled from the bottom up and may ingest information in codecs starting from telephone calls and podcasts to recorded conferences and movies. Deepgram processes the speech, which is saved in what’s known as a “deep representation index” that teams sounds by phonetics versus phrases. Customers can seek for phrases by the way in which they sound and, even when they’re misspelled, Deepgram can discover them.
Stephenson says Deepgram’s fashions routinely choose up issues like microphone noise profiles, in addition to background noise, audio encodings, transmission protocols, accents, valence (i.e., vitality), sentiment, matters of dialog, charges of speech, product names, and languages. Moreover, he claims they will enhance speech recognition accuracy by 30% in contrast with trade baselines whereas rushing up transcription by 200 instances, and whereas dealing with hundreds of simultaneous audio streams.
Soon, the fashions will turn into much more succesful with the launch of two new options: real-time streaming and on-premises deployment. Real-time streaming will let prospects analyze and transcribe speech as phrases are being spoken, whereas on-premises deployment will present a personal, deployable occasion of Deepgram’s product to be used circumstances involving confidential, regulated, or in any other case delicate audio information.
Deepgram is way from the one participant in a speech recognition market that’s anticipated to be value $21.5 billion by 2024, in line with Markets and Markets. Tech giants like Nuance, Cisco, Google, Microsoft, and Amazon provide real-time voice transcription and captioning companies, as do startups like Otter. There’s additionally Verbit, which just lately raised $31 million for its human-in-the-loop AI transcription tech; Oto, which final December snagged $5.three million to enhance speech recognition with intonation information; and Voicera, which has raked in over $20 million for AI that attracts insights from assembly notes.
But in line with Stephenson, Deepgram hasn’t had a lot hassle attracting prospects. It has greater than 30 presently, together with Genesys, Memrise, Poly, Sharpen, and Observe.ai.
Wing VC led Deepgram’s collection A elevate, which noticed participation from SAP.io, Y Combinator, and Nvidia GPU Ventures and which brings the entire raised up to now to over $13 million. The San Francisco-based firm was based in 2015 by University of Michigan physics graduate Noah Shutty and Stephenson, a Ph.D. pupil who previously labored on the University of California Davis’ Large Underground Xenon Detector (LUX/LZ), a big and delicate darkish matter detector, and who helped to develop the faculty’s Davis Xenon (DaX) dual-phase liquid xenon detector program.