Mixed-modality speech recognition and interaction using a wearable artificial throat

  • Qisheng Yang
  • , Weiqiu Jin
  • , Qihang Zhang
  • , Yuhong Wei
  • , Zhanfeng Guo
  • , Xiaoshi Li
  • , Yi Yang*
  • , Qingquan Luo*
  • , He Tian*
  • , Tian Ling Ren*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

125 Scopus citations

Abstract

Researchers have recently been pursuing technologies for universal speech recognition and interaction that can work well with subtle sounds or noisy environments. Multichannel acoustic sensors can improve the accuracy of recognition of sound but lead to large devices that cannot be worn. To solve this problem, we propose a graphene-based intelligent, wearable artificial throat (AT) that is sensitive to human speech and vocalization-related motions. Its perception of the mixed modalities of acoustic signals and mechanical motions enables the AT to acquire signals with a low fundamental frequency while remaining noise resistant. The experimental results showed that the mixed-modality AT can detect basic speech elements (phonemes, tones and words) with an average accuracy of 99.05%. We further demonstrated its interactive applications for speech recognition and voice reproduction for the vocally disabled. It was able to recognize everyday words vaguely spoken by a patient with laryngectomy with an accuracy of over 90% through an ensemble AI model. The recognized content was synthesized into speech and played on the AT to rehabilitate the capability of the patient for vocalization. Its feasible fabrication process, stable performance, resistance to noise and integrated vocalization make the AT a promising tool for next-generation speech recognition and interaction systems.

Original languageEnglish
Pages (from-to)169-180
Number of pages12
JournalNature Machine Intelligence
Volume5
Issue number2
DOIs
StatePublished - Feb 2023
Externally publishedYes

Fingerprint

Dive into the research topics of 'Mixed-modality speech recognition and interaction using a wearable artificial throat'. Together they form a unique fingerprint.

Cite this