Proceedings of the 10th Convention of the
European Acoustics Association
Forum Acusticum 2023


Politecnico di Torino
Torino, Italy
September 11 - 15, 2023





Session: A15-06: Application of Machine Learning and Automated Methods to Speech and Voice Research - Part II
Date: Tuesday 12 September 2023
Time: 15:20 - 15:40
Title: Speech-dependent Modeling of Own Voice Transfer Characteristics for In-ear Microphones in Hearables
Author(s): M. Ohlenbusch, Fraunhofer IDMT, Oldenburg Branch HSA, Marie-Curie-Strasse 2, 26129 Oldenburg, Germany
C. Rollwage, Fraunhofer IDMT, Oldenburg Branch HSA, Marie-Curie-Strasse 2, 26129 Oldenburg, Germany
S. Doclo, Fraunhofer IDMT, Oldenburg Branch HSA, Marie-Curie-Strasse 2, 26129 Oldenburg, Germany
Pages: 1899-1902
DOI: https://www.doi.org/10.61782/fa.2023.1030
PDF: https://dael.euracoustics.org/confs/fa2023/data/articles/001030.pdf
Conference proceedings
Abstract

Many hearables contain an in-ear microphone, which may be used to capture the own voice of its user in noisy environments. Since the in-ear microphone mostly records body-conducted speech due to ear canal occlusion, it suffers from band-limitation effects while only capturing a limited amount of external noise. To enhance the quality of the in-ear microphone signal using algorithms aiming at joint bandwidth extension, equalization, and noise reduction, it is desirable to have an accurate model of the own voice transfer characteristics between the entrance of the ear canal and the in-ear microphone. Such a model can be used, e.g., to simulate a large amount of in-ear recordings to train supervised learning-based algorithms. Since previous research on ear canal occlusion suggests that own voice transfer characteristics depend on speech content, in this contribution we propose a speech-dependent system identification model based on phoneme recognition. We assess the accuracy of simulating own voice speech by speech-dependent and speech-independent modeling and investigate how well modeling approaches are able to generalize to different talkers. Simulation results show that using the proposed speech-dependent model is preferable for simulating in-ear recordings compared to using a speech-independent model.