Proceedings of the 10th Convention of the
European Acoustics Association
Forum Acusticum 2023


Politecnico di Torino
Torino, Italy
September 11 - 15, 2023





Session: A13-05: Machine learning for acoustic sensor array processing
Date: Tuesday 12 September 2023
Time: 17:20 - 17:40
Title: Impact of input preprocessing for HRTF elevation classification over multiple datasets
Author(s): J.A. De Rus, Universitat de València, Avinguda de la Universitat, 46100 Burjassot, Spain
J. Lopez-Ballester, Universitat de València, Avinguda de la Universitat, 46100 Burjassot, Spain
M. Montagud, Universitat de València, Avinguda de la Universitat, 46100 Burjassot, Spain
F.J. Ferri, Universitat de València, Avinguda de la Universitat, 46100 Burjassot, Spain
J.J. López, iTEAM, Universitat Politècnica de València, Avinguda de la Universitat, 46100 Burjassot, Spain
M. Cobos, Universitat de València, Avinguda de la Universitat, 46100 Burjassot, Spain
Pages: 2161-2168
DOI: https://www.doi.org/10.61782/fa.2023.0984
PDF: https://dael.euracoustics.org/confs/fa2023/data/articles/000984.pdf
Conference proceedings
Abstract

The localization of sound sources on the horizontal plane is mainly aided by perceived interaural level and time differences. However, identifying elevation cues in Head-Related Transfer Functions (HRTFs) remains challenging. Spectral cues play a key role in localizing sources in elevation and are highly individual, resulting from anatomic characteristics specific to each person, such as the shape of the pinnae, head, or torso. In a previous study, we proposed a simple 1D convolutional neural network (CNN) trained to classify HRTF signals into different elevation sectors to identify spectral elevation cues using explainability techniques. Although the model obtained promising results, it was only trained and validated on the CIPIC database. In this work, we focus on developing a model that can generalize across multiple HRTF datasets to achieve good classification performance across various subjects and measurements. Since each dataset is obtained in different conditions (e.g., source signal used, distance between emitters and receivers, spatial resolution, calibration), the preprocessing of the data may significantly impact the overall inter-dataset model performance. We explore different preprocessing techniques and evaluate their impact on the classification task to select meaningful standardization strategies for working with multiple HRTF datasets.