Learning speaker representation with semi-supervised learning approach for speaker profiling

Shangeth Rajaa, Pham Van Tung, Chng Eng Siong

October 2021

PDF arXiv

Abstract

We address speaker profiling — estimating characteristics like age and height — by proposing a semi-supervised framework that leverages external corpora to improve performance with limited training data. Our approach incorporates three components: supervised learning, unsupervised speaker representation learning, and consistency training for robustness. Evaluated on TIMIT and NISP datasets using Librispeech as external data, the method achieves competitive results, including RMSE of 6.8 and 7.4 years and MAE of 4.8 for age estimation in male and female speakers respectively.

Type

Preprint

Publication

arXiv preprint arXiv:2110.13653