Skit-S2I: An Indian Accented Speech to Intent Dataset

Shangeth Rajaa, Swaraj Dalmia, Kumarmanas Nethil

December 2022

PDF Code arXiv

Abstract

Traditional conversation systems extract text from speech using automatic speech recognition, then predict intent from transcriptions. We present Skit-S2I, the first publicly available Indian-accented spoken language understanding dataset in the banking domain in a conversational tonality. The end-to-end SLU approach directly predicts speaker intent from the speech signal, avoiding cascading errors from ASR and reducing latency. We test various baseline models and pretrained speech encoders, finding that self-supervised learning representations perform slightly better than ASR-based representations for this classification task.

Type

Preprint

Publication

arXiv preprint arXiv:2212.13015