Deep Learning

DualTurn: Learning Turn-Taking from Dual-Channel Generative Speech Pretraining

DualTurn learns natural conversational turn-taking via generative pretraining on dual-channel audio, outperforming larger models on turn prediction tasks.

SpeechLLM: Multi-Modal LLM for Speech Understanding

SpeechLLM: A multimodal LLM combining speech encoders with TinyLlama for joint ASR, gender, age, accent, and emotion prediction from audio.

Improving spoken language identification with MAP-Mix

MAP-Mix: Training-dynamics-guided data augmentation for spoken language identification, achieving ~2% F1 improvement over random mixup at ICASSP 2023.

Improving end-to-end spoken language understanding with prosodic attention and distillation

Prosodic attention and distillation techniques to improve end-to-end SLU, achieving up to 8% intent classification accuracy gain on SLURP dataset.

Skit-S2I: An Indian Accented Speech to Intent Dataset

Skit-S2I: The first publicly available Indian-accented SLU dataset in the banking domain for end-to-end speech-to-intent research.

Learning speaker representation with semi-supervised learning approach for speaker profiling

Semi-supervised framework for speaker profiling (age, height estimation) leveraging external unlabelled speech data via consistency training.

RL based framework to generate optimal data quality remediation sequence for machine learning

Reinforcement Learning based framework to generate optimal data quality remediation sequence for machine learning pipelines.

Facial Emotion Recognition PyTorch ONNX

Recognizing the facial emotions with Deep learning model trained on PyTorch and deployed with TF.js model converted with ONNX.

Pneumonia Diagnosis with Deep Learning

Web Application for Diagnosis of Pnuemonia with deep learning model trained and backed with PyTorch framework.

Convolutional Feature Extraction and Neural Arithmetic Logic Units for Stock Prediction

Stock Prediction with CNN and Neural Arithmetic Logic Units.