DualTurn learns natural conversational turn-taking via generative pretraining on dual-channel audio, outperforming larger models on turn prediction tasks.
SpeechLLM: A multimodal LLM combining speech encoders with TinyLlama for joint ASR, gender, age, accent, and emotion prediction from audio.
MAP-Mix: Training-dynamics-guided data augmentation for spoken language identification, achieving ~2% F1 improvement over random mixup at ICASSP 2023.
Prosodic attention and distillation techniques to improve end-to-end SLU, achieving up to 8% intent classification accuracy gain on SLURP dataset.
Skit-S2I: The first publicly available Indian-accented SLU dataset in the banking domain for end-to-end speech-to-intent research.
Semi-supervised framework for speaker profiling (age, height estimation) leveraging external unlabelled speech data via consistency training.
Reinforcement Learning based framework to generate optimal data quality remediation sequence for machine learning pipelines.
Recognizing the facial emotions with Deep learning model trained on PyTorch and deployed with TF.js model converted with ONNX.
Web Application for Diagnosis of Pnuemonia with deep learning model trained and backed with PyTorch framework.
Stock Prediction with CNN and Neural Arithmetic Logic Units.