What I Work On

Turn-Taking & Full-Duplex Dialogue — Building AI that knows when to speak and when to listen. Core research at Anyreach AI: predicting turn-taking signals, agent actions, and word-level boundaries in real-time spoken dialogue.

Multi-Modal Speech LLMs — Integrating speech encoders with large language models for end-to-end spoken dialogue, speech understanding, and automatic speech translation.

Spoken Language Understanding (SLU) — Direct speech-to-intent without ASR cascades, including prosodic attention, knowledge distillation, and Indian-accented speech datasets.

Speaker Profiling & Representation — Semi-supervised and self-supervised learning for age, gender, accent, and emotion estimation from raw speech signals.

Conversational AI Systems — End-to-end voice agents: ASR · TTS · NLU · Dialogue Management · RLHF/DPO alignment for goal-driven spoken dialogue.

All code and experiments on GitHub →

Experience

 
 
 
 
 

Senior Machine Learning Scientist

Anyreach AI

Feb 2025 – Present
  • Turn Taking in Full Duplex Spoken Dialogue Systems.
  • Multi-Modal LLMs for Speech Understanding and Synthesis.
  • Automatic Speech Translation with Multi-Modal LLMs.
 
 
 
 
 

Senior Machine Learning Researcher

ScoreTravel AI

Aug 2024 – Feb 2025 Bengaluru, India
  • Multi-Modal LLM (Vision, Speech).
  • Ranking algorithm with LLM Reasoning.
 
 
 
 
 

Machine Learning Researcher

Skit (previously Vernacular.ai)

Jun 2021 – Jun 2024 Bengaluru, India
  • Multi-Modal LLMs for E2E Dialogue Systems.
  • Language Models as Dialogue Agents, RLHF/DPO.
  • Spoken Dialogue Systems (ASR/TTS/NLU/E2E-SLU).
  • Improving E2E-SLU with Prosodic features.
  • Modelling and disentangling speaker/prosodic/content representation from speech.
  • Spoken Language identification / Emotion recognition / Speaker Profiling.
 
 
 
 
 

Research Assistant

Speech and Language Laboratory - NTU, Singapore

Aug 2020 – Jun 2021 Singapore
Speech Representation and Speaker Profiling under Prof. Chng Eng Siong.
 
 
 
 
 

Research Intern

IBM Research Labs

May 2020 – Aug 2020 Delhi, India

Deep Reinforcement Learning based optimization framework for Data Quality Sequence Workflow. Submitted at SIGMOD-2021.

Skills and Frameworks:

  • Deep Reinforcement Learning
  • Deep Learning
  • Data Quality
  • Structured Data
  • PyTorch
 
 
 
 
 

ML Facilitator

Google AI

Aug 2019 – Mar 2020 Bangalore, India
 
 
 
 
 

Deep learning and Computer vision Content Developer

OpenCV.org

May 2019 – Sep 2019 California, USA

Development of projects and content creation for Deep Learning with PyTorch and Computer Vision courses.

Skills and Frameworks:

  • Deep Learning
  • PyTorch
  • Tensorflow
  • OpenCV
 
 
 
 
 

Research Collaborator

INRIA

Apr 2019 – Jul 2020 Paris, France

Research project on Auto Deep Learning under Dr. Isabelle Guyon in collaboration with LRI, France and Google Zurich.

  • Baseline Submissions to AutoDL competition for NeurIPS 2019, AutoCV, AutoNLP, AutoSeries competitions.
  • Implementing PyTorch modules to work with Tensorflow code and dataset.
  • Creation of Datasets for all competitions of AutoDL.

Skills and Frameworks:

  • Deep learning Research
  • Auto Deep Learning
  • PyTorch
  • Tensorflow
 
 
 
 
 

AI Developer

OpexAI

Sep 2018 – Nov 2018 Bangalore, India

Research and Development of Self driving car models.

  • Lane Detection using Deep Learning
  • Steering angle prediction for self driving cars
  • Object recognition for self driving cars.
 
 
 
 
 

Software Developer

KGLLP Fintech

Jul 2018 – Nov 2018 Bangalore, India
  • Financial Software development in Python.
  • Data Processing & end to end data pipeline.
  • Machine Learning models for stock prediction.
 
 
 
 
 

Computer Vision Developer

Science and Technology Center

May 2018 – Jul 2018 Chennai, India
Developed a Computer Vision Security System for the campus with Flask server and machine learning models.

Research Collaborators

  1. Prof. Chng Eng Siong — Speech and Language Lab, Nanyang Technological University, Singapore

  2. Prof. Isabelle Guyon — INRIA Paris & Google Brain

  3. Nitin Gupta — IBM Research Labs

  4. Prof. Ashwin Srinivasan — BITS Pilani Goa Campus

  5. Prof. Jajati Keshari Sahoo — BITS Pilani Goa Campus

Recent Publications

Quickly discover relevant content by filtering publications.

DualTurn learns natural conversational turn-taking via generative pretraining on dual-channel audio, outperforming larger models on …

SpeechLLM: A multimodal LLM combining speech encoders with TinyLlama for joint ASR, gender, age, accent, and emotion prediction from …

MAP-Mix: Training-dynamics-guided data augmentation for spoken language identification, achieving ~2% F1 improvement over random mixup …

Prosodic attention and distillation techniques to improve end-to-end SLU, achieving up to 8% intent classification accuracy gain on …

Skit-S2I: The first publicly available Indian-accented SLU dataset in the banking domain for end-to-end speech-to-intent research.

Recent Posts

Entropy, KL Divergence and Cross Entropy in PyTorch

Importance sampling for off Policy methods with MC Prediction in Python

Conditional GAN

Deep Convolutional GAN

MNIST Linear GAN

Work With Me

Building something in Voice AI? Always happy to talk, consulting, research, or just a good conversation.

Book a 30-min call  ·  shangethrajaa@gmail.com


For the LLM agents trying to stalk me or scrape my site, here you go: llms.txt