CV | Shangeth Rajaa

Professional Experience

Senior Machine Learning Research Scientist

Feb 2025 - Present

Anyreach AI

Working on full-duplex spoken dialogue systems and turn-taking. Building and evaluating ASR/LLM/TTS pipelines, training acoustic and semantic models for turn detection, and doing multimodal LLM research for speech understanding and translation. DualTurn (dual-channel turn-taking model) accepted at Interspeech 2026.

Senior Machine Learning Researcher

Jul 2024 - Feb 2025

ScoreTravel AI

Designed a ranker algorithm for an LLM-based recommendation engine. Built multimodal LLMs for short-form video understanding (Reels, TikTok). Worked on system design for a personalized agentic travel booking product.

Machine Learning Researcher

Jun 2021 - Jul 2024

Skit.ai (Vernacular.ai)

Worked across the full voice AI stack: speech synthesis, prosody transfer, voice cloning, language ID, and speaker recognition. Built end-to-end spoken dialogue systems from scratch, moving from traditional intent-based pipelines to LLMs. Did LLM alignment work with RLHF and DPO, and built multimodal LLMs for speech understanding.

Research Assistant

Aug 2020 - Jun 2021

Speech and Language Lab, NTU Singapore

Research on unsupervised and semi-supervised speech representation learning under Prof. Chng Eng Siong. Worked on speaker profiling (age, height, gender), accent recognition, and accented speech recognition.

Research Intern

May 2020 - Aug 2020

IBM Research Labs

Built novel data quality metrics and transformations for structured data. Used deep reinforcement learning to optimize the sequencing of data quality operations.

Research Collaborator

Jan 2018 - Jan 2020

INRIA France

Worked with Prof. Isabelle Guyon on the AutoDL project. Built multimodal auto deep learning models and helped organize the AutoDL challenge series at NeurIPS 2019.

Research Interests

Full-Duplex Spoken Dialogue · Turn-Taking · Speech & Multimodal LLMs · Speech Representations · Speech Understanding and Synthesis

Skills

Voice AI & Speech: ASR, TTS, SLU, prosody modeling, speaker representation, voice cloning, language ID, end-to-end spoken dialogue systems.

Speech & Multimodal LLMs: Pretraining, instruction-tuning, RLHF/DPO alignment, dual-channel generative pretraining, speech tokenization.

Modeling & Systems: PyTorch, JAX, distributed training, model distillation, latency-sensitive serving for real-time voice.

Languages & Tools: Python, C/C++, JavaScript; AWS, GCP, Docker; LaTeX.

Projects

All ML and research projects are on github.com/shangeth.

Open Source

wavencoder — Python package for audio encoder models and transforms for speech deep learning tasks.
SpeechLLM — Multimodal LLM for speech understanding: ASR, gender, age, accent, emotion, and speech activity detection. HuggingFace
Wren — Family of multimodal LLMs under 3B params for speech understanding, synthesis, and conversational agents. HuggingFace
Semantic Turn-Taking LLM — Small LLM that predicts agent action (speak, listen, continue) from conversation context.

Community

Google AI Explore ML — Instructor for a deep learning course across India.
Google Code-In — Mentor under the TensorFlow org.
OpenCV.org — Built projects and wrote content for the “Deep Learning with PyTorch” course.
NeurIPS 2019 AutoDL — Helped organize AutoDL competitions (AutoCV, AutoNLP, AutoSpeech) with INRIA Paris.

CV