My research lies at the intersection of computer vision and machine learning. I am particularly focused on the controllability of generative models across various modalities (image, video, 3D, 4D). I have broadly worked on visual grounding and audio reasoning of Multimodal Large Language Models (MLLMs). I have also worked on emotional 3D animation of virtual humans, with applications in VR-based human–computer interaction (HCI). Previously, I explored Bayesian optimization for multi-agent systems, robotic systems, and computer-aided engineering (CAE).
@misc{chhatre2025learning3dtextureawarerepresentations,
title={Learning {3D} Texture-Aware Representations for Parsing Diverse Human Clothing and Body Parts},
author={Kiran Chhatre and Christopher Peters and Srikrishna Karanam},
year={2025},
eprint={2508.06032},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.06032},
}
Spectrum introduces a novel repurposing of an Image-to-Texture diffusion model for improved alignment with body parts and clothing, enabling detailed human parsing that handles diverse clothing types and complex poses across any number of humans in the scene.
Synthetically Expressive: Evaluating gesture and voice for emotion and empathy in VR and 2D scenarios
@article{du2025synthetically,
title={Synthetically Expressive: Evaluating gesture and voice for emotion and empathy in VR and 2D scenarios},
author={Du, Haoyang and Chhatre, Kiran and Peters, Christopher and Keegan, Brian and McDonnell, Rachel and Ennis, Cathy},
journal={arXiv preprint arXiv:2506.23777},
year={2025}
}
This work evaluates gesture and voice synthesis for conveying emotion and empathy in both VR and 2D scenarios, providing insights into the effectiveness of synthetic emotional expressions across different interaction modalities.
Evaluation of Generative Models for Emotional 3D Animation Generation in VRjournal
@article{chhatre2025evaluation,
title={Evaluation of Generative Models for Emotional 3D Animation Generation in VR},
author={Chhatre, Kiran and Guarese, Renan and Matviienko, Andrii and Peters, Christopher Edward},
journal={Frontiers in Computer Science},
volume={7},
pages={1598099},
year={2025},
publisher={Frontiers}
}
@inproceedings{chhatre2025evaluating,
title={Evaluating Speech and Video Models for Face-Body Congruence},
author={Chhatre, Kiran and Guarese, Renan and Matviienko, Andrii and Peters, Christopher},
booktitle={Companion Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games},
pages={1--3},
year={2025}
}
This work evaluates emotional 3D animation generative models within an immersive Virtual Reality environment, emphasizing user-centric metrics including emotional arousal realism, naturalness, enjoyment, diversity, face-body congruence, and interaction quality in real-time human-agent interaction scenarios.
Audiopedia: Audio QA with Knowledgeoral
Abhirama S. Penamakuri*, Kiran Chhatre*, Akshat Jain (* denotes equal contribution)
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
arxiv /
website /
code /
@INPROCEEDINGS{10889814,
author={Penamakuri, Abhirama Subramanyam and Chhatre, Kiran and Jain, Akshat},
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Audiopedia: Audio QA with Knowledge},
year={2025},
volume={},
number={},
pages={1-5},
keywords={Adaptation models;Benchmark testing;Signal processing;Question answering (information retrieval);Cognition;Acoustics;Speech processing;audio question answering;knowledge-intensive questions;audio entity linking},
doi={10.1109/ICASSP49660.2025.10889814}}
Audiopedia introduces a novel, knowledge-intensive audio question answering task and proposes a framework to enhance audio language models by integrating external knowledge.
AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
@InProceedings{Chhatre_2024_CVPR,
author = {Chhatre, Kiran and Daněček, Radek and Athanasiou, Nikos and Becherini, Giorgio and Peters, Christopher and Black, Michael J. and Bolkart, Timo},
title = {AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {1942-1953},
url = {https://amuse.is.tue.mpg.de},
}
AMUSE generates realistic emotional 3D body gestures directly from a speech sequence. It provides user control over the generated emotion by combining the driving speech with a different emotional audio.
EMOTE: Emotional Speech-Driven Animation with Content-Emotion Disentanglement
@inproceedings{10.1145/3610548.3618183,
author = {Dan\v{e}\v{c}ek, Radek and Chhatre, Kiran and Tripathi, Shashank and Wen, Yandong and Black, Michael and Bolkart, Timo},
title = {Emotional Speech-Driven Animation with Content-Emotion Disentanglement},
year = {2023},
isbn = {9798400703157},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3610548.3618183},
doi = {10.1145/3610548.3618183},
booktitle = {SIGGRAPH Asia 2023 Conference Papers},
articleno = {41},
numpages = {13},
keywords = {Computer Graphics, Computer Vision, Deep learning, Facial Animation, Speech-driven Animation},
location = {Sydney, NSW, Australia},
series = {SA '23}
}
Given audio input and an emotion label, EMOTE generates an animated 3D head that has state-of-the-art lip synchronization while expressing the emotion. The method is trained from 2D video sequences using a novel video emotion loss and a mechanism to disentangle emotion from speech.
BEAMBayesOpt: Parallel Bayesian Optimization of Agent-Based Transportation Simulationspecial mentions
@inproceedings{chhatre2022parallel,
title={Parallel Bayesian Optimization of Agent-Based Transportation Simulation},
author={Chhatre, Kiran and Feygin, Sidney and Sheppard, Colin and Waraich, Rashid},
booktitle={International Conference on Machine Learning, Optimization, and Data Science},
pages={470--484},
year={2022},
organization={Springer}
}
BEAMBayesOpt introduces a parallel Bayesian optimization approach with early stopping that autonomously calibrates hyperparameters in BEAM’s large-scale multi-agent transportation simulations and enables efficient surrogate modeling of complex scenarios.
@article{deichler2021spatio,
title={Spatio-temporal priors in 3D human motion},
author={Deichler, Anna and Chhatre, Kiran and Peters, Christopher and Beskow, Jonas},
year={2021}
}
This workshop paper investigates spatial-temporal priors for 3D human motion synthesis by comparing graph convolutional networks and transformer architectures to capture dynamic joint dependencies.
Rethinking Computer-Aided Architectural Design (CAAD) – From Generative Algorithms and Architectural Intelligence to Environmental Design and Ambient Intelligence
@inproceedings{stojanovski2021rethinking,
title={Rethinking computer-aided architectural design (CAAD)--from generative algorithms and architectural intelligence to environmental design and ambient intelligence},
author={Stojanovski, Todor and Zhang, Hui and Frid, Emma and Chhatre, Kiran and Peters, Christopher and Samuels, Ivor and Sanders, Paul and Partanen, Jenni and Lefosse, Deborah},
booktitle={International Conference on Computer-Aided Architectural Design Futures},
pages={62--83},
year={2021},
organization={Springer}
}
This paper reviews the evolution of CAAD—from generative algorithms and BIM to current AI developments—and argues that integrating AI-driven ambient intelligence into digital design tools can transform architectural and urban design for smarter, more sustainable cities.
Academic Services
Conference Reviewer: NeurIPS, ICLR, AAAI, ICCV, SIGGRAPH, SIGGRAPH Asia, ISMAR, CoG, IVA Journal Reviewer: Pattern Recognition, IEEE Transactions on Affective Computing Program Committee:CLIPE Workshop at Eurographics 2024