Shounak Das

Final year Dual Degree (B.Tech + M.Tech) student in Electrical Engineering at IIT Bombay, with a minor in Artificial Intelligence, Machine Learning & Data Science.

I am currently working on Vision-Language Models (VLMs) at MeDAL Lab under the guidance of Prof. Amit Sethi. I have worked on cutting-edge industrial problems in ML, NLP, Generative AI, and CV at Intel, Swiggy, and IBM, and am currently interning at Fujitsu Research on LLMs. I'm also proficient in DSA and programming, and skilled in Signal Processing and Communication Systems.

Skills:

Languages: C++, Python, C, MATLAB, HTML, CSS, JavaScript, VHDL, Assembly, Arduino
Softwares: Git, AWS, Docker, Tableau, MySQL, PostgreSQL, Intel Quartus, Xilinx Vivado, GNU Radio, SOLIDWORKS
Python Libraries: NumPy, Matplotlib, Pandas, SciPy, PyTorch, TensorFlow, PySpark, OpenSlide, Transformers, Gensim, spaCy, Diffusers, LangChain, ChromaDB, Weaviate, NLTK, Streamlit

I have a strong interest in NLP, Generative AI, Large Language Models (LLMs), Vision-Language Models (VLMs), and Computer Vision, and love working on projects in these areas.

Publications:

Published at ICML, MIDL, ISBI, ICPR, BIBE, and BioImaging (Best Student Paper Award)

Education:

Dual Degree (B.Tech + M.Tech) in Electrical Engineering with a minor in AI, ML & Data Science at IIT Bombay

Email / Google Scholar / GitHub / LinkedIn / X

News

[06/2025] Our paper FEDTAIL has been accepted to ICML 2025!
[05/2025] Our paper Whole Slide Image Domain Adaptation Tailored with Fisher Vector, Self Supervised Learning, and Novel Loss Function has been accepted to MIDL 2025!
[01/2025] Our paper Scalable Whole Slide Image Representation Using K-Means Clustering and Fisher Vector Aggregation has been accepted to ISBI 2025!
[09/2024] Our paper Clustered Patch Embeddings for Permutation-Invariant Classification of Whole Slide Images has been accepted to BIBE 2024!
[06/2024] Our paper IDAL: Improved Domain Adaptive Learning for Natural Images Dataset has been accepted to ICPR 2024!
[02/2024] Best Student Paper Award for our BioImaging 2024 paper on domain adaptation for histology images!
[12/2023] Our paper Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images has been accepted to BioImaging 2024!

Work Experience

	AI Research Intern Fujitsu Research May 2025 - Present
	AI Solutions Engineering Intern Intel Corporation July 2024 - May 2025 Created and pipelined an Optical Character Recognition (OCR) system for accurately recognizing text in Hindi, Telugu & Kannada from images, for NIC (Govt of India), improving regional language text digitization. Utilized WhisperX for precise audio labeling of multilingual audio, achieving accurate transcription alignment & automated timestamp generation, which streamlined annotation workflows and enhanced efficiency. Optimized AI workloads on Intel Habana accelerators using advanced frameworks & quantization techniques.
	Data Science Intern Swiggy July 2024 - August 2024 Developed a robust topic modeling framework using BERTopic and an Azure OpenAI LLM agent, enabling the identification & prediction of emerging trends in events and items, significantly enhancing predictive analytics. Developed a spell correction pipeline using unigram and bigram probability models, fuzzy logic, and SymSpell. Generated a custom dataset of spell errors on Instamart search queries using edit distance, decompounding & phonetics, and fine-tuned T5-small and trained a vanilla transformer (4 encoder & 5 decoder layers) from scratch.
	LLM Intern IBM May 2024 - June 2024 Optimized vector databases (ChromaDB and Weaviate) for Retrieval-Augmented Generation (RAG) in Large Language Models (LLM), improving indexing efficiency & search accuracy. Integrated TraceLoop & IBM Instana for observability of VectorDB searches & real-time performance analytics.
	Machine Learning Intern GMAC Intelligence December 2023 - January 2024 Worked on the MLCommons AlgoPerf Training Benchmark, a global competition focused on ML algorithms. Optimized a novel training algorithm across 6 datasets: Criteo 1TB (clickthrough), FastMRI (reconstruction), ImageNet (classification), LibriSpeech (speech), OGBG (molecular), and WMT (translation).
	Generative AI Intern MURVEN Design Solutions December 2022 - April 2023 Implemented Generative AI models like Deforum Stable Diffusion, VQGAN & Variational Autoencoders (VAEs) for text-to-image, image-to-image, and image-to-animation tasks, and deployed a prompt-based API on AWS infrastructure. Explored advanced flow-based models like RealNVP to enhance content diversity and quality.

Publications (Full list available at Google Scholar)

	FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching 42^nd International Conference on Machine Learning · ICML 2025 We present FedTAIL, a federated domain generalization framework designed to tackle domain shifts and long-tailed class distributions. By aligning gradients across objectives and dynamically reweighting underrepresented classes using sharpness-aware optimization, our method achieves state-of-the-art performance under label imbalance. FedTAIL enables scalable and robust generalization in both centralized and federated settings.
	Whole Slide Image Domain Adaptation Tailored with Fisher Vector, Self Supervised Learning, and Novel Loss Function 8^th Medical Imaging with Deep Learning · MIDL 2025 We introduce a domain adaptation framework for Whole Slide Image (WSI) classification that combines self-supervised learning, clustering, and Fisher Vector encoding. By extracting MoCoV3-based patch features and aggregating them via Gaussian mixture models, our method forms robust slide-level representations. Adversarial training with a hybrid PLMMD-MCC loss enables effective domain alignment, achieving strong performance on cross-domain HER2 classification tasks, even under label noise.
	Scalable Whole Slide Image Representation Using K-Means Clustering and Fisher Vector Aggregation IEEE 22^nd International Symposium on Biomedical Imaging · ISBI 2025 We propose a scalable method for whole slide image (WSI) classification that combines patch-based deep feature extraction, clustering, and Fisher Vector encoding. By modeling clustered patch embeddings with Gaussian mixture models, our approach generates compact yet expressive slide-level representations. This enables robust and accurate WSI classification while efficiently capturing both local and global tissue structures.
	IDAL: Improved Domain Adaptive Learning for Natural Images Dataset 27^th International Conference on Pattern Recognition · ICPR 2024 We propose a novel unsupervised domain adaptation (UDA) approach for natural images that combines ResNet with a feature pyramid network to capture both content and style features. A carefully designed loss function enhances alignment across domains with multi-modal distributions, improving robustness to scale, noise, and style shifts. Our method achieves superior performance on benchmarks like Office-Home, Office-31, and VisDA-2017, while maintaining competitive results on DomainNet.
	Clustered Patch Embeddings for Permutation-Invariant Classification of Whole Slide Images IEEE 24^th International Conference on Bioinformatics and Bioengineering · BIBE 2024 We propose an efficient WSI analysis framework that leverages diverse encoders and a specialized classification model to produce robust, permutation-invariant slide representations. By distilling a gigapixel WSI into a single informative vector, our method significantly improves computational efficiency without sacrificing diagnostic accuracy. This scalable approach enables effective utilization of WSIs in digital pathology and medical research.
	Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images 11^th International Conference on BioImaging · BioImaging 2024 Received the Best Student Paper Award We propose a novel approach for unsupervised domain adaptation designed for medical images like H&E-stained histology and retinal fundus scans. By leveraging texture-specific features such as tissue structure and cell morphology, DAL improves domain alignment using a custom loss function that enhances both accuracy and training efficiency. Our method outperforms ViT and CNN-based baselines on FHIST and retina datasets, demonstrating strong generalization and robustness.

Website template borrowed from here.