Shounak Das

Final year Dual Degree (B.Tech + M.Tech) student in Electrical Engineering at IIT Bombay, with a minor in AI, Machine Learning & Data Science, graduating in 2026.

I am currently working on Vision-Language Models (VLMs) at MeDAL Lab under the guidance of Prof. Amit Sethi. I have worked on cutting-edge industrial problems in ML, NLP, Generative AI, LLMs, and Computer Vision at Fujitsu Research, Intel, and Swiggy. I am also proficient in DSA and programming, and skilled in Signal Processing and Communication Systems.

Skills:

Languages: Python, C++, C, MATLAB, SQL, JavaScript, HTML, CSS, Bash
ML & Deep Learning Frameworks: PyTorch, TensorFlow, scikit-learn, Keras, Hugging Face, Diffusers, LangChain, LangGraph, OpenVINO
NLP & Computer Vision: spaCy, NLTK, Gensim, OpenCV, Whisper, CLIP, Gemini API, OpenAI API, gTTS
Data & MLOps: PySpark, ChromaDB, Weaviate, Docker, AWS, Databricks, Linux, Git, MLflow, Snowflake
Deployment & Web: Django, React, Streamlit, REST APIs, FastAPI, Flask, Nginx, Postman

I have a strong interest in Generative AI, NLP, Computer Vision, and Machine Learning and enjoy working on projects in these areas. I’m excited to apply my skills to impactful projects.

Interested in full-time opportunities in Machine Learning, Data Science, and Software Engineering starting Summer 2026.

Publications:

ICML '25 (International Conference on Machine Learning)
MIDL '25 (Medical Imaging with Deep Learning)
ISBI '25 (International Symposium on Biomedical Imaging)
ICPR '24 (International Conference on Pattern Recognition)
BIBE '24 (International Conference on Bioinformatics & Bioengineering)
BioImaging '24 (International Conference on BioImaging) — Best Student Paper Award

Education:

Dual Degree (B.Tech + M.Tech) in Electrical Engineering with a minor in AI, ML & Data Science at IIT Bombay

Email / Google Scholar / GitHub / LinkedIn / X

News

[06/2025] Our paper FEDTAIL has been accepted to ICML 2025!
[05/2025] Our paper Whole Slide Image Domain Adaptation Tailored with Fisher Vector, Self Supervised Learning, and Novel Loss Function has been accepted to MIDL 2025!
[01/2025] Our paper Scalable Whole Slide Image Representation Using K-Means Clustering and Fisher Vector Aggregation has been accepted to ISBI 2025!
[09/2024] Our paper Clustered Patch Embeddings for Permutation-Invariant Classification of Whole Slide Images has been accepted to BIBE 2024!
[06/2024] Our paper IDAL: Improved Domain Adaptive Learning for Natural Images Dataset has been accepted to ICPR 2024!
[02/2024] Best Student Paper Award for our BioImaging 2024 paper on domain adaptation for histology images!
[12/2023] Our paper Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images has been accepted to BioImaging 2024!

Work Experience

	AI Research Intern Fujitsu Research May 2025 - July 2025 Developed a proactive RCA framework for ~100k-line Warrior logs for next-step prediction and failure diagnosis. Built an InstructRAG pipeline using Gemini to synthesize QA data and fine-tuned Mamba-2 (2.7B) with LoRA. Overcame LLM context limitations using hypergraphs with Personalized PageRank and similarity compression. Achieved 42% BLEU and 4.4/5 LLM-as-a-Judge score, and deployed into Fujitsu's Kozuchi platform for business use.
	AI Solutions Engineering Intern Intel Corporation October 2024 - March 2025 Engineered an OCR system for Hindi, Telugu & Kannada using ViT to extract robust image features, enabling accurate recognition across diverse regional scripts. Integrated IndicBERT for context-aware recognition, boosting regional text digitization for NIC (Govt of India) and improving automated document processing. Utilized WhisperX for multilingual audio labeling, enabling accurate transcription alignment and timestamp generation. Optimized inference speed by 67% (LLaMA-3 8B) and 80% (Mistral 7B) using OpenVINO and IPEX, improving overall AI workload performance.
	Data Science Intern Swiggy July 2024 - October 2024 Developed a robust topic modeling framework using BERTopic and an Azure OpenAI LLM agent, enabling the identification & prediction of emerging trends in events and items, significantly enhancing predictive analytics. Built a custom spell-error dataset from Instamart SQL database using edit distance, decompounding, and phonetics. Developed a spell correction pipeline leveraging unigram and bigram probability models with fuzzy logic, achieving up to 83% correction accuracy.
	LLM Intern IBM May 2024 - June 2024 Designed and tuned ChromaDB and Weaviate pipelines for RAG, improving retrieval quality via optimized embeddings, indexing, and query-time filtering. Integrated TraceLoop and IBM Instana for VectorDB observability and real-time performance analytics.
	Machine Learning Intern GMAC Intelligence December 2023 - January 2024 Worked on the MLCommons AlgoPerf Training Benchmark, a global competition focused on ML algorithms. Optimized a novel training algorithm across 6 datasets: Criteo 1TB (clickthrough), FastMRI (reconstruction), ImageNet (classification), LibriSpeech (speech), OGBG (molecular), and WMT (translation).
	Generative AI Intern MURVEN Design Solutions December 2022 - April 2023 Implemented models like Deforum Stable Diffusion and VQGAN for text-to-image and image-to-image tasks, generating high-quality visual content from prompts. Developed image-to-animation pipelines and deployed a prompt-based API on AWS for real-time inference, enabling interactive generative workflows. Explored advanced flow-based models like RealNVP to enhance generative content diversity and quality.

Key Projects

Vision-Language Models for Whole Slide Images (WSIs)
Dual Degree Project | Guide: Prof. Amit Sethi, MeDAL Lab, IIT Bombay
May 2025 - Present

Developing a multi-resolution vision-language pipeline for gigapixel images, enabling scalable text-guided representation learning.
Engineering semantic-guided prompt tuning with CLIP and LLaVA, using knowledge distillation for robust few-shot classification.
Implementing distribution-aware cross-modal alignment to reduce modality gaps and improve generalization in VLMs.

AI Guard Agent: Multimodal Vision-Language Surveillance System
Course Project | Advanced Machine Learning
Sep 2025 – Oct 2025

Developed an AI Guard Agent for real-time surveillance using vision, speech, and LLM-based reasoning.
Integrated Whisper, Coqui TTS, and Gemini with meta-prompting and 4-level escalation for access control.
Achieved 97.6% SSIM for face image similarity verification and 0.8s speech-to-text latency with robust multimodal response.

Whole Slide Image Analysis for Cancer Classification
Supervised Research Exposition | Guide: Prof. Amit Sethi, MeDAL Lab, IIT Bombay
Jan 2024 – Nov 2024

Developed a WSI classification framework using attention-based MIL on patch-wise Fisher vectors.
Extracted patch features via ResNet, Swin, MoCo, SimCLR and encoded using a 5-component GMM.
Achieved AUC 0.83 (Warwick) and accuracies 0.86 / 0.84 on TCGA-BRCA / LUAD, surpassing SoTA benchmarks.

Advanced CV Models for Super-Resolution and Visual Analysis
Course Project | Machine Learning for Remote Sensing-II
Mar 2025 – May 2025

Secured 1st place in the course image super-resolution Kaggle competition using a custom EDSR model on gaming data.
Trained EDSR from scratch using ResBlocks and PixelShuffle, achieving a top score of 59.39 (joint PSNR + SSIM).
Built vision models with CAM for CNN interpretability and generative models (GANs, Hierarchical VAEs).

Automatic Text Summarization
Course Project | Introduction to Machine Learning
Apr 2024 – May 2024

Built an NLP-based summarization system using TF-IDF, Seq2Seq, and BART with real-time deployment through Streamlit.
Achieved ROUGE-L scores of 0.878 (BART) and 0.814 (Seq2Seq) on the XSum and SamSum datasets.

Deep Reinforcement Learning and NLP for Stock Allocation
Seasons of Code | Web & Coding Club, IIT Bombay
May 2023 – Jul 2023

Developed a Deep RL framework combining DQN, PPO, and FinBERT-based NLP sentiments for dynamic portfolio optimization.
Engineered a custom Gymnasium trading environment using volatility, MACD, and RSI for risk-adjusted optimization.
Achieved 25% annualized returns in backtesting on historical S&P 500 (SPY) data sourced from Yahoo Finance.

RFID-based Inventory Management System
Sensors & Firmware Product | Guide: Prof. Siddharth Tallur, IIT Bombay
Jan 2024 – Apr 2024

Built a full-stack Inventory Management System using Django + React with secure authentication and multi-scanner integration.
Designed and implemented REST APIs for real-time synchronization, concurrent data logging, and automated email alerts.
Implemented a battery management module with LCD UI using C and the Raspberry Pi SDK for firmware-level control.

Publications (Full list available at Google Scholar)

	FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching 42^nd International Conference on Machine Learning · ICML 2025 Vancouver, Canada We present FedTAIL, a federated domain generalization framework designed to tackle domain shifts and long-tailed class distributions. By aligning gradients across objectives and dynamically reweighting underrepresented classes using sharpness-aware optimization, our method achieves state-of-the-art performance under label imbalance. FedTAIL enables scalable and robust generalization in both centralized and federated settings.
	Whole Slide Image Domain Adaptation Tailored with Fisher Vector, Self Supervised Learning, and Novel Loss Function 8^th Medical Imaging with Deep Learning · MIDL 2025 Salt Lake City, USA We introduce a domain adaptation framework for Whole Slide Image (WSI) classification that combines self-supervised learning, clustering, and Fisher Vector encoding. By extracting MoCoV3-based patch features and aggregating them via Gaussian mixture models, our method forms robust slide-level representations. Adversarial training with a hybrid PLMMD-MCC loss enables effective domain alignment, achieving strong performance on cross-domain HER2 classification tasks, even under label noise.
	Scalable Whole Slide Image Representation Using K-Means Clustering and Fisher Vector Aggregation IEEE 22^nd International Symposium on Biomedical Imaging · ISBI 2025 Texas, USA We propose a scalable method for whole slide image (WSI) classification that combines patch-based deep feature extraction, clustering, and Fisher Vector encoding. By modeling clustered patch embeddings with Gaussian mixture models, our approach generates compact yet expressive slide-level representations. This enables robust and accurate WSI classification while efficiently capturing both local and global tissue structures.
	IDAL: Improved Domain Adaptive Learning for Natural Images Dataset 27^th International Conference on Pattern Recognition · ICPR 2024 Kolkata, India We propose a novel unsupervised domain adaptation (UDA) approach for natural images that combines ResNet with a feature pyramid network to capture both content and style features. A carefully designed loss function enhances alignment across domains with multi-modal distributions, improving robustness to scale, noise, and style shifts. Our method achieves superior performance on benchmarks like Office-Home, Office-31, and VisDA-2017, while maintaining competitive results on DomainNet.
	Clustered Patch Embeddings for Permutation-Invariant Classification of Whole Slide Images IEEE 24^th International Conference on Bioinformatics and Bioengineering · BIBE 2024 Kragujevac, Serbia We propose an efficient WSI analysis framework that leverages diverse encoders and a specialized classification model to produce robust, permutation-invariant slide representations. By distilling a gigapixel WSI into a single informative vector, our method significantly improves computational efficiency without sacrificing diagnostic accuracy. This scalable approach enables effective utilization of WSIs in digital pathology and medical research.
	Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images 11^th International Conference on BioImaging · BioImaging 2024 Rome, Italy Received the Best Student Paper Award We propose a novel approach for unsupervised domain adaptation designed for medical images like H&E-stained histology and retinal fundus scans. By leveraging texture-specific features such as tissue structure and cell morphology, DAL improves domain alignment using a custom loss function that enhances both accuracy and training efficiency. Our method outperforms ViT and CNN-based baselines on FHIST and retina datasets, demonstrating strong generalization and robustness.

Website template borrowed from here.