Shounak Das
Final year Dual Degree (B.Tech + M.Tech) student in Electrical Engineering at IIT Bombay, with a minor in Artificial Intelligence, Machine Learning & Data Science.
I am currently working on Vision-Language Models (VLMs) at MeDAL Lab under the guidance of Prof. Amit Sethi. I have worked on cutting-edge industrial problems in ML, NLP, Generative AI, and CV at
Intel, Swiggy, and IBM, and am currently interning at Fujitsu Research on LLMs. I'm also proficient in DSA and programming, and skilled in Signal Processing and Communication Systems.
Skills:
- Languages: C++, Python, C, MATLAB, HTML, CSS, JavaScript, VHDL, Assembly, Arduino
- Softwares: Git, AWS, Docker, Tableau, MySQL, PostgreSQL, Intel Quartus, Xilinx Vivado, GNU Radio, SOLIDWORKS
- Python Libraries: NumPy, Matplotlib, Pandas, SciPy, PyTorch, TensorFlow, PySpark, OpenSlide, Transformers,
Gensim, spaCy, Diffusers, LangChain, ChromaDB, Weaviate, NLTK, Streamlit
I have a strong interest in NLP, Generative AI, Large Language Models (LLMs), Vision-Language Models (VLMs), and Computer Vision, and love working on projects in these areas.
Publications:
- Published at ICML, MIDL, ISBI, ICPR, BIBE, and BioImaging (Best Student Paper Award)
Education:
- Dual Degree (B.Tech + M.Tech) in Electrical Engineering with a minor in AI, ML & Data Science at IIT Bombay
Email  / 
Google Scholar  / 
GitHub  / 
LinkedIn  / 
X
|
|
|
AI Research Intern
Fujitsu Research
May 2025 - Present
|
|
AI Solutions Engineering Intern
Intel Corporation
July 2024 - May 2025
-
Created and pipelined an Optical Character Recognition (OCR) system for accurately recognizing text in
Hindi, Telugu & Kannada from images, for NIC (Govt of India), improving
regional language text digitization.
-
Utilized WhisperX for precise audio labeling of multilingual audio, achieving accurate
transcription alignment & automated timestamp generation, which streamlined annotation workflows and enhanced efficiency.
-
Optimized AI workloads on Intel Habana accelerators using advanced frameworks &
quantization techniques.
|
|
Data Science Intern
Swiggy
July 2024 - August 2024
-
Developed a robust topic modeling framework using BERTopic and an Azure OpenAI LLM agent, enabling the
identification & prediction of emerging trends in events and items, significantly enhancing predictive analytics.
-
Developed a spell correction pipeline using unigram and bigram probability models, fuzzy logic, and
SymSpell.
-
Generated a custom dataset of spell errors on Instamart search queries using edit distance, decompounding &
phonetics, and fine-tuned T5-small and trained a vanilla transformer
(4 encoder & 5 decoder layers) from scratch.
|
|
LLM Intern
IBM
May 2024 - June 2024
-
Optimized vector databases (ChromaDB and Weaviate) for
Retrieval-Augmented Generation (RAG) in Large Language Models (LLM),
improving indexing efficiency & search accuracy.
-
Integrated TraceLoop & IBM Instana for observability of VectorDB searches
& real-time performance analytics.
|
|
Machine Learning Intern
GMAC Intelligence
December 2023 - January 2024
-
Worked on the MLCommons AlgoPerf Training Benchmark, a global competition focused on ML algorithms.
-
Optimized a novel training algorithm across 6 datasets:
Criteo 1TB (clickthrough),
FastMRI (reconstruction),
ImageNet (classification),
LibriSpeech (speech),
OGBG (molecular), and
WMT (translation).
|
|
Generative AI Intern
MURVEN Design Solutions
December 2022 - April 2023
-
Implemented Generative AI models like Deforum Stable Diffusion, VQGAN &
Variational Autoencoders (VAEs) for text-to-image, image-to-image, and image-to-animation tasks,
and deployed a prompt-based API on AWS infrastructure.
-
Explored advanced flow-based models like RealNVP to enhance content diversity and quality.
|
|
FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching
42nd International Conference on Machine Learning · ICML 2025
We present FedTAIL, a federated domain generalization framework designed to tackle domain shifts and long-tailed class distributions. By aligning gradients across objectives and dynamically reweighting underrepresented classes using sharpness-aware optimization, our method achieves state-of-the-art performance under label imbalance. FedTAIL enables scalable and robust generalization in both centralized and federated settings.
|
|
Whole Slide Image Domain Adaptation Tailored with Fisher Vector, Self Supervised Learning, and Novel Loss Function
8th Medical Imaging with Deep Learning · MIDL 2025
We introduce a domain adaptation framework for Whole Slide Image (WSI) classification that combines self-supervised learning, clustering, and Fisher Vector encoding. By extracting MoCoV3-based patch features and aggregating them via Gaussian mixture models, our method forms robust slide-level representations. Adversarial training with a hybrid PLMMD-MCC loss enables effective domain alignment, achieving strong performance on cross-domain HER2 classification tasks, even under label noise.
|
|
Scalable Whole Slide Image Representation Using K-Means Clustering and Fisher Vector Aggregation
IEEE 22nd International Symposium on Biomedical Imaging · ISBI 2025
We propose a scalable method for whole slide image (WSI) classification that combines patch-based deep feature extraction, clustering, and Fisher Vector encoding. By modeling clustered patch embeddings with Gaussian mixture models, our approach generates compact yet expressive slide-level representations. This enables robust and accurate WSI classification while efficiently capturing both local and global tissue structures.
|
|
IDAL: Improved Domain Adaptive Learning for Natural Images Dataset
27th International Conference on Pattern Recognition · ICPR 2024
We propose a novel unsupervised domain adaptation (UDA) approach for natural images that combines ResNet with a feature pyramid network to capture both content and style features. A carefully designed loss function enhances alignment across domains with multi-modal distributions, improving robustness to scale, noise, and style shifts. Our method achieves superior performance on benchmarks like Office-Home, Office-31, and VisDA-2017, while maintaining competitive results on DomainNet.
|
|
Clustered Patch Embeddings for Permutation-Invariant Classification of Whole Slide Images
IEEE 24th International Conference on Bioinformatics and Bioengineering · BIBE 2024
We propose an efficient WSI analysis framework that leverages diverse encoders and a specialized classification model to produce robust, permutation-invariant slide representations. By distilling a gigapixel WSI into a single informative vector, our method significantly improves computational efficiency without sacrificing diagnostic accuracy. This scalable approach enables effective utilization of WSIs in digital pathology and medical research.
|
|
Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images
11th International Conference on BioImaging · BioImaging 2024
Received the Best Student Paper Award
We propose a novel approach for unsupervised domain adaptation designed for medical images like H&E-stained histology and retinal fundus scans. By leveraging texture-specific features such as tissue structure and cell morphology, DAL improves domain alignment using a custom loss function that enhances both accuracy and training efficiency. Our method outperforms ViT and CNN-based baselines on FHIST and retina datasets, demonstrating strong generalization and robustness.
|
Website template borrowed from here.
|
|