Shounak Das

Final year Dual Degree (B.Tech + M.Tech) student in Electrical Engineering at IIT Bombay, with a minor in Artificial Intelligence, Machine Learning & Data Science.

I am currently working on Vision-Language Models (VLMs) at MeDAL Lab under the guidance of Prof. Amit Sethi. I have worked on cutting-edge industrial problems in ML, NLP, Generative AI, LLMs, and Computer Vision at Fujitsu Research, Intel, and Swiggy. I am also proficient in DSA and programming, and skilled in Signal Processing and Communication Systems.

Skills:

  • Languages: C++, Python, C, MATLAB, HTML, CSS, JavaScript, VHDL, Assembly, Arduino
  • Softwares: Git, AWS, Docker, Tableau, MySQL, PostgreSQL, Intel Quartus, Xilinx Vivado, GNU Radio, SOLIDWORKS
  • Python Libraries: NumPy, Matplotlib, Pandas, SciPy, PyTorch, TensorFlow, PySpark, OpenSlide, Transformers, Gensim, spaCy, Diffusers, LangChain, ChromaDB, Weaviate, NLTK, Streamlit

I have a strong interest in Generative AI, NLP, Computer Vision, and Machine Learning and enjoy working on projects in these areas. I’m excited to apply my skills to impactful projects and am currently looking for full-time roles in Machine Learning, Data Science, and Software Engineering starting in 2026.

Publications:

    • ICML '25 (International Conference on Machine Learning)
    • MIDL '25 (Medical Imaging with Deep Learning)
    • ISBI '25 (International Symposium on Biomedical Imaging)
    • ICPR '24 (International Conference on Pattern Recognition)
    • BIBE '24 (International Conference on Bioinformatics & Bioengineering)
    • BioImaging '24 (International Conference on BioImaging) — Best Student Paper Award

Education:

  • Dual Degree (B.Tech + M.Tech) in Electrical Engineering with a minor in AI, ML & Data Science at IIT Bombay

Email  /  Google Scholar  /  GitHub  /  LinkedIn  /  X

profile photo
News

Work Experience
AI Research Intern
Fujitsu Research
May 2025 - July 2025
AI Solutions Engineering Intern
Intel Corporation
July 2024 - May 2025
  • Pipelined an OCR system for Hindi, Telugu & Kannada using ViT to extract robust image features, enabling accurate recognition across diverse regional scripts.
  • Integrated IndicBERT for context-aware recognition, boosting regional text digitization for NIC (Govt of India) and improving automated document processing.
  • Utilized WhisperX for multilingual audio labeling, enabling accurate transcription alignment and timestamp generation.
  • Optimized inference speed by 67% (LLaMA-3 8B) and 80% (Mistral 7B) using OpenVINO and IPEX, improving overall AI workload performance.
Data Science Intern
Swiggy
July 2024 - August 2024
  • Developed a robust topic modeling framework using BERTopic and an Azure OpenAI LLM agent, enabling the identification & prediction of emerging trends in events and items, significantly enhancing predictive analytics.
  • Built a custom spell-error dataset from Instamart SQL database using edit distance, decompounding, and phonetics.
  • Developed a spell correction pipeline leveraging unigram and bigram probability models with fuzzy logic, achieving up to 83% correction accuracy.
LLM Intern
IBM
May 2024 - June 2024
  • Optimized vector databases (ChromaDB and Weaviate) for Retrieval-Augmented Generation (RAG) in Large Language Models (LLM), improving indexing efficiency & search accuracy.
  • Integrated TraceLoop & IBM Instana for observability of VectorDB searches & real-time performance analytics.
Machine Learning Intern
GMAC Intelligence
December 2023 - January 2024
  • Worked on the MLCommons AlgoPerf Training Benchmark, a global competition focused on ML algorithms.
  • Optimized a novel training algorithm across 6 datasets: Criteo 1TB (clickthrough), FastMRI (reconstruction), ImageNet (classification), LibriSpeech (speech), OGBG (molecular), and WMT (translation).
Generative AI Intern
MURVEN Design Solutions
December 2022 - April 2023
  • Implemented models like Deforum Stable Diffusion and VQGAN for text-to-image and image-to-image tasks, generating high-quality visual content from prompts.
  • Developed image-to-animation pipelines and deployed a prompt-based API on AWS for real-time inference, enabling interactive generative workflows.
  • Explored advanced flow-based models like RealNVP to enhance generative content diversity and quality, improving model expressiveness and output variety.
Publications (Full list available at Google Scholar)
FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching
42nd International Conference on Machine Learning · ICML 2025
Vancouver, Canada

We present FedTAIL, a federated domain generalization framework designed to tackle domain shifts and long-tailed class distributions. By aligning gradients across objectives and dynamically reweighting underrepresented classes using sharpness-aware optimization, our method achieves state-of-the-art performance under label imbalance. FedTAIL enables scalable and robust generalization in both centralized and federated settings.

Whole Slide Image Domain Adaptation Tailored with Fisher Vector, Self Supervised Learning, and Novel Loss Function
8th Medical Imaging with Deep Learning · MIDL 2025
Salt Lake City, USA

We introduce a domain adaptation framework for Whole Slide Image (WSI) classification that combines self-supervised learning, clustering, and Fisher Vector encoding. By extracting MoCoV3-based patch features and aggregating them via Gaussian mixture models, our method forms robust slide-level representations. Adversarial training with a hybrid PLMMD-MCC loss enables effective domain alignment, achieving strong performance on cross-domain HER2 classification tasks, even under label noise.

Scalable Whole Slide Image Representation Using K-Means Clustering and Fisher Vector Aggregation
IEEE 22nd International Symposium on Biomedical Imaging · ISBI 2025
Texas, USA

We propose a scalable method for whole slide image (WSI) classification that combines patch-based deep feature extraction, clustering, and Fisher Vector encoding. By modeling clustered patch embeddings with Gaussian mixture models, our approach generates compact yet expressive slide-level representations. This enables robust and accurate WSI classification while efficiently capturing both local and global tissue structures.

IDAL: Improved Domain Adaptive Learning for Natural Images Dataset
27th International Conference on Pattern Recognition · ICPR 2024
Kolkata, India

We propose a novel unsupervised domain adaptation (UDA) approach for natural images that combines ResNet with a feature pyramid network to capture both content and style features. A carefully designed loss function enhances alignment across domains with multi-modal distributions, improving robustness to scale, noise, and style shifts. Our method achieves superior performance on benchmarks like Office-Home, Office-31, and VisDA-2017, while maintaining competitive results on DomainNet.

Clustered Patch Embeddings for Permutation-Invariant Classification of Whole Slide Images
IEEE 24th International Conference on Bioinformatics and Bioengineering · BIBE 2024
Kragujevac, Serbia

We propose an efficient WSI analysis framework that leverages diverse encoders and a specialized classification model to produce robust, permutation-invariant slide representations. By distilling a gigapixel WSI into a single informative vector, our method significantly improves computational efficiency without sacrificing diagnostic accuracy. This scalable approach enables effective utilization of WSIs in digital pathology and medical research.

Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images
11th International Conference on BioImaging · BioImaging 2024
Rome, Italy
Received the Best Student Paper Award

We propose a novel approach for unsupervised domain adaptation designed for medical images like H&E-stained histology and retinal fundus scans. By leveraging texture-specific features such as tissue structure and cell morphology, DAL improves domain alignment using a custom loss function that enhances both accuracy and training efficiency. Our method outperforms ViT and CNN-based baselines on FHIST and retina datasets, demonstrating strong generalization and robustness.


Website template borrowed from here.