|
Shounak Das
Final year Dual Degree (B.Tech + M.Tech) student in Electrical Engineering at
IIT Bombay,
with a minor in AI, Machine Learning & Data Science,
graduating in 2026.
I am currently working on Vision-Language Models (VLMs) at MeDAL Lab under the guidance of Prof. Amit Sethi.
I have worked on cutting-edge industrial problems in ML, NLP, Generative AI, LLMs, and Computer Vision at Fujitsu Research, Intel, and Swiggy. I am also proficient in DSA and programming, and skilled in Signal Processing and Communication Systems.
Skills:
-
Languages:
Python, C++, C, MATLAB, SQL, JavaScript, HTML, CSS, Bash
-
ML & Deep Learning Frameworks:
PyTorch, TensorFlow, scikit-learn, Keras, Hugging Face, Diffusers,
LangChain, LangGraph, OpenVINO
-
NLP & Computer Vision:
spaCy, NLTK, Gensim, OpenCV, Whisper, CLIP,
Gemini API, OpenAI API, gTTS
-
Data & MLOps:
PySpark, ChromaDB, Weaviate, Docker, AWS, Databricks,
Linux, Git, MLflow, Snowflake
-
Deployment & Web:
Django, React, Streamlit, REST APIs, FastAPI,
Flask, Nginx, Postman
I have a strong interest in Generative AI, NLP, Computer Vision, and Machine Learning and enjoy working on projects in these areas. I’m excited to apply my skills to impactful projects.
Interested in full-time opportunities in Machine Learning, Data Science, and Software Engineering starting Summer 2026.
Publications:
- ICML '25 (International Conference on Machine Learning)
- MIDL '25 (Medical Imaging with Deep Learning)
- ISBI '25 (International Symposium on Biomedical Imaging)
- ICPR '24 (International Conference on Pattern Recognition)
- BIBE '24 (International Conference on Bioinformatics & Bioengineering)
- BioImaging '24 (International Conference on BioImaging) — Best Student Paper Award
Education:
- Dual Degree (B.Tech + M.Tech) in Electrical Engineering with a minor in AI, ML & Data Science at IIT Bombay
Email  / 
Google Scholar  / 
GitHub  / 
LinkedIn  / 
X
|
|
|
|
AI Research Intern
Fujitsu Research
May 2025 - July 2025
-
Developed a proactive RCA framework for
~100k-line Warrior logs for next-step prediction
and failure diagnosis.
-
Built an InstructRAG pipeline using Gemini
to synthesize QA data and fine-tuned Mamba-2 (2.7B)
with LoRA.
-
Overcame LLM context limitations using hypergraphs
with Personalized PageRank and similarity compression.
-
Achieved 42% BLEU and 4.4/5 LLM-as-a-Judge
score, and deployed into Fujitsu's Kozuchi platform
for business use.
|
|
|
AI Solutions Engineering Intern
Intel Corporation
October 2024 - March 2025
-
Engineered an OCR system for Hindi, Telugu & Kannada using ViT to extract robust image features, enabling accurate recognition across diverse regional scripts.
-
Integrated IndicBERT for context-aware recognition, boosting regional text digitization for NIC (Govt of India) and improving automated document processing.
-
Utilized WhisperX for multilingual audio labeling, enabling accurate transcription alignment and timestamp generation.
-
Optimized inference speed by 67% (LLaMA-3 8B) and 80% (Mistral 7B) using OpenVINO and IPEX, improving overall AI workload performance.
|
|
|
Data Science Intern
Swiggy
July 2024 - October 2024
-
Developed a robust topic modeling framework using BERTopic and an Azure OpenAI LLM agent, enabling the
identification & prediction of emerging trends in events and items, significantly enhancing predictive analytics.
-
Built a custom spell-error dataset from Instamart SQL database using edit distance, decompounding, and phonetics.
-
Developed a spell correction pipeline leveraging unigram and bigram probability models with fuzzy logic, achieving up to 83% correction accuracy.
|
|
|
LLM Intern
IBM
May 2024 - June 2024
-
Designed and tuned ChromaDB and Weaviate pipelines for
RAG, improving retrieval quality via optimized
embeddings, indexing, and
query-time filtering.
-
Integrated TraceLoop and IBM Instana for
VectorDB observability and real-time performance analytics.
|
|
|
Machine Learning Intern
GMAC Intelligence
December 2023 - January 2024
-
Worked on the MLCommons AlgoPerf Training Benchmark, a global competition focused on ML algorithms.
-
Optimized a novel training algorithm across 6 datasets:
Criteo 1TB (clickthrough),
FastMRI (reconstruction),
ImageNet (classification),
LibriSpeech (speech),
OGBG (molecular), and
WMT (translation).
|
|
|
Generative AI Intern
MURVEN Design Solutions
December 2022 - April 2023
-
Implemented models like Deforum Stable Diffusion and VQGAN for text-to-image and image-to-image tasks, generating high-quality visual content from prompts.
-
Developed image-to-animation pipelines and deployed a prompt-based API on AWS for real-time inference, enabling interactive generative workflows.
-
Explored advanced flow-based models like RealNVP to enhance generative content diversity and quality.
|
Vision-Language Models for Whole Slide Images (WSIs)
Dual Degree Project | Guide: Prof. Amit Sethi, MeDAL Lab, IIT Bombay
May 2025 - Present
-
Developing a multi-resolution vision-language pipeline for gigapixel
images, enabling scalable text-guided representation learning.
-
Engineering semantic-guided prompt tuning with CLIP and LLaVA,
using knowledge distillation for robust few-shot classification.
-
Implementing distribution-aware cross-modal alignment to reduce modality gaps
and improve generalization in VLMs.
|
AI Guard Agent: Multimodal Vision-Language Surveillance System
Course Project | Advanced Machine Learning
Sep 2025 – Oct 2025
- Developed an AI Guard Agent for real-time surveillance using vision, speech, and LLM-based reasoning.
- Integrated Whisper, Coqui TTS, and Gemini with meta-prompting and 4-level escalation for access control.
- Achieved 97.6% SSIM for face image similarity verification and 0.8s speech-to-text latency with robust multimodal response.
|
Whole Slide Image Analysis for Cancer Classification
Supervised Research Exposition | Guide: Prof. Amit Sethi, MeDAL Lab, IIT Bombay
Jan 2024 – Nov 2024
- Developed a WSI classification framework using attention-based MIL on patch-wise Fisher vectors.
- Extracted patch features via ResNet, Swin, MoCo, SimCLR and encoded using a 5-component GMM.
- Achieved AUC 0.83 (Warwick) and accuracies 0.86 / 0.84 on TCGA-BRCA / LUAD, surpassing SoTA benchmarks.
|
Advanced CV Models for Super-Resolution and Visual Analysis
Course Project | Machine Learning for Remote Sensing-II
Mar 2025 – May 2025
-
Secured 1st place in the course image super-resolution Kaggle competition
using a custom EDSR model on gaming data.
-
Trained EDSR from scratch using ResBlocks and
PixelShuffle, achieving a top score of
59.39 (joint PSNR + SSIM).
-
Built vision models with CAM for CNN interpretability and
generative models (GANs, Hierarchical VAEs).
|
Automatic Text Summarization
Course Project | Introduction to Machine Learning
Apr 2024 – May 2024
- Built an NLP-based summarization system using TF-IDF, Seq2Seq, and BART with real-time deployment through Streamlit.
- Achieved ROUGE-L scores of 0.878 (BART) and 0.814 (Seq2Seq) on the XSum and SamSum datasets.
|
Deep Reinforcement Learning and NLP for Stock Allocation
Seasons of Code | Web & Coding Club, IIT Bombay
May 2023 – Jul 2023
-
Developed a Deep RL framework combining DQN, PPO, and
FinBERT-based NLP sentiments for dynamic portfolio optimization.
-
Engineered a custom Gymnasium trading environment using
volatility, MACD, and RSI for risk-adjusted optimization.
-
Achieved 25% annualized returns in backtesting on historical
S&P 500 (SPY) data sourced from Yahoo Finance.
|
RFID-based Inventory Management System
Sensors & Firmware Product | Guide: Prof. Siddharth Tallur, IIT Bombay
Jan 2024 – Apr 2024
-
Built a full-stack Inventory Management System using
Django + React with secure authentication and
multi-scanner integration.
-
Designed and implemented REST APIs for real-time synchronization,
concurrent data logging, and automated email alerts.
-
Implemented a battery management module with LCD UI
using C and the Raspberry Pi SDK for firmware-level control.
|
|
|
FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching
42nd International Conference on Machine Learning · ICML 2025
Vancouver, Canada
We present FedTAIL, a federated domain generalization framework designed to tackle domain shifts and long-tailed class distributions. By aligning gradients across objectives and dynamically reweighting underrepresented classes using sharpness-aware optimization, our method achieves state-of-the-art performance under label imbalance. FedTAIL enables scalable and robust generalization in both centralized and federated settings.
|
|
|
Whole Slide Image Domain Adaptation Tailored with Fisher Vector, Self Supervised Learning, and Novel Loss Function
8th Medical Imaging with Deep Learning · MIDL 2025
Salt Lake City, USA
We introduce a domain adaptation framework for Whole Slide Image (WSI) classification that combines self-supervised learning, clustering, and Fisher Vector encoding. By extracting MoCoV3-based patch features and aggregating them via Gaussian mixture models, our method forms robust slide-level representations. Adversarial training with a hybrid PLMMD-MCC loss enables effective domain alignment, achieving strong performance on cross-domain HER2 classification tasks, even under label noise.
|
|
|
Scalable Whole Slide Image Representation Using K-Means Clustering and Fisher Vector Aggregation
IEEE 22nd International Symposium on Biomedical Imaging · ISBI 2025
Texas, USA
We propose a scalable method for whole slide image (WSI) classification that combines patch-based deep feature extraction, clustering, and Fisher Vector encoding. By modeling clustered patch embeddings with Gaussian mixture models, our approach generates compact yet expressive slide-level representations. This enables robust and accurate WSI classification while efficiently capturing both local and global tissue structures.
|
|
|
IDAL: Improved Domain Adaptive Learning for Natural Images Dataset
27th International Conference on Pattern Recognition · ICPR 2024
Kolkata, India
We propose a novel unsupervised domain adaptation (UDA) approach for natural images that combines ResNet with a feature pyramid network to capture both content and style features. A carefully designed loss function enhances alignment across domains with multi-modal distributions, improving robustness to scale, noise, and style shifts. Our method achieves superior performance on benchmarks like Office-Home, Office-31, and VisDA-2017, while maintaining competitive results on DomainNet.
|
|
|
Clustered Patch Embeddings for Permutation-Invariant Classification of Whole Slide Images
IEEE 24th International Conference on Bioinformatics and Bioengineering · BIBE 2024
Kragujevac, Serbia
We propose an efficient WSI analysis framework that leverages diverse encoders and a specialized classification model to produce robust, permutation-invariant slide representations. By distilling a gigapixel WSI into a single informative vector, our method significantly improves computational efficiency without sacrificing diagnostic accuracy. This scalable approach enables effective utilization of WSIs in digital pathology and medical research.
|
|
|
Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images
11th International Conference on BioImaging · BioImaging 2024
Rome, Italy
Received the Best Student Paper Award
We propose a novel approach for unsupervised domain adaptation designed for medical images like H&E-stained histology and retinal fundus scans. By leveraging texture-specific features such as tissue structure and cell morphology, DAL improves domain alignment using a custom loss function that enhances both accuracy and training efficiency. Our method outperforms ViT and CNN-based baselines on FHIST and retina datasets, demonstrating strong generalization and robustness.
|
Website template borrowed from here.
|
|