Varun Kotte

Experience

What I have built in production, and the decisions behind it. Figures are relative to the systems I worked on; product scale belongs to the product, not to me.

Adobe — Staff Machine Learning Engineer

I own the serving and reliability layers of the internal LLM-serving and retrieval platform (ILUP) that backs AI and search across six Adobe products (Acrobat, Express, Lightroom, Photoshop, Stock, and Creative Cloud Home). Seven product teams build on it. Over the last few years I've shipped search and question answering into several of them — RAG-backed help answering in Document Cloud, entity-aware query suggestions across search surfaces, and the named-entity work behind Lightroom semantic search — and I set the platform's reliability model and mentor engineers across the team.
6 products on the platform · 7 teams building on it · engineers mentored across regions

LLM-serving & retrieval platform (ILUP)

Problem
Several product teams were each rebuilding the same LLM-serving and retrieval stack, and experimental GenAI traffic was destabilizing the critical search path.
Decision
I designed a shared, pluggable platform and led its adoption, and turned prompt onboarding into a self-service path so a team can go from nothing to serving in minutes instead of weeks. I isolated critical search traffic from experimental workloads behind per-experience latency budgets (~700 ms). For serving throughput the main lever was continuous batching in the vLLM layer, with quantization and small-and-large model cascading to hold cost and latency. I also consolidated five per-team LLM integrations into a single multi-model abstraction (supporting models like Mistral and Gemma), so onboarding a new model became a configuration change rather than a rebuild — cutting integration from weeks to days.
Result
Serving throughput rose ~5× per pod (~20 to ~100 requests/sec); Docker build fell from ~40 to ~10 minutes and cluster bootstrap from ~30 to ~10–15 minutes; client teams self-onboard in minutes; and customer-impacting incidents fell year over year. It now backs AI across six Adobe products.
Tradeoff
Standardizing cost teams some bespoke optimizations; the pluggable design let them keep the few that actually mattered.

AutoPrompt

Problem
Every team hand-tuned prompts from a blank page, which was slow and inconsistent across the platform.
Decision
I built AutoPrompt, which generates a working prompt baseline directly from a team's own evaluation data, so an engineer starts from something that already scores rather than from scratch.
Result
Baselines land around ~70% F1 on a team's own evaluation set, and prompt-engineering setup dropped from days to hours for any engineer on the platform.
Tradeoff
A generated baseline still needs task-specific tuning; the win is the starting point, not the finish line.

Question answering & search across products

Problem
The platform had to serve question answering and search for products beyond Lightroom — each with its own query shapes and iteration cadence.
Decision
I integrated the HelpX RAG engine into production for help-content question answering, built underspecified-query resolution (dates, people) and a Jenkins-automated evaluation harness for Document Cloud so the team could iterate weekly, and shipped an entity-aware query-suggestions capability used across search surfaces.
Result
Document Cloud moved to weekly model-iteration cycles, and RAG-backed help answering and query suggestions shipped to production on the shared platform — the same infrastructure that later carried Lightroom.
Tradeoff
Serving several products from one platform meant resisting product-specific forks; I kept it pluggable so a team could specialize only where it genuinely needed to.

Lightroom semantic search

Problem
Lightroom's search under-served natural, intent-based queries ("photos of my dog at the beach") at production scale.
Decision
I owned the named-entity recognition and entity-resolution layer. Rather than a heavier retrieval rewrite, I trained NER on production signals and used a configuration-driven resolver so new entity types could be added without code changes, which kept us inside the latency budget.
Result
At general availability, on intent-based queries the feature's click-through rose from ~1.8% to ~12.5% (roughly 7×), and the null-result rate fell from ~25.9% to ~12.2%.
Tradeoff
The approach covered fewer query shapes at launch; I expanded coverage by configuration over time rather than by re-training.
Leadership. I mentor engineers across regions to independently own platform components, grew the platform's contributor base from a handful of experts to several independent engineers, and have covered a teammate's extended absence without dropping delivery.

Earlier