Varun Kotte

Experience

What I have built in production, and the decisions behind it. Figures are relative to the systems I worked on; product scale belongs to the product, not to me.

Adobe — Staff Machine Learning Engineer

I own the serving and reliability layers of the internal LLM-serving and retrieval platform (ILUP) that backs AI and search across six Adobe products (Acrobat, Express, Lightroom, Photoshop, Stock, and Creative Cloud Home). Seven product teams build on it. Over the last few years I've shipped search and question answering into several of them — RAG-backed help answering in Document Cloud, entity-aware query suggestions across search surfaces, and the named-entity work behind Lightroom semantic search — and I set the platform's reliability model.
6 products on the platform · 7 teams building on it · 8+ engineers mentored to independent ownership

LLM-serving & retrieval platform (ILUP)

Problem
Several product teams were each rebuilding the same LLM-serving and retrieval stack, and experimental GenAI traffic was destabilizing the critical search path.
Decision
I designed a shared, pluggable platform and led its adoption, and turned prompt onboarding into a self-service path so a team can go from nothing to serving in minutes instead of weeks. I isolated critical search traffic from experimental workloads behind per-experience latency budgets (~700 ms). For serving throughput the main lever was continuous batching in the vLLM layer, with quantization and small-and-large model cascading to hold cost and latency. I also consolidated five per-team LLM integrations into a single multi-model abstraction (supporting models like Mistral and Gemma), so onboarding a new model became a configuration change rather than a rebuild — cutting integration from weeks to days.
Result
Serving throughput rose ~5× per pod (~20 to ~100 requests/sec); Docker build fell from ~40 to ~10 minutes and cluster bootstrap from ~30 to ~10–15 minutes; client teams self-onboard in minutes; and customer-impacting incidents fell year over year. It now backs AI across six Adobe products.
Tradeoff
Standardizing meant some teams lost hand-tuned optimizations. I kept the platform pluggable exactly at the layers where a team's specialization was load-bearing and standardized everywhere else, so consolidation never regressed the paths that were already fast.

AutoPrompt

Problem
Every team hand-tuned prompts from a blank page, which was slow and inconsistent across the platform.
Decision
I built AutoPrompt, which generates a working prompt baseline directly from a team's own evaluation data, so an engineer starts from something that already scores rather than from scratch.
Result
Baselines land around ~70% F1 on a team's own evaluation set, and prompt-engineering setup dropped from days to hours for any engineer on the platform.
Tradeoff
A generated baseline still needs task-specific tuning, but because it is seeded from the team's own evaluation set its failure modes are theirs to fix rather than a generic template's.

Question answering & search across products

Problem
The platform had to serve question answering and search for products beyond Lightroom — each with its own query shapes and iteration cadence.
Decision
I integrated the HelpX RAG engine into production for help-content question answering, built underspecified-query resolution (dates, people) and a Jenkins-automated evaluation harness for Document Cloud so the team could iterate weekly, and shipped an entity-aware query-suggestions capability used across search surfaces.
Result
Document Cloud moved to weekly model-iteration cycles, and RAG-backed help answering and query suggestions shipped to production on the shared platform — the same infrastructure that later carried Lightroom.
Tradeoff
The risk was per-product query logic leaking into shared infrastructure. I pushed product-specific resolution to the edges and kept the evaluation harness shared, so Document Cloud could iterate weekly without imposing that cadence on every other product.

Lightroom semantic search

Problem
Lightroom's search under-served natural, intent-based queries ("photos of my dog at the beach") at production scale.
Decision
I owned the named-entity recognition and entity-resolution layer. Rather than a heavier retrieval rewrite, I trained NER on production signals and used a configuration-driven resolver so new entity types could be added without code changes, which kept us inside the latency budget.
Result
At general availability, on intent-based queries the feature's click-through rose from ~1.8% to ~12.5% (roughly 7×), and the null-result rate fell from ~25.9% to ~12.2%.
Tradeoff
I shipped narrower query coverage rather than delay on a full retrieval rewrite. The config-driven resolver made new entity types additions rather than model retrains, so coverage grew without re-opening the latency budget I had already won.
Leadership. I mentor engineers across regions to own platform components end-to-end, and grew the contributor base from a handful of experts to a team that ships without me in the loop — turning a specialist-gated process into one partner teams run themselves.

Earlier