6 products on the platform · 7 teams building on it · 8+ engineers mentored to independent ownership
Experience
What I have built in production, and the decisions behind it. Figures are relative to the systems I worked on; product scale belongs to the product, not to me.
Adobe — Staff Machine Learning Engineer
LLM-serving & retrieval platform (ILUP)
- Problem
- Several product teams were each rebuilding the same LLM-serving and retrieval stack, and experimental GenAI traffic was destabilizing the critical search path.
- Decision
- I designed a shared, pluggable platform and led its adoption, and turned prompt onboarding into a self-service path so a team can go from nothing to serving in minutes instead of weeks. I isolated critical search traffic from experimental workloads behind per-experience latency budgets (~700 ms). For serving throughput the main lever was continuous batching in the vLLM layer, with quantization and small-and-large model cascading to hold cost and latency. I also consolidated five per-team LLM integrations into a single multi-model abstraction (supporting models like Mistral and Gemma), so onboarding a new model became a configuration change rather than a rebuild — cutting integration from weeks to days.
- Result
- Serving throughput rose ~5× per pod (~20 to ~100 requests/sec); Docker build fell from ~40 to ~10 minutes and cluster bootstrap from ~30 to ~10–15 minutes; client teams self-onboard in minutes; and customer-impacting incidents fell year over year. It now backs AI across six Adobe products.
- Tradeoff
- Standardizing meant some teams lost hand-tuned optimizations. I kept the platform pluggable exactly at the layers where a team's specialization was load-bearing and standardized everywhere else, so consolidation never regressed the paths that were already fast.
AutoPrompt
- Problem
- Every team hand-tuned prompts from a blank page, which was slow and inconsistent across the platform.
- Decision
- I built AutoPrompt, which generates a working prompt baseline directly from a team's own evaluation data, so an engineer starts from something that already scores rather than from scratch.
- Result
- Baselines land around ~70% F1 on a team's own evaluation set, and prompt-engineering setup dropped from days to hours for any engineer on the platform.
- Tradeoff
- A generated baseline still needs task-specific tuning, but because it is seeded from the team's own evaluation set its failure modes are theirs to fix rather than a generic template's.
Question answering & search across products
- Problem
- The platform had to serve question answering and search for products beyond Lightroom — each with its own query shapes and iteration cadence.
- Decision
- I integrated the HelpX RAG engine into production for help-content question answering, built underspecified-query resolution (dates, people) and a Jenkins-automated evaluation harness for Document Cloud so the team could iterate weekly, and shipped an entity-aware query-suggestions capability used across search surfaces.
- Result
- Document Cloud moved to weekly model-iteration cycles, and RAG-backed help answering and query suggestions shipped to production on the shared platform — the same infrastructure that later carried Lightroom.
- Tradeoff
- The risk was per-product query logic leaking into shared infrastructure. I pushed product-specific resolution to the edges and kept the evaluation harness shared, so Document Cloud could iterate weekly without imposing that cadence on every other product.
Lightroom semantic search
- Problem
- Lightroom's search under-served natural, intent-based queries ("photos of my dog at the beach") at production scale.
- Decision
- I owned the named-entity recognition and entity-resolution layer. Rather than a heavier retrieval rewrite, I trained NER on production signals and used a configuration-driven resolver so new entity types could be added without code changes, which kept us inside the latency budget.
- Result
- At general availability, on intent-based queries the feature's click-through rose from ~1.8% to ~12.5% (roughly 7×), and the null-result rate fell from ~25.9% to ~12.2%.
- Tradeoff
- I shipped narrower query coverage rather than delay on a full retrieval rewrite. The config-driven resolver made new entity types additions rather than model retrains, so coverage grew without re-opening the latency budget I had already won.
Earlier
- Cisco — engineering on WebEx, its video-collaboration platform.
- T-Mobile — platform engineering on a nationwide consumer app.