← back to Auction Theory

Ad Auctions for LLMs via Retrieval Augmented Generation

Hajiaghayi, Lahaie, Lubin, Shin · 2024 · arXiv:2406.09459

RAG-based ad insertion into LLM responses. Advertisers bid on relevance to user queries. The auction selects which ad context to inject into the retrieval step. The LLM generates a response that naturally incorporates the winning ad. This is the competition for embedding-space auctions.

The RAG ad pipeline

User sends a query. The system retrieves relevant documents AND ad candidates from an embedding index. An auction selects the winning ad. The ad context is injected into the LLM prompt alongside organic results. The LLM generates a unified response. The alternative architecture is a jkdedicated vector-space ad server that handles the auction before the retrieval step, keeping the ad selection independent of the LLM.

Query Retrieve embed index organic Auction select ad LLM generate response ad injected into context RAG pipeline with auction-selected ad context
Scheme

The design space

Key questions: How to price (VCG on the embedding scores? GSP?). How to ensure ad quality (reserve on relevance?). How to prevent the LLM from distorting the ad. The paper proposes using the retrieval relevance score as the quality signal, combined with a second-price payment rule. This is the RAG approach to what jkthe last ad layer calls the serving problem: where in the stack does the ad get inserted, and who controls the context window?

Neighbors
Ready for the real thing? Read the paper.