Ad Auctions for LLMs via Retrieval Augmented Generation
Hajiaghayi, Lahaie, Lubin, Shin · 2024 · arXiv:2406.09459
RAG-based ad insertion into LLM responses. Advertisers bid on relevance to user queries. The auction selects which ad context to inject into the retrieval step. The LLM generates a response that naturally incorporates the winning ad. This is the competition for embedding-space auctions.
The RAG ad pipeline
User sends a query. The system retrieves relevant documents AND ad candidates from an embedding index. An auction selects the winning ad. The ad context is injected into the LLM prompt alongside organic results. The LLM generates a unified response. The alternative architecture is a
dedicated vector-space ad server that handles the auction before the retrieval step, keeping the ad selection independent of the LLM.
The design space
Key questions: How to price (VCG on the embedding scores? GSP?). How to ensure ad quality (reserve on relevance?). How to prevent the LLM from distorting the ad. The paper proposes using the retrieval relevance score as the quality signal, combined with a second-price payment rule. This is the RAG approach to what
the last ad layer calls the serving problem: where in the stack does the ad get inserted, and who controls the context window?
Neighbors
june.kim/vector-space — the blog series this paper competes with- ๐ Cryptography — crypto-12: TEE enclaves as alternative architecture for trusted ad serving
- ๐ Aurenhammer 1987 — geometric allocation via power diagrams
- ๐ GSP 2007 — the pricing rule this paper adapts