Ad Auctions for LLMs via Retrieval Augmented Generation

Hajiaghayi, Lahaie, Lubin, Shin · 2024 · arXiv:2406.09459

RAG-based ad insertion into LLM responses. Advertisers bid on relevance to user queries. The auction selects which ad context to inject into the retrieval step. The LLM generates a response that naturally incorporates the winning ad. This is the competition for embedding-space auctions.

The RAG ad pipeline

User sends a query. The system retrieves relevant documents AND ad candidates from an embedding index. An auction selects the winning ad. The ad context is injected into the LLM prompt alongside organic results. The LLM generates a unified response. The alternative architecture is a dedicated vector-space ad server that handles the auction before the retrieval step, keeping the ad selection independent of the LLM.

Scheme

; RAG ad auction: score = relevance * bid
; Retrieve top-k candidates, auction among them

(define (dot-product a b)
  (if (null? a) 0
      (+ (* (car a) (car b))
         (dot-product (cdr a) (cdr b)))))

(define (rag-ad-auction query-embedding ads)
  ;; Each ad: (name embedding bid)
  ;; Score = relevance (cosine sim approx) * bid
  (let* ((scored (map (lambda (ad)
                        (let ((name (car ad))
                              (emb (cadr ad))
                              (bid (caddr ad)))
                          (list name
                                (dot-product query-embedding emb)
                                bid
                                (* (dot-product query-embedding emb) bid))))
                      ads))
         (ranked (sort scored (lambda (a b) (> (cadddr a) (cadddr b))))))
    ranked))

;; Simulated embeddings (2D for clarity)
(define query '(0.8 0.6))

(define ads
  (list
    (list 'shoe-ad '(0.9 0.5) 5)    ; relevant, medium bid
    (list 'car-ad '(0.1 0.2) 20)    ; irrelevant, high bid
    (list 'sneaker-ad '(0.85 0.55) 8))) ; relevant, high bid

(define results (rag-ad-auction query ads))
(display "Rankings (name, relevance, bid, score):") (newline)
(for-each (lambda (r)
  (display "  ") (display r) (newline))
  results)
; Sneaker-ad wins: high relevance AND high bid

; RAG ad auction: score = relevance * bid
; Retrieve top-k candidates, auction among them

(define (dot-product a b)
  (if (null? a) 0
      (+ (* (car a) (car b))
         (dot-product (cdr a) (cdr b)))))

(define (rag-ad-auction query-embedding ads)
  ;; Each ad: (name embedding bid)
  ;; Score = relevance (cosine sim approx) * bid
  (let* ((scored (map (lambda (ad)
                        (let ((name (car ad))
                              (emb (cadr ad))
                              (bid (caddr ad)))
                          (list name
                                (dot-product query-embedding emb)
                                bid
                                (* (dot-product query-embedding emb) bid))))
                      ads))
         (ranked (sort scored (lambda (a b) (> (cadddr a) (cadddr b))))))
    ranked))

;; Simulated embeddings (2D for clarity)
(define query '(0.8 0.6))

(define ads
  (list
    (list 'shoe-ad '(0.9 0.5) 5)    ; relevant, medium bid
    (list 'car-ad '(0.1 0.2) 20)    ; irrelevant, high bid
    (list 'sneaker-ad '(0.85 0.55) 8))) ; relevant, high bid

(define results (rag-ad-auction query ads))
(display "Rankings (name, relevance, bid, score):") (newline)
(for-each (lambda (r)
  (display "  ") (display r) (newline))
  results)
; Sneaker-ad wins: high relevance AND high bid

The design space

Key questions: How to price (VCG on the embedding scores? GSP?). How to ensure ad quality (reserve on relevance?). How to prevent the LLM from distorting the ad. The paper proposes using the retrieval relevance score as the quality signal, combined with a second-price payment rule. This is the RAG approach to what the last ad layer calls the serving problem: where in the stack does the ad get inserted, and who controls the context window?

Neighbors

june.kim/vector-space — the blog series this paper competes with
🔐 Cryptography — crypto-12: TEE enclaves as alternative architecture for trusted ad serving
💎 Aurenhammer 1987 — geometric allocation via power diagrams
💎 GSP 2007 — the pricing rule this paper adapts

Ready for the real thing? Read the paper.

← Hartline 2023 · 7 of 8 by june.kim