The Price of Relevance

Every ad auction has to decide how much to weight relevance versus price. Weight relevance too little and users see irrelevant ads. Weight it too much and the platform loses revenue. In keyword auctions, this tradeoff is controlled by a single number — the squashing parameter s — and twenty years of research has gone into understanding it.

Embedding auctions have the same tradeoff, but it’s been hiding in the formula. The scoring rule score = log_b(price) - dist²/σ² ranks advertisers by a combination of price and proximity, where dist is the distance to the query, σ is the advertiser’s targeting radius, and b is the log base. That log base is the squashing parameter: s = ln(b). Change b from e to 10 and an advertiser needs 10× the price to overcome one unit of distance penalty instead of 2.72×. Change it to 50 and proximity is everything.

Lahaie and Pennock (2007) formalized the keyword version as score = bid × relevance^s and showed that tuning s matters more than setting reserve prices. The mapping s = ln(b) means every finding about s in keyword auctions transfers directly to embedding auctions. We swept it from s = 0 (pure rank-by-bid) to s = 3.91 (near-Voronoi) across 50 trials.

The Derivation

Why does s = ln(b)? Rewrite the score: log_b(price) = ln(price)/ln(b), so score = ln(price)/ln(b) - dist²/σ². Define quality as exp(-dist²/σ²). Then in log space the score is ln(price)/ln(b) + ln(quality), and ln(b) is the weight on quality relative to price — which is exactly what s does in the keyword formula bid × quality^s.

Log base b1/ln(b)Equivalent sInterpretation
1.00.00Pure rank-by-bid (distance ignored)
1.110.50.10Very weak distance weight
1.52.470.41Moderate — between rank-by-bid and industry default
21.440.69Price has edge over distance
e ≈ 2.721.001.00Rank-by-revenue (industry default)
50.621.61Distance dominates
100.432.30Strong territorial defense
500.263.91Near-Voronoi

At b = 1, distance is completely ignored — the auction ranks purely by price. Higher b means distance matters more. At b = 50, the auction approaches a Voronoi partition where the closest advertiser wins regardless of bid.

What the Log Base Controls

Che (1993) showed that in multidimensional auctions, under-rewarding quality maximizes short-term revenue while over-rewarding it maximizes buyer surplus. For chatbot platforms, higher b is the natural direction — retention depends on answer quality, not ad revenue per query. The question is how much higher.

To overcome one unit of dist²/σ²:

A climbing physical therapist at cosine 0.95 to a climbing query is hard to outbid at b = 10 — a generalist sports physical therapist at cosine 0.80 would need to pay 10× more per unit of distance disadvantage. At b = e, the generalist needs only 2.72×.

The platform has three levers: b (continuous weight on relevance), τ (hard relevance gate — minimum cosine to enter the auction), and λ (drift penalty preventing position gaming). This experiment tests the first two.

Experiment Design

The simulation models a local services market — physical therapists, fitness coaches, wellness professionals. 25 advertisers across 5 clusters in 384-dimensional BGE-small-en-v1.5 embedding space. Each cluster has one generalist and four specialists.

Key assumptions:

Scoring: log_b(price) - dist²/σ², implemented via sigma scaling (σ/√ln(b)) so the core auction library stays untouched. At b = 1, embeddings are removed entirely — the auction ranks by log(price) only. 300 rounds per trial, 50 trials per condition.

Part A sweeps the log base: b ∈ {1.0, 1.1, 1.5, 2, e, 5, 10, 50} with τ=0.3, covering s ∈ [0, 3.91].

Part B sweeps the discovery threshold: τ ∈ {0.0, 0.2, 0.3, 0.5, 0.7} at b=50.

The simulation is open source.

Results

Part A: Log Base Sweep

Price-dominated regime (s < 1):

Metricb=1.0b=1.1b=1.5b=2b=e
Value efficiency0.657***0.664**0.6700.6730.683
Avg surplus/rnd/adv1.90***1.96***2.13***2.27**2.42
Specialist surplus2.38***2.45***2.66***2.83**3.01
Generalist surplus-0.014***-0.012***0.0100.0310.040
Revenue/round90.090.890.689.989.6
Winner cos_sim0.688***0.691*0.6930.696

Quality-dominated regime (s > 1):

Metricb=eb=5b=10b=50
Value efficiency0.6830.6920.7000.737***
Avg surplus/rnd/adv2.422.66**2.84***3.36***
Specialist surplus3.013.31**3.52***4.15***
Generalist surplus0.0400.0860.120**0.202***
Revenue/round89.687.8*86.1***82.9***
Winner cos_sim0.6960.7000.7030.712***

Significance vs b=e baseline (Welch’s t-test): * p<0.05, ** p<0.01, *** p<0.001. No-show rate is 0% and win diversity is flat across all conditions. Winner cos_sim for b=1.0 is undefined (embeddings removed for pure price ranking).

Higher b monotonically improves value efficiency, surplus, and winner relevance at the cost of publisher revenue.

Revenue is flat up to b = e (all within 1.3%, none significant), then drops: 2% at b = 5, 4% at b = 10, 7.5% at b = 50. Specialist surplus grows 75% from b = 1.0 to b = 50 (2.38 → 4.15). Generalists go negative at b ≤ 1.1 — they systematically overpay when distance doesn’t protect specialists’ territories. Win diversity is flat across the range.

The stakeholders want opposite things. Advertisers want lower b — it lets them win impressions outside their niche by outbidding closer competitors. Users want higher b — it means the ad they see is actually relevant to what they asked. The platform sits in the middle. At b = 5, it buys 10% more surplus and 1.3% better value efficiency for a 2% revenue cost — a good trade. At b = 10, double the quality gain for double the revenue cost. The curve is smooth enough that the platform can pick its point.

Part B: Discovery Threshold Sweep

Metricτ=0.0τ=0.2τ=0.3τ=0.5τ=0.7
Value efficiency0.7370.7370.7370.7370.930
Surplus3.363.363.363.361.87
Revenue/round82.982.982.982.9154.3
Winner cos_sim0.7120.7120.7120.7120.797
No-show rate0%0%0%0%22.1%
Win diversity0.2890.2890.2890.2890.492

τ=0.0 through τ=0.5 produce identical results (p=1.000). The value decay function max(0, MaxVal × (cos - threshold)) already acts as a soft relevance gate. Advertisers don’t bid on queries where their expected value falls below 5% of their maximum. The hard threshold has nothing to filter because the soft gate already did the work.

Only τ=0.7 has any effect, and it’s dramatic. Value efficiency jumps to 0.930. Winner cosine similarity climbs from 0.712 to 0.797. But 22.1% of queries get no ad at all — no advertiser clears the bar.

Revenue doubles at τ=0.7 despite fewer competitors. Fewer bidders should mean lower VCG payments, but the surviving bidders have much higher values for their winning queries (cosine 0.797 vs 0.712), so the individual rationality cap — payment ≤ value — binds at a higher level. The auction collects more per impression because each impression is worth more to whoever wins it.

The Tradeoff

b (continuous weight) is the primary lever; τ (hard gate) is redundant until extreme values. This matches Asker and Cantillon’s (2008) theoretical result that scoring auctions dominate mechanisms that combine price-only ranking with minimum quality thresholds. The continuous score subsumes the hard cutoff.

The discovery threshold matters only if the platform is willing to accept 22% no-shows. That’s a product decision, not a scoring decision. For most chatbot deployments, showing no ad is acceptable — it’s a conversation, not a search results page. If the best match is mediocre, say nothing. That argues for high τ as a quality floor. But the scoring formula with high b already achieves most of this effect without the binary cutoff.

In isolation, a revenue-maximizing platform would set b as low as possible. But platforms don’t operate in isolation. A chatbot that shows irrelevant ads loses users to one that shows relevant ads. Gomes (2014) formalized this: in two-sided markets, user participation elasticity pushes the platform’s optimal quality weight above the pure revenue-maximizer’s choice. Competition between platforms pushes b up — the same way competition between advertisers pushes bids up. The floor on b isn’t set by the platform’s own revenue curve, it’s set by the b of the next-best alternative.

Recommended ranges:

What’s Still Open

Should noisy relevance estimates push b lower? Lahaie and McAfee (2011) showed that when quality estimates are noisy, s < 1 can improve welfare by reducing weight on unreliable signals. Cosine similarity in embedding space is a noisy proxy for true relevance — an advertiser at cosine 0.85 isn’t reliably better than one at 0.82. Our simulation assumes perfect cosine-to-value mapping. With noisy estimates, the optimal b might be lower than what the clean tradeoff curve suggests.

Does the optimal b depend on market density? With 25 advertisers across 5 clusters, the tradeoff is clean. With 250 advertisers — or 5 — the shape of the curve might change. More competitors could shift the efficient frontier, making higher b less costly in revenue terms.

How does b interact with σ adaptation? In our simulation, σ values are fixed. In a live market, advertisers learn their optimal σ — and the optimal σ depends on b. At high b, a narrower σ is more valuable because proximity matters more. This creates a feedback loop: b shapes σ, which shapes clearing prices, which shifts the revenue-quality curve.

Is there a dynamic b that adjusts per query? A query with 12 qualified advertisers could use lower b (more price competition). A query with 2 could use higher b (protect the closer match). Variable b would complicate the mechanism’s transparency — the attested auction would need to commit to the adjustment rule, not just the scoring formula.

The log form gives diminishing returns on price. Going from $1 to $10 has the same effect as $10 to $100. No major platform uses explicit log(price) — they use multiplicative quality scores that achieve a similar compression. Whether the log form is an advantage (prevents runaway bidding) or a limitation (suppresses price signal) depends on market structure.

References


Written with Claude Opus 4.6 via Claude Code. I directed the argument and designed the experiments; Claude built the simulation, researched prior art, and drafted prose.

Part of the Vector Space series. june@june.kim