Skip to content

Scoring Formula — Beta-Binomial Posterior

The substantive score is a Beta-Binomial posterior over all evidence (VOIs) for an entity-capability pair. This page documents the math conceptually, from prior specification through credible-interval extraction.


Why Beta-Binomial

The substantive question is binary: given a randomly drawn task in capability class C, would this entity perform it competently? The natural model for a latent binary probability is a Beta distribution on [0, 1]. Each piece of evidence contributes pseudo-observations (successes and failures) toward that probability.

Beta-Binomial conjugacy means the posterior after updating on evidence is also a Beta distribution — no approximation required. Three properties make this the correct model for Bukti's use case over a noisy-OR aggregation:

  1. Negative evidence is natively expressible. A contradicting VOI adds to the failure count; a revoked credential adds a larger penalty. A monotonic-in-evidence formula cannot do this.
  2. Credible intervals come for free from the closed-form Beta CDF inverse.
  3. The posterior (α, β) parameters can be stored and updated incrementally as new VOIs arrive.

Step 1 — Prior

The prior is Beta(α₀, β₀), configured per capability cluster. Most clusters use a weakly informative prior encoding "capability is unproven absent evidence." Foundational-quantitative capabilities use a slightly success-skewed prior reflecting higher population base rates (most professionals have functional mathematics).

Prior parameters are tunable, with values held in private config until calibration data justifies their public release. They are theory-informed defaults today, not fitted to outcome data.


Step 2 — Per-VOI pseudo-counts

For each VOI of evidence type t, in capability cluster c, observed age_months ago:

a_i = w_{t,c} × decay_c(age_i) × extraction_confidence_i b_i = w_{t,c} × decay_c(age_i) × (1 - extraction_confidence_i)

Where:

  • w_{t,c} — the operational validity weight for evidence type t in cluster c. The categories and the direction of weights are described in evidence-weights.md; specific values are tunable parameters held in private config.
  • decay_c(age) = 0.5 ^ (age_months / half_life_c) — exponential half-life decay. No floor. See decay-and-half-lives.md.
  • extraction_confidence — a [0, 1] valence reflecting how success-like vs. failure-like this VOI is.

Step 3 — Cohort aggregation

VOIs that share an error mode are grouped into cohorts before their pseudo-counts are summed. Within each cohort, the contribution is a max-plus-bonus aggregation: the strongest single VOI provides the floor, and additional in-cohort VOIs add a sub-linear bonus rather than independent pseudo-observations.

This rewards genuine cross-context evidence (each independent cohort contributes fully) while punishing same-context farming (many similar submissions in one short window collapse to a single cohort contributing one max-value plus a small bonus, not many independent contributions).

See cohort-independence.md for cohort grouping rules.


Step 4 — Contradiction penalty

For each VOI flagged as contradicted, an additional penalty is added to the β accumulator. The penalty is multiplicatively larger for revocations (the issuer themselves asserting the claim is invalid) than for ordinary factual or temporal contradictions. The exact multipliers are tunable parameters held in private config.

This is the correct behavior: a revoked credential is the strongest possible negative signal and must outweigh ordinary positive evidence at parity.


Step 5 — Posterior parameters

α = α₀ + Σ a_i (over cohorts) β = β₀ + Σ b_i + (contradiction penalties)


Step 6 — Reported quantities

The reported substantive grade includes a posterior median, 80% and 95% credible intervals, and an effective sample size n_effective — the total pseudo-observation mass added by evidence, above the prior. An entity with only a prior has n_effective = 0.

A calibrated flag on the response indicates whether these quantities have been validated against observed outcomes. See calibration-status.md.