Evidence Type Weights¶

This page documents the categories of evidence and the direction of their weights in Bukti's substantive scoring. Specific numeric weights are tunable parameters held in private config until calibration data from the first pilot cohort moves them to public disclosure.

What these weights represent¶

In the Beta-Binomial formula, w_{t,c} is the operational validity of evidence type t in capability cluster c. It answers: "how many effective task-observations does one VOI of this type substitute for?" Higher weight means the evidence is a stronger signal of genuine capability. These weights are grounded in the predictive-validity literature for employment outcomes, primarily Sackett, Zhang, Berry & Lievens (2022) in Personnel Psychology (corrected operational validity estimates) and the 2024 follow-up by Lievens et al.

The weights are theory-informed defaults today, not calibrated to Bukti's specific outcome data. When pilot data yields (predicted, observed) outcome pairs, the weights will be recalibrated and disclosed.

Evidence type categories — relative ranking¶

Categories are listed strongest-first. Inside each category, weight scales with provenance strength (signed vs. unsigned, issuer rigor, attestor identity grade).

task_outcome — Verified, measurable outcomes from real deployments. Strongest single category. Maps to the work-sample literature (Sackett 2022 Table 1).
behavioral_artifact — Direct demonstration of capability through code, prototypes, or other behavioral output. Strength scales with cryptographic signing (Sigstore / GPG-signed > unsigned).
credential_badge — Issuer-attested credentials. Weight contingent on issuer rigor and Open Badges 3.0 signature verification.
publication_artifact — Peer-reviewed publication. Domain-dependent; particularly strong for research and pedagogy clusters.
peer_attestation — Structured 360-feedback peer ratings outweigh unstructured peer endorsements (Sackett 2022 / Lievens 2024). Weight scales with attestor identity grade.
contribution_artifact — Behavioral artifact with weaker provenance than signed commits.
indirect_attestation — Mentions of an entity by third parties. Deliberately low weight: mentions are weak signal and easily fabricated.
self_reported / self_authored — Capped low so that no volume of self-reports can reach the Attested tier alone.

Cluster-specific overrides¶

Some clusters carry higher default weights for specific evidence types. The literature finds that validity coefficients have high standard deviations across contexts (Sackett 2022 SD column): for software engineering, verified task outcomes (deployed, measurable systems) are the strongest possible signal, justifying a higher ceiling. For educational research, peer-reviewed publication is a stronger-than-default signal.

The category structure is public; the specific cluster-level numeric overrides are tunable.

What "calibrated" means here¶

The system reports calibrated: false until weights are validated. Calibration requires:

A set of (entity, capability, predicted, observed) tuples from a pilot cohort.
Brier-score computation per evidence type, per cluster.
Isotonic regression or Platt scaling to correct systematic miscalibration.
A reliability diagram showing predicted probability vs. empirical frequency in each decile.

See calibration-status.md.

scoring-formula.md — how weights enter the posterior formula
calibration-status.md — current state and timeline
two-axis-model.md — how the posterior feeds the tier matrix

Evidence Type Weights¶

What these weights represent¶

Evidence type categories — relative ranking¶

Cluster-specific overrides¶

What "calibrated" means here¶

Related pages¶