Skip to content

Capability Taxonomy Structure

The capability taxonomy is Bukti's operational ontology: the set of concepts that evidence (VOIs) is indexed against, that search queries decompose into, and that MCP tool responses return. This page documents the node schema, the layered architecture, and the governance rules.


Why no single public taxonomy works

Bukti needs three properties from its taxonomy:

  1. Stable, public capability IDs that survive Bukti's lifetime and are resolvable by downstream agents and regulators.
  2. Hierarchical structure so VOIs about "PyTorch" roll up to "deep learning" which rolls up to "applied ML."
  3. Crosswalks so a credential issued in ESCO terms (EU Open Badges), a job posting in O*NET terms (US recruiter), and a self-asserted skill in Lightcast terms (LinkedIn import) all resolve to the same Bukti capability node.

No single taxonomy provides all three. ONET is too coarse for VOI-level granularity. ESCO is multilingual and hierarchical but misses fast-moving technology skills. Lightcast captures emerging skills but has opaque governance and US-market bias. Bukti therefore uses a layered ontology anchored in ONET and ESCO, following the same pattern Schema.org used (federated extensions over a public core).


Four-layer architecture

Layer 1 — Open anchors (immutable references)

ONET Content Model elements (v28.3) and ESCO concept URIs (v1.2) are canonical external IDs. They appear on Bukti capability nodes as onet_refs[] and esco_refs[]. These references are stable: ONET 28.x and ESCO 1.x are maintained by their respective standards bodies and change slowly. Both are held as local snapshots rather than live API calls, so scoring is deterministic and does not depend on third-party availability.

Layer 2 — Bukti capability nodes (the operational unit)

Each Bukti capability is a graph node with the full schema below. Layer 2 is where VOIs are indexed, where scoring math resolves, and where search results are keyed. Bukti nodes are finer-grained than O*NET and roughly match ESCO depth, but extend further into emerging skills (MCP server development, RAG pipelines, agent orchestration) that neither upstream taxonomy covers.

Layer 3 — Granularity extensions

Emerging capabilities that do not have a close match in ONET or ESCO are added as Bukti-only nodes, annotated with the closest upstream ancestor for lineage purposes. A capability with no ESCO or ONET crosswalk still has a parent Bukti node in the DAG, which in turn has upstream crosswalk pointers.

Layer 4 — Versioning

Every capability node carries since_version and optionally deprecated_in_version. Changes to parent or cluster relationships require a major version bump (they change scoring math). Adding new capabilities requires a minor version bump.


Capability node schema

Field Type Notes
id string Bukti namespace; stable; never reused after deprecation
label string Human-readable short name
description string One-paragraph capability description
cluster string Coarse grouping ID; drives weight and half-life lookups
parent_capability string[] Multiple parents allowed (DAG, not tree); supports ESCO-style multiple inheritance
onet_refs string[] O*NET 28.3 element IDs (e.g., 2.B.3.e)
esco_refs string[] ESCO 1.2 concept URIs
lightcast_refs string[] Lightcast Open Skills IDs (optional enrichment; not part of open core)
wikidata_qid string Wikidata Q-identifier for entity reconciliation, when applicable
since_version string Taxonomy version when this node was added
deprecated_in_version string Taxonomy version when this node was deprecated; null if active
permanence_class enum foundational, conceptual, methodology, tooling, framework; drives decay overrides

DAG vs. tree

ESCO allows multiple inheritance ("PyTorch" is both a "deep learning framework" and a "Python library"). Bukti uses a DAG structure with multiple parents permitted. The operational cost — one extra graph traversal per parent in cohort detection — is small. The alternative (forcing a tree) would create duplicated capability nodes with permanent scoring ambiguity.


Snapshot discipline

O*NET 28.3 and ESCO 1.2 are stored as local snapshots rather than live API references. This means:

  • Scoring is deterministic regardless of upstream changes.
  • A new upstream release requires an explicit migration: review changed concepts, update crosswalks, bump the Bukti taxonomy version.

Growth-tier nodes

When the embedding similarity between an extracted capability and the nearest seed node is below the auto-map threshold, the system creates a growth node. Growth nodes do not have O*NET or ESCO crosswalks initially, but they carry a parent_capability link to the nearest confirmed ancestor. High-frequency growth nodes can be promoted to full Bukti capability nodes through a deliberate review.