Capability Taxonomy Structure¶
The capability taxonomy is Bukti's operational ontology: the set of concepts that evidence (VOIs) is indexed against, that search queries decompose into, and that MCP tool responses return. This page documents the node schema, the layered architecture, and the governance rules.
Why no single public taxonomy works¶
Bukti needs three properties from its taxonomy:
- Stable, public capability IDs that survive Bukti's lifetime and are resolvable by downstream agents and regulators.
- Hierarchical structure so VOIs about "PyTorch" roll up to "deep learning" which rolls up to "applied ML."
- Crosswalks so a credential issued in ESCO terms (EU Open Badges), a job posting in O*NET terms (US recruiter), and a self-asserted skill in Lightcast terms (LinkedIn import) all resolve to the same Bukti capability node.
No single taxonomy provides all three. ONET is too coarse for VOI-level granularity. ESCO is multilingual and hierarchical but misses fast-moving technology skills. Lightcast captures emerging skills but has opaque governance and US-market bias. Bukti therefore uses a layered ontology anchored in ONET and ESCO, following the same pattern Schema.org used (federated extensions over a public core).
Four-layer architecture¶
Layer 1 — Open anchors (immutable references)¶
ONET Content Model elements (v28.3) and ESCO concept URIs (v1.2) are canonical external IDs. They appear on Bukti capability nodes as onet_refs[] and esco_refs[]. These references are stable: ONET 28.x and ESCO 1.x are maintained by their respective standards bodies and change slowly. Both are held as local snapshots rather than live API calls, so scoring is deterministic and does not depend on third-party availability.
Layer 2 — Bukti capability nodes (the operational unit)¶
Each Bukti capability is a graph node with the full schema below. Layer 2 is where VOIs are indexed, where scoring math resolves, and where search results are keyed. Bukti nodes are finer-grained than O*NET and roughly match ESCO depth, but extend further into emerging skills (MCP server development, RAG pipelines, agent orchestration) that neither upstream taxonomy covers.
Layer 3 — Granularity extensions¶
Emerging capabilities that do not have a close match in ONET or ESCO are added as Bukti-only nodes, annotated with the closest upstream ancestor for lineage purposes. A capability with no ESCO or ONET crosswalk still has a parent Bukti node in the DAG, which in turn has upstream crosswalk pointers.
Layer 4 — Versioning¶
Every capability node carries since_version and optionally deprecated_in_version. Changes to parent or cluster relationships require a major version bump (they change scoring math). Adding new capabilities requires a minor version bump.
Capability node schema¶
| Field | Type | Notes |
|---|---|---|
id |
string | Bukti namespace; stable; never reused after deprecation |
label |
string | Human-readable short name |
description |
string | One-paragraph capability description |
cluster |
string | Coarse grouping ID; drives weight and half-life lookups |
parent_capability |
string[] | Multiple parents allowed (DAG, not tree); supports ESCO-style multiple inheritance |
onet_refs |
string[] | O*NET 28.3 element IDs (e.g., 2.B.3.e) |
esco_refs |
string[] | ESCO 1.2 concept URIs |
lightcast_refs |
string[] | Lightcast Open Skills IDs (optional enrichment; not part of open core) |
wikidata_qid |
string | Wikidata Q-identifier for entity reconciliation, when applicable |
since_version |
string | Taxonomy version when this node was added |
deprecated_in_version |
string | Taxonomy version when this node was deprecated; null if active |
permanence_class |
enum | foundational, conceptual, methodology, tooling, framework; drives decay overrides |
DAG vs. tree¶
ESCO allows multiple inheritance ("PyTorch" is both a "deep learning framework" and a "Python library"). Bukti uses a DAG structure with multiple parents permitted. The operational cost — one extra graph traversal per parent in cohort detection — is small. The alternative (forcing a tree) would create duplicated capability nodes with permanent scoring ambiguity.
Snapshot discipline¶
O*NET 28.3 and ESCO 1.2 are stored as local snapshots rather than live API references. This means:
- Scoring is deterministic regardless of upstream changes.
- A new upstream release requires an explicit migration: review changed concepts, update crosswalks, bump the Bukti taxonomy version.
Growth-tier nodes¶
When the embedding similarity between an extracted capability and the nearest seed node is below the auto-map threshold, the system creates a growth node. Growth nodes do not have O*NET or ESCO crosswalks initially, but they carry a parent_capability link to the nearest confirmed ancestor. High-frequency growth nodes can be promoted to full Bukti capability nodes through a deliberate review.
Related pages¶
- onet-crosswalks.md — O*NET 28.3 cluster anchors
- esco-crosswalks.md — ESCO 1.2 cluster anchors
- seed-capabilities.md — the seed set