BOM SuiteJune 7, 20267 min read

What Is an AIBOM? The AI Bill of Materials, Explained

What an AI Bill of Materials is, what it inventories, how to generate one, and why the EU AI Act is about to make it mandatory.

O3 Security Team

Research & Engineering

Key takeaways

An AIBOM is a machine-readable inventory of every model, dataset, and component inside an AI system — an SBOM extended for AI.
Most AI risk lives where a normal SBOM is blind: the model weights, the training data, and the model's lineage.
Two standards cover it: CycloneDX ML-BOM (OWASP) for CI/CD, and the SPDX 3.0 AI Profile (Linux Foundation) for regulatory filings.
The OWASP AIBOM Generator builds a CycloneDX AIBOM from a Hugging Face model in seconds, and scores what's missing.
The EU AI Act is the forcing function: GPAI transparency obligations began in August 2025, and Annex IV is what auditors ask for.

Your team just shipped a feature built on a fine-tuned open-source model pulled from Hugging Face. Quick question: what was that model fine-tuned from? What data trained the base? What license governs the weights — and can you legally use it in a commercial product? For most teams shipping AI right now, the honest answer to all three is a shrug.

An AI Bill of Materials is what turns that shrug into a document. An AIBOM (AI Bill of Materials) is a machine-readable inventory of every component that makes up an AI system — the models, the datasets, the frameworks, and the relationships between them. If a software bill of materials tells you which packages you ship, an AIBOM tells you what's actually inside the AI.

What is an AIBOM, exactly?

An AIBOM is a structured, machine-readable record of an AI system's makeup. It extends the familiar SBOM with the things that are unique to AI: model weights and their provenance, training and fine-tuning data, hyperparameters, the base model a fine-tune descends from, and the licenses attached to each. A tool produces it by reading model and dataset metadata, and every component it finds becomes an entry you can query, diff, and audit.

One important boundary: an AIBOM documents the structure, provenance, and relationships of an AI system — not the raw weights or the proprietary algorithm itself. It's the manifest, not the model. That's what makes it shareable with a customer, an auditor, or a procurement team without giving away your IP.

Key takeaway

An SBOM answers 'what code did we ship?' An AIBOM answers 'what model, trained on what data, descended from what — and are we allowed to use it?' Same idea, much harder questions.

Why a normal SBOM isn't enough

Here's the trap teams fall into: they run an SBOM on their AI service, see the Python packages and the container image, and assume they're covered. They're not. An SBOM is blind to exactly where AI risk concentrates.

The weights. A model is a binary blob of learned parameters. Where did it come from? Was it tampered with? A poisoned or backdoored model passes every dependency scan you own — because it isn't a dependency, it's data.
The training data. Bias, copyrighted material, PII, and licensing land-mines all live in the dataset, not the code. An SBOM never looks there.
The lineage. Modern models are fine-tunes of fine-tunes. If the base model three hops upstream has a non-commercial license or a known flaw, you inherit it — and an SBOM can't see the chain.

“A backdoored model passes every dependency scan you own — because it isn't a dependency, it's data.”

What goes inside an AIBOM?

Guidance from CISA, NIST, OWASP, and the EU AI Act's Annex IV has converged on a similar picture. Think of it in six buckets:

Models — name, version, architecture, base-model lineage (what it was fine-tuned from), a weights identifier, license, and acceptable-use restrictions.
Datasets — source, collection method, preprocessing steps, train/validation/test splits, license, and known biases or limitations.
Code — the frameworks, libraries, dependencies, and container images around the model (this is the part a normal SBOM already covers).
Hardware — the training and inference requirements (GPUs, accelerators).
Data pipelines — the training, validation, retrieval, and orchestration steps that move data through the system.
Governance — approval history, change log, evaluation results, and compliance attestations.

Note

Notice how much of this is about data and provenance, not code. That's the whole point: AI's attack surface and its compliance surface both live upstream of the application.

AIBOM vs SBOM: how they relate

An AIBOM doesn't replace your SBOM — it wraps around it. The code and container parts of an AI system are still a normal SBOM; the AIBOM adds the model and data layers that the SBOM can't describe. Both are built on the same Bill of Materials standards, so they slot into the same tooling.

	SBOM	AIBOM
Question it answers	What components do I ship?	What model, data, and lineage power this AI?
Core entries	Packages, versions, licenses	Models, datasets, weights provenance, training data
Main risks it surfaces	Vulnerable / outdated dependencies	Model poisoning, data bias, license & copyright exposure
Standards	CycloneDX / SPDX	CycloneDX ML-BOM / SPDX 3.0 AI Profile

SBOM and AIBOM, side by side.

The two standards: CycloneDX ML-BOM and SPDX 3.0

You don't have to invent a format — the same two communities behind the SBOM standards have extended them for AI.

CycloneDX ML-BOM (OWASP)

CycloneDX added machine-learning support in version 1.5 (June 2023) and has refined it since. Its ML-BOM captures models, datasets, and configurations — architecture, training and inference setup, dataset references, performance metrics, and the bias and safety considerations attached to a model. Because it's the same CycloneDX framework (ECMA-424) as the SBOM, it drops into the CI/CD pipelines you already run.

SPDX 3.0 AI Profile (Linux Foundation)

SPDX 3.0, released in April 2024, introduced an AI Profile and a Dataset Profile. The AI Profile describes a model's type, training methods, data handling, explainability, limitations, and even energy consumption; the Dataset Profile covers how data is processed, stored, and managed. SPDX is ISO-standardized, which gives it extra weight in formal, regulator-facing documentation.

	CycloneDX ML-BOM	SPDX 3.0 AI Profile
Steward	OWASP	Linux Foundation
AI support since	v1.5 (June 2023)	SPDX 3.0 (April 2024)
Best for	Internal CI/CD generation, security automation	Vendor filings, regulatory & audit documentation
Edge	Compact, tooling-rich, same as your SBOM	ISO-standardized, formal compliance weight

CycloneDX ML-BOM vs SPDX 3.0 AI Profile.

The pragmatic move that's emerging: generate CycloneDX ML-BOM internally where tooling matters, and require SPDX 3.0 from vendors where regulatory weight matters. Translators like Protobom and BomCTL convert between the two, so picking one doesn't lock you out of the other.

How to generate an AIBOM

As with SBOMs, producing the file is the easy part — the discipline is in doing it on every model you ship and acting on what it shows.

Tool	Steward	What it does
OWASP AIBOM Generator	OWASP	Builds a CycloneDX AIBOM from a Hugging Face model's metadata, and scores which fields are missing
GUAC	OpenSSF	Aggregates supply-chain metadata across sources
Protobom / BomCTL	OpenSSF	CLI tooling that translates between SPDX and CycloneDX in CI/CD

Common AIBOM tooling (June 2026).

If your team builds on Hugging Face models, the OWASP AIBOM Generator — launched at RSAC 2025 — is the fastest way to see a real AIBOM. It reads the model's metadata, emits CycloneDX, and gives you a completeness score so you can see exactly which provenance fields the upstream model never documented. That gap, by the way, is often the most useful output: it tells you what to demand from the vendor.

Generate at build time. Produce the AIBOM in your pipeline whenever you ship or update a model, so the inventory matches what's actually in production.
Capture lineage, not just the top model. Record the base model and the training datasets, not only the final fine-tune — that's where licensing and poisoning risk hides.
Score completeness and chase the gaps. Where the upstream model is silent on provenance, that's a question for procurement, not a field to leave blank.
Feed it into review. Match licenses against your policy, flag non-commercial or unknown-provenance weights, and keep the AIBOM as the evidence an auditor will ask for.

Why now: the EU AI Act

AIBOMs moved from nice-to-have to procurement requirement fast, and the clearest reason is European law. The EU AI Act puts AI transparency on a legal clock.

Transparency and documentation obligations for general-purpose AI models began applying on 2 August 2025, with providers expected to maintain technical documentation, transparency, and risk mitigation. The Act's Article 50 transparency rules — covering AI-generated and manipulated content — apply from 2 August 2026. And the technical documentation auditors will actually ask for is spelled out in the Act's Annex IV, which maps closely to what an AIBOM contains: the model, its data, its training, and its limitations.

By the numbers

An AIBOM is the most efficient way to produce EU AI Act Annex IV documentation — because the inventory you build for security is most of the documentation a regulator wants anyway.

The standards bodies are converging in the same direction: CISA's minimum elements for an SBOM for AI, NIST's AI risk work, ISO/IEC 42001, OWASP, and the Linux Foundation have all landed on a similar field set. When the guidance agrees this much, the requirement tends to follow.

The bottom line

AI is assembled from models and data you didn't build and often can't see inside. An AIBOM is how you get that visibility — the inventory that makes model provenance, data licensing, and AI supply-chain risk something you can actually manage instead of hope about. The standards exist, the tools are free, and the regulatory clock is already running. Generate one for your next model and read what it tells you; the gaps it surfaces are the work.

Frequently asked questions

What is an AIBOM (AI Bill of Materials)?

An AIBOM is a machine-readable inventory of every component in an AI system — its models, datasets, frameworks, and the relationships between them. It extends the SBOM with AI-specific details like model weights provenance, training data, and lineage, so you can track security, licensing, and compliance risk across the AI you ship.

How is an AIBOM different from an SBOM?

An SBOM lists software components like packages and containers; an AIBOM adds the model and data layers an SBOM can't describe — model weights, training datasets, base-model lineage, and AI licenses. Most AI risk (poisoning, bias, copyright) lives in those layers, so an AIBOM wraps around your SBOM rather than replacing it.

What should an AIBOM include?

Six areas, per CISA and EU AI Act guidance: models (version, architecture, lineage, weights ID, license), datasets (source, preprocessing, splits, biases), code and dependencies, hardware requirements, data pipelines, and governance records. An AIBOM documents provenance and structure — not the raw weights or proprietary algorithms themselves.

What standards are used for an AIBOM?

Two main ones. CycloneDX ML-BOM, from OWASP, added machine-learning support in version 1.5 (June 2023) and suits CI/CD generation. SPDX 3.0, from the Linux Foundation (released April 2024), added an AI Profile and Dataset Profile and is ISO-standardized, giving it weight for regulatory filings. Tools translate between the two.

How do I generate an AIBOM?

Use a generator like the OWASP AIBOM Generator, which builds a CycloneDX AIBOM from a Hugging Face model's metadata and scores which fields are missing. Generate it in your CI/CD pipeline on every model release, capture the full lineage and training data, and feed the result into license and security review.

Is an AIBOM required by the EU AI Act?

The EU AI Act doesn't name 'AIBOM,' but it requires the documentation an AIBOM provides. General-purpose AI transparency obligations began applying in August 2025, and the technical documentation in Annex IV — covering the model, its data, training, and limitations — maps closely to AIBOM contents, making it the most efficient way to comply.