diff --git a/ML-BOM/en/0x20-Design-Model-Component-Metadata.md b/ML-BOM/en/0x20-Design-Model-Component-Metadata.md index f0889d68..d9253345 100644 --- a/ML-BOM/en/0x20-Design-Model-Component-Metadata.md +++ b/ML-BOM/en/0x20-Design-Model-Component-Metadata.md @@ -17,6 +17,7 @@ For convenience, here are links to the specific sections for each of those infor * [Describing models as components](#describing-models-as-components) * [Model repositories as components](#model-repositories-as-components) * [Model identifiers](#model-identifiers) + * [Providing model release notes](#providing-model-release-notes) * [Describing a model repository as a CycloneDX assembly](#describing-a-model-repository-as-a-cyclonedx-assembly) * [Declaring a model's pedigree](#declaring-a-models-pedigree) @@ -58,8 +59,18 @@ The CycloneDX JSON pseudocode below shows how an ML model would be declared as t "bom-ref": "pkg:huggingface/Qwen/Qwen-7B@ef3c5c9", "purl": "pkg:huggingface/Qwen/Qwen-7B@ef3c5c9c57b252f3149c1408daf4d649ec8b6c85", "version": "ef3c5c9c57b252f3149c1408daf4d649ec8b6c85", + "licenses": [ + { + "license": { + "name": "Tongyi Qianwen LICENSE AGREEMENT", + "text": { + "content": "By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, ..." + } + } + } + ] // ... - } + }, // ... } // ... @@ -69,6 +80,7 @@ The CycloneDX JSON pseudocode below shows how an ML model would be declared as t ###### Field discussion * **bom-ref** - Please note the `bom-ref` value includes the first seven characters of the larger hash value from the `purl` component identifier which is sufficient for local identification within the BOM itself. +* **license** - The `licenses` object shown in the example is a "custom" license which, in this case, we chose to provide the unencoded license text. It is preferable, when possible to use an SPDX license identifier and supply it in the `id` field of the `license` (e.g., `"license": { "id": "Apache-2.0" }` ). #### Model repositories as components @@ -166,7 +178,7 @@ If the model being described by an ML-BOM is instead hosted in a GitHub reposito Organizations that produce BOMs for hardware or software components they produce may have multiple domain-specific identifiers for the same component. In these cases, it is best practice to register (reserve) an official namespace for these domains with the [CycloneDX Property Taxonomy](), which is the authoritative source of official namespaces used in CycloneDX `properties`. -###### Example: +###### Example: domain-specific identifiers The following example shows how a registered name for a fictional company, ACME, which registered the namespace `acme`, could provide a property to identify one of its internal ML models. @@ -224,11 +236,47 @@ Each can be specifically identified in a CycloneDX component using a Package URL } ``` +##### Providing model release notes + +It is important to disclose information regarding a model's release. This is accomplished by utilizing the CycloneDX component's `releaseNotes` object and its fields. + +###### Example: release notes + +```json +{ + "$schema": "http://cyclonedx.org/schema/bom-1.7.schema.json", + // ... + "metadata": + { + "component": + { + "type": "machine-learning-model", + "bom-ref": "pkg:huggingface/Qwen/Qwen-7B@ef3c5c9", + // ... + "releaseNotes": [ + { + "type": "major", + "title": "Qwen 7B initial release", + "timestamp": "2023-08-03T15:30:00Z", + "notes": { + { + "locale": "en-US", + "text": "United States (US), English release date." + } + // ... + } + } + ] + }, + // ... + } +} +``` + ###### Field discussion * **type** - the type has the value `machine-learning-model` since the single file contains all the information (e.g., default configuration parameters, references to architectures and tokenizers, prompt template, etc.) needed to run the model in GGUF inference frameworks. - #### Describing a model repository as a CycloneDX assembly CycloneDX allows for declarations of software compositions (e.g., hardware products, software applications, packages, libraries, archives, etc.). @@ -387,7 +435,7 @@ It is important to capture any of these transformations in the model's lineage ( * **ancestors** - `ancestors` entries are themselves CycloneDX `component` objects. It should be noted that these models may have their own ML-BOMs, which can be located via their identifiers (e.g., `purl`) or via `externalReferences` for readers to follow. -##### Declaring known descendents +##### Declaring known descendants If, at the time an ML-BOM is created for a model, its downstream model variants (e.g., finetunings, quantizations, etc., derived from the model) are known, these can also be recorded within the `pedigree` object as `descendants` in a similar manner. diff --git a/ML-BOM/en/0x40-Design-Additional-Model-Information.md b/ML-BOM/en/0x40-Design-Additional-Model-Information.md index 3a1e0a70..a8b18aed 100644 --- a/ML-BOM/en/0x40-Design-Additional-Model-Information.md +++ b/ML-BOM/en/0x40-Design-Additional-Model-Information.md @@ -7,7 +7,9 @@ Currently, the v1.7 CycloneDX specification may not have specific objects or fie For convenience, here are links to the specific sections for some of these acknowledged informational areas: * [Using CycloneDX AI/ML properties](#using-cyclonedx-aiml-properties) + * [Declaring a model's modalities](#declaring-a-models-modalities) * [Annotating a model's supported languages](#annotating-a-models-supported-languages) + * [Providing a model's usage policy](#providing-a-models-usage-policy) * [Providing free-form tags for search](#providing-free-form-tags-for-search) * [Tokenizers and prompt templates](#tokenizers-and-prompt-templates) * [Including manufacturing information for the ML model](#including-manufacturing-information-for-the-ml-model) @@ -20,6 +22,44 @@ For convenience, here are links to the specific sections for some of these ackno This section includes discussion and examples of supported AI/ML-related metadata properties that can be used to classify models in their model card information. This method utilizes reserved [AI/ML property names](https://github.com/CycloneDX/cyclonedx-property-taxonomy/cdx/ai-ml.md) registered under the [CycloneDX Property Taxonomy](https://github.com/CycloneDX/cyclonedx-property-taxonomy). +## Declaring a model's modalities + +Models are trained to support processing and analysis of one or more types types of input data for specific tasks or data modalities. + +* **Property name**: The CycloneDX reserved property taxonomy name to use to annotate a model with its supported modalities is: `cdx:ai-ml:model:modality` + +* **Property value**: The values for this property includes: + + * `text` - Natural Language Processing (NLP) and specializations such as Natural Language Understanding (NLU) for tasks like translation, summarization, conversation, classification and sentiment analysis. + * `code` - Specialized text-based modality used for software engineering and logic. + * `instruct` - Specialized text-based fine-tuned for understanding and executing natural language directives (i.e., instruction following). + * `image` (vision) - Computer vision for object detection, generation, and classification as well as document processing. + * `video` - Video processing tasks to extract structured information, including object detection, action recognition, scene detection, and temporal understanding. + * `audio` - Audio processing tasks such as Automatic Speech Recognition (ASR), Speech-to-Text, music generation, and sound pattern recognition. + * `sensor` (telemetry) - Processes data from specialized sensors or hardware, such as LiDAR for autonomous vehicles or IoT sensor feeds. + * `biometric` - Specialized sensor-based modality used for analyzing biological traits for tasks such as facial recognition, fingerprint scanning, or voice authentication. + * `genomic` (telemetry) - Processes high-dimensional data used in drug discovery and medical research. + * `_undefined:` - `` placeholder, used to provide an arbitrary model modality name. + +###### Example: Tagging a model with its modalities + +```json +"component": +{ + "type": "machine-learning-model", + "bom-ref": "pkg:huggingface/FakeAI/CoderModel", + // ..., + "properties": [ + { + "name": "cdx:ai-ml:model:modality:code" + }, + { + "name": "cdx:ai-ml:model:modality:instruct" + } + ] +} +``` + ## Annotating a model's supported languages Models can be trained in one or more languages (i.e., multilingual models). @@ -81,6 +121,28 @@ This section describes how to "tag" model components with non-standard keywords * **properties** - The tag values shown above might be used to search for models in a catalog that are compatible with the `pytorch` framework and (the Hugging Face) `transformers` library. The `text-to-speech` and `speech-to-speech` tags could identify the model with those input/output capabilities. +## Providing a model's usage policy + +Model usage policies can be provided using `externalReferences` associated with the model's component definition. + +###### Example: Providing a link to a model's usage policy + +```json +"component": { + "type": "machine-learning-model", + "bom-ref": "pkg:huggingface/Qwen/Qwen-7B@ef3c5c9", + // ..., + "externalReferences": [ + { + "url": "https://qwen.ai/usagepolicy", + "type": "documentation", + "comment": "Usage policy" + } + ], + // ... +} +``` + ## Tokenizers and prompt templates Tokenizers provide the preprocessing (encoding) and postprocessing (decoding) functions to convert input and output information to tokens that the associated ML model was trained on and used for inference. diff --git a/ML-BOM/en/0x91-Appendix-B_References.md b/ML-BOM/en/0x91-Appendix-B_References.md index 9046f164..87c986c8 100644 --- a/ML-BOM/en/0x91-Appendix-B_References.md +++ b/ML-BOM/en/0x91-Appendix-B_References.md @@ -27,7 +27,7 @@ This appendix includes references to resources, standards, technologies, and mod * [ECMA-428 Common Lifecycle Enumeration (CLE) specification](https://ecma-international.org/publications-and-standards/standards/ecma-428/) - The CLE provides a standardized format for communicating software component lifecycle events in a machine-readable format. * [European Union's Cyber Resilience Act (EU CRA)](https://www.european-cyber-resilience-act.com/) * [Cyber Resilience Act (CRA)](https://www.european-cyber-resilience-act.com/Cyber_Resilience_Act_Articles.html) - "The Final Text" -* [EU’s AI Act](https://artificialintelligenceact.eu/) ([text](https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng)) - The European Union's comprehensive legal framework for artificial intelligence, designed to ensure that AI systems used in the European Union are safe, ethical, and trustworthy. +* [EU AI Act](https://artificialintelligenceact.eu/) ([index](https://artificialintelligenceact.eu/ai-act-explorer/)) - The European Union's comprehensive legal framework for artificial intelligence, designed to ensure that AI systems used in the European Union are safe, ethical, and trustworthy. * [Article 53: Obligations for Providers of General-Purpose AI Models](https://artificialintelligenceact.eu/article/53/) * [Annex XI: Technical Documentation Referred to in Article 53(1), Point (a) – Technical Documentation for Providers of General-Purpose AI Models](https://artificialintelligenceact.eu/annex/11/) * [Explanatory Notice and Template for the Public Summary of Training Content for general-purpose AI models](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models) diff --git a/ML-BOM/en/0x92-Appendix-EU-AI-Act-mappings.md b/ML-BOM/en/0x92-Appendix-EU-AI-Act-mappings.md new file mode 100644 index 00000000..3bbc5554 --- /dev/null +++ b/ML-BOM/en/0x92-Appendix-EU-AI-Act-mappings.md @@ -0,0 +1,161 @@ +# Appendix A: ML-BOM mappings to the European Union's Artificial Intelligence Act (EU AI Act) + +This appendix provides a mapping between the [EU’s AI Act](https://artificialintelligenceact.eu/) prose requirements, as well as the more prescriptive [Explanatory Notice and Template for the Public Summary of Training Content for general-purpose AI models](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models), and how they are shown to be fulfilled using CycloneDX ML-BOM as documented in specific sections of this guide. + +These mappings include: + +* [Article 53: Obligations for Providers of General-Purpose AI Models](#article-53-obligations-for-providers-of-general-purpose-ai-models) +* [ANNEX XI: Technical Documentation Referred to in Article 53](#annex-xi-mappings) +* [Annex: Template for the Public Summary of Training Content for General-Purpose AI models](#annex-template-for-the-public-summary-of-training-content-for-general-purpose-ai-models-required-by-article-53) + +--- + +### Summary of the EU AI Act + +The AI Act requires model providers to report extensive information on the models they produce to be used for risk assessment and compliance purposes. This act, effectively endorses moving away from the current non-normative publication of model cards and research papers (or similar or documentation) towards normative and standardized methods such as AI/ML Bills-of-Materials (AI/ML-BOMs). Specifically, AIBOMs are recognized as a key method for creating the technical documentation required by the EU AI Act (Article 11 and Annex IV). + +In order to fulfill requirements of the act, providers must create and maintain up-to-date technical documentation, which includes providing a detailed description of the model’s capabilities, limitations, and intended use. + +Some of these model documentation requirements include: + +- General description, architecture, number of parameters and capabilities. +- Training data provenance, methodologies and scope. +- Evaluation results and performance benchmarks. +- Known limitations and intended use cases. +- Disclosing energy consumption and other environmental impacts. + +### Summary of the Explanatory Notice and Template for the Public Summary of Training Content for general-purpose AI models + +On July 24, 2025, the European Commission released the mandatory Explanatory Notice and Template for the Public Summary of Training Content for general-purpose AI (GPAI) models, a key compliance step under [Article 53](https://artificialintelligenceact.eu/article/53/)(1)(d) of the EU AI Act.This template serves as a mandatory minimum baseline for all GPAI providers, including those using open-source licenses, to publicly disclose information about their training data. + +--- + +## EU AI Act & Explanatory template mappings + +This section provides mappings of the EU AI Act's written and templated requirements to sections of this guide that show how CycloneDX can accommodate these requirements. + +### Article 53: Obligations for Providers of General-Purpose AI Models + +This section contains mappings to guide sections along with commentary for the EU AI Act [Article 53: Obligations for Providers of General-Purpose AI Models](https://artificialintelligenceact.eu/article/53/) which is part of [Chapter V: General-Purpose AI Models](https://artificialintelligenceact.eu/chapter/5/). + +#### Article 53 mappings + +| Section | Text | Guide references & commentary | +| --- | --- | --- | +| 1. | Providers of general-purpose AI models shall: | N/A | +| 1.(a) | draw up and keep up-to-date the technical documentation of the model, including its training and testing process and the results of its evaluation, which shall contain, at a minimum, the information set out in [Annex XI](https://artificialintelligenceact.eu/annex/11/) for the purpose of providing it, upon request, to the AI Office and the national competent authorities; | See [Annex XI: mappings](#annex-xi-mappings) | +| 1.(b) | draw up, keep up-to-date and make available information and documentation to providers of AI systems who intend to integrate the general-purpose AI model into their AI systems. Without prejudice to the need to observe and protect intellectual property rights and confidential business information or trade secrets in accordance with Union and national law, the information and documentation shall: | This effectively describes the AI/ML BOM document in its entirety. | +| 1.(b).(i) | enable providers of AI systems to have a good understanding of the capabilities and limitations of the general-purpose AI model and to comply with their obligations pursuant to this Regulation; and | [Model design considerations](0x24-Design-Model-Card-Considerations.md#model-design-considerations) | +| 1.(b).(ii) |contain, at a minimum, the elements set out in [Annex XII](https://artificialintelligenceact.eu/annex/12/); | Annex XII: "Technical Documentation for Providers of General-Purpose AI Models to Downstream Providers that Integrate the Model into Their AI System"
**Note**: CycloneDX can fully describe an AI/ML model that is part of, or used by, an application or service via contextual or referential inclusion as components in a Software Bill-of-Materials (SBOM) or Software-as-a-Service Bill-of-Materials (SaaSBOM). | +| 1.(c) | put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of [Directive (EU) 2019/790](https://eur-lex.europa.eu/eli/dir/2019/790/oj/eng); | **Intention**: *Article 4 in Directive 2019/790 (CDSMD), the European Union legislator intended to both encourage innovation and to provide more legal certainty for text and data mining (TDM) activities.*
• See commentary: [Oxford: Journal of Intellectual Property Law & Practice - "The text and data mining opt-out in Article 4(3) CDSMD: Adequate veto right for rightholders or a suffocating blanket for European artificial intelligence innovations?"](https://academic.oup.com/jiplp/article/19/5/453/7614898)

CycloneDX enables various methods of conveying non-normative and legal information. Primarily, this is accomplished via `externalReferences`, component `properties`, as well as through explicit `licenses` objects and `copyright` fields. | +| 1.(d) | draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office. | The CycloneDX model card's parameter object allows for a description of the model's Training [Approach](0x22-Design-Model-Card-Parameters.md#approach).

**Note**: CycloneDX, as a Bills-of-Materials standard, accounts for each dataset as its own, fully described, component. See [Declaring Datasets](0x22-Design-Model-Card-Parameters.md#declaring-datasets). | +| 2. | The obligations set out in paragraph 1, points (a) and (b), shall not apply to providers of AI models that are released under a free and open-source licence that allows for the access, usage, modification, and distribution of the model, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available. This exception shall not apply to general-purpose AI models with systemic risks. | N/A | +| 3. | Providers of general-purpose AI models shall cooperate as necessary with the Commission and the national competent authorities in the exercise of their competences and powers pursuant to this Regulation. | N/A | +| 4. | Providers of general-purpose AI models may rely on codes of practice within the meaning of [Article 56](https://artificialintelligenceact.eu/article/56/) to demonstrate compliance with the obligations set out in paragraph 1 of this Article, until a harmonised standard is published. Compliance with European harmonised standards grants providers the presumption of conformity to the extent that those standards cover those obligations. Providers of general-purpose AI models who do not adhere to an approved code of practice or do not comply with a European harmonised standard shall demonstrate alternative adequate means of compliance for assessment by the Commission. | In short, Article 56 references further creation of new "Codes of Practice" that provide:
• a means to ensure that the information, referred to in Article 53, is kept up to date.
• an adequate level of detail for the summary about the content used for training;
• identification of the type and nature of the systemic risks at Union level, including their sources, where appropriate;
• measures, procedures and modalities for the assessment and management of the systemic risks at Union level, including the documentation thereof, which shall be proportionate to the risks.

As these codes are developed, future revisions of this guide will provide updates to facilitate compliance using CycloneDX. | +| 5. | For the purpose of facilitating compliance with [Annex XI](https://artificialintelligenceact.eu/annex/11/), in particular points 2 (d) and (e) thereof, the Commission is empowered to adopt delegated acts in accordance with [Article 97](https://artificialintelligenceact.eu/article/97/) to detail measurement and calculation methodologies with a view to allowing for comparable and verifiable documentation. | In short, Article 97 provides for *"The power to adopt delegated acts is conferred on the Commission subject to the conditions laid down in this Article.* which would *"enter into force upon only if no objection has been expressed by either the European Parliament or the Council within a period of three months of notification*.

As any additional "delegated acts" are developed, future revisions of this guide will provide updates to facilitate compliance using CycloneDX.| +| 6. | The Commission is empowered to adopt delegated acts in accordance with [Article 97](https://artificialintelligenceact.eu/article/97/)(2) to amend [Annexes XI](https://artificialintelligenceact.eu/annex/11) and [XII](https://artificialintelligenceact.eu/annex/12) in light of evolving technological developments. | *See commentary provided to paragraph 5.* | +| 7. | Any information or documentation obtained pursuant to this Article, including trade secrets, shall be treated in accordance with the confidentiality obligations set out in [Article 78](https://artificialintelligenceact.eu/article/78/). | In short, Article 78 provides assurances to providers that their *"the intellectual property rights and confidential business information or trade secrets of a natural or legal person, including source code"* will remain confidential and protected by law when handled by regulators and officials.

The CycloneDX Bills-of-Materials (BOMs) format can be used to convey any information a provider chooses to encode subject to their legal discretion. | + +--- + +### ANNEX XI: Technical Documentation Referred to in Article 53(1), Point (a) – Technical Documentation for Providers of General-Purpose AI Models + +This section contains mappings for [ANNEX XI: Technical Documentation Referred to in Article 53](https://artificialintelligenceact.eu/annex/11/). + +#### Annex XI mappings + +| Section | Section text | Guide references | Relevant schema (v1.7) | +| --- | --- | --- | --- | +| 1 | Information to be provided by all providers of general-purpose AI models
The technical documentation referred to in [Article 53](https://artificialintelligenceact.eu/article/53) (1), point (a) shall contain at least the following information as appropriate to the size and risk profile of the model: | N/A | N/A | +| 1.1 | A general description of the general-purpose AI model including: | CycloneDX describes models as components:
• [Declaring ML models](0x20-Design-Model-Component-Metadata.md#declaring-ml-models) | • [metadata:component.](https://cyclonedx.org/docs/1.7/json/#metadata_component)
   ▪ [type](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_type): `"machine-learning-model"`
   ▪ [name](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_name)
   ▪ [version](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_version)
   ▪ [description](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_description)
   ▪ [supplier](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_supplier)
   ▪ [manufacturer](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_manufacturer)
   ▪ Component [publisher](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_publisher) | +| 1.1.(a) | the tasks that the model is intended to perform and the type and nature of AI systems in which it can be integrated; | Use cases and users:
• [Considerations: Users & use cases](0x24-Design-Model-Card-Considerations.md#users--use-cases) | • [metadata.component.modelCard.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard)
   ▪ [considerations.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations)
     ▪ [users](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_users)
     ▪ [useCases](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_useCases) | +| 1.1.(b) | the acceptable use policies applicable; | Use policies:
• [Providing a model's usage policy](0x40-Design-Additional-Model-Information.md#providing-a-models-usage-policy)
_- See example for the Qwen model._| Usage policies can be provided as a CycloneDX external reference.
•  [metadata.component.externalReferences](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_externalReferences)

**Note**: multiple references to published usage policies can be provided. | +| 1.1.(c) | the date of release and methods of distribution; | Release information:
• [Providing model release notes](0x20-Design-Model-Component-Metadata.md#providing-model-release-notes) | Release dates and methods are part of CycloneDX are provided using the `releaseNotes` fields in the model's component:
• [metadata.component.releaseNotes.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_releaseNotes)
   ▪ [type]()
   ▪ [description]()
   ▪ [timestamp]()
   ▪ [notes]()
   ▪ etc.

**Note:** *Components support multiple releases notes for the associated model/version.* | +| 1.1.(d) | the architecture and number of parameters; | Model architecture:
• [Architecture family](#architecture-family)
• [Model architecture](#model-architecture)
| • [metadata.component.modelCard.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard)
   • [modelParameters.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_modelParameters)
       ▪ [architectureFamily](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_modelParameters_architectureFamily)
       ▪ [modelArchitecture](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_modelParameters_modelArchitecture) | +| 1.1.(e) | the modality (e.g. text, image) and format of inputs and outputs; | Modality/modalities:
• [Declaring a model's modalities](0x40-Design-Additional-Model-Information.md#declaring-a-models-modalities)

Inputs & Outputs
• [Inputs & Outputs](0x22-Design-Model-Card-Parameters.md#inputs--outputs) | Inputs & Outputs:
• [modelCard.modelParameters.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_modelParameters)
   ▪ [inputs](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_modelParameters_inputs)
   ▪ [outputs](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_modelParameters_outputs)
Modality/modalities:
• [metadata.component.properties](https://cyclonedx.org/docs/1.7/json/#metadata_properties) | +| 1.1.(f) | the licence. | Component license:
• [Describing models as components](0x20-Design-Model-Component-Metadata.md#describing-models-as-components)
   ▪ _See [Example: Declaring an ML model in an ML-BOM](0x20-Design-Model-Component-Metadata.md#example-declaring-an-ml-model-in-an-ml-bom) which uses the CycloneDX `license` object._ | CycloneDX provides multiple, robust options for recording license information:
• [metadata.licenses](https://cyclonedx.org/docs/1.7/json/#metadata_licenses) | +| 1.2 | A detailed description of the elements of the model referred to in point 1, and relevant information of the process for the development, including the following elements: | N/A | N/A | +| 1.2.(a) | the technical means (e.g. instructions of use, infrastructure, tools) required for the general-purpose AI model to be integrated in AI systems; | Detailing inference workflows, tasks, steps and resources to be used testing and production:
• [Including manufacturing information for the ML model](0x40-Design-Additional-Model-Information.md#including-manufacturing-information-for-the-ml-model)
• [Declaring the runtime topology](0x40-Design-Additional-Model-Information.md#declaring-the-runtime-topology) | Detailed inference and testing workflows:
• [formulation.](https://cyclonedx.org/docs/1.7/json/#formulation)
   ▪ [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [tasks](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [steps](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks_items_steps)
   ▪ [runtimeTopology](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks_items_runtimeTopology)

**Note**: The CycloneDX `formulation` object can be used to convey `workflows` for such things as `inference` in the context of various target frameworks or their runtime topologies. | +| 1.2.(b) | the design specifications of the model and training process, including training methodologies and techniques, the key design choices including the rationale and assumptions made; what the model is designed to optimise for and the relevance of the different parameters, as applicable; | Considerations when designing the model:
• [Technical limitations](#technical-limitations)
• [Performance tradeoffs](#performance-tradeoffs)
• [Ethical considerations](#ethical-considerations)

Describing design, data preparation and training workflows along with detailed tasks, steps and resources:
• [Including manufacturing information for the ML model](0x40-Design-Additional-Model-Information.md#including-manufacturing-information-for-the-ml-model)
• [Declaring hardware and software training components](0x40-Design-Additional-Model-Information.md#declaring-hardware-and-software-training-components)
   ▪ [Providing training workflow details](0x40-Design-Additional-Model-Information.md#providing-training-workflow-details)
   ▪ [Declaring the runtime topology](0x40-Design-Additional-Model-Information.md#declaring-the-runtime-topology) | Considerations:
• [metadata.component.modelCard.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard)
   ▪ [considerations.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations)
     ▪ [technicalLimitations](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_technicalLimitations)
     ▪ [performanceTradeoffs](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_performanceTradeoffs)
     ▪ [ethicalConsiderations](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_ethicalConsiderations)

Detailed workflows:
• [formulation.](https://cyclonedx.org/docs/1.7/json/#formulation)
   ▪ [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [tasks](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [steps](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks_items_steps)
   ▪ [runtimeTopology](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks_items_runtimeTopology)

**Note**: CycloneDX v2.0 will have extensible workflow `taskTypes` that will include an AI/ML taxonomy with values for such things as `training` or `fine-tuning`. | +| 1.2.(c) | information on the data used for training, testing and validation, where applicable, including the type and provenance of data and curation methodologies (e.g. cleaning, filtering, etc.), the number of data points, their scope and main characteristics; how the data was obtained and selected as well as all other measures to detect the unsuitability of data sources and methods to detect identifiable biases, where applicable; | Data or dataset declaration, provenance and pedigree:
• [Datasets](0x22-Design-Model-Card-Parameters.md#datasets)
   ▪ [Declaring datasets](0x22-Design-Model-Card-Parameters.md#declaring-datasets)
     ▪ [datasets as component references](0x22-Design-Model-Card-Parameters.md#datasets-as-data-component-references)
     ▪ [datasets as in-line declarations](0x22-Design-Model-Card-Parameters.md#datasets-as-in-line-information) | Detailed data preparation workflows:
• [formulation.](https://cyclonedx.org/docs/1.7/json/#formulation)
   ▪ [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [tasks](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [steps](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks_items_steps)
   ▪ [runtimeTopology](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks_items_runtimeTopology) | +| 1.2.(d) | the computational resources used to train the model (e.g. number of floating point operations), training time, and other relevant details related to the training; | Detailing training workflows, tasks, steps and resources:
• [Including manufacturing information for the ML model](0x40-Design-Additional-Model-Information.md#including-manufacturing-information-for-the-ml-model)
   ▪ [Declaring hardware and software training components](0x40-Design-Additional-Model-Information.md#declaring-hardware-and-software-training-components)
   ▪ [Providing training workflow details](0x40-Design-Additional-Model-Information.md#providing-training-workflow-details)
   ▪ [Declaring the runtime topology](0x40-Design-Additional-Model-Information.md#declaring-the-runtime-topology) | Detailed training workflows:
• [formulation.](https://cyclonedx.org/docs/1.7/json/#formulation)
   ▪ [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [tasks](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [steps](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks_items_steps)
   ▪ [runtimeTopology](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks_items_runtimeTopology)

**Note**: CycloneDX v2.0 will have extensible workflow `taskTypes` that will include an AI/ML taxonomy with values for such things as `training` or `fine-tuning`. | +| 1.2.(e) | known or estimated energy consumption of the model. With regard to point (e), where the energy consumption of the model is unknown, the energy consumption may be based on information about computational resources used. | Per-activity energy consumptions, energy provider information, CO2 costs and CO2 cost offsets:
• [Energy Consumptions](0x24-Design-Model-Card-Considerations.md#energy-consumptions) | • [Environmental Considerations](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_environmentalConsiderations)
   ▪ [Energy Consumptions](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_environmentalConsiderations_energyConsumptions)
     ▪ [activity](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_environmentalConsiderations_energyConsumptions_items_activity)
     ▪ [energyProviders](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_environmentalConsiderations_energyConsumptions_items_energyProviders)
     ▪ [activityEnergyCost](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_environmentalConsiderations_energyConsumptions_items_activityEnergyCost)
     ▪ [co2CostEquivalent](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_environmentalConsiderations_energyConsumptions_items_co2CostEquivalent)
     ▪ [co2CostOffset](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_environmentalConsiderations_energyConsumptions_items_co2CostOffset)

**Note**: _Energy consumptions can be reported on a per-activity basis (e.g., `data-collection`, `training`, `fine-tuning`, etc.) and can correspond to declared workflows._| +| 2 | Additional information to be provided by providers of general-purpose AI models with systemic risk | N/A | N/A | +| 2.1 | A detailed description of the evaluation strategies, including evaluation results, on the basis of available public evaluation protocols and tools or otherwise of other evaluation methodologies. Evaluation strategies shall include evaluation criteria, metrics and the methodology on the identification of limitations. | Describing and recording results for performance (evaluation) tests:
• [Model quantitative analysis](0x23-Design-Model-Card-Quantitative-Analysis.md#model-quantitative-analysis)
   ▪ [Performance Metrics](0x23-Design-Model-Card-Quantitative-Analysis.md#performance-metrics)
   ▪ [Graphics](0x23-Design-Model-Card-Quantitative-Analysis.md#graphics) | • [metadata.component.modelCard.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard)
   ▪ [considerations.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations)
     ▪ [quantitativeAnalysis.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_quantitativeAnalysis)
       ▪ [performanceMetrics](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_quantitativeAnalysis_performanceMetrics)
       ▪ [graphics](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_quantitativeAnalysis_graphics)

**Note:** Evaluation processes can be encoded as [formulation](https://cyclonedx.org/docs/1.7/json/#formulation) [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows), and their compositional `tasks`, `steps`, `tools`, `inputs`, `outputs` and more. | +| 2.2 | Where applicable, a detailed description of the measures put in place for the purpose of conducting internal and/or external adversarial testing (e.g. red teaming), model adaptations, including alignment and fine-tuning. | Additionally, "Ethical considerations" and "Fairness assessments" can be documented as shown in these sections:
• [Fairness assessments](0x24-Design-Model-Card-Considerations.md#fairness-assessments) | • [metadata.component.modelCard.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard)
    [considerations.](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations)
     • [fairnessAssessments](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_considerations_fairnessAssessments) | +| 2.3 | Where applicable, a detailed description of the system architecture explaining how software components build or feed into each other and integrate into the overall processing. | The composition of model components, including data:
• [Declaring ML Models](#declaring-ml-models)
   • [Describing models as components](#describing-models-as-components)
   • [Model repositories as components](#model-repositories-as-components)
   • [Describing a model repository as a CycloneDX assembly](#describing-a-model-repository-as-a-cyclonedx-assembly) | Component and service relationships:
• [compositions.](https://cyclonedx.org/docs/1.7/json/#compositions) (nested relationships)
   • [assemblies](https://cyclonedx.org/docs/1.7/json/#compositions_items_assemblies)
   • [dependencies](https://cyclonedx.org/docs/1.7/json/#compositions_items_dependencies) (required, non-transitive relationships)

Hierarchical (nested) relationships (for assemblies):
• [metadata.component.components](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_components)
• [components.components](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_components)
• [services.services](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_services_items_services)

Direct dependencies:
• [dependencies](https://cyclonedx.org/docs/1.7/json/#dependencies)

Process and data flows:
• [formulation.](https://cyclonedx.org/docs/1.7/json/#formulation)
   ▪ [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [tasks](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks)
   ▪ [runtimeTopology](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_runtimeTopology)

**Note**: _When models are incorporated into hardware and software systems, CycloneDX supports of declaring full dependency relationships as well as detailing service and data processing workflows._ | + +--- + +### Annex: Template for the Public Summary of Training Content for General-Purpose AI models required by Article 53 + +This section provides mappings for the _"Explanatory Notice"_ and _"Template for the Public Summary of Training Content"_ which seek to address relevant legal text from [Article 53](https://artificialintelligenceact.eu/article/53/)(1)(d) of the AI Act: + +_Providers of general-purpose AI models shall […] draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office._ + +As well as [Recital 107](https://artificialintelligenceact.eu/recital/107/) of the AI Act: + +_In order to increase transparency on the data that is used in the pre-training and training of general-purpose AI models, including text and data protected by copyright law, it is adequate that providers of such models draw up and make publicly available a sufficiently detailed summary of the content used for training the general-purpose AI model._ + +#### Template mapping notes + +Subsections under Section 2, _"Lists of data sources"_, require similar information around data or datasets and their collection processes using different techniques and from different sources. Therefore, much of the _"Guide references"_ and _"CycloneDX Commentary"_ text will be similar across the following subsections: + +* 2.1, Publicly available datasets +* 2.2, Private non-publicly available datasets obtained from third parties +* 2.3, Data crawled and scraped from online sources +* 2.4, User data +* 2.5, Synthetic data +* 2.6, Other data sources + +#### Template mappings + +| Section | Text | Guide references | CycloneDX Commentary | +| --- | --- | --- | --- | +| 1. | General information | See [Annex XI, Section 1.1](#annex-xi-mappings),
• [Declaring ML models](0x20-Design-Model-Component-Metadata.md#declaring-ml-models) | The majority of this information would be provided within the CycloneDX [component.metadata](https://cyclonedx.org/docs/1.7/json/#metadata) for the model. | +| 1.1 | Provider identification | See [Annex XI, Section 1.1](#annex-xi-mappings),
• [Declaring ML models](0x20-Design-Model-Component-Metadata.md#declaring-ml-models) | Manufacturer, supplier and publisher information can be provided within the model's metadata:
• [manufacturer](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_manufacturer) - _The organization that built or created the model._
• [supplier](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_supplier) - _The organization the supplied the model for use_
• [publisher](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_publisher) - _The organization that published the model_ | +| 1.1.(i) | Provider name and contact details | See template mapping section 1.1 (above) | Both the [manufacturer](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_manufacturer) and [supplier](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_supplier) information includes:
• name, address, url and multiple (i.e., an array of), detailed [contact](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_supplier_contact) information which accounts for multiple points-of-contact.

Component [publisher](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_publisher) information supports a textual description. | +| 1.1.(ii) | Authorised representative name and contact details | See template mapping section 1.1.(i) (above) | Each [contact](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_supplier_contact) information includes:
• name, email address and phone | +| 1.2 | Model identification | A discussion of model identifiers with examples:
• [Model identifiers](0x20-Design-Model-Component-Metadata.md#model-identifiers)| The model component information includes support for the [Package-URL (PURL) Specification](https://github.com/package-url/purl-spec) specification which provides syntax for identifying a model from various source repositories:
• [metadata.component.](https://cyclonedx.org/docs/1.7/json/#metadata_component)
   ▪ [purl](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_purl)

**Note**: _In addition, CycloneDX the means to provide identifiers from registered proprietary or other publication sources via the [CycloneDX property taxonomy](https://github.com/CycloneDX/cyclonedx-property-taxonomy)._ | +| 1.2.(i) | Versioned model name(s) | A model's name, identifiers and version are considered unique attributes of the model component. These are provided in the [metadata.component](https://cyclonedx.org/docs/1.7/json/#metadata_component) field as shown here:
• [Describing models as components](0x20-Design-Model-Component-Metadata.md#describing-models-as-components)
• [Model identifiers](0x20-Design-Model-Component-Metadata.md#model-identifiers)

If a model is derived from another model, that relationship would be described by pedigree:
• [Declaring a model's pedigree](0x20-Design-Model-Component-Metadata.md#declaring-a-models-pedigree) | Model name, version and identifiers:
• [metadata.component](https://cyclonedx.org/docs/1.7/json/#metadata_component)

Model pedigree:
• [component.pedigree](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_pedigree)

**Note**: _An AI/ML Bill-of-Materials is intended to represent a single identifiable model. For each separate version published or supplied from a different source, it should be represented by its own unique AI/ML-BOM which can capture its pedigree, including any changes from the original model by reference to its AI/ML BOM via pedigree fields._ | +| 1.2.(ii) | Model dependencies | Dependencies of a model needs to consider both the datasets used for training and/or finetuning as well as listing dependencies on tokenizers, templates and any configurations. Accounting for these resources are respectively shown in the following sections:
• [Model repositories as components](0x20-Design-Model-Component-Metadata.md#model-repositories-as-components)
• [Declaring datasets](0x22-Design-Model-Card-Parameters.md#declaring-datasets)
   ▪ [Datasets as data component references](0x22-Design-Model-Card-Parameters.md#datasets-as-data-component-references) | Model component and service compositions (e.g., datasets, tensor data, tokenizers, configurations, etc.):
• [compositions.](https://cyclonedx.org/docs/1.7/json/#compositions) (nested relationships)
   • [assemblies](https://cyclonedx.org/docs/1.7/json/#compositions_items_assemblies)
   • [dependencies](https://cyclonedx.org/docs/1.7/json/#compositions_items_dependencies) (required, non-transitive relationships)

Hierarchical (nested) relationships (for assemblies):
• [metadata.component.components](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_components)
• [components.components](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_components)
• [services.services](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_services_items_services)

Direct dependencies:
• [dependencies](https://cyclonedx.org/docs/1.7/json/#dependencies)

Explicit declaration of model datasets used (using CycloneDX data component references):
• [modelCard.modelParameters.datasets](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_modelCard_modelParameters_datasets)

Process and data flows:
• [formulation.](https://cyclonedx.org/docs/1.7/json/#formulation)
   ▪ [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [tasks](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks)
   ▪ [runtimeTopology](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_runtimeTopology) | +| 1.2.(iii) | Date of placement of the model on the Union market: | This information would be provided in the model's release notes:
• [Providing model release notes](0x20-Design-Model-Component-Metadata.md#providing-model-release-notes) | Release notes:
• [component.releaseNotes](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_releaseNotes)
| +| 1.3. | Modalities, overall training data size and other characteristic *(general information about the overall training data after pre-processing and before the training of the model)* | N/A | N/A | +| 1.3.(i) | Modality *(e.g., text, image, audio, video, other)* | Model modalities:
• [Declaring a model's modalities](0x40-Design-Additional-Model-Information.md#declaring-a-models-modalities)

**Note**: *Multi-model models should include modality information for each sub-model.* | Modalities:
• [metadata.component.properties](https://cyclonedx.org/docs/1.7/json/#metadata_properties)

**Note**: _Utilizes property values defined in the the [CycloneDX Property Taxonomy for AI/ML](https://github.com/CycloneDX/cyclonedx-property-taxonomy/blob/main/cdx/ai-ml.md)_ | +| 1.3.(ii) | Training data size | The CycloneDX component can be used to describe a training dataset with any level of detail required. In general, this section describes the general method on how to declare public and private datasets:
• [Declaring datasets](0x22-Design-Model-Card-Parameters.md#declaring-datasets)

Additionally, other types of information about each component dataset can be provided via various fields such as:
• pedigree
• external references to documentation
• properties _(customized for tagging information to domain-specific requirements such as data sizes)_. | Dataset component(s):
• [component](https://cyclonedx.org/docs/1.7/json/#components)
   ▪ [pedigree](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_pedigree)
   ▪ [externalReferences](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_externalReferences)
   ▪ [properties](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_properties)

**Note**: _Ideally, each dataset would have its own independent Bill-of-Materials that fully described the details of its design, dependencies (i.e., data sources) and manufacturing which could be referenced by the AI/ML BOM._ | +| 1.3.(iii) | Types of content | A discrete description of the types of content used to train a model would be provided as CycloneDX data components.
• [Declaring datasets](0x22-Design-Model-Card-Parameters.md#declaring-datasets)

Additional content information can be provided via external documentation and referenced in the model's component declaration.
• [Providing links to papers & articles](0x22-Design-Model-Card-Parameters.md#providing-links-to-papers--articles) | Dataset components, their descriptions and external references to documentation:
• [component.](https://cyclonedx.org/docs/1.7/json/#components)
   ▪ [type](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_type): `"data"`
   ▪ [description](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_description)
   ▪ etc.

Model component's external references:
• [metadata.component.externalReferences](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_externalReferences) | +| 2 | List of data sources *(information about specific sources of data used to train the general-purpose AI model)* | N/A | N/A | +| 2.1 | Publicly available datasets | Each _public_ dataset used to train a model would be provided as CycloneDX data component.
• [Declaring datasets](0x22-Design-Model-Card-Parameters.md#declaring-datasets)
   ▪ [Datasets as in-line information](0x22-Design-Model-Card-Parameters.md#datasets-as-in-line-information)
   ▪ [Datasets as data component references](#datasets-as-data-component-references) | Dataset component(s):
• [component](https://cyclonedx.org/docs/1.7/json/#components)
   ▪ [type]: `"data"`
   ▪ [name](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_name)
   ▪ [description](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_description)
   ▪ [pedigree](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_pedigree)
   ▪ [externalReferences](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_externalReferences)
   ▪ [properties](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_properties)
   ▪ etc.

**Note**: _Ideally, each public dataset would have its own independent Bill-of-Materials that fully described the details of its design, dependencies (i.e., data sources) and manufacturing which could be referenced by the AI/ML BOM._ | +| 2.2 | Private non-publicly available datasets obtained from third parties | Private dataset information would be provided similarly to public datasets.

See references and commentary for _public_ data.
• See: _"Guide references" and "CycloneDX Commentary"_ in [Annex: Template for the Public Summary of Training Content](#annex-template-for-the-public-summary-of-training-content-for-general-purpose-ai-models-required-by-article-53), Section 2.1 _"Publicly available data"_ (above) | _See referenced section._ | +| 2.2.1 | Datasets commercially licensed by rightsholders or their representatives | Commercial dataset information would be provided similarly to public datasets.

See references and commentary for _public_ data.
• See "Guide references" and "CycloneDX Commentary" in [Annex: Template for the Public Summary of Training Content](#annex-template-for-the-public-summary-of-training-content-for-general-purpose-ai-models-required-by-article-53), Section 2.1 _"Publicly available data"_ (above) | _See referenced section._ | +| 2.2.1.(i) | concluded transactional commercial licensing agreement (modalities covered by license) | License information would be provided in the CycloneDX data component:
• [Describing models as components](0x20-Design-Model-Component-Metadata.md#describing-models-as-components)
   ▪ See: _[Example: Declaring an ML model in an ML-BOM](0x20-Design-Model-Component-Metadata.md#example-declaring-an-ml-model-in-an-ml-bom)_ which uses the CycloneDX `license` object. | CycloneDX provides multiple, robust options for recording license information:
• [metadata.licenses](https://cyclonedx.org/docs/1.7/json/#metadata_licenses)

**Note**: *modality-specific licensing may have considerations in future CycloneDX versions.* | +| 2.2.2 | Private datasets obtained from other third parties | Third-party, private dataset information would be provided similarly to public datasets.

See references and commentary for _public_ data.
• See "Guide references" and "CycloneDX Commentary" in [Annex: Template for the Public Summary of Training Content](#annex-template-for-the-public-summary-of-training-content-for-general-purpose-ai-models-required-by-article-53), Section 2.1 _"Publicly available data"_ (above) | _See referenced section._ | +| 2.2.2.(i) | Specify the modality(ies) of the content covered by the datasets concerned. | Model data component modalities are declared in the same way as for the model component itself:
• [Declaring a model's modalities](0x40-Design-Additional-Model-Information.md#declaring-a-models-modalities) | Data component modalities as properties:
• [component.properties](https://cyclonedx.org/docs/1.7/json/#metadata_tools_oneOf_i0_components_items_properties)

**Note**: _Utilizes property values defined in the the [CycloneDX Property Taxonomy for AI/ML](https://github.com/CycloneDX/cyclonedx-property-taxonomy/blob/main/cdx/ai-ml.md)_ | +| 2.2.2.(ii) | If publicly known, list private datasets obtained from other third parties | Publicly known, third-party, private dataset information would be provided similarly to public datasets.

See references and commentary for _public_ data.
• See "Guide references" and "CycloneDX Commentary" in [Annex: Template for the Public Summary of Training Content](#annex-template-for-the-public-summary-of-training-content-for-general-purpose-ai-models-required-by-article-53), Section 2.1 _"Publicly available data"_ (above) | _See referenced section._ | +| 2.2.2.(iii) | General description of non-publicly known private datasets obtained from third parties | Non-publicly known, third-party, private dataset information would be provided similarly to public datasets.

See references and commentary for _public_ data.
• See "Guide references" and "CycloneDX Commentary" in [Annex: Template for the Public Summary of Training Content](#annex-template-for-the-public-summary-of-training-content-for-general-purpose-ai-models-required-by-article-53), Section 2.1 _"Publicly available data"_ (above) | _See referenced section._ | +| 2.2.2.(iv) | Additional comments *(optional)*

_e.g., the period of data collection, size of the datasets and further details_ | CycloneDX Bills-of-Materials (e.g., an AI/ML BOM) supports `annotations` that allow for comments (made by people, organizations, or tools) about any object with a `bom-ref` such as `components`, `services` or the BOM itself. | Comments using annotations:
• [annotations](https://cyclonedx.org/docs/1.7/json/#annotations)
   ▪ [subjects](https://cyclonedx.org/docs/1.7/json/#annotations_items_subjects) - _list of references (e.g, components, services, etc.) the annotation applies to._
   ▪ [annotator](https://cyclonedx.org/docs/1.7/json/#annotations_items_annotator) - _The organization, person, component, or service which created the textual content of the annotation._
   ▪ [timestamp](https://cyclonedx.org/docs/1.7/json/#annotations_items_timestamp)
   ▪ [text](https://cyclonedx.org/docs/1.7/json/#annotations_items_text) - _The textual content of the annotation._
   ▪ [signature](https://cyclonedx.org/docs/1.7/json/#annotations_items_signature) _(optional)_ - _digital signature of the signer._

**Note**: _Each annotation can optionally have its own unique `bom-ref` which allows reference from other annotations._ | +| 2.3 | Data crawled and scraped from online sources *(excluding publicly available datasets already compiled by third parties and made available on platforms such as common crawl that are covered under Section 2.1)*

_The following subsections only apply if "crawlers were used for data collection"._ | Although this guide does not provide specific examples for describing data collection workflows using _data crawling_ techniques, the _training_ workflow example can be used to extrapolate how this may be done using CycloneDX. | Any data collection processes would be described using CycloneDX workflows:
• [formulation.](https://cyclonedx.org/docs/1.7/json/#formulation)
   ▪ [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [tasks](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks)
   ▪ [runtimeTopology](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_runtimeTopology)

**Note**: _Best practices would have the data collection processes, for the resultant subject named dataset, captured in a Manufacturing Bill-of-Materials (or MBOM) which could be referenced by an AI/ML BOM for a model that used that dataset for training or finetuning._ | +| 2.3.(i) | specify crawler name(s)/identifier(s) | The _data crawler_ would be declared as a CycloneDX component with its name and identifiers provided as described for a model component.

See applicable references in the following table sections (above):
• 1.2, "Model identification"
• 1.2.(i), "Versioned model name(s)" | N/A | +| 2.3.(ii) | Purposes of the crawler(s) | The crawler software would be declared as a CycloneDX `component` (or `service`) which can include its `description`, documentation via `externalReferences`, `properties` and `annotations`.

See section 2.1, "Publicly available datasets" (above) for how to include component information and section 2.2.2.(iv), "Additional comments" for `annotations`. | N/A | +| 2.3.(iii) | General description of crawler behaviour | See referenced methods for annotations in Section 2.3.(i) (above) | N/A | +| 2.3.(iv) | Period of data collection | See referenced methods in Section 2.2.2.(iv) (above) | N/A | +| 2.3.(v) | Comprehensive description of the type of content and online sources crawled | Each content source the crawler software targeted would be declared as a CycloneDX `component` with the ability to describe the content names, identifiers and location.

See section 2.1, "Publicly available datasets" (above) for how to include component information. | N/A | +| 2.3.(vi) | Type of modality covered | Each content source can include modality `properties` as referenced in Section 2.2.2.(i) (above). | N/A | +| 2.3.(vii) | Summary of the most relevant domain names crawled | The BOM for the software crawler should have a detailed listing of all crawled sources represented as CycloneDX components (or services). This comprehensive information could be provided using the crawler's BOM itself or the BOM used to produce a summary view. | N/A | +| 2.3.(viii) | Additional comments *(optional)*

_e.g., domain names, URLs and the sources of individual works_ | See Section 2.2.2.(iv) (above) | N/A | +| 2.4 | User data *(information about user data collected by all services and products of the provider, including through mail services, social media platforms, content platforms)*

_The following subsections only apply if user information sources were used._ | Although this guide does not provide specific examples for describing data collection workflows that target _user data_, the _training_ workflow example can be used to extrapolate how this may be done using CycloneDX. | Any data collection processes would be described using CycloneDX workflows:
• [formulation.](https://cyclonedx.org/docs/1.7/json/#formulation)
   ▪ [workflows](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows)
   ▪ [tasks](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_tasks)
   ▪ [runtimeTopology](https://cyclonedx.org/docs/1.7/json/#formulation_items_workflows_items_runtimeTopology)

**Note**: _Best practices would have the data collection processes, for the resultant subject named dataset, captured in a Manufacturing Bill-of-Materials (or MBOM) which could be referenced by an AI/ML BOM for a model that used that dataset for training or finetuning._ | +| 2.4.(i) | provide a general description of the provider’s services or products that were used to collect the user data | N/A | N/A | +| 2.4.(ii) | Additional comments *(optional)* | N/A | N/A | +| 2.5 | Synthetic data

_The following subsections only apply if synthetic information sources were used._ | N/A | N/A | +| 2.5.(i) | modality of the synthetic data | N/A | N/A | +| 2.5.(ii) | specify the general-purpose AI model(s) used to generate the synthetic data if available on the market | N/A | N/A | +| 2.5.(iii) | Information about other AI models, including provider’s own AI model(s) not available on the market, used to generate synthetic data to train the model | N/A | N/A | +| 2.5.(iv) | Additional comments *(optional)* | N/A | N/A | +| 2.6 | Other sources of data

_The following subsections only apply if other information sources were used._ | N/A | N/A | +| 2.6.(i) | provide a narrative description of these data sources and the data | N/A | N/A | +| 2.5.(ii) | Additional comments *(optional)* | N/A | N/A | +| 3 | Data processing aspects The following subsections only apply if synthetic information sources were used. | N/A | N/A | +| 3.1 | Respect of reservation of rights from text and data mining exception or limitation | *(measures implemented by the provider to identify and comply with the reservation of rights from the text and data mining (TDM) exception or limitation expressed pursuant to Article 4(3))* | N/A | +| 3.1.(i) | Additional comments *(optional)* | N/A | N/A | +| 3.2 | Removal of illegal content | *measures taken to avoid or remove illegal content under Union law from the training data (such as blacklists, keywords, and model-based classifiers), without requiring disclosure of specific details about the provider’s internal business practices or trade secrets* | | +| 3.3 | Other information *(optional)* | *Other relevant information about data processing* | | diff --git a/ML-BOM/en/1.7_schema_example_v1.json b/ML-BOM/en/0x93_Appendix-C_Complete_Example.md similarity index 95% rename from ML-BOM/en/1.7_schema_example_v1.json rename to ML-BOM/en/0x93_Appendix-C_Complete_Example.md index 8cc0ac65..a0b32f22 100644 --- a/ML-BOM/en/1.7_schema_example_v1.json +++ b/ML-BOM/en/0x93_Appendix-C_Complete_Example.md @@ -1,3 +1,13 @@ +# Appendix C: References + +This appendix includes a complete AI/ML BOM example that combines most of the isolated examples for the Qwen model shown throughout this guide. + +#### Example: Qwen-7B AI/ML BOM + + +> **Note**: For brevity, the `formulation` entry for the model's training only describes the top-level `workflow` topology (i.e., the run-time "stack"), but none of the `tasks` or `steps` that could be detailed. + +```json { "$schema": "http://cyclonedx.org/schema/bom-1.7.schema.json", "bomFormat": "CycloneDX", @@ -21,6 +31,30 @@ "name": "Qwen/Qwen-7B", "version": "ef3c5c9c57b252f3149c1408daf4d649ec8b6c85", "description": "Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc.", + "licenses": [ + { + "license": { + "name": "Tongyi Qianwen LICENSE AGREEMENT", + "text": { + "content": "By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, ..." + } + } + } + ], + "releaseNotes": [ + { + "type": "major", + "title": "Qwen 7B initial release", + "timestamp": "2023-08-03T15:30:00Z", + "notes": { + { + "locale": "en-US", + "text": "United States (US), English release date." + } + // ... + } + } + ], "externalReferences": [ { "type": "vcs", @@ -436,4 +470,5 @@ ] } ] -} \ No newline at end of file +} +``` \ No newline at end of file