-
-
Notifications
You must be signed in to change notification settings - Fork 11
WIP: EU AI Act mapping to AI/ML BOM sections and examples #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c676586
21bee3c
29ef3da
11e8c8b
383ef2b
cabe164
220bba7
0c782f2
20db299
bc26403
717ec1f
ccd2bd6
2fb58a7
abad223
412ce55
8d44bdd
cec62f1
dd06c3d
6fd33b8
b773005
0ea94b2
86a82fa
fb2b6e7
2148494
aa4d0c9
4588f44
eeb7244
7934271
012e37f
a6a0c7c
5f17f17
da957e7
2119954
4559581
2f4ed6f
48365c3
cd4aae6
2944564
a772157
094deeb
088d372
15b54b0
1aab308
42569aa
ab53236
1884e2e
7c9f6e3
21e2035
76ae86b
c2c5e6c
6fcf651
7680289
05168bb
c4246c1
adfaebc
f043e7d
d8a3cbf
ce08717
eb5c2f5
6c39ffc
f5300d5
6086180
c976cbf
4152a72
42c7f5e
21a6b2e
352acaf
64f9b68
427fe2d
2f9f2ee
05e36cb
da73ea4
28f010f
f163c9a
5ab54d1
b81ec6a
0586028
8d9546b
a64827d
fc041d1
09e4356
8363d6c
43150dc
0747434
d087612
fdd5a80
aeff414
a36ec1e
e731550
68d515b
99e1366
ebfaf0d
bc43cfb
89099a8
43ada7a
b239c22
fed6a26
90c5e08
ba48a0c
c71cbe9
9118ca7
eb9f193
56da200
a4492ee
fd5cca6
6a358fa
da5f16e
f4e629d
65ea3d3
f9a1110
0716bb1
a5c8471
68e636e
cac052a
6bd839e
2a72fbb
3aa9b22
f263a68
bbd7326
23701f1
6713afe
94a0330
5c0d61d
6ebd3a5
5e920ea
39b73b9
1660d27
e206bb3
8e1e6e2
9964f3f
1ed0e6c
e110b93
8265a4a
4ed0968
d078620
7bfcee8
f2594aa
37429a1
cc0e9e5
e91f09d
4dad040
eb50ce1
b63f8ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -17,6 +17,7 @@ For convenience, here are links to the specific sections for each of those infor | |||||
| * [Describing models as components](#describing-models-as-components) | ||||||
| * [Model repositories as components](#model-repositories-as-components) | ||||||
| * [Model identifiers](#model-identifiers) | ||||||
| * [Providing model release notes](#providing-model-release-notes) | ||||||
| * [Describing a model repository as a CycloneDX assembly](#describing-a-model-repository-as-a-cyclonedx-assembly) | ||||||
| * [Declaring a model's pedigree](#declaring-a-models-pedigree) | ||||||
|
|
||||||
|
|
@@ -58,8 +59,18 @@ The CycloneDX JSON pseudocode below shows how an ML model would be declared as t | |||||
| "bom-ref": "pkg:huggingface/Qwen/Qwen-7B@ef3c5c9", | ||||||
| "purl": "pkg:huggingface/Qwen/Qwen-7B@ef3c5c9c57b252f3149c1408daf4d649ec8b6c85", | ||||||
| "version": "ef3c5c9c57b252f3149c1408daf4d649ec8b6c85", | ||||||
| "licenses": [ | ||||||
| { | ||||||
| "license": { | ||||||
| "name": "Tongyi Qianwen LICENSE AGREEMENT", | ||||||
| "text": { | ||||||
| "content": "By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, ..." | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| ] | ||||||
| // ... | ||||||
| } | ||||||
| }, | ||||||
| // ... | ||||||
| } | ||||||
| // ... | ||||||
|
|
@@ -69,6 +80,7 @@ The CycloneDX JSON pseudocode below shows how an ML model would be declared as t | |||||
| ###### Field discussion | ||||||
|
|
||||||
| * **bom-ref** - Please note the `bom-ref` value includes the first seven characters of the larger hash value from the `purl` component identifier which is sufficient for local identification within the BOM itself. | ||||||
| * **license** - The `licenses` object shown in the example is a "custom" license which, in this case, we chose to provide the unencoded license text. It is preferable, when possible to use an SPDX license identifier and supply it in the `id` field of the `license` (e.g., `"license": { "id": "Apache-2.0" }` ). | ||||||
|
|
||||||
| #### Model repositories as components | ||||||
|
|
||||||
|
|
@@ -166,7 +178,7 @@ If the model being described by an ML-BOM is instead hosted in a GitHub reposito | |||||
|
|
||||||
| Organizations that produce BOMs for hardware or software components they produce may have multiple domain-specific identifiers for the same component. In these cases, it is best practice to register (reserve) an official namespace for these domains with the [CycloneDX Property Taxonomy](), which is the authoritative source of official namespaces used in CycloneDX `properties`. | ||||||
|
|
||||||
| ###### Example: | ||||||
| ###### Example: domain-specific identifiers | ||||||
|
|
||||||
| The following example shows how a registered name for a fictional company, ACME, which registered the namespace `acme`, could provide a property to identify one of its internal ML models. | ||||||
|
|
||||||
|
|
@@ -224,11 +236,47 @@ Each can be specifically identified in a CycloneDX component using a Package URL | |||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| ##### Providing model release notes | ||||||
|
|
||||||
| It is important to disclose information regarding a model's release. This is accomplished by utilizing the CycloneDX component's `releaseNotes` object and its fields. | ||||||
|
|
||||||
| ###### Example: release notes | ||||||
|
|
||||||
| ```json | ||||||
| { | ||||||
| "$schema": "http://cyclonedx.org/schema/bom-1.7.schema.json", | ||||||
| // ... | ||||||
| "metadata": | ||||||
| { | ||||||
| "component": | ||||||
| { | ||||||
| "type": "machine-learning-model", | ||||||
| "bom-ref": "pkg:huggingface/Qwen/Qwen-7B@ef3c5c9", | ||||||
| // ... | ||||||
| "releaseNotes": [ | ||||||
| { | ||||||
| "type": "major", | ||||||
| "title": "Qwen 7B initial release", | ||||||
| "timestamp": "2023-08-03T15:30:00Z", | ||||||
| "notes": { | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| { | ||||||
| "locale": "en-US", | ||||||
| "text": "United States (US), English release date." | ||||||
| } | ||||||
| // ... | ||||||
| } | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| } | ||||||
| ] | ||||||
| }, | ||||||
| // ... | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| ###### Field discussion | ||||||
|
|
||||||
| * **type** - the type has the value `machine-learning-model` since the single file contains all the information (e.g., default configuration parameters, references to architectures and tokenizers, prompt template, etc.) needed to run the model in GGUF inference frameworks. | ||||||
|
|
||||||
|
|
||||||
| #### Describing a model repository as a CycloneDX assembly | ||||||
|
|
||||||
| CycloneDX allows for declarations of software compositions (e.g., hardware products, software applications, packages, libraries, archives, etc.). | ||||||
|
|
@@ -387,7 +435,7 @@ It is important to capture any of these transformations in the model's lineage ( | |||||
|
|
||||||
| * **ancestors** - `ancestors` entries are themselves CycloneDX `component` objects. It should be noted that these models may have their own ML-BOMs, which can be located via their identifiers (e.g., `purl`) or via `externalReferences` for readers to follow. | ||||||
|
|
||||||
| ##### Declaring known descendents | ||||||
| ##### Declaring known descendants | ||||||
|
|
||||||
| If, at the time an ML-BOM is created for a model, its downstream model variants (e.g., finetunings, quantizations, etc., derived from the model) are known, these can also be recorded within the `pedigree` object as `descendants` in a similar manner. | ||||||
|
|
||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -7,7 +7,9 @@ Currently, the v1.7 CycloneDX specification may not have specific objects or fie | |||||||
| For convenience, here are links to the specific sections for some of these acknowledged informational areas: | ||||||||
|
|
||||||||
| * [Using CycloneDX AI/ML properties](#using-cyclonedx-aiml-properties) | ||||||||
| * [Declaring a model's modalities](#declaring-a-models-modalities) | ||||||||
| * [Annotating a model's supported languages](#annotating-a-models-supported-languages) | ||||||||
| * [Providing a model's usage policy](#providing-a-models-usage-policy) | ||||||||
| * [Providing free-form tags for search](#providing-free-form-tags-for-search) | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Below the sections are in this order |
||||||||
| * [Tokenizers and prompt templates](#tokenizers-and-prompt-templates) | ||||||||
| * [Including manufacturing information for the ML model](#including-manufacturing-information-for-the-ml-model) | ||||||||
|
|
@@ -20,6 +22,44 @@ For convenience, here are links to the specific sections for some of these ackno | |||||||
| This section includes discussion and examples of supported AI/ML-related metadata properties that can be used to classify models in their model card information. This method utilizes reserved [AI/ML property names](https://github.com/CycloneDX/cyclonedx-property-taxonomy/cdx/ai-ml.md) registered under the [CycloneDX Property Taxonomy](https://github.com/CycloneDX/cyclonedx-property-taxonomy). | ||||||||
|
|
||||||||
|
|
||||||||
| ## Declaring a model's modalities | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
It should be inner section? |
||||||||
|
|
||||||||
| Models are trained to support processing and analysis of one or more types types of input data for specific tasks or data modalities. | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
|
|
||||||||
| * **Property name**: The CycloneDX reserved property taxonomy name to use to annotate a model with its supported modalities is: `cdx:ai-ml:model:modality` | ||||||||
|
|
||||||||
| * **Property value**: The values for this property includes: | ||||||||
|
|
||||||||
| * `text` - Natural Language Processing (NLP) and specializations such as Natural Language Understanding (NLU) for tasks like translation, summarization, conversation, classification and sentiment analysis. | ||||||||
| * `code` - Specialized text-based modality used for software engineering and logic. | ||||||||
| * `instruct` - Specialized text-based fine-tuned for understanding and executing natural language directives (i.e., instruction following). | ||||||||
| * `image` (vision) - Computer vision for object detection, generation, and classification as well as document processing. | ||||||||
| * `video` - Video processing tasks to extract structured information, including object detection, action recognition, scene detection, and temporal understanding. | ||||||||
| * `audio` - Audio processing tasks such as Automatic Speech Recognition (ASR), Speech-to-Text, music generation, and sound pattern recognition. | ||||||||
| * `sensor` (telemetry) - Processes data from specialized sensors or hardware, such as LiDAR for autonomous vehicles or IoT sensor feeds. | ||||||||
| * `biometric` - Specialized sensor-based modality used for analyzing biological traits for tasks such as facial recognition, fingerprint scanning, or voice authentication. | ||||||||
| * `genomic` (telemetry) - Processes high-dimensional data used in drug discovery and medical research. | ||||||||
| * `_undefined:<NAME>` - `<NAME>` placeholder, used to provide an arbitrary model modality name. | ||||||||
|
|
||||||||
| ###### Example: Tagging a model with its modalities | ||||||||
|
|
||||||||
| ```json | ||||||||
| "component": | ||||||||
| { | ||||||||
| "type": "machine-learning-model", | ||||||||
| "bom-ref": "pkg:huggingface/FakeAI/CoderModel", | ||||||||
| // ..., | ||||||||
| "properties": [ | ||||||||
| { | ||||||||
| "name": "cdx:ai-ml:model:modality:code" | ||||||||
| }, | ||||||||
| { | ||||||||
| "name": "cdx:ai-ml:model:modality:instruct" | ||||||||
| } | ||||||||
| ] | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| ## Annotating a model's supported languages | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
It should be inner section? |
||||||||
|
|
||||||||
| Models can be trained in one or more languages (i.e., multilingual models). | ||||||||
|
|
@@ -81,6 +121,28 @@ This section describes how to "tag" model components with non-standard keywords | |||||||
| * **properties** - The tag values shown above might be used to search for models in a catalog that are compatible with the `pytorch` framework and (the Hugging Face) `transformers` library. The `text-to-speech` and `speech-to-speech` tags could identify the model with those input/output capabilities. | ||||||||
|
|
||||||||
|
|
||||||||
| ## Providing a model's usage policy | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
It should be inner section? |
||||||||
|
|
||||||||
| Model usage policies can be provided using `externalReferences` associated with the model's component definition. | ||||||||
|
|
||||||||
| ###### Example: Providing a link to a model's usage policy | ||||||||
|
|
||||||||
| ```json | ||||||||
| "component": { | ||||||||
| "type": "machine-learning-model", | ||||||||
| "bom-ref": "pkg:huggingface/Qwen/Qwen-7B@ef3c5c9", | ||||||||
| // ..., | ||||||||
| "externalReferences": [ | ||||||||
| { | ||||||||
| "url": "https://qwen.ai/usagepolicy", | ||||||||
| "type": "documentation", | ||||||||
| "comment": "Usage policy" | ||||||||
| } | ||||||||
| ], | ||||||||
| // ... | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| ## Tokenizers and prompt templates | ||||||||
|
|
||||||||
| Tokenizers provide the preprocessing (encoding) and postprocessing (decoding) functions to convert input and output information to tokens that the associated ML model was trained on and used for inference. | ||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
releaseNotes is not an array but object?