Training and Data Mining Assertion
The C2PA technical specification allows actors in a workflow to make cryptographically signed assertions about the produced C2PA asset.
The training and data mining assertion enables a human actor to provide a C2PA Manifest Consumer information about whether an asset with C2PA metadata may be used as part of a data mining or AI/ML training workflow.
Version 1.1 Draft 11 November 2024 · Version history
The Creator Assertions Working Group expects to release an update to this specification late in 2024. This page is the working draft of that update. Until that time, implementers should refer to the 1.0 version of this specification. |
Maintainers:
License
This specification is subject to the Community Specification License 1.0.
Additional information about this specification’s scope and governance can be found at the project’s GitHub repository (creator-assertions/training-and-data-mining-assertion). The Community Specification License documents at the root of that repository are the authoritative governance documents for this specification.
Contributing
This section is non-normative.
This specification is an active working draft. If you wish to contribute to its development, you are invited to:
Foreword
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. No party shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not constitute an endorsement.
This document was prepared by the Creator Assertions Working Group.
Known patent licensing exclusions are available in the specification’s notices.md
file.
Any feedback or questions on this document should be directed to the specifications repository (GitHub: creator-assertions/training-and-data-mining-assertion).
THESE MATERIALS ARE PROVIDED “AS IS.” The Contributors and Licensees expressly disclaim any warranties (express, implied, or otherwise), including implied warranties of merchantability, non-infringement, fitness for a particular purpose, or title, related to the materials. The entire risk as to implementing or otherwise using the materials is assumed by the implementer and user. IN NO EVENT WILL THE CONTRIBUTORS OR LICENSEES BE LIABLE TO ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND WITH RESPECT TO THIS DELIVERABLE OR ITS GOVERNING AGREEMENT, WHETHER BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR NOT THE OTHER MEMBER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Table of contents
1. Introduction
This section is non-normative.
1.1. Scope
For purposes of the Community Specification License, the scope.md document at the root of this project’s GitHub repository is the governing document of this specification’s scope.
|
3. Assertion definition
3.1. Overview
This assertion enables a human actor to provide a Manifest Consumer information about whether the asset may be used as part of a data mining or AI/ML training workflow. This is expressed in the assertion through a map of one or more training-mining-entries
. Each entry describes whether its use is allowed
, notAllowed
, or constrained
.
There are four pre-defined entries:
cawg.data_mining
-
Can any text or data content be extracted from the asset for purposes of determining “patterns, trends, and correlations.”
This would include images containing text, where the text could be extracted via OCR. cawg.ai_inference
-
Can the asset be used as input to a trained AI/ML model for the purposes of inferring a result.
cawg.ai_generative_training
-
Can the asset be used as training data to an AI/ML model that could generate assets.
cawg.ai_training
-
Can the asset be used as data to train non-generative AI/ML models, such as those used for classification, object detection, etc.
|
In addition to the pre-defined entries, a claim generator may also add their own custom keys, provided that they conform to the same syntax for custom labels as defined in Section 6.2, “Labels,” of the C2PA Technical Specification. Labels beginning with the prefix cawg.
are reserved for use in future versions of this specification and MUST NOT be assigned by any other claim generator.
The value of constrained
implies that permission is not unconditionally granted for this usage. Consumers of this content that wish to use the content in this way may wish to contact the actor which is the rights holder, author, or signer to get more info or obtain permission. In the absence of additional information, constrained
shall be treated as equivalent to notAllowed
. More details on the constraints may be provided in the constraints_info
text field.
Some possible things that could be put into constraints_info include a well-known description of a license (e.g., Creative Commons), a URL to a policy file, or just some free text.
|
A training and data mining assertion SHALL have a label of cawg.training-mining
.
Notice to implementers of previous (C2PA 1.x) definition of this assertion
Implementers who are transitioning from the earlier definition of this assertion should pay special attention to label names. The training and data mining assertion as defined in version 1.4 of the C2PA technical specification used labels with the prefix This specification is not a product of the C2PA itself, so it can not use the |
3.2. Schema and example
The CDDL Definition for this type is:
; Assertion for specifying whether the associated asset and its data
; may be used for training an AI/ML model or mined for its data (or both).
; Possible values
$training-mining-choice /= "allowed"
$training-mining-choice /= "notAllowed"
$training-mining-choice /= "constrained"
; Description of the data structure
training-mining-map = {
"entries" : $training-mining-entries-map
? "metadata" : $assertion-metadata-map ; additional information about the assertion
}
training-mining-entries-map = {
? "cawg.data_mining" : $training-mining-map-entry,
? "cawg.ai_inference" : $training-mining-map-entry,
? "cawg.ai_training" : $training-mining-map-entry,
? "cawg.ai_generative_training" : $training-mining-map-entry,
* tstr => any ; allow for any other custom use case
}
training-mining-map-entry = {
"use": $training-mining-choice,
? "constraint_info": tstr .size (1..max-tstr-length) ; information about the use of `constrained`
}
A previous version of this specification inadvertently omitted the entries map from the CDDL definition. It SHOULD be used as documented in this version.
|
An example in CBOR Diagnostic Format (.cbordiag
) is shown below:
{
"entries":
"cawg.ai_training": {
"use": "allowed"
},
"cawg.ai_generative_training": {
"use": "notAllowed"
},
"cawg.data_mining": {
"use": "constrained",
"constraint_info": "may only be mined on days whose names end in 'y'"
}
}
Appendix A: Version history
This section is non-normative.
17 July 2024
-
Started 1.1 pre-draft version of this specification, derived from the 1.0 version of this specification. (Version history)
22 July 2024
-
Promoted from pre-draft to draft status.
11 Nov 2024
-
Fix inconsistency between schema and example in Section 3.2, “Schema and example”