THE SPDX WIKI IS NO LONGER ACTIVE. ALL CONTENT HAS BEEN MOVED TO https://github.com/spdx

Difference between revisions of "Technical Team/Proposals/Rough proposal for provenance, hierarchy and aggregation, and supply chain friendliness in SPDX 2.0"

From SPDX Wiki
Jump to: navigation, search
Line 1: Line 1:
<p>A desire has been expressed to be able to have SPDX be capable of expressing</p><p>&nbsp;</p><ol><li>Hiearchy ( package A contains packages B, C, etc)</li><li>Authentication ( we can know precisely who said what and when about a package)</li><li>How software flows through a supply chain (upstream to packager, through several intermediate vendors to consumer)</li></ol><p>A rough example of this thought is shown in the diagram below, showing how the coreutils package might be represented:</p><p>&nbsp;<img src="http://spdx.org/system/files/spdxdoodle_0.jpg" alt="" width="586" height="372" /></p><p>&nbsp;The simple story behind this diagram is this:</p><ol><li>The upstream maintainer of coreutils provides an SPDX file which</li><ol><li>Provides information for the copyrighted entity that is the package as a whole</li><li>Provides embedded information for the copyrighted entity that is each file in the package (same format, just embedded and clearly down hiearchy)</li><li>Provides a coreutils.spdx.sig file with the signature for the coreutils.spdx file (so we can authenticate it)</li></ol><li>This coreutils.spdx file is in the coreutils.tar.gz for the upstream</li><li>The rpm (or deb) packager creates a coreutils.spdx (distinct from the one for the upstream) in the rpm file which:</li><ol><li>Provides information for the copyrighted entity that is the rpm (or deb) package as a whole</li><li>Provides embedded information for the copyrighted entity that is each file (such as patch files) contained in the rpm (or deb) package</li><li>For the coreutils.tar.gz file (also contained in the rpm or deb package), provides it's SPDX information by *referencing* the coreutils.spdx in the coreutils.tar.gz file.</li><li>Optionally provides and Annotation section to 'annotate' some of the information provided by the coreutils upstream.</li></ol></ol><p>&nbsp;</p><p>Diagram for a Concrete proposal (very very rough) for structure (note, notes that say 'Concrete' or 'Referential' are just indicating an 'or' in the doc structure):</p><p>&nbsp;</p><p><img src="http://spdx.org/system/files/spdxdoodle2_0.jpg" alt="" /></p><p>Description of diagram</p><ul><li><strong>Top:</strong> Simple top level place to start</li><li><strong>SPDXFile</strong>: File containing SPDX data</li><li><strong>SPDXElement:</strong> The containing element for SPDX data for a given copyrightable work contained in the SPDXFile. &nbsp;It's SPDXElements all the way down.</li><li><strong>Specifier:</strong> Not really a node, sort of a grouper of nodes to indicate those fields which specify the 'thing' the SPDX Element is about</li><li><strong>LicenseData</strong>: not really a node, sort of a grouper of nodes to indicate those fields which specify what we know about the 'thing' this SPDX Element is about</li><li><strong>SPDXElements:</strong>&nbsp;zero or more additional 'contained' SPDX Elements referring to contained things (like files, or contained tarballs etc).</li><li><strong>Annotations:</strong> zero or more annotations indicating additional information about the contained SPDXElements (to handle the case where a contained SPDX Element represents a reference to a another SPDX file that is signed and thus we can't change directly) - Note, we need more thought here.</li><li><strong>Name</strong>: Equivalent to SPDX 1.0 Formal Name</li><li><strong>Version:</strong> Equivalent to SPDX 1.0 Package Version Information</li><li><strong>Supplier:</strong> Equivalent to SPDX 1.0 Package Supplier</li><li><strong>Summary:</strong> Equivalent to SPDX 1.0 Package Summary Description</li><li><strong>Description:</strong> Equivalent to SPDX 1.0 Package Detailed Description</li><li><strong>URI (in SPDXElement-&gt;Specifier):</strong> URI of the copyrightable thing being referenced, may point to a file, an archive, a package, etc.</li><li><strong>URICheckSum( in SPDXElement-&gt;Specifier):&nbsp;</strong>Checksum for the thing URI points to</li><li><strong>CopyrightText:&nbsp;</strong>Equivalent to SPDX 1.0 CopyrightText</li><li><strong>LicenseText:&nbsp;</strong>Full text of license if LicenseShortForm isn't available</li><li><strong>LicenseShortForm:&nbsp;</strong>License short form in lew of license text if available</li><li><strong>SPDXFileURI:&nbsp;</strong>If the SPDX Element does not contain it's own concrete license data but references an external SPDX File... the URI of that SPDXFile</li><li><strong>SPDXFileSigURI:&nbsp;</strong>If the SPDX Element references an external SPDXFile, the URI of the sig file for that SPDX file</li><li><strong>ACL:</strong> I hate the name ACL, but basically it's a way of specifying that you are including or excluding some of the copyrightable bits that are covered by the referenced SPDX File.</li><li><strong>Exclude (in ACL):&nbsp;</strong>Used to specify parts of the stuff referenced by the external SPDX file you are not bring in. &nbsp;So if I am using all of a package, but not foo.c or bar.c.</li><li><strong>ExcludeAll (in ACL):</strong> Used to indicate that *none* of the referenced copyrightable items from the SPDX file are used except those explicitely included.</li><li><strong>Include (in ACL):&nbsp;</strong>Used after an excludeall to indicate we are only using the specifically included files... say we are just using foobar.c for example.</li><li><strong>SPDXFileSig</strong>: Separate file containing the signature for the octets of the SPDXFile&nbsp;</li></ul><p>Stealing Java's archive URI syntax:</p><p>In the java world, they commonly use a URI syntax &lt;archivename&gt;!&lt;filename&gt; to indicate a particular file within an archive, for example foo.tar.gz!bar.c. &nbsp;I suggest we use this (as there are no other widespread alternatives).</p><p>Archiving multiple SPDX files:</p><p>It may be desirable to archive together multiple SPDX files so that we have full resolvability for a given package. &nbsp;In that case, those should be rolled into a specific archive format (say foo.spdx.zip) with the various spdx files, their sig files, and an index file. &nbsp;The index file should map URIs (as specified the SPDX files) to filenames in the archive (so we can resolve them).</p><p>&nbsp;</p><p>Example:</p><p>file://coreutils.tar.gz!coreutils.spdx: file://upstream/coreutils.spdx</p><p>file://coreutils.tar.gz!coreutils.spdx.sig file://upstream/coreutils.spdx.sig</p><p>Note: We have to also solve the problem here of how to distinguish when two spdx files contain the same URI (say relative to their archives) but they actually need to be resolved to separate files in the rollup, say if both upstream and the rpm (or deb) packager used file://coreutils.spdx&nbsp;</p><p>&nbsp;</p>
+
<p>A desire has been expressed to be able to have SPDX be capable of expressing</p><p>&nbsp;</p><ol><li>Hiearchy ( package A contains packages B, C, etc)</li><li>Authentication ( we can know precisely who said what and when about a package)</li><li>How software flows through a supply chain (upstream to packager, through several intermediate vendors to consumer)</li></ol><p>A rough example of this thought is shown in the diagram below, showing how the coreutils package might be represented:</p><p>&nbsp;<img src="http://spdx.org/system/files/spdxdoodle_0.jpg" alt="" width="586" height="372" /></p><p>&nbsp;The simple story behind this diagram is this:</p><ol><li>The upstream maintainer of coreutils provides an SPDX file which</li><ol><li>Provides information for the copyrighted entity that is the package as a whole</li><li>Provides embedded information for the copyrighted entity that is each file in the package (same format, just embedded and clearly down hiearchy)</li><li>Provides a coreutils.spdx.sig file with the signature for the coreutils.spdx file (so we can authenticate it)</li></ol><li>This coreutils.spdx file is in the coreutils.tar.gz for the upstream</li><li>The rpm (or deb) packager creates a coreutils.spdx (distinct from the one for the upstream) in the rpm file which:</li><ol><li>Provides information for the copyrighted entity that is the rpm (or deb) package as a whole</li><li>Provides embedded information for the copyrighted entity that is each file (such as patch files) contained in the rpm (or deb) package</li><li>For the coreutils.tar.gz file (also contained in the rpm or deb package), provides it's SPDX information by *referencing* the coreutils.spdx in the coreutils.tar.gz file.</li><li>Optionally provides and Annotation section to 'annotate' some of the information provided by the coreutils upstream.</li></ol></ol><p>&nbsp;</p><p>Diagram for a Concrete proposal (very very rough) for structure (note, notes that say 'Concrete' or 'Referential' are just indicating an 'or' in the doc structure):</p><p>&nbsp;</p><p><img src="http://spdx.org/system/files/spdxdoodle2_1.jpg" alt="" /></p><p>Description of diagram</p><ul><li><strong>Top:</strong> Simple top level place to start</li><li><strong>SPDXFile</strong>: File containing SPDX data</li><li><strong>SPDXElement:</strong> The containing element for SPDX data for a given copyrightable work contained in the SPDXFile. &nbsp;It's SPDXElements all the way down.</li><li><strong>Specifier:</strong> Not really a node, sort of a grouper of nodes to indicate those fields which specify the 'thing' the SPDX Element is about</li><li><strong>LicenseData</strong>: not really a node, sort of a grouper of nodes to indicate those fields which specify what we know about the 'thing' this SPDX Element is about</li><li><strong>SPDXElements:</strong>&nbsp;zero or more additional 'contained' SPDX Elements referring to contained things (like files, or contained tarballs etc).</li><li><strong>Annotations:</strong> zero or more annotations indicating additional information about the contained SPDXElements (to handle the case where a contained SPDX Element represents a reference to a another SPDX file that is signed and thus we can't change directly) - Note, we need more thought here.</li><li><strong>Name</strong>: Equivalent to SPDX 1.0 Formal Name</li><li><strong>Version:</strong> Equivalent to SPDX 1.0 Package Version Information</li><li><strong>Supplier:</strong> Equivalent to SPDX 1.0 Package Supplier</li><li><strong>Summary:</strong> Equivalent to SPDX 1.0 Package Summary Description</li><li><strong>Description:</strong> Equivalent to SPDX 1.0 Package Detailed Description</li><li><strong>URI (in SPDXElement-&gt;Specifier):</strong> URI of the copyrightable thing being referenced, may point to a file, an archive, a package, etc.</li><li><strong>URICheckSum( in SPDXElement-&gt;Specifier):&nbsp;</strong>Checksum for the thing URI points to</li><li><strong>CopyrightText:&nbsp;</strong>Equivalent to SPDX 1.0 CopyrightText</li><li><strong>LicenseText:&nbsp;</strong>Full text of license if LicenseShortForm isn't available</li><li><strong>LicenseShortForm:&nbsp;</strong>License short form in lew of license text if available</li><li><strong>SPDXFileURI:&nbsp;</strong>If the SPDX Element does not contain it's own concrete license data but references an external SPDX File... the URI of that SPDXFile</li><li><strong>SPDXFileSigURI:&nbsp;</strong>If the SPDX Element references an external SPDXFile, the URI of the sig file for that SPDX file</li><li><strong>ACL:</strong> I hate the name ACL, but basically it's a way of specifying that you are including or excluding some of the copyrightable bits that are covered by the referenced SPDX File.</li><li><strong>Exclude (in ACL):&nbsp;</strong>Used to specify parts of the stuff referenced by the external SPDX file you are not bring in. &nbsp;So if I am using all of a package, but not foo.c or bar.c.</li><li><strong>ExcludeAll (in ACL):</strong> Used to indicate that *none* of the referenced copyrightable items from the SPDX file are used except those explicitely included.</li><li><strong>Include (in ACL):&nbsp;</strong>Used after an excludeall to indicate we are only using the specifically included files... say we are just using foobar.c for example.</li><li><strong>SPDXFileSig</strong>: Separate file containing the signature for the octets of the SPDXFile&nbsp;</li></ul><p>Stealing Java's archive URI syntax:</p><p>In the java world, they commonly use a URI syntax &lt;archivename&gt;!&lt;filename&gt; to indicate a particular file within an archive, for example foo.tar.gz!bar.c. &nbsp;I suggest we use this (as there are no other widespread alternatives).</p><p>Archiving multiple SPDX files:</p><p>It may be desirable to archive together multiple SPDX files so that we have full resolvability for a given package. &nbsp;In that case, those should be rolled into a specific archive format (say foo.spdx.zip) with the various spdx files, their sig files, and an index file. &nbsp;The index file should map URIs (as specified the SPDX files) to filenames in the archive (so we can resolve them).</p><p>&nbsp;</p><p>Example:</p><p>file://coreutils.tar.gz!coreutils.spdx: file://upstream/coreutils.spdx</p><p>file://coreutils.tar.gz!coreutils.spdx.sig file://upstream/coreutils.spdx.sig</p><p>Note: We have to also solve the problem here of how to distinguish when two spdx files contain the same URI (say relative to their archives) but they actually need to be resolved to separate files in the rollup, say if both upstream and the rpm (or deb) packager used file://coreutils.spdx&nbsp;</p><p>&nbsp;</p>

Revision as of 17:30, 6 December 2011

A desire has been expressed to be able to have SPDX be capable of expressing

 

  1. Hiearchy ( package A contains packages B, C, etc)
  2. Authentication ( we can know precisely who said what and when about a package)
  3. How software flows through a supply chain (upstream to packager, through several intermediate vendors to consumer)

A rough example of this thought is shown in the diagram below, showing how the coreutils package might be represented:

 <img src="http://spdx.org/system/files/spdxdoodle_0.jpg" alt="" width="586" height="372" />

 The simple story behind this diagram is this:

  1. The upstream maintainer of coreutils provides an SPDX file which
    1. Provides information for the copyrighted entity that is the package as a whole
    2. Provides embedded information for the copyrighted entity that is each file in the package (same format, just embedded and clearly down hiearchy)
    3. Provides a coreutils.spdx.sig file with the signature for the coreutils.spdx file (so we can authenticate it)
  2. This coreutils.spdx file is in the coreutils.tar.gz for the upstream
  3. The rpm (or deb) packager creates a coreutils.spdx (distinct from the one for the upstream) in the rpm file which:
    1. Provides information for the copyrighted entity that is the rpm (or deb) package as a whole
    2. Provides embedded information for the copyrighted entity that is each file (such as patch files) contained in the rpm (or deb) package
    3. For the coreutils.tar.gz file (also contained in the rpm or deb package), provides it's SPDX information by *referencing* the coreutils.spdx in the coreutils.tar.gz file.
    4. Optionally provides and Annotation section to 'annotate' some of the information provided by the coreutils upstream.

 

Diagram for a Concrete proposal (very very rough) for structure (note, notes that say 'Concrete' or 'Referential' are just indicating an 'or' in the doc structure):

 

<img src="http://spdx.org/system/files/spdxdoodle2_1.jpg" alt="" />

Description of diagram

  • Top: Simple top level place to start
  • SPDXFile: File containing SPDX data
  • SPDXElement: The containing element for SPDX data for a given copyrightable work contained in the SPDXFile.  It's SPDXElements all the way down.
  • Specifier: Not really a node, sort of a grouper of nodes to indicate those fields which specify the 'thing' the SPDX Element is about
  • LicenseData: not really a node, sort of a grouper of nodes to indicate those fields which specify what we know about the 'thing' this SPDX Element is about
  • SPDXElements: zero or more additional 'contained' SPDX Elements referring to contained things (like files, or contained tarballs etc).
  • Annotations: zero or more annotations indicating additional information about the contained SPDXElements (to handle the case where a contained SPDX Element represents a reference to a another SPDX file that is signed and thus we can't change directly) - Note, we need more thought here.
  • Name: Equivalent to SPDX 1.0 Formal Name
  • Version: Equivalent to SPDX 1.0 Package Version Information
  • Supplier: Equivalent to SPDX 1.0 Package Supplier
  • Summary: Equivalent to SPDX 1.0 Package Summary Description
  • Description: Equivalent to SPDX 1.0 Package Detailed Description
  • URI (in SPDXElement->Specifier): URI of the copyrightable thing being referenced, may point to a file, an archive, a package, etc.
  • URICheckSum( in SPDXElement->Specifier): Checksum for the thing URI points to
  • CopyrightText: Equivalent to SPDX 1.0 CopyrightText
  • LicenseText: Full text of license if LicenseShortForm isn't available
  • LicenseShortForm: License short form in lew of license text if available
  • SPDXFileURI: If the SPDX Element does not contain it's own concrete license data but references an external SPDX File... the URI of that SPDXFile
  • SPDXFileSigURI: If the SPDX Element references an external SPDXFile, the URI of the sig file for that SPDX file
  • ACL: I hate the name ACL, but basically it's a way of specifying that you are including or excluding some of the copyrightable bits that are covered by the referenced SPDX File.
  • Exclude (in ACL): Used to specify parts of the stuff referenced by the external SPDX file you are not bring in.  So if I am using all of a package, but not foo.c or bar.c.
  • ExcludeAll (in ACL): Used to indicate that *none* of the referenced copyrightable items from the SPDX file are used except those explicitely included.
  • Include (in ACL): Used after an excludeall to indicate we are only using the specifically included files... say we are just using foobar.c for example.
  • SPDXFileSig: Separate file containing the signature for the octets of the SPDXFile 

Stealing Java's archive URI syntax:

In the java world, they commonly use a URI syntax <archivename>!<filename> to indicate a particular file within an archive, for example foo.tar.gz!bar.c.  I suggest we use this (as there are no other widespread alternatives).

Archiving multiple SPDX files:

It may be desirable to archive together multiple SPDX files so that we have full resolvability for a given package.  In that case, those should be rolled into a specific archive format (say foo.spdx.zip) with the various spdx files, their sig files, and an index file.  The index file should map URIs (as specified the SPDX files) to filenames in the archive (so we can resolve them).

 

Example:

file://coreutils.tar.gz!coreutils.spdx: file://upstream/coreutils.spdx

file://coreutils.tar.gz!coreutils.spdx.sig file://upstream/coreutils.spdx.sig

Note: We have to also solve the problem here of how to distinguish when two spdx files contain the same URI (say relative to their archives) but they actually need to be resolved to separate files in the rollup, say if both upstream and the rpm (or deb) packager used file://coreutils.spdx