Software Package Data eXchange (SPDX™)

Specification

Version: DRAFT

Disclaimer

The following is a draft of the Software Package Data eXchange (SDPX) Specification. This draft is provided as a public courtesy to illustrate the progress of the project. Please be advised that in addition to the licensing terms recited below, no reliance should be made regarding this draft, including any reliance that this draft represents any particular standard for any particular purpose. When the official SPDX specification is released, it will be clearly be labeled "official", and its purpose and recommended use will be clearly stated at that time.

Copyright © 2010 Linux Foundation and its Contributors. All other rights are expressly reserved.

With thanks to Andrew Back, Bill Schineller, Bruno Cornec, Ciaran Farrell, Daniel German, Debra McGlade, Eran Strod, Eric Thomas, Esteban Rockett, Gary O'Neall, Guillaume Rousseau, Jack Manbeck, Jeff Luszcz, John Ellis, Kate Stewart, Kim Weins, Mark Gisi, Marshall Clow, Martin Michlmayr , Martin von Willebrand, Michael J. Herzog, Michel Ruffin, Phil Robb, Philip Odence, Philip Koltun, Scott K Peterson, Shane Coughlan, Stuart Hughes, Tom Callaway, and Thomas F. Incorvia for their contributions and assistance.

Licensed under the Creative Commons Attribution License 3.0 (reproduced in Appendix III herein).

1. Rationale

1.1 Charter

To create a set of data exchange standards that enable companies and organizations to share + and component information (metadata) for software packages and related content with the aim of facilitating license and other policy compliance.

1.2 Definition

The Software Package Data Exchange(SPDX™) specification is a standard format for communicating the components, licenses and copyrights associated with a software package. An SPDX file is associated with a particular software package and contains information about that package in the SPDX format.

1.3. Why is a common format for data exchange needed?

Companies and organizations (collectively "Organizations") are widely using and reusing open source and other software packages. Compliance with the associated licenses requires a set of analysis activities and due diligence that each Organization performs independently including: a manual and/or automated scan of software and identification of associated licenses followed by manual verification. Software development teams across the globe use the same open source packages, but little infrastructure exists to facilitate collaboration on the analysis or share the results of these analysis activities. As a result, many groups are performing the same work leading to duplicated efforts and redundant information being created. The SPDX working group seeks to create a data exchange format so that information about software packages and related content, may be collected and shared in a common format with the goal of saving time and improving data accuracy.

1.4. What does this specification cover?

1.4.1. Identification Information

Meta data to associate analysis results with a specific package. This includes a unique identifier that identifies a specific instance of a package and permits correlation to the corresponding SPDX file.

1.4.2. Overview Information

Facts that are common properties of the entire package.

1.4.3 File Specific Information

Facts ( e.g. copyrights, licenses) that are specific to each file included in the package.

1.4.4 Common Licenses

list of common licenses likely to be encountered and a standardized naming convention for referring to these licenses within an SPDX document. This naming convention will also be the basis for extending this set of common licenses over time.

1.4.5. Evolution Hooks

A set of mechanisms that permit extending the specification in a structured manner under specific future versions of the specification.

1.5 What is not covered in the specification?

Information that cannot be derived from an inspection (whether manual or using automated tools) of the package to be analyzed.

How the data stored in and SPDX file is used by the recipient identification of any patent(s) which may or may not relate to the package.

Legal interpretation of the licenses or any compliance actions that might need to be taken.

1.6 Format Requirements:

2 SPDXDoc

An SpdxDoc is a collection of license and copyright informatixon regarding one or more packages and the files contained in those packages.

Properties

2.1 SPDXVersion

Purpose

Provide a reference number that can be used to understand how to parse and interpret the rest of the file. It will enable both future changes to the specification and to support backward compatibility. The version number consists of a Major and Minor version indicator. The Major field will be incramented when one or more sections are created, modified or deleted. The Minor field will be incremented when either only the fields within a section are modified or standard recognized licenses are added to appendix.

Intent

Here, parties exchanging Identification Information in accordance with SPDX specification need to provide 100% transparency as to which SPDX specification such Identification Information is conforming to.

Domain

SPDXDoc

Range

The value of this property is fixed. The value of the SPDXVersion property for all datasets encoded in this verison must be: SPDX-1.0

2.1.1 RDF/XML serialization

Example

      <SPDXVersion>SPDX-1.0</SPDXVersion>
  

2.1.2 Tag-Value serialization

The SPDX version is serialized as a single tag with a string value. As noted above "SPDX-1.0" is the only allowed value at this time.


      versiondecl = "SPDXVersion:" *WSP spdxversion
      spdxversion = "SPDX-1.0"
  
Example

      SPDXVersion: SPDX-1.0
  

2.2 CreatedBy

Purpose

Identify how the meta data information in the SPDX file was generated. If it was generated manually, it should indicate – who did the analysis. If the information in the file was generated with a software tool, the file should indicate an identifier and version for that tool.

Intent

Here, the generation method will assist the reader of the Identification Information in self determining the general reliability/accuracy of the Identification Information.

Value Format

The value of created by may indicate a person, tool or organization.


      company    = "Company:" *WSP name *WSP ["(" email ")"]
      tool       = "Tool:" *WSP name
      person     = "Person:" *WSP name *WSP ["(" email ")"]
      name       = *VUNICODECHAR
      email      = 1*(ALPHA / DIGIT / "." / "-" / "_") "@" domainpart 1*("." domainpart)
      domainpart = 1*(ALPHA / DIGIT / "-" / "_") 
  

For example, ”Person: John Doe” (john.doe@example.com)" or "Company: My Company, Inc" (depending on which is accountable) or "Tool: spdx-gen 1.0".

Domain

SPDXDoc

2.1.1 RDF/XML serialization

Example

      <CreatedBy>Person: John Doe (john.doe@example.com)</CreatedBy>
  

2.1.2 Tag-Value serialization

CreatedBy is serialized as a single tag with a string value. If there are multiple participants in the analysis, each participant should be indicated with a separate entry.


      createdby = "CreatedBy:" 1*(*WSP (company / tool / person))
  
Example

      CreatedBy: Person:  John Doe (john.doe@example.com)
      CreatedBy: Company: My Company, Inc.
      CreatedBy: Tool:    spdx-gen 1.0
  

2.3 Created

Purpose

Identify when the analysis was done and the SPDX file was originally created. This is to be specified according to combined data and time in UTC format as specified in ISO 8601 standard.

Intent

Here, the Time Stamp can serve as a verification as to whether the analysis needs to be updated.

Domain

SPDXDoc

Range

xsd:dateTime

Refines

dc:created

RDF/XML serialization

Example

      <Created>2010-11-02T11:02:03-0600</Created>
  

Tag-Value serialization

The value of Created should be represented in UTC for the tag-value format. Number should be zero padded to fit the expected field width.

Tag-Value syntax


      created   = "Created:" *WSP timestamp
      timestamp = year "-" month "-" day "T" hour ":" minute ":" second "Z"
      year      = 4*4DIGIT
      month     = 2*2DIGIT
      day       = 2*2DIGIT
      hour      = 2*2DIGIT
      minute    = 2*2DIGIT
      second    = 2*2DIGIT
  
Example

      Created: 2010-11-02T17:02:03Z
  

2.4 ReviewedBy

Purpose

List of the people who have reviewed the SPDX file and the date of that review. Note that there is no requirement for a particular reviewer to add their name to the file, however it may be important for participants in the software supply chain to validate whether upstream providers have reviewed the SPDX file. This can be considered as an equivalent to “signed off” or “reviewed by”. Additional reviewers can be added after the original version of the SPDX file is created and can occur as an append to the original file.

Intent

Here, as time progresses certain reviewers will begin to gain credibility as reliable. This field intends to make such information transparent.

Domain

SPDXDoc

2.1.1 RDF/XML serialization

Example

      <ReviewedBy>John Doe (john.doe@example.com) 2010-11-02T17:02:03Z</ReviewedBy>
  

2.1.2 Tag-Value serialization

ReviewedBuse uses the same person format as CreatedBy but adds a timestamp, like the value of Created, to indicate when the review occurred.


      reviewedby = "ReviewedBy:" person *WSP timestamp
  
Example

      ReviewedBy: John Doe (john.doe@example.com) 2010-11-02T17:02:03Z
  

3 Package

A package is an artifact of a software development effort. It is generally a snapshot of the source code at a particular point in time or an mechanism to be used to install the software. A package can contain sub-packages, but the overview is a reference to the entire contents of the package listed.

Properties

3.1 DeclaredName

Purpose

Full name of the package as given by originator with information if available.

Intent

Here, the formal name of each package is an important conventional technical identifier to be maintained for each package.

Domain

Package

Range

The value is an xsd:string that contains the full name, including version information if known, of the package.

3.1.1 RDF/XML serialization

Example

      <DeclaredName>glibc 2.11.1lt;/DeclaredName>
  

3.1.2 Tag-Value serialization

Example

      DeclaredName: glibc 2.11.1
  

3.2 MachineName

Purpose

File name of package instance.

Intent

Here, the actual filename of the compressed file (containing the package) is a significant technical element that needs to be carried with each package's Identification Information.

Domain

Package

Range

The value is an xsd:string that is the machine generated file name and version typically includes the packaging and compression methods used.

3.2.1 RDF/XML serialization

Example

      <MachineName>glibc-2.11.1.tar.gz</MachineName>
  

3.2.2 Tag-Value serialization

Example

      MachineName: glibc-2.11.1.tar.gz
  

3.3 SHA1

Purpose

This field provides an independently reproducible mechanism that permits unique identification of a specific package that correlates to the data in this SPDX file. This identifier enables a recipient to determine if any file in the original package has been changed. The SHA1 unique identifier mechanism will be used.

Intent

Here, by providing an unique identifier of each package, confusion over which version/modification of a specific package the Identification Information references should be eliminated.

Domain

Package

Range

The value is an xsd:string that is the hexadecimal encoded SHA1 hash of the package file. The length of this values is 40 characters

3.3.1 RDF/XML serialization

Example

      <SHA1>d6a770ba38583ed4bb4525bd96e50461655d2758</SHA1>
  

3.3.2 Tag-Value serialization

Example

      SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
  

3.4 DownloadURL

Purpose

This field identifies the download Universal Resource Locator (URL) for this package at the time that the SPDX file was initially created and the analysis was done. If there is no public URL, then it is explicitly marked as “unknown”.

Intent

Here, where to download the exact package being referenced is a critical verification and tracking datum.

Domain

Package

Range

The value is an xsd:anyURI. This URI should be the publicly reachable location from which the package was acquired.

3.4.1 RDF/XML serialization

Example

      <DownloadURL>http://ftp.gnu.org/gnu/glibc/</DownloadURL>
  

3.4.2 Tag-Value serialization

Example

      URL: http://ftp.gnu.org/gnu/glibc/
  

3.5 SourceInfo

Purpose

This is a free form text field that contains additional comments about the origin of the package. For instance, this field might include comments indicating whether the package been pulled from a source code management system or has been repackaged.

Intent

Here, by providing a freeform field, reviewers can provide any additional information to describe any anomalies, or discoveries, in the determination of the origin of the package.

Domain

Package

Range

The value is an xsd:string.

3.5.1 RDF/XML serialization

Example

      <SourceInfo>This package was acquired from the standard location for glibc.</SourceInfo>
  

3.5.2 Tag-Value serialization

Example

      SourceInfo: This package was acquired from the standard location for glibc. 
  

3.6 DeclaredLicense

Purpose

This field lists the licenses that have been declared by the creators of the package. The licenses are listed using a standard short form name. See Appendix I for standardized license short forms. If a Declared License is not one of the standardized license short forms, this field must contain a reference to the full license text included in this SPDX file. If no license is declared, the “NotSpecified” short form in Appendix I should be used. If it is unclear if there is a license reference or not, use “Unknown” short form from Appendix I. If more than one license is formally declared for package, then each should be listed. If the creators of the package offer the recipent a choice of licenses, then each of the choices will be declared as a “disjunctive” license.

Intent

This is simply the license identified in text in the actual package source code files (typically in the header of each package file.) This field may have multiple declared licenses, if multiple licenses are declared at the package level.

Domain

Package

Range

The value is an LicenseInfo.

3.6.1 RDF/XML serialization

Example

      <DeclaredLicense rdf:resource="http://spdx.org/licenses/LGPL-2.0"/>
  

3.6.2 Tag-Value serialization

<short form identifier in Appendix I> | "FullLicense"-N

Example

      DeclaredLicense: LGPL-2.0
      DetectedLicense: FullLicense-1
  

3.7 DetectedLicense

Purpose

This field list of all licenses found in the package by scanning the files, manually or using automated tools. The licenses should use the standard short form names. See Appendix I for standardized license short forms. If a Detected License is not one of the standardized license short forms, this field must contain a reference to the full licenses text included in this SPDX file. If no licenses are detected then the “NotSpecified” short form in Appendix I should be used. If it is unclear if there is a license reference or not, use “Unknown” short form from Appendix I.

Intent

Here, we intend to capture all licenses within the package detected by any scanning tools used by the package reviewer.

Domain

Package

Range

The value is an LicensingInfo.

3.7.1 RDF/XML serialization

Example

      <DetectedLicense rdf:resource="http://spdx.org/licenses/LGPL-2.0"/>
  

3.7.2 Tag-Value serialization

<short form identifier in Appendix I> | "FullLicense"-N

Example

      DetectedLicense: LGPL-2.0
      DetectedLicense: FullLicense-1
  

3.8 DeclaredCopyright

Purpose

Identify the author(s) and licensor(s) of package, as well as any dates present. This will be a free form text field extracted from the package information files.

Intent

Here, by identifying the actual author(s), some ambiguities, e.g., under which license the author(s) were intending to license the package, may be resolvable by knowing who to contact for clarity.

Domain

Package

Range

The value is an xsd:string.

3.8.1 RDF/XML serialization

Example

      <DeclaredCopyright>Copyright 2008-2010 John Smith</DeclaredCopyright>
  

3.8.2 Tag-Value serialization

Example

      DeclaredCopyright: Copyright 2008-2010 John Smith
  

3.9 ShortDesc

Purpose

This field is a short description of the package

Intent

Here, the intent is to allow a reader/reviewer of this field to quickly understand the function/use of the package, at a high level, without having to parse the source code of the actual package.

Domain

Package

Range

The value is an xsd:string.

3.9.1 RDF/XML serialization

Example

      <ShortDesc>gnu c library</ShortDesc>
  

3.9.2 Tag-Value serialization

Example

      ShortDesc: gnu c library 
  

3.10 Description

Purpose

This field is a more detailed description of the package. It may be extracted from the packages itself.

Intent

Here, the intent is to provide technical readers/reviewers with a detailed technical explanation of the functionality, anticipated use, and anticipated implementation of the package. This field may also include a description of improvements over prior version of the package, where applicable.

Domain

Package

Range

The value is an xsd:string.

3.10.1 RDF/XML serialization

Example

      <Description>…</Description>
  

3.10.2 Tag-Value serialization

Example

      Description: …
  

4 License

Subclass of LicensingInfo

This section is used for any detected or declared licenses that are NOT one of the standard licenses. One instance should be created for every unique license detected in package that does not match one of the standard license short forms from Appendix I. Each license instance should have the following fields.

Properties

4.1 LicenseID

Purpose

Provide a unique identifier for the packages and files sections to refer to non standard license text detected in the package and reproduced here to aid analysis.

Intent

Here, we seek to identify in whole or in part portions of the package which are licensed under unfamiliar and/or uncommon licenses.

Domain

Package

Range

The value is an xsd:string.

2.2.1.1 RDF/XML serialization

Example

      <LicenseID>FullLicense-1</LicenseID>
  

2.2.1.2 Tag-Value serialization

Example

      LicenseID: FullLicense-1
  

4.2 LicenseText

Purpose

Provide a copy of the actual text of the license extracted from the package that is associated with the License ID to aid in analysis.

Intent

Here, the actual license text included in the package serves as confirmation that the license, and version, named in the header files of the package is correct.

Domain

Package

Range

The value is an xsd:string.

2.2.1.1 RDF/XML serialization

Example

      <LicenseText>...</LicenseText>
  

2.2.1.2 Tag-Value serialization

Example

      LicenseText: ...
  

5 File

A file is a series of related octets that will be placed in a single file when the package is installed or unpacked. One File resources should be created for each file in a Package

Properties

5.1 Name

Purpose

Identify path to file that corresponds to this information.

Intent

Here, any confusion over where a file needs to hierarchically be placed for proper functionality is mitigated.

Domain

File

Range

The value is a relative file URL. The package should be treated the root directory. This means all file Names start with a "/" character.

2.1.1.1 RDF/XML serialization

Example

      <Name>/var/foo.c</Name>
  

2.1.1.2 Tag-Value serialization


      filename = "name:" *WSP "/" *VUNICODECHAR
  
Example

      name: /bar/foo.c
  

5.2 Type

Purpose

This field Identifies common types of files where there may be different treatment of copyright and license information: source, binary, machine generated, etc.

Intent

Here, this field is basically the "best available" format field, from a developer perspective.

Domain

File

Range

The value is on of the following strings. Values of this property are case insensitive.

2.1.1.1 RDF/XML serialization

Example

      <Type>source</Type>
  

2.1.1.2 Tag-Value serialization


      filetype = "type:" *WSP ( "source" / "binary" / "archive" / "other" )
  
Example

      type: source
  

5.3 License

Purpose

This field contains the license governing the file if it is known. It will either be explicit from the file header or other information found in the file’s source code or the default from the package. If no license information is found it should be denoted as “NotSpecified”. If no license information can be determined, the license is denoted as “Unknown”. The licenses should use the standard short form names. See Appendix I for standardized license short forms. If a Detected License is not one of the standardized license short forms, this field must contain a reference to the full licenses text included in this SPDX file in section 4. If more than one license is detected in the file, then each should be listed. If any of the detected licenses offer the recipient a choice of licenses, then each of the choices will be declared as a “disjunctive” license.

Intent

Here, the intent is to have a uniform method to refer to each license with specificity to eliminate any license confusion. For example, the 3 clause BSD would have a different license identifier then the 4 clause BSD.

Domain

File

Range

The value is an LicensingInfo.

5.6.1 RDF/XML serialization

Example

      <License rdf:resource="http://spdx.org/licenses/LGPL-2.0"/>
  

5.6.2 Tag-Value serialization

<short form identifier in Appendix I> | "FullLicense"-N


      filelicense    = "License:" *WSP ( anyURI / spdxlicenseid / locallicenseid )
      spdxlicenseid  = *VCHAR
      locallicenseid = *locallicenseid
  
Example

      License: LGPL-2.0
      License: FullLicense-1
  

5.4 Copyright

Purpose

This field identifies the copyright holders and associated dates of their copyright that are in this specific file if known. Note: Copyright holder identifier may have developer names, companies, email addresses, and may be specified in international character sets. This will be a freeform text field extracted from the package information files.

Intent

Here, similar to identifying the actual author(s) (above), by identifying the copyright holder(s), the copyright holder(s) may be contacted if licensing issues exist with the package, or to request distribution under another license more compatible with a

Domain

File

Range

The value is an xsd:string.

5.6.1 RDF/XML serialization

Example

        <Copyright>Copyright: Copyright 2008-2010 John Smith</Copyright>
    

5.6.2 Tag-Value serialization


        filecopyright    = "Copyright:" *WSP *VUNICODECHAR
    
Example

        Copyright: Copyright 2008-2010 John Smith
    

5.5 SHA1

Purpose

Provide a unique identifier to match analysis information on specific files between packages.

Intent

Here, by providing an unique identifier of each file, confusion over which version/modification of a specific file the Identification Information references should be eliminated.

Domain

File

Range

The value is an xsd:string that is the hexadecimal encoded SHA1 hash of the package file. The length of this values is 40 characters

5.3.1 RDF/XML serialization

Example

      <SHA1>d6a770ba38583ed4bb4525bd96e50461655d2758</SHA1>
  

5.3.2 Tag-Value serialization


      sha1    = "SHA1:" *WSP 40 * HEXDIG
  
Example

      SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
  

6 LicensingInfo

Subclass of dc:LicenseDocument

A LicenseingInfo resource represents a License or set of Licenses under which a File or Package may be copied.

7 DisjunctiveLicenseSet

Subclass of LicensingInfo

A DisjunctiveLicenseSet is a set Licenses, or LicensingInfo, that any one of which constitues the complete terms and conditions of coping a File or Package. The copier/user of a File or Package may choose, at their disgression, which License in a disjuntive set.

Properties

7.1 Licenses

Purpose

This is a list of members of this set.

Intent

To provide a list of the members of this disjunctive set.

Domain

DisjunctiveLicenseSet

Range

The value is rdf:List of LicenseInfo resources.

2.1.1.1 RDF/XML serialization

Example

<DisjunctiveLicenseSet>
  <Licenses rdf:parseType="Collection">
    <rdf:Description rdf:about="license:GPL" />
    <rdf:Description rdf:about="license:BSD" />
  </Licenses>
</DisjunctiveLicenseSet>
  

2.1.1.2 Tag-Value serialization


disjunctivelicenseset = "DisjunctiveLicenseSet(" *WSP [setmembership] *WSP ")"
setmembership         = licensinginfoid *[*WSP "," *WSP licensinginfoid]
licensinginfoid       = 1*(ALPHA / DIGIT / "-" / "_" / ".")
  
Example

DisjunctiveLicenseSet(GPLv2, BSD)
  

8 ConjunctiveLicenseSet

Subclass of LicensingInfo

A ConjunctiveLicenseSet is a set Licenses, or LicensingInfo, that any one of which constitues the complete terms and conditions of coping a File or Package. The copier/user of a File or Package may choose, at their disgression, which License in a disjuntive set.

Properties

8.1 Licenses

Purpose

This is a list of members of this set.

Intent

To provide a list of the members of this conjunctive set.

Domain

ConjunctiveLicenseSet

Range

The value is rdf:List of LicenseInfo resources.

2.1.1.1 RDF/XML serialization

Example

<ConjunctiveLicenseSet>
  <Licenses rdf:parseType="Collection">
    <rdf:Description rdf:about="license:GPL" />
    <rdf:Description rdf:about="license:BSD" />
  </Licenses>
</ConjunctiveLicenseSet>
  

2.1.1.2 Tag-Value serialization


conjunctivelicenseset = "ConjunctiveLicenseSet(" *WSP [setmembership] *WSP ")"
setmembership         = licensinginfoid *[*WSP "," *WSP licensinginfoid]
licensinginfoid       = 1*(ALPHA / DIGIT / "-" / "_" / ".")
  
Example

ConjunctiveLicenseSet(GPLv2, BSD)