The following is a draft of the Software Package Data eXchange (SDPX) Specification. This draft is provided as a public courtesy to illustrate the progress of the project. Please be advised that in addition to the licensing terms recited below, no reliance should be made regarding this draft, including any reliance that this draft represents any particular standard for any particular purpose. When the official SPDX specification is released, it will be clearly be labeled "official", and its purpose and recommended use will be clearly stated at that time.
Copyright © 2010 Linux Foundation and its Contributors. All other rights are expressly reserved.
With thanks to Andrew Back, Bill Schineller, Bruno Cornec, Ciaran Farrell, Daniel German, Debra McGlade, Eran Strod, Eric Thomas, Esteban Rockett, Gary O'Neall, Guillaume Rousseau, Jack Manbeck, Jeff Luszcz, John Ellis, Kate Stewart, Kim Weins, Mark Gisi, Marshall Clow, Martin Michlmayr , Martin von Willebrand, Michael J. Herzog, Michel Ruffin, Phil Robb, Philip Odence, Philip Koltun, Scott K Peterson, Shane Coughlan, Stuart Hughes, Tom Callaway, and Thomas F. Incorvia for their contributions and assistance.
Licensed under the Creative Commons Attribution License 3.0 (reproduced in Appendix III herein).
To create a set of data exchange standards that enable companies and organizations to share + and component information (metadata) for software packages and related content with the aim of facilitating license and other policy compliance.
The Software Package Data Exchange(SPDX™) specification is a standard format for communicating the components, licenses and copyrights associated with a software package. An SPDX file is associated with a particular software package and contains information about that package in the SPDX format.
Companies and organizations (collectively "Organizations") are widely using and reusing open source and other software packages. Compliance with the associated licenses requires a set of analysis activities and due diligence that each Organization performs independently including: a manual and/or automated scan of software and identification of associated licenses followed by manual verification. Software development teams across the globe use the same open source packages, but little infrastructure exists to facilitate collaboration on the analysis or share the results of these analysis activities. As a result, many groups are performing the same work leading to duplicated efforts and redundant information being created. The SPDX working group seeks to create a data exchange format so that information about software packages and related content, may be collected and shared in a common format with the goal of saving time and improving data accuracy.
Meta data to associate analysis results with a specific package. This includes a unique identifier that identifies a specific instance of a package and permits correlation to the corresponding SPDX file.
Facts that are common properties of the entire package.
Facts ( e.g. copyrights, licenses) that are specific to each file included in the package.
list of common licenses likely to be encountered and a standardized naming convention for referring to these licenses within an SPDX document. This naming convention will also be the basis for extending this set of common licenses over time.
A set of mechanisms that permit extending the specification in a structured manner under specific future versions of the specification.
Information that cannot be derived from an inspection (whether manual or using automated tools) of the package to be analyzed.
How the data stored in and SPDX file is used by the recipient identification of any patent(s) which may or may not relate to the package.
Legal interpretation of the licenses or any compliance actions that might need to be taken.
Must be in a human readable form.
Must be in a syntax that a software tool can read and write.
Must be suitable to be checked for syntactic correctness independent of how it was generated (human or tool).
The SPDX file character set must support UTF-8 encoding.
Must permit automated specification syntax validation.
Resource Description Framework (RDF) will be used to represent certain semantic information.
An SpdxDoc is a collection of license and copyright informatixon regarding one or more packages and the files contained in those packages.
SPDXVersion
Cardinality: Mandatory, one (1).
CreatedBy
Cardinality: Mandatory, one (1) or more.
Created
Cardinality: Mandatory, one (1).
ReviewedBy
Cardinality: Optional, zero (0) or more.
Provide a reference number that can be used to understand how to parse and interpret the rest of the file. It will enable both future changes to the specification and to support backward compatibility. The version number consists of a Major and Minor version indicator. The Major field will be incramented when one or more sections are created, modified or deleted. The Minor field will be incremented when either only the fields within a section are modified or standard recognized licenses are added to appendix.
Here, parties exchanging Identification Information in accordance with SPDX specification need to provide 100% transparency as to which SPDX specification such Identification Information is conforming to.
The value of this property is fixed. The value of the
SPDXVersion property for all datasets encoded in this verison
must be:
SPDX-1.0
<SPDXVersion>SPDX-1.0</SPDXVersion>
The SPDX version is serialized as a single tag with a string value. As noted above "SPDX-1.0" is the only allowed value at this time.
versiondecl = "SPDXVersion:" *WSP spdxversion
spdxversion = "SPDX-1.0"
SPDXVersion: SPDX-1.0
Identify how the meta data information in the SPDX file was generated. If it was generated manually, it should indicate – who did the analysis. If the information in the file was generated with a software tool, the file should indicate an identifier and version for that tool.
Here, the generation method will assist the reader of the Identification Information in self determining the general reliability/accuracy of the Identification Information.
The value of created by may indicate a person, tool or organization.
company = "Company:" *WSP name *WSP ["(" email ")"]
tool = "Tool:" *WSP name
person = "Person:" *WSP name *WSP ["(" email ")"]
name = *VUNICODECHAR
email = 1*(ALPHA / DIGIT / "." / "-" / "_") "@" domainpart 1*("." domainpart)
domainpart = 1*(ALPHA / DIGIT / "-" / "_")
For example, ”Person: John Doe” (john.doe@example.com)" or "Company: My Company, Inc" (depending on which is accountable) or "Tool: spdx-gen 1.0".
<CreatedBy>Person: John Doe (john.doe@example.com)</CreatedBy>
CreatedBy is serialized as a single tag with a string value. If there are multiple participants in the analysis, each participant should be indicated with a separate entry.
createdby = "CreatedBy:" 1*(*WSP (company / tool / person))
CreatedBy: Person: John Doe (john.doe@example.com)
CreatedBy: Company: My Company, Inc.
CreatedBy: Tool: spdx-gen 1.0
Identify when the analysis was done and the SPDX file was originally created. This is to be specified according to combined data and time in UTC format as specified in ISO 8601 standard.
Here, the Time Stamp can serve as a verification as to whether the analysis needs to be updated.
<Created>2010-11-02T11:02:03-0600</Created>
The value of Created should be represented in UTC for the tag-value format. Number should be zero padded to fit the expected field width.
created = "Created:" *WSP timestamp
timestamp = year "-" month "-" day "T" hour ":" minute ":" second "Z"
year = 4*4DIGIT
month = 2*2DIGIT
day = 2*2DIGIT
hour = 2*2DIGIT
minute = 2*2DIGIT
second = 2*2DIGIT
Created: 2010-11-02T17:02:03Z
List of the people who have reviewed the SPDX file and the date of that review. Note that there is no requirement for a particular reviewer to add their name to the file, however it may be important for participants in the software supply chain to validate whether upstream providers have reviewed the SPDX file. This can be considered as an equivalent to “signed off” or “reviewed by”. Additional reviewers can be added after the original version of the SPDX file is created and can occur as an append to the original file.
Here, as time progresses certain reviewers will begin to gain credibility as reliable. This field intends to make such information transparent.
<ReviewedBy>John Doe (john.doe@example.com) 2010-11-02T17:02:03Z</ReviewedBy>
ReviewedBuse uses the same person format as CreatedBy but adds a timestamp, like the value of Created, to indicate when the review occurred.
reviewedby = "ReviewedBy:" person *WSP timestamp
ReviewedBy: John Doe (john.doe@example.com) 2010-11-02T17:02:03Z
A package is an artifact of a software development effort. It is generally a snapshot of the source code at a particular point in time or an mechanism to be used to install the software. A package can contain sub-packages, but the overview is a reference to the entire contents of the package listed.
DelcaredName
Cardinality: Mandatory, one (1).
MachineName
Cardinality: Mandatory, one (1).
SHA1
Cardinality: Mandatory, one (1).
DownloadURL
Cardinality: Optional, one (1).
SourceInfo
Cardinality: Optional, one (1).
DelcaredLicense
Cardinality: Optional, zero (0) or more.
DetectedLicense
Cardinality: Optional, zero (0) or more.
DeclaredCopyright
Cardinality: Optional, zero (0) or more.
ShortDesc
Cardinality: Optional, one (1).
Description
Cardinality: Optional, one (1).
Full name of the package as given by originator with information if available.
Here, the formal name of each package is an important conventional technical identifier to be maintained for each package.
The value is an xsd:string that contains the full name, including version information if known, of the package.
<DeclaredName>glibc 2.11.1lt;/DeclaredName>
DeclaredName: glibc 2.11.1
File name of package instance.
Here, the actual filename of the compressed file (containing the package) is a significant technical element that needs to be carried with each package's Identification Information.
The value is an xsd:string that is the machine generated file name and version typically includes the packaging and compression methods used.
<MachineName>glibc-2.11.1.tar.gz</MachineName>
MachineName: glibc-2.11.1.tar.gz
This field provides an independently reproducible mechanism that permits unique identification of a specific package that correlates to the data in this SPDX file. This identifier enables a recipient to determine if any file in the original package has been changed. The SHA1 unique identifier mechanism will be used.
Here, by providing an unique identifier of each package, confusion over which version/modification of a specific package the Identification Information references should be eliminated.
The value is an xsd:string that is the hexadecimal encoded SHA1 hash of the package file. The length of this values is 40 characters
<SHA1>d6a770ba38583ed4bb4525bd96e50461655d2758</SHA1>
SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
This field identifies the download Universal Resource Locator (URL) for this package at the time that the SPDX file was initially created and the analysis was done. If there is no public URL, then it is explicitly marked as “unknown”.
Here, where to download the exact package being referenced is a critical verification and tracking datum.
The value is an xsd:anyURI. This URI should be the publicly reachable location from which the package was acquired.
<DownloadURL>http://ftp.gnu.org/gnu/glibc/</DownloadURL>
URL: http://ftp.gnu.org/gnu/glibc/
This is a free form text field that contains additional comments about the origin of the package. For instance, this field might include comments indicating whether the package been pulled from a source code management system or has been repackaged.
Here, by providing a freeform field, reviewers can provide any additional information to describe any anomalies, or discoveries, in the determination of the origin of the package.
The value is an xsd:string.
<SourceInfo>This package was acquired from the standard location for glibc.</SourceInfo>
SourceInfo: This package was acquired from the standard location for glibc.
This field lists the licenses that have been declared by the creators of the package. The licenses are listed using a standard short form name. See Appendix I for standardized license short forms. If a Declared License is not one of the standardized license short forms, this field must contain a reference to the full license text included in this SPDX file. If no license is declared, the “NotSpecified” short form in Appendix I should be used. If it is unclear if there is a license reference or not, use “Unknown” short form from Appendix I. If more than one license is formally declared for package, then each should be listed. If the creators of the package offer the recipent a choice of licenses, then each of the choices will be declared as a “disjunctive” license.
This is simply the license identified in text in the actual package source code files (typically in the header of each package file.) This field may have multiple declared licenses, if multiple licenses are declared at the package level.
The value is an LicenseInfo.
<DeclaredLicense rdf:resource="http://spdx.org/licenses/LGPL-2.0"/>
<short form identifier in Appendix I> | "FullLicense"-N
DeclaredLicense: LGPL-2.0
DetectedLicense: FullLicense-1
This field list of all licenses found in the package by scanning the files, manually or using automated tools. The licenses should use the standard short form names. See Appendix I for standardized license short forms. If a Detected License is not one of the standardized license short forms, this field must contain a reference to the full licenses text included in this SPDX file. If no licenses are detected then the “NotSpecified” short form in Appendix I should be used. If it is unclear if there is a license reference or not, use “Unknown” short form from Appendix I.
Here, we intend to capture all licenses within the package detected by any scanning tools used by the package reviewer.
The value is an LicensingInfo.
<DetectedLicense rdf:resource="http://spdx.org/licenses/LGPL-2.0"/>
<short form identifier in Appendix I> | "FullLicense"-N
DetectedLicense: LGPL-2.0
DetectedLicense: FullLicense-1
Identify the author(s) and licensor(s) of package, as well as any dates present. This will be a free form text field extracted from the package information files.
Here, by identifying the actual author(s), some ambiguities, e.g., under which license the author(s) were intending to license the package, may be resolvable by knowing who to contact for clarity.
The value is an xsd:string.
<DeclaredCopyright>Copyright 2008-2010 John Smith</DeclaredCopyright>
DeclaredCopyright: Copyright 2008-2010 John Smith
This field is a short description of the package
Here, the intent is to allow a reader/reviewer of this field to quickly understand the function/use of the package, at a high level, without having to parse the source code of the actual package.
The value is an xsd:string.
<ShortDesc>gnu c library</ShortDesc>
ShortDesc: gnu c library
This field is a more detailed description of the package. It may be extracted from the packages itself.
Here, the intent is to provide technical readers/reviewers with a detailed technical explanation of the functionality, anticipated use, and anticipated implementation of the package. This field may also include a description of improvements over prior version of the package, where applicable.
The value is an xsd:string.
<Description>…</Description>
Description: …
Subclass of LicensingInfo
This section is used for any detected or declared licenses that are NOT one of the standard licenses. One instance should be created for every unique license detected in package that does not match one of the standard license short forms from Appendix I. Each license instance should have the following fields.
LicenseID
Cardinality: Mandatory, one (1).
LicenseText
Cardinality: Mandatory, one (1).
Provide a unique identifier for the packages and files sections to refer to non standard license text detected in the package and reproduced here to aid analysis.
Here, we seek to identify in whole or in part portions of the package which are licensed under unfamiliar and/or uncommon licenses.
The value is an xsd:string.
<LicenseID>FullLicense-1</LicenseID>
LicenseID: FullLicense-1
Provide a copy of the actual text of the license extracted from the package that is associated with the License ID to aid in analysis.
Here, the actual license text included in the package serves as confirmation that the license, and version, named in the header files of the package is correct.
The value is an xsd:string.
<LicenseText>...</LicenseText>
LicenseText: ...
A file is a series of related octets that will be placed in a single file when the package is installed or unpacked. One File resources should be created for each file in a Package
Identify path to file that corresponds to this information.
Here, any confusion over where a file needs to hierarchically be placed for proper functionality is mitigated.
The value is a relative file URL. The package should be treated the root directory. This means all file Names start with a "/" character.
<Name>/var/foo.c</Name>
filename = "name:" *WSP "/" *VUNICODECHAR
name: /bar/foo.c
This field Identifies common types of files where there may be different treatment of copyright and license information: source, binary, machine generated, etc.
Here, this field is basically the "best available" format field, from a developer perspective.
The value is on of the following strings. Values of this property are case insensitive.
<Type>source</Type>
filetype = "type:" *WSP ( "source" / "binary" / "archive" / "other" )
type: source
This field contains the license governing the file if it is known. It will either be explicit from the file header or other information found in the file’s source code or the default from the package. If no license information is found it should be denoted as “NotSpecified”. If no license information can be determined, the license is denoted as “Unknown”. The licenses should use the standard short form names. See Appendix I for standardized license short forms. If a Detected License is not one of the standardized license short forms, this field must contain a reference to the full licenses text included in this SPDX file in section 4. If more than one license is detected in the file, then each should be listed. If any of the detected licenses offer the recipient a choice of licenses, then each of the choices will be declared as a “disjunctive” license.
Here, the intent is to have a uniform method to refer to each license with specificity to eliminate any license confusion. For example, the 3 clause BSD would have a different license identifier then the 4 clause BSD.
The value is an LicensingInfo.
<License rdf:resource="http://spdx.org/licenses/LGPL-2.0"/>
<short form identifier in Appendix I> | "FullLicense"-N
filelicense = "License:" *WSP ( anyURI / spdxlicenseid / locallicenseid )
spdxlicenseid = *VCHAR
locallicenseid = *locallicenseid
License: LGPL-2.0
License: FullLicense-1
This field identifies the copyright holders and associated dates of their copyright that are in this specific file if known. Note: Copyright holder identifier may have developer names, companies, email addresses, and may be specified in international character sets. This will be a freeform text field extracted from the package information files.
Here, similar to identifying the actual author(s) (above), by identifying the copyright holder(s), the copyright holder(s) may be contacted if licensing issues exist with the package, or to request distribution under another license more compatible with a
The value is an xsd:string.
<Copyright>Copyright: Copyright 2008-2010 John Smith</Copyright>
filecopyright = "Copyright:" *WSP *VUNICODECHAR
Copyright: Copyright 2008-2010 John Smith
Provide a unique identifier to match analysis information on specific files between packages.
Here, by providing an unique identifier of each file, confusion over which version/modification of a specific file the Identification Information references should be eliminated.
The value is an xsd:string that is the hexadecimal encoded SHA1 hash of the package file. The length of this values is 40 characters
<SHA1>d6a770ba38583ed4bb4525bd96e50461655d2758</SHA1>
sha1 = "SHA1:" *WSP 40 * HEXDIG
SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
Subclass of dc:LicenseDocument
A LicenseingInfo resource represents a License or set of Licenses under which a File or Package may be copied.
Subclass of LicensingInfo
A LicenseOrLaterVersion is a disjunctive set Licenses. It's members are the specified BaseLicense, as well an any future versions of that License.
BaseLicense
Cardinality: Mandatory, one (1).
This is a the base license of this set. The full set of members is calculated to include the BaseLicense plus any other licenses which are later versions of this license.
To provide a set of licenses whose members can change over time as new versions of existing licenses are released.
The value is License.
<LicenseOrLaterVersion>
<BaseLicense rdf:about="License:GPLv2">
</LicenseOrLaterVersion>
licenseorlaterversion = "LicenseOrLaterVersion(" *WSP licensinginfoid *WSP ")"
LicenseOrLaterVersion(GPLv2)