THE SPDX WIKI IS NO LONGER ACTIVE. ALL CONTENT HAS BEEN MOVED TO https://github.com/spdx

Technical Team/SPDX Specification Versions

From SPDX Wiki
Jump to: navigation, search

For anyone wanting to add comments/questions/etc. directly in the document, so they get tracked without having to do a lot of version reference, please put your comments on a new line and use the following syntax:

(yyyymmdd initials comments)  

for example:

(20100407 KS does this make sense?)

 

 

SPDX Specification Version: DRAFT 20100505

(20100211 KS The intent from the discussions so far is that this document is licensed under the CC-BY (allow derivatives) - where should we add license text? ) RESOLUTION: CC license has text; put at the start of doc. Create/publish delimeter formatting - Rockett

(20100422 CALL  List of authors for CC license will be folks on thread and are registered users of the Wiki. Martin will need to update as new folks register. Martin- Are you OK with this? Also, please send initial list to Rockett.)

(20100210 MVW Package specific v. Project Specific - One aspect that may need some consideration is whether some part of the data is project specific (i.e. specific to many packages of the same project). However, the current standard seems to have – with first view - only package specific data.)  RESOLUTION: Need more info from MVW

(20100310 JM General – Is the Package Facts file licensed (the use of a Formal Copyright Holder in 3.2.7 seems to imply that)? If so, do we want to say it should be under the same license as the specification?  I like the idea of a permissive license or possibly even public domain. However, we could allow people to license the file or not license it according to their project or tastes. My only concern with this is the inevitable are these licenses compatible mess if I try to take 10 or 20 of these files and roll them up into (or even take info from them) a nice neat re-distributable document. I would suggest at a minimum, if someone can license this content that we have a block to capture it.) RESOLUTION: Add boilerplate disclaimer with regard to copyright of instance of facts file. Creative C header file including authorship assignment. Plan is to preserve as sidecar, but it may make sense to create a version of SPDX for inclusion in package (issue is checksum...should it be optional?). - Rockett  Set up call with CC folks with regards to boilerplate - Bradley

(20100310 JM General – Do we want a way for people to extend the format of this file and if so in a controlled way? Do we want people to add new fields in any section they wish? What if someone takes a package then modifies it and re-distributes it? Would they add or remove form the package facts? Would it be worthwhile to capture that delta in its own section? I noticed that we want the document to be signed so maybe we don’t envision it being modified in this way?) (20100407 KS first entry in file is version field for specification itself - 2.2.1 is meant to handle this,  is something else needed? ) RESOLUTION:  Agreed: Extensibility is desirable. Propose mechanism to mailing list - Philippe, Michael, and Gary. GARY AND PHILIPPE HAVE BEEN GOING BACK AND FORTH. USING RDF VOCABULARIES. (SIMPLE XML-BASED FORMAT FOR DEFINING VOCABULARY. www.w3.org/rdf) CLOSE TO READY FOR SHARING PROPOSAL. 

Provide ITU/IETF examples to the list- John. Jack will create an appendix on use of spec including aggregation and history.

(20100415 FacetoFace General- Should we extend spec to describe streams and usage of streams? Only at file level?) RESOLUTION:  Assess whether extensibility proposal accommodates. Jeff L IN PROCESS

Signed off:

(Approved for release by active participants in this specification effort, as indicated by name and email id)

1. Rationale

1.1. Charter

Create a set of data exchange standards to enable companies and organizations to share license and component information (metadata) for software packages and related content with the aim of facilitating license and other policy compliance.

1.2. Why is a common format for data exchange needed?

Companies and organizations (collectively “Organizations”) are widely using and reusing open source and other software packages. Compliance with the associated licenses requires a set of due diligence activities that each Organization performs independently: a manual and/or automated scan of software and identification of associated licenses followed by manual verification. Software development teams across the globe use the same open source packages, but they have not yet set-up a way to collaborate on license discovery – many groups are performing the same work leading to duplicated effort and redundancy. This working group seeks to create a data exchange format so that information about software packages and related content, may be collected and shared in a common format with the goal of saving time and improving data accuracy.

1.3. What does this specification cover?

1.3.1. Identification Information: Meta data to associate analysis results with a specific package. This includes a unique identifier to permit correlation of a specific instance of this data with a specific package.

1.3.2. Overview Information: Facts that are common properties for the entire package.

1.3.3. File Specific Information: Facts that are specific to each file (copyrights, licenses) that are included in the package.

1.3.4. Common Licenses: standardized way of referring to the common licenses likely to be encountered.

(20100415 Face to Face - Add proposed text regarding extensibility.) - MichaelH

1.4. What is not covered in the written standard?

1.4.1. Information that cannot be derived from a visual inspection of the package to be analyzed. 

1.4.2. How the data stored in this file format is used. 

1.4.3. Any identification of any patent(s) which may or may not relate to the package.

1.5. Format Requirements:

1.5.1. Needs to be in human readable form.

1.5.2. Needs to be a syntax that tools can read and write.

1.5.3. Needs to be suitable to be checked for syntactic correctness independent of how it was generated (human or tool).

1.5.4. Character set to be used in the SPDX file shall support UTF-8 encoding. (Philippe to provide example of potentially problematic issue with filepath.)

1.5.5 Discussion: XML vs. simple text to represent fields. Extent human understandable without tool still needs to be discussed. Deferred to mailing list. Philippe to write up DOAP alternative.

2. Identification Information

2.1. One instance per package instance.

2.2. Fields:

2.2.1. SPDX Specification Version Number

2.2.1.1. Purpose: version of SPDX specification version information to use to parse the rest of the file. This will permit future changes to the specification, and retain backwards compatibility.

(20100415 F2F Generate language for meaning of major/minor versions - Jack)

2.2.1.2  Intent: Here, parties exchanging Identification Information in accordance with SPDX need to provide 100% transparency as to which SPDX specification such Identification Information is conforming to.

2.2.1.3  Tag: "SPDXversion"

2.2.1.4. Data Format:  SPDX-N.N  

where: N is [0-9].

2.2.1.5. Example: SPDXversion: SPDX-1.0

2.2.2. Unique Identifier

2.2.2.1. Purpose: Need an independently reproducible mechanism that is agreed will permit unique identification of a specific package with this data. It must be able to determine if any file in the original package has been changed in a verifiable way.

(20100422 - Push issue to mail list. Michael and Philippe to propose solution incorporating such concerns as crypto export, internationalization and line endings. )

2.2.2.2. Intent: Here, by providing an unique identifier of each package, confusion over which version/modification of a specific package the Identification Information references should be eliminated.

2.2.2.3. Tag: "UniqueID"

2.2.2.4. Data Format: ?

2.2.2.5. Example: UniqueID: ?

 

2.2.3. Generation Method

2.2.3.1. Purpose: identify how this information was generated. If manual – who, if tool – identifier and version.

2.2.3.2. Intent: Here, the generation method will assist the reader of the Identification Information in self determining the general reliability/accuracy of the Identification Information.

2.2.3.3. Tag: "CreatedBy"

2.2.3.4. Data Format: ”Person: person name” or "Company: company" (depending on which is accountable) or "Tool: tool identifier - version”.

2.2.3.5. Examples: CreatedBy: Person: Kim Weins

(20100415 RESOLUTION: Push to mailing list. Kim to propose language. Concept: Include who worked on it and what tools were used.  20100505: Kate proposed CreatedBy Tag & some data format options above,  need to discuss,  as well as syntax still for supporting multiple contributors (people, tools, etc.), some variation on and/or.  Or should we just consider that person who created file is listed, and drop company/tool/etc. )

(20100505 -KS do we want to include email information? )

2.2.4. Creation Time Stamp

2.2.4.1. Purpose: Identify when the analysis was done.  This is to be specified according to combined data and time in UTC as specified in ISO 8601 standard. 

2.2.4.2  Intent: Here, the Time Stamp can serve as a verification as to whether the analysis needs to be updated.  For example, changes in the software industry may require a different reading of a particular license identification, post a certain fixed date, due to a court holding.

2.2.4.3  Tag: "Created"

2.2.4.4. Data Format: YYYY-MM-DDThh:mm:ssZ

where: YYYY is year, MM is month with leading zero, DD is day with leading zero, T is deliminter for time, hh is hours with leading zero in 24 hour time, mm is minutes with leading zero, ss is second with leading zero, and Z is universal time indicator.

2.2.4.5. Example: Created: 2010-01-29T18:30:22Z

 

2.2.5. Review

2.2.5.1. Purpose: reviewers of tool result, or other reviewer of original – equivalent to “signed off” or “reviewed by”.

(20100310 JM 2.2.5 – This one makes me a little nervous. If someone puts something there what does it mean? Have they verified all the information is factual? Independent Audit implies to me that someone other than the Package creator or even the project (?) has looked at this and said the information is <?>. )

2.2.5.2. Intent: Here, as time progress certain reviewers will begin to gain creditability as reliable.  This field intends to make such information transparent.

2.2.5.3. Tag: "ReviewedBy"

2.2.5.4. Data Format: "Person: person name”

2.2.5.5. Example: ReviewedBy: Person: Bradley Kuhn

(20100415 F2F RESOLUTION: Bradley K to rework section including such issues as defining "reviewed by," date, partial review or caveats, etc. Make optional field and include multiple, down the supply chain reviewed bys.)

3. Common Overview Information

(20100210 MVW License Applied by Project - At Validos, we record also the license applied by the project from the projects website and then (store that information as a pdf-printout and) compare that information with the package information. Sometimes the package doesn’t have any information,(For conflicts, we use a set of “approved conclusions”.) Consider adding a section where there is the url of the page of the project’s statement on its license and even add a separate pdf-printout to the metadata info?) RESOLUTION: No longer applicable unless MVW says otherwise.

3.1. One instance per package instance

3.2. Fields:

3.2.1. Formal Name

3.2.1.1. Purpose: Full name of package as given by originator with version information included if available.

3.2.1.2. Intent: Here, the formal name of each package is an important conventional technical identifier to be maintained for each package.

3.2.1.3  Tag: "DeclaredName"

3.2.1.4. Data Format: <text string of full name> <version info if avilable>

3.2.1.5. Example: DeclaredName: glibc 2.11.1

3.2.2. Specific Package File Name

3.2.2.1. Purpose: File name of package instance.

3.2.2.2. Intent: Here, the actual filename of the compressed file (containing the package) is a significant technical element that needs to be carried with each package's Identification Information.

3.2.2.3.  Tag: FileName:

3.2.2.4.  Data Format: identifier

where identifier is the machine generated file name and version typically includes the packaging and compression methods used.

3.2.2.5. Examples: FileName: glibc-2.11.1.tar.gz

3.2.3. Download URL

3.2.3.1. Purpose: identify exact download URL of the original version of this package resides (at time of analysis).

3.2.3.2. Intent: Here, where to download the exact package being referenced is a critical verification and tracking datum.

3.2.3.3. Tag: URL:

3.2.3.4. Format: URL or "unknown"

3.2.3.5. Example: URL: http://ftp.gnu.org/gnu/glibc/

(20100414  F2F Debra M- Begin section on Wiki cataloging various SPDX doc use cases. IN PROCESS)

(20100414 F2F Philippe, David M, Bradley taking a stab at developing tool that utilizes format.)

(20100506 BillS- Concern raised that standard XML checkers would have a hard time validating a field with URL or string="unknown".)

3.2.4  Source Additional Information

3.2.4.1. Purpose: Freeform source commentary to add clarity to origins. For instance whether its been pulled from SCM or has been repackaged.

3.2.4.2. Intent:

(20100505 KS - Rockett can you fill this in? )

3.2.4.3. Tag: "SourceInfo"

3.2.4.4. Data Format: <string without line separator>

3.2.4.5. Example: SourceInfo: use glibc-2_11-branch from git://sourceware.org/git/glibc.git.

3.2.5. Declared License for Package

3.2.5.1. Purpose: use a standard way of referring to license and its version. See Appendix I for standardized license short forms. If more than one in effect, list license package defaults to and indicate alternate license is present.

3.2.5.2. Intent: This is simply the license identified in text in the actual package source code files (typically in the header of each package file.)  This field may have multiple declared licenses, if multiple licenses are recited in the source code files of the package.

(20100505 KS Rockett - should the wording be "This field may have multiple declared licenses, if multiple licenses are declared at the package level.")

3.2.4.3. Tag: "DeclaredLicense"

3.2.5.4. Data Format: <short form identifier> | "FullLicense"-N

3.2.5.5. Example: DeclaredLicense: GPL2.0

(20100415 F2F Kim will revise 3.2.5 section and Appendix I based on comments received on short form terminology that merges best from Debian and Fedora)

3.2.6. License(s) Present

3.2.6.1. Purpose: list of all licenses found in files in package by scanning

3.2.6.2. Intent: Here, we intend to capture additional licenses under which the package is licensed.  The license(s) for this field are licenses which are not visibly identifiable in the actual source code, but rather identified by other means, e.g., scanning tools, by the reviewer.

(20100415 F2F Rockett - Update intent.)

3.2.6.3. Tag: "DetectedLicense"

3.2.6.4. Data Format: ((<short form identifier> | "FullLicense"-N)",")* 

where it is either a single license identifier, or a comma separated list of identifiers.   The identifier can be a short form identifier from Appendix I or a reference to the section of Full License texts (that are included in a numerical detection order (each unique denoted by N).

3.2.6.5. Example: DetectedLicense: GPL2.0, FullLicense-1, FullLicense-2, GPL2.0+

(20100415 F2F Kim- Propose optional field for identifying packages from which files come.)

(20100506 Bill- consider adding count per license)

3.2.7. Declared Copyright Holder of Package

3.2.7.1. Purpose: identify the author and licensor of package itself. ? Permit international extended characters in character string or restrict ?

3.2.7.2. Intent: Here, by identifying the actual author(s), some ambiguities, e.g., under which license the author(s) were intending to license the package, may be resolvable by knowing who to contact for clarity.

3.2.7.3. Tag "DeclaredCopyright"

3.2.7.4. Data Format: ?

(20100415 F2F Kate- Update. Needs to accommodate multiple copyright holders and dates. Include None as possible.) 

3.2.7.5. Example: ?

3.2.8. Declared Copyright Date of Package

3.2.8.1. Purpose: Identify the date this package was created. Individual files inside package may have different copyright dates.

3.2.8.2. Intent: Here, we can now begin to track when copyright protection expires, for example, and the package falls into the public domain.

3.2.8.3. Tag: CopyrightDate

3.2.8.4. Format: YYYY 

where YYYY is the year.

3.2.8.5. Example: CopyrightDate: 2010

3.2.9. Pithy Description Field

(20100415 - F2F Jeff- Propose to mailing list.)


4. Index of Non Standard License(s) Detected

4.1 One instance for every unique license detected in package that does not match one of the standard license short forms from Appendix I.

4.2 Fields:

4.2.1 Identifier

4.2.1.1. Purpose: Provide a unique identifier for the packages and files sections to refer to non standard license text detected in the package and reproduced here to aid analysis.

4.2.1.2. Intent:

(20100505 KS - Rockett can you fill this in? )

4.2.1.3. Tag: "LicenseID"

4.2.1.4. Data Format: "License-"N where N is a unique ascending numeric value.

4.2.1.5. Example:  LicenseID: License-1

4.2.2 Extracted Text

 

4.2.2.1. Purpose: Provide a copy of the actual text of the license extracted from the package to aid in analysis.

4.2.2.2. Intent:

(20100505 KS - Rockett can you fill this in? )

4.2.1.3. Tag: "LicenseText"

4.2.1.4. Data Format: <multiline text> "***EOLT***" 

4.2.1.5. Example:  LicenseText: <multiline text> ***EOLT***

 

 

5. File Specific Information

(20100210 MVW File Level Data - One instance per file seems overwhelming if these instances are separate files. On the other hand, if they are within one file, it should be ok.  - The practical result of one instance should be as short as possible.  - There will be repetition (same copyright holder, same years, same license for many files); how about a standardized method of combining this info: e.g. only a list of path+file for all files falling under the same license and then separations under the license for copyright holders and then lastly for differences in years. This will avoid repetition. - As an option, the standard could standardize the license headers (or part of them) in the files themselves. This has the benefit of not creating another database (the database would be the source files), and can easily use the same version control systems as for the rest of the source code. Projects would be more likely to accept this: a standard for adding the license information in the beginning of the file could help them in practice and not just create another work step. Separate package meta-data would then be required only for files containing no license headers. Of course, this is not an option for existing packages that need to be used. However, once the file package is analysed and there is separate meta-data, that information could be then dropped (if decided by the project) into the source files themselves. Removes all repetition and can be machine read. )

(20100310 JM 4.0- Would files that do not have licensing information be present in this block? I would think so and the relevant fields would be blank. We may want to have an explicit statement versus leaving blank.   Interested in others thoughts.)

(20100310 JM 4.0 – Do we need an exceptions field to capture exceptions that are written in to a license? Likely difficult to farm from existing code per my comment in 4.2.2 but seems useful. )

(20100415 F2F Should we allow inferred or best guesses for file licenses with an indication that it was not clear?)

5.1. One entry for every file in package instance

(20100415 F2F Consider aggregation.)

5.2. Fields:

5.2.1. Full File Name

5.2.1.1. Purpose: identify path to file that corresponds to this summary information. version of this standard to use to parse the rest of the file.

5.2.1.2. Intent:  Here, any confusion over where a file needs to hierarchically be placed for proper functionality is mitigated.

5.2.1.3. Tag: "FileName"

5.2.1.4. Format: [directory/]filename.suffix

5.2.1.5. Example: FileName: /bar/foo.c

(20100415 F2F Consider short file name as well. 20100505 KS - any one remember why we wouldn't want to include the directory information if it exists?)

5.2.2. File Type (optional)

5.2.2.1. Purpose: Identify common types of files where there may be different treatment of copyright and license information: source, binary, machine generated, etc.

5.2.2.2. Intent: Here, this field is basically the "best available" format field, from a developer perspective.

5.2.2.3. Tag: "FileType"

5.2.2.4. Data Format: "source" | "binary" | "other"

5.2.2.5. Example: FileType: binary

(20100310 JM 4.2.2 – I like the field but I’m struggling with whether it will be difficult to automate the generation of this information if that’s there and whether to be concerned about that. Specifically I am wondering about auto generated files that come from tools. Here is my thought process. I can see where a project could farm everything in 4.0 from existing source except for possibly this field. If so that means they have to either answer this manually for every file (think of the Linux kernel)  or try and adopt (as an example) a keyword approach and add it to files.)

(20100415 F2F - Kate- Change to optional field. Limit types to binary, source, other.  20100505 KS - Done, see above )

(20100415 F2F - Discuss on mail list how to deal with archives, consider in scope / out of scope.  20100505 - KS - assign Kate as owner )

5.2.3. License(s)

5.2.3.1. Purpose: License governing file if known. This will either be explicit in file, or be expected to default to package license. Use a standard way of referring to license and its version. See Section 5.0 for standardized license references. If more than one in effect, list all licenses.

5.2.3.2. Intent: Here, the intent is to have a uniform method to refer to each license with specificity to eliminate any license confusion.  For example, the 3 clause BSD would have a different license identifier then the 4 clause BSD. 

5.2.3.3. Tag: "FileLicense"

5.2.3.4. Data Format: ( [identier,]* [identifier])|"unknown"

where identifier is either one of the short forms from Appendix I or LicenseID from section 4.   If no license detected,  should have "unknown". 

5.2.3.5. Example: FileLicense: GPL-2.0,License-5

(20100210 MVW 4.2.3 A package may contain sub-packages, which may have their own “main license”. As an “approved conclusion” we default a file with no license information to the closest package level license (not necessarily the license of the package under inspection, but the license of a sub-package), unless there is contrary information. The distinction of package and sub-packages is relevant here.)

(20100415 F2F  Resolve to follow what we do for package level.- Kim will own updating once defined.)

5.2.4. Copyright(s)

5.2.4.1. Purpose: identify the copyright holders and associated dates of their copyright that are in this specific file if known. Note: Copyright holder identifier may have developer names, companies, email addresses, so we’ll probably need a generic string mechanism (including international characters). Since there may be multiple per file, need a way of having separators between them.

5.2.4.2. Intent: Here, similar to identifying the actual author(s) (above), by identifying the copyright holder(s), the copyright holder(s) may be contact if licensing issues exist with the package, or to request distribution under another license more compatible with a given implementation, for example.

5.2.4.3. Tag: "FileCopyright"

5.2.4.2. Data Format: (“copyright holder”:<date(s)> ";")*

where date(s) are of form (YYYY["-"|","])* YYYY

5.2.4.3. Example: FileCopyright: “Linus Torvalds”:”1996-2010”;

(20100415 F2F - Use this format at package level.  Mail list to discuss do we want "none" vs. "none found." Ex Disclaim copyrights. Aggregation of redundant copyrights.  20100505 KS - Need owner clarified )

5.2.5. Package(s) (optional)

(20100415 F2F - Kim to implement same as at package level.)

5.2.6 Hash on File (optional)

(20100415 F2F - Jeff L to make proposal.)

 

<h2 style="font-size: 1.5em;">6. Definitions</h2>

1. Package: ...

2. Date range: [YYYY,]*[YYYY-]YYYY syntax for multiple ranges needed.

(20100415 F2F Legal question: Do we want to capitalize terms that are defined here as in a legal agreement.)

(20100415 F2F All- Add terms here that seem ambiguous above.)

(20100505 KS reshuffle sections to put this earlier in the specification? 


Appendix I. Standard License Short Forms

https://fossbazaar.org/wiki/standard-license-short-forms

(20100415 F2F Consider moving this section to SPDX website. 20100505 KS this will be reference information in spec with its own syntax - Kate to own first draft  20100506 KS - new link created - forrmatting in progress)

 

Appendix II.  Examples

(20100505 KS - Kate - create links to child pages with examples