THE SPDX WIKI IS NO LONGER ACTIVE. ALL CONTENT HAS BEEN MOVED TO https://github.com/spdx
Technical Team/SPDX Specification Versions
For anyone wanting to add comments/questions/etc. directly in the document, so they get tracked without having to do a lot of version reference, please put your comments on a new line and use the following syntax:
(yyyymmdd initials comments)
for example:
(20100407 KS does this make sense?)
SPDX Specification Version: DRAFT 20100407
(20100211 KS The intent from the discussions so far is that this document is licensed under the CC-BY (allow derivatives) - where should we add license text? ) RESOLUTION: CC license has text; put at the start of doc. Create/publish delimeter formatting - Rockett
(20100210 MVW Package specific v. Project Specific - One aspect that may need some consideration is whether some part of the data is project specific (i.e. specific to many packages of the same project). However, the current standard seems to have – with first view - only package specific data.) RESOLUTION: Need more info from MVW
(20100310 JM General – Is the Package Facts file licensed (the use of a Formal Copyright Holder in 3.2.7 seems to imply that)? If so, do we want to say it should be under the same license as the specification? I like the idea of a permissive license or possibly even public domain. However, we could allow people to license the file or not license it according to their project or tastes. My only concern with this is the inevitable are these licenses compatible mess if I try to take 10 or 20 of these files and roll them up into (or even take info from them) a nice neat re-distributable document. I would suggest at a minimum, if someone can license this content that we have a block to capture it.) RESOLUTION: Add boilerplate disclaimer with regard to copyright of instance of facts file. Creative C header file including authorship assignment. Plan is to preserve as sidecar, but it may make sense to create a version of SPDX for inclusion in package (issue is checksum...should it be optional?). - Rockett
(20100310 JM General – In the final version should have an examples section that shows a completed Package Facts document or maybe examples per section or both? Perhaps we can use some of use cases that we are working on for this.) RESOLUTION: Agreed. Kate will create spot for full example. Jeff L and Jack M will create and own full example and snippet examples.
(20100310 JM General – Do we want a way for people to extend the format of this file and if so in a controlled way? Do we want people to add new fields in any section they wish? What if someone takes a package then modifies it and re-distributes it? Would they add or remove form the package facts? Would it be worthwhile to capture that delta in its own section? I noticed that we want the document to be signed so maybe we don’t envision it being modified in this way?) (20100407 KS first entry in file is version field for specification itself - 2.2.1 is meant to handle this, is something else needed? ) RESOLUTION: Agreed: Extensibility is desirable. Propose mechanism to mailing list - Philippe, Michael, and Gary. Provide ITU/IETF examples to the list- John. Agreed: Facts file only applies to original package. New aggregated package requires newly created facts file and authorship (of SPDX) changes. Verify that this intent is in current spec - Jack M
(20100415 FacetoFace General- Should we extend spec to describe streams and usage of streams? Only at file level?) RESOLUTION: Assess whether extensibility proposal accommodates. Jeff L
(20100310 JM General – Are all these fields required or are some optional?) RESOLUTION: Some are optional. All are required unless noted as optional.
(20100310 JM General – How do we represent different Packages in a single distribution of something? I can see where some projects are very singular and would have just an application or a library. What happens, as an example, if that project offers an application, kernel patch, library, etc in one download? Would we use one package facts to cover them all, one per piece, etc?) RESOLUTION: There is a hierarchy and people can aggregate what they want.
Signed off:
(Approved for release by active participants in this specification effort, as indicated by name and email id)
Contents
1. Rationale
1.1. Charter
Create a set of data exchange standards to enable companies and organizations to share license and component information (metadata) for software packages and related content with the aim of facilitating license and other policy compliance.
1.2. Why is a common format for data exchange needed?
Companies and organizations (collectively “Organizations”) are widely using and reusing open source and other software packages. Compliance with the associated licenses requires a set of due diligence activities that each Organization performs independently: a manual and/or automated scan of software and identification of associated licenses followed by manual verification. Software development teams across the globe use the same open source packages, but they have not yet set-up a way to collaborate on license discovery – many groups are performing the same work leading to duplicated effort and redundancy. This working group seeks to create a data exchange format so that information about software packages and related content, may be collected and shared in a common format with the goal of saving time and improving data accuracy.
(20100210 MVW 1.2 Should probably reflect the idea behind package specifig and use case specific data)(20100407 KS - use case specific data was removed - so not sure if this still needs to be addressed?) RESOLUTION: No longer applicable.
1.3. What does this specification cover?
1.3.1. Identification Information: Meta data to associate analysis results with a specific package. This includes a unique identifier to permit correlation of a specific instance of this data with a specific package.
1.3.2. Overview Information: Facts that are common properties for the entire package.
1.3.3. File Specific Information: Facts that are specific to each file (copyrights, licenses) that are included in the package.
1.3.4. Common Licenses: standardized way of referring to the common licenses likely to be encountered.
(20100415 Face to Face - Add proposed text regarding extensibility.)
1.4. What is not covered in the written standard?
1.4.1. Information that cannot be derived from a visual inspection of the package to be analyzed.
1.4.2. How the data stored in this file format is used.
1.4.3. Any identification of any patent(s) which may or may not relate to the package.
1.5. Format Requirements:
1.5.1. Needs to be in human readable form.
1.5.2. Needs to be a syntax that tools can read and write.
1.5.3. Needs to be suitable to be checked for syntactic correctness independent of how it was generated (human or tool).
1.5.4. Character set to be used in the SPDX file shall support UTF-8 encoding. (Philippe to provide example of potentially problematic issue with filepath.)
1.5.5 Discussion: XML vs. simple text to represent fields. Extent human understandable without tool still needs to be discussed. Deferred to mailing list. Philippe to write up DOAP alternative.
2. Identification Information
2.1. One instance per package instance.
2.2. Fields:
2.2.1. SPDX Specification Version Number
2.2.1.1. Purpose: version of SPDX specification to use to parse the rest of the file. This will permit future changes to the specification, and retain backwards compatibility.
2.2.1.2. Format: SPDX Version: SPDX-N.N
2.2.1.3. Example: SPDX Version: SPDX-1.0
2.2.1.4 Intent: Here, parties exchanging Identification Information in accordance with SPDX need to provide 100% transparency as to which SPDX specification such Identification Information is conforming to.
(20100415 F2F Generate language for meaning of major/minor versions - Jack)
2.2.2. Unique Identifier
2.2.2.1. Purpose: Need an independently reproducible mechanism that is agreed will permit unique identification of a specific package with this data. It must be able to determine if any file in the original package has been changed in a verifiable way.
RESOLUTION: Acceptable.
Push issue to mail list. Michael and Philippe to propose solution incorporating such concerns as crypto export, internationalization and line endings.
(20100210 MVW 2.2.2 MD5 or PGP are also quite widely used for security, allowing e.g. to check that the downloaded package corresponds to the one distributed by the projects. There could be a signal, if the signum has been checked from file source too. If the file source did not provide a signum, it can be generated. Probably needs to allow variance for different signums (or more background knowledge, if a certain method is to be promoted.) RESOLUTION: Addressed above with Action item.
(20100310 JM 2.2.2.1 – Should we add MD5? That seems to be very common as a signature as well. If we allow multiple signature types would there be a preferred one? We should also have a URL to where the keys are posted so we can check against them. I would add a field here for that. That said, problems sometimes don’t appear right away and some considerable amount of time may have elapsed. In that case, whoever posted the checksums to validate against may have taken them down for whatever reason (i.e. it’s an older version, project folded, etc). Do we want the keys to still be around in this situation? If so, we need to comprehend that.) RESOLUTION: Addressed above with Action item
2.2.2.2. Format: UniqueID: ?
2.2.2.3. Example: Unique ID: ?
2.2.2.4. Intent: Here, by providing an unique identifier of each package, confusion over which version/modification of a specific package the Identification Information references should be eliminated.
2.2.3. Generation Method
2.2.3.1. Purpose: identify how this information was generated. If manual – who, if tool – identifier and version.
2.2.3.2. Format: Created by: ”person name” or company (depending on which is accountable) | Tool: ”tool id - version”
2.2.3.3. Examples: ?
2.2.3.4. Intent: Here, the generation method will assist the reader of the Identification Information in self determining the general reliability/accuracy of the Identification Information.
RESOLUTION: Push to mailing list. Kim to propose language. Concept: Include who worked on it and what tools were used.
2.2.4. Creation Time Stamp
2.2.4.1. Purpose: Identify when the analysis was done.
2.2.4.2. Format: Created: YYYYMMDD-HH:MM:SS
(20100310 JM 2.2.4 – Should we add a time zone or say it’s based on GMT? Alternatively we could adopt a date/time format from an RFC but I’m okay with this one.) RESOLUTION: Use standard ISO 8601. Kate- Edit above to match.
2.2.4.3. Example: Created: 20100129-18:30:22
2.2.4.4. Intent: Here, the Time Stamp can serve as a verification as to whether the analysis needs to be updated. For example, changes in the software industry may require a different reading of a particular license identification, post a certain fixed date, due to a court holding.
2.2.5. Review
2.2.5.1. Purpose: reviewers of tool result, or other reviewer of original – equivalent to “signed off” or “reviewed by”.
(20100310 JM 2.2.5 – This one makes me a little nervous. If someone puts something there what does it mean? Have they verified all the information is factual? Independent Audit implies to me that someone other than the Package creator or even the project (?) has looked at this and said the information is <?>. )
2.2.5.2. Format: Reviewed by: “person name”
2.2.5.3. Example: ?
2.2.5.4. Intent: Here, as time progress certain reviewers will begin to gain creditability as reliable. This field intends to make such information transparent.
RESOLUTION: Bradley K to rework section including such issues as defining "reviewed by," date, partial review or caveats, etc. Make optional field and include multiple, down the supply chain reviewed bys.
Bradley K set up call for Rockett with Creative Commons to discuss warranty, reliance, etc. disclaimer.
(20100310 JM 2.0 – Would it be useful to have a marker or token at the top of the facts file that shows it uses the Package Facts format? This may make life easier for tools which are likely going to have to process different file formats. It would be the first thing that must appear in the file.) RESOLVED with SPDX Version field
3. Common Overview Information
(20100210 MVW License Applied by Project - At Validos, we record also the license applied by the project from the projects website and then (store that information as a pdf-printout and) compare that information with the package information. Sometimes the package doesn’t have any information,(For conflicts, we use a set of “approved conclusions”.) Consider adding a section where there is the url of the page of the project’s statement on its license and even add a separate pdf-printout to the metadata info?) RESOLUTION: No longer applicable unless MVW says otherwise.
3.1. One instance per package instance
3.2. Fields:
3.2.1. Formal Name
3.2.1.1. Purpose: Full name given by originator with version information.
3.2.1.2. Format: ?
3.2.1.3. Example: ?
3.2.1.4. Intent: Here, the formal name of each package is an important conventional technical identifier to be maintained for each package.
(20100415 F2F Should we separate version from name?) RESOLUTION: Kate to propose.
3.2.2. Specific Package File Name
3.2.2.1. Purpose: File name of package instance.
3.2.2.2. Format: identifier.suffix
3.2.2.3. Examples: foo.tar, foo.rpm, ?
3.2.2.4. Intent: Here, the actual filename of the compressed file (containing the package) is a significant technical element that needs to be carried with each package's Identification Information.
3.2.3. Download URL
3.2.3.1. Purpose: identify exact download URL of the original version of this package resides (at time of analysis).
3.2.3.2. Format: download URL or "Unknown"
3.2.3.3. Example: ?
(20100310 JM 3.2.3 – We should allow usage of git, svn, etc values) RESOLUTION: Add field for freeform source commentary contemplating pulling from SCM or modifying downloaded package. - Kate
3.2.3.4. Intent: Here, where to download the exact package being referenced is a critical verification and tracking datum.
Debra M- Begin section on Wiki cataloging various SPDX doc use cases.
Philippe, David M, Bradley taking a stab at developing tool that utilizes format.
3.2.4. Declared License for Package
3.2.4.1. Purpose: use a standard way of referring to license and its version. See Section 5.0 for standardized license short forms. If more than one in effect, list license package defaults to and indicate alternate license is present.
3.2.4.2. Format: identifier | other
3.2.4.3. Example: ? (something like GPL2.0)
(20100210 MVW 3.2.4: At Validos, we call this the “main license”, i.e. the license the project itself is applying. (The term is not the best, since most of the code can be under some other license.) However, the license the project is using is a sort of top-level license (compliance and license compatibility review can often be needed between sub-packages too, so this is also more a terminology question). Perhaps this item should state the “License Applied by Project” and then indicate if other licenses are present too ) (20100407 KS MVW's comments were against original term of "Formal", which has been under discussion. I think they've been addressed, but replicating here for completeness. Currently suggesting use of "Declared" terminology so updated field to have this. )
3.2.4.4. Intent: This is simply the license identified in text in the actual package source code files (typically in the header of each package file.) This field may have multiple declared licenses, if multiple licenses are recited in the source code files of the package.
Kim will revise 3.2.4 section based on comments received.
3.2.5. License(s) Present
3.2.5.1. Purpose: list of all licenses found in files in package by scanning
3.2.5.2. Format: identifier (see section 5) | other; one per line.
3.2.5.3. Example: GPLv3.0
3.2.5.4. Intent: Here, we intend to capture additional licenses under which the package is licensed. The license(s) for this field are licenses which are not visibly identifiable in the actual source code, but rather identified by other means, e.g., scanning tools, by the reviewer.
Rockett- Update intent.
Kate- Create a new section called License Text with the notion being that this would augment the list of short names indexed from the "Other." Look for cases in which it would make sense to include text of std licenses.
Kim- Propose optional field for identifying packages from which files come.
3.2.6. –removed-
3.2.7. Declared Copyright Holder of Package
3.2.7.1. Purpose: identify the author and licensor of package itself. ? Permit international extended characters in character string or restrict ?
3.2.7.2. Format: ?
Kate- Update. Needs to accommodate multiple copyright holders and dates. Include None as possible.
3.2.7.3. Example: ?
3.2.7.4. Intent: Here, by identifying the actual author(s), some ambiguities, e.g., under which license the author(s) were intending to license the package, may be resolvable by knowing who to contact for clarity.
3.2.8. Formal Copyright Date of Package
3.2.8.1. Purpose: Identify the date this package was created. Individual files inside package may have different copyright dates.
3.2.8.2. Format: YYYY
3.2.8.3. Example: 2010
3.2.8.4. Intent: Here, we can now begin to track when copyright protection expires, for example, and the package falls into the public domain.
3.2.9. Pithy Description Field
Jeff- Propose to mailing list.
4. File Specific Information
(20100210 MVW File Level Data - One instance per file seems overwhelming if these instances are separate files. On the other hand, if they are within one file, it should be ok. - The practical result of one instance should be as short as possible. - There will be repetition (same copyright holder, same years, same license for many files); how about a standardized method of combining this info: e.g. only a list of path+file for all files falling under the same license and then separations under the license for copyright holders and then lastly for differences in years. This will avoid repetition. - As an option, the standard could standardize the license headers (or part of them) in the files themselves. This has the benefit of not creating another database (the database would be the source files), and can easily use the same version control systems as for the rest of the source code. Projects would be more likely to accept this: a standard for adding the license information in the beginning of the file could help them in practice and not just create another work step. Separate package meta-data would then be required only for files containing no license headers. Of course, this is not an option for existing packages that need to be used. However, once the file package is analysed and there is separate meta-data, that information could be then dropped (if decided by the project) into the source files themselves. Removes all repetition and can be machine read. )
(20100310 JM 4.0- Would files that do not have licensing information be present in this block? I would think so and the relevant fields would be blank. We may want to have an explicit statement versus leaving blank. Interested in others thoughts.)
(20100310 JM 4.0 – Do we need an exceptions field to capture exceptions that are written in to a license? Likely difficult to farm from existing code per my comment in 4.2.2 but seems useful. )
(20100415 F2F Should we allow inferred or best guesses for file licenses with an indication that it was not clear?)
4.1. One entry for every file in package instance
(20100415 F2F Consider aggregation.)
4.2. Fields:
4.2.1. Full File Name
4.2.1.1. Purpose: identify path to file that corresponds to this summary information. version of this standard to use to parse the rest of the file.
4.2.1.2. Format: [directory/]filename.suffix
4.2.1.3. Example: bar/foo.c
4.2.1.4. Intent: Here, any confusion over where a file needs to hierarchically be placed for proper functionality is mitigated.
(20100415 F2F Consider short file name as well.)
4.2.2. File Type
4.2.2.1. Purpose: Identify common types of files where there may be different treatment of copyright and license information: source, binary, machine generated, ??
4.2.2.2. Format: ?
4.2.2.3. Example: ?
4.2.2.4. Intent: Here, this field is basically the "best available" format field, from a developer perspective.
(20100310 JM 4.2.2 – I like the field but I’m struggling with whether it will be difficult to automate the generation of this information if that’s there and whether to be concerned about that. Specifically I am wondering about auto generated files that come from tools. Here is my thought process. I can see where a project could farm everything in 4.0 from existing source except for possibly this field. If so that means they have to either answer this manually for every file (think of the Linux kernel) or try and adopt (as an example) a keyword approach and add it to files.)
Kate- Change to optional field. Limit types to binary, source, other.
Discuss on mail list how to deal with archives.
4.2.3. License(s)
4.2.3.1. Purpose: License governing file if known. This will either be explicit in file, or be expected to default to package license. Use a standard way of referring to license and its version. See Section 5.0 for standardized license references. If more than one in effect, list all licenses.
4.2.3.2. Format: ? [identier,]* [identifier | “string“]
4.2.3.3. Example: GPL2.0,BSD,”xyz license type”
4.2.3.4. Intent: Here, the intent is to have a uniform method to refer to each license with specificity to eliminate any license confusion. For example, the 3 clause BSD would have a different license identifier then the 4 clause BSD.
(20100210 MVW 4.2.3 A package may contain sub-packages, which may have their own “main license”. As an “approved conclusion” we default a file with no license information to the closest package level license (not necessarily the license of the package under inspection, but the license of a sub-package), unless there is contrary information. The distinction of package and sub-packages is relevant here.)
4.2.4. Copyright(s)
4.2.4.1. Purpose: identify the copyright holders and associated dates of their copyright that are in this specific file if known. Note: Copyright holder identifier may have developer names, companies, email addresses, so we’ll probably need a generic string mechanism (including international characters). Since there may be multiple per file, need a way of having separators between them.
4.2.4.2. Format: [ “copyright holder”:”date(s)”]*
4.2.4.3. Example: “Linus Torvalds”:”1996-2010”
4.2.4.4. Intent: Here, similar to identifying the actual author(s) (above), by identifying the copyright holder(s), the copyright holder(s) may be contact if licensing issues exist with the package, or to request distribution under another license more compatible with a given implementation, for example.
4.2.5. ?
5. Standard License Identifiers
5.1. Rationale for licenses to choose to standardize identifiers. Focus on standardizing the most commonly used rather than all. Align with any other standardization efforts underway here that will meet the need.
5.2. Table of standard licenses and their identifiers
Identifier |
Full name |
Official Source Text |
GPL2.0 |
GNU General Public License (GPL) Ver. 2, June 1991 |
|
GPL3.0 |
... |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6. Definitions
1. Package: ...
2. Date range: [YYYY,]*[YYYY-]YYYY syntax for multiple ranges needed.