THE SPDX WIKI IS NO LONGER ACTIVE. ALL CONTENT HAS BEEN MOVED TO https://github.com/spdx
Difference between revisions of "Technical Team/SPDX Specification Versions"
(Import Jack Manbeck's comments - 2010/03/10 package-facts mail list.) |
(clean up accidentally imported formatting - make style consistent.) |
||
Line 6: | Line 6: | ||
<p><strong>Version: DRAFT 20100407</strong></p> | <p><strong>Version: DRAFT 20100407</strong></p> | ||
<p>(20100310 JM <em>General - Can we add a date or version to the document? We should probably add a revision table when it goes live but in my view not necessary to have right now</em>) (20100407 KS done)</p> | <p>(20100310 JM <em>General - Can we add a date or version to the document? We should probably add a revision table when it goes live but in my view not necessary to have right now</em>) (20100407 KS done)</p> | ||
− | <p>(20100310 JM | + | <p>(20100310 JM <em>General – I would add a standard disclaimer (i.e. we tried to get this right but there may be mistakes) to the Package facts that travel with it. Perhaps in the Identification Information section. See my next comment on License since it may solve that.)</em></p> |
− | <p | + | <p>(20100310 JM <em>General – Is the Package Facts file licensed (the use of a Formal Copyright Holder in 3.2.7 seems to imply that)? If so, do we want to say it should be under the same license as the specification? I like the idea of a permissive license or possibly even public domain. However, we could allow people to license the file or not license it according to their project or tastes. My only concern with this is the inevitable are these licenses compatible mess if I try to take 10 or 20 of these files and roll them up into (or even take info from them) a nice neat re-distributable document. I would suggest at a minimum, if someone can license this content that we have a block to capture it</em>.)</p> |
− | <p | + | <p>(20100310 JM <em>General – In the final version should have an examples section that shows a completed Package Facts document or maybe examples per section or both? Perhaps we can use some of use cases that we are working on for this. </em>)</p> |
− | <p | + | <p>(20100310 JM <em> General – Do we want a way for people to extend the format of this file and if so in a controlled way? Do we want people to add new fields in any section they wish? What if someone takes a package then modifies it and re-distributes it? Would they add or remove form the package facts? Would it be worthwhile to capture that delta in its own section? I noticed that we want the document to be signed so maybe we don’t envision it being modified in this way?</em>) (20100407 KS first entry in file is version field for specification itself - 2.2.1 is meant to handle this, is something else needed? )</p> |
− | <p | + | <p>(20100310 JM <em>General – Are all these fields required or are some optional?</em>)</p> |
− | <p | + | <p>(20100310 JM <em>General – How do we represent different Packages in a single distribution of something? I can see where some projects are very singular and would have just an application or a library. What happens, as an example, if that project offers an application, kernel patch, library, etc in one download? Would we use one package facts to cover them all, one per piece, etc?</em>)</p> |
<p> </p> | <p> </p> | ||
<p><strong>Signed off:</strong></p> | <p><strong>Signed off:</strong></p> | ||
<p>(Approved for use by active participants in this specification effort, as indicated by name and email id)</p> | <p>(Approved for use by active participants in this specification effort, as indicated by name and email id)</p> | ||
− | <p>(20100310 JM | + | <p>(20100310 JM <em> General – I would like to see the definition of Signed Off mean “approved for release” vs. “approved for use” since I think that’s what we mean and use could possibly be construed to mean something else. )</em></p> |
<h2>1. Rationale</h2> | <h2>1. Rationale</h2> | ||
<p style="padding-left: 30px;"><strong>1.1. Charter</strong></p> | <p style="padding-left: 30px;"><strong>1.1. Charter</strong></p> | ||
Line 47: | Line 47: | ||
<p style="padding-left: 60px;">2.2.2. Unique Identifier</p> | <p style="padding-left: 60px;">2.2.2. Unique Identifier</p> | ||
<p style="padding-left: 90px;">2.2.2.1. Purpose: Need an independently reproducible mechanism that is agreed will permit unique identification of a specific package with this data. It must be able to determine if any file in the original package has been changed. Options under consideration: SHA256, ?</p> | <p style="padding-left: 90px;">2.2.2.1. Purpose: Need an independently reproducible mechanism that is agreed will permit unique identification of a specific package with this data. It must be able to determine if any file in the original package has been changed. Options under consideration: SHA256, ?</p> | ||
− | <p style="padding-left: 90px;">(20100310 JM | + | <p style="padding-left: 90px;">(20100310 JM <em>2.2.2.1 – Should we add MD5? That seems to be very common as a signature as well. If we allow multiple signature types would there be a preferred one? We should also have a URL to where the keys are posted so we can check against them. I would add a field here for that. That said, problems sometimes don’t appear right away and some considerable amount of time may have elapsed. In that case, whoever posted the checksums to validate against may have taken them down for whatever reason (i.e. it’s an older version, project folded, etc). Do we want the keys to still be around in this situation? If so, we need to comprehend that.</em>)</p> |
<p style="padding-left: 90px;">2.2.2.2. Format: UniqueID: ?</p> | <p style="padding-left: 90px;">2.2.2.2. Format: UniqueID: ?</p> | ||
<p style="padding-left: 90px;">2.2.2.3. Example: ?</p> | <p style="padding-left: 90px;">2.2.2.3. Example: ?</p> | ||
Line 57: | Line 57: | ||
<p style="padding-left: 90px;">2.2.4.1. Purpose: Identify when the analysis was done.</p> | <p style="padding-left: 90px;">2.2.4.1. Purpose: Identify when the analysis was done.</p> | ||
<p style="padding-left: 90px;">2.2.4.2. Format: Created: YYYYMMDD-HH:MM:SS</p> | <p style="padding-left: 90px;">2.2.4.2. Format: Created: YYYYMMDD-HH:MM:SS</p> | ||
− | <p style="padding-left: 90px | + | <p style="padding-left: 90px;">(20100310 JM <em>2.2.4 – Should we add a time zone or say it’s based on GMT? Alternatively we could adopt a date/time format from an RFC but I’m okay with this one.</em>)</p> |
<p style="padding-left: 90px;">2.2.4.3. Example: Created: 20100129-18:30:22</p> | <p style="padding-left: 90px;">2.2.4.3. Example: Created: 20100129-18:30:22</p> | ||
<p style="padding-left: 60px;">2.2.5. Independent Review/Audit</p> | <p style="padding-left: 60px;">2.2.5. Independent Review/Audit</p> | ||
<p style="padding-left: 90px;">2.2.5.1. Purpose: reviewers of tool result, or other reviewer of original – equivalent to “signed off” or “reviewed by”.</p> | <p style="padding-left: 90px;">2.2.5.1. Purpose: reviewers of tool result, or other reviewer of original – equivalent to “signed off” or “reviewed by”.</p> | ||
− | <p style="padding-left: 90px;">(20100310 JM <em | + | <p style="padding-left: 90px;">(20100310 JM <em>2.2.5 – This one makes me a little nervous. If someone puts something there what does it mean? Have they verified all the information is factual? Independent Audit implies to me that someone other than the Package creator or even the project (?) has looked at this and said the information is <?>. )</em></p> |
<p style="padding-left: 90px;">2.2.5.2. Format: Reviewed by: “person name”</p> | <p style="padding-left: 90px;">2.2.5.2. Format: Reviewed by: “person name”</p> | ||
<p style="padding-left: 90px;">2.2.5.3. Example: ?</p> | <p style="padding-left: 90px;">2.2.5.3. Example: ?</p> | ||
<p style="padding-left: 60px;">2.2.6. ??</p> | <p style="padding-left: 60px;">2.2.6. ??</p> | ||
− | <p style="padding-left: 60px;">(20100310 JM <em | + | <p style="padding-left: 60px;">(20100310 JM <em>2.0 – Would it be useful to have a marker or token at the top of the facts file that shows it uses the Package Facts format? This may make life easier for tools which are likely going to have to process different file formats. It would be the first thing that must appear in the file.)</em></p> |
<h2>3. Common Overview Information</h2> | <h2>3. Common Overview Information</h2> | ||
<p style="padding-left: 30px;">1. One instance per package</p> | <p style="padding-left: 30px;">1. One instance per package</p> | ||
Line 81: | Line 81: | ||
<p style="padding-left: 90px;">3.2.3.2. Format: download URL</p> | <p style="padding-left: 90px;">3.2.3.2. Format: download URL</p> | ||
<p style="padding-left: 90px;">3.2.3.3. Example: ?</p> | <p style="padding-left: 90px;">3.2.3.3. Example: ?</p> | ||
− | <p style="padding-left: 90px;">(20100310 JM | + | <p style="padding-left: 90px;">(20100310 JM <em>3.2.3 – We should allow usage of git, svn, etc values</em>)</p> |
<p style="padding-left: 60px;">3.2.4. Declared License for Package</p> | <p style="padding-left: 60px;">3.2.4. Declared License for Package</p> | ||
<p style="padding-left: 90px;">3.2.4.1. Purpose: use a standard way of referring to license and its version. See Section 5.0 for standardized license short forms. If more than one in effect, list license package defaults to and indicate alternate license is present.</p> | <p style="padding-left: 90px;">3.2.4.1. Purpose: use a standard way of referring to license and its version. See Section 5.0 for standardized license short forms. If more than one in effect, list license package defaults to and indicate alternate license is present.</p> | ||
Line 101: | Line 101: | ||
<p style="padding-left: 60px;">3.2.9. ???</p> | <p style="padding-left: 60px;">3.2.9. ???</p> | ||
<h2>4. File Specific Information</h2> | <h2>4. File Specific Information</h2> | ||
− | <p style="padding-left: 30px;">(20100310 JM | + | <p style="padding-left: 30px;">(20100310 JM <em>4.0- Would files that do not have licensing information be present in this block? I would think so and the relevant fields would be blank. We may want to have an explicit statement versus leaving blank. Interested in others thoughts.</em>)</p> |
− | <p style="padding-left: 30px; | + | <p style="padding-left: 30px;">(20100310 JM <em>4.0 – Do we need an exceptions field to capture exceptions that are written in to a license? Likely difficult to farm from existing code per my comment in 4.2.2 but seems useful. )</em></p> |
<p style="padding-left: 30px;">4.1. One instance for every file in package</p> | <p style="padding-left: 30px;">4.1. One instance for every file in package</p> | ||
<p style="padding-left: 30px;">4.2. Fields:</p> | <p style="padding-left: 30px;">4.2. Fields:</p> | ||
Line 113: | Line 113: | ||
<p style="padding-left: 90px;">4.2.2.2. Format: ?</p> | <p style="padding-left: 90px;">4.2.2.2. Format: ?</p> | ||
<p style="padding-left: 90px;">4.2.2.3. Example: ?</p> | <p style="padding-left: 90px;">4.2.2.3. Example: ?</p> | ||
− | <p style="padding-left: 90px;">(20100310 JM | + | <p style="padding-left: 90px;">(20100310 JM <em>4.2.2 – I like the field but I’m struggling with whether it will be difficult to automate the generation of this information if that’s there and whether to be concerned about that. Specifically I am wondering about auto generated files that come from tools. Here is my thought process. I can see where a project could farm everything in 4.0 from existing source except for possibly this field. If so that means they have to either answer this manually for every file (think of the Linux kernel) or try and adopt (as an example) a keyword approach and add it to files</em>.)</p> |
<p style="padding-left: 60px;">4.2.3. License(s)</p> | <p style="padding-left: 60px;">4.2.3. License(s)</p> | ||
<p style="padding-left: 90px;">4.2.3.1. Purpose: License governing file if known. This will either be explicit in file, or be expected to default to package license. Use a standard way of referring to license and its version. See Section 5.0 for standardized license references. If more than one in effect, list all licenses.</p> | <p style="padding-left: 90px;">4.2.3.1. Purpose: License governing file if known. This will either be explicit in file, or be expected to default to package license. Use a standard way of referring to license and its version. See Section 5.0 for standardized license references. If more than one in effect, list all licenses.</p> |
Revision as of 03:12, 8 April 2010
For anyone wanting to add comments/questions/etc. directly in the document, so they get tracked without having to do a lot of version reference, please put your comments on a new line and use the following syntax:
(yyyymmdd initials comments)
for example:
(20100407 KS does this make sense?)
Version: DRAFT 20100407
(20100310 JM General - Can we add a date or version to the document? We should probably add a revision table when it goes live but in my view not necessary to have right now) (20100407 KS done)
(20100310 JM General – I would add a standard disclaimer (i.e. we tried to get this right but there may be mistakes) to the Package facts that travel with it. Perhaps in the Identification Information section. See my next comment on License since it may solve that.)
(20100310 JM General – Is the Package Facts file licensed (the use of a Formal Copyright Holder in 3.2.7 seems to imply that)? If so, do we want to say it should be under the same license as the specification? I like the idea of a permissive license or possibly even public domain. However, we could allow people to license the file or not license it according to their project or tastes. My only concern with this is the inevitable are these licenses compatible mess if I try to take 10 or 20 of these files and roll them up into (or even take info from them) a nice neat re-distributable document. I would suggest at a minimum, if someone can license this content that we have a block to capture it.)
(20100310 JM General – In the final version should have an examples section that shows a completed Package Facts document or maybe examples per section or both? Perhaps we can use some of use cases that we are working on for this. )
(20100310 JM General – Do we want a way for people to extend the format of this file and if so in a controlled way? Do we want people to add new fields in any section they wish? What if someone takes a package then modifies it and re-distributes it? Would they add or remove form the package facts? Would it be worthwhile to capture that delta in its own section? I noticed that we want the document to be signed so maybe we don’t envision it being modified in this way?) (20100407 KS first entry in file is version field for specification itself - 2.2.1 is meant to handle this, is something else needed? )
(20100310 JM General – Are all these fields required or are some optional?)
(20100310 JM General – How do we represent different Packages in a single distribution of something? I can see where some projects are very singular and would have just an application or a library. What happens, as an example, if that project offers an application, kernel patch, library, etc in one download? Would we use one package facts to cover them all, one per piece, etc?)
Signed off:
(Approved for use by active participants in this specification effort, as indicated by name and email id)
(20100310 JM General – I would like to see the definition of Signed Off mean “approved for release” vs. “approved for use” since I think that’s what we mean and use could possibly be construed to mean something else. )
Contents
1. Rationale
1.1. Charter
Create a set of data exchange standards to enable companies and organizations to share license and component information (metadata) for software packages and related content with the aim of facilitating license and other policy compliance.
1.2. Why is a common format for data exchange needed?
Companies and organizations (collectively “Organizations”) are widely using and reusing open source and other software packages. Compliance with the associated licenses requires a set of due diligence activities that each Organization performs independently: a manual and/or automated scan of software and identification of associated licenses followed by manual verification. Software development teams across the globe use the same open source packages, but they have not yet set-up a way to collaborate on license discovery – many groups are performing the same work leading to duplicated effort and redundancy. This working group seeks to create a data exchange format so that information about software packages and related content, may be collected and shared in a common format with the goal of saving time and improving data accuracy.
1.3. What does this specification cover?
1.3.1. Identification Information: Meta data to associate analysis results with a specific package. This includes a unique identifier to permit correlation of a specific instance of this data with a specific package.
1.3.2. Overview Information: Facts that are common properties for the entire package.
1.3.3. File Specific Information: Facts that are specific to each file (copyrights, licenses) that are included in the package.
1.3.4. Common Licenses: standardized way of referring to the common licenses likely to be encountered.
1.3.5. ?
4. What is not covered?
1.4.1. Information that cannot be derived from a visual inspection of the package to be analyzed.
1.4.2. How the data stored in this file format is used. After we agree on what should be specified; discussions on how it can be used, who will generate it, how it will be published, audited, etc., will happen outside the scope of this document.
1.4.3. ?
5. Format Requirements:
1.5.1. Needs to be in a syntax that humans can read and write.
1.5.2. Needs to be a syntax that tools can read and write.
1.5.3. Needs to be suitable to be checked for syntactic correctness independent of how it was generated (human or tool).
1.5.4. ? Character set to be used to support international naming. (follow Debian precedent?)
1.5.5. ? Actual specification of fields – below is illustrative rather than agreed on.
1.5.6. ? Discussion: XML vs. simple text to represent fields. Extent human understandable without tool still needs to be discussed.
2. Identification Information
1. One instance per package
2. Fields:
2.2.1. Version Number for the instance of the SPDX specification.
2.2.1.1. Purpose: version of SPDX specification to use to parse the rest of the file. This will permit future changes to the specification, and retain backwards compatibility.
2.2.1.2. Format: Version: N.N
2.2.1.3. Example: 1.0
2.2.2. Unique Identifier
2.2.2.1. Purpose: Need an independently reproducible mechanism that is agreed will permit unique identification of a specific package with this data. It must be able to determine if any file in the original package has been changed. Options under consideration: SHA256, ?
(20100310 JM 2.2.2.1 – Should we add MD5? That seems to be very common as a signature as well. If we allow multiple signature types would there be a preferred one? We should also have a URL to where the keys are posted so we can check against them. I would add a field here for that. That said, problems sometimes don’t appear right away and some considerable amount of time may have elapsed. In that case, whoever posted the checksums to validate against may have taken them down for whatever reason (i.e. it’s an older version, project folded, etc). Do we want the keys to still be around in this situation? If so, we need to comprehend that.)
2.2.2.2. Format: UniqueID: ?
2.2.2.3. Example: ?
2.2.3. Generation Method
2.2.3.1. Purpose: identify how this information was generated. If manual – who, if tool – identifier and version.
2.2.3.2. Format: Manual: ”person name” | Tool: ”tool id - version”
2.2.3.3. Examples: ?
2.2.4. Creation Time Stamp
2.2.4.1. Purpose: Identify when the analysis was done.
2.2.4.2. Format: Created: YYYYMMDD-HH:MM:SS
(20100310 JM 2.2.4 – Should we add a time zone or say it’s based on GMT? Alternatively we could adopt a date/time format from an RFC but I’m okay with this one.)
2.2.4.3. Example: Created: 20100129-18:30:22
2.2.5. Independent Review/Audit
2.2.5.1. Purpose: reviewers of tool result, or other reviewer of original – equivalent to “signed off” or “reviewed by”.
(20100310 JM 2.2.5 – This one makes me a little nervous. If someone puts something there what does it mean? Have they verified all the information is factual? Independent Audit implies to me that someone other than the Package creator or even the project (?) has looked at this and said the information is <?>. )
2.2.5.2. Format: Reviewed by: “person name”
2.2.5.3. Example: ?
2.2.6. ??
(20100310 JM 2.0 – Would it be useful to have a marker or token at the top of the facts file that shows it uses the Package Facts format? This may make life easier for tools which are likely going to have to process different file formats. It would be the first thing that must appear in the file.)
3. Common Overview Information
1. One instance per package
2. Fields:
3.2.1. Formal Name
3.2.1.1. Purpose: Full name given by originator with version information. ? Permit international extended characters in character string or restrict ?
3.2.1.2. Format: ?
3.2.1.3. Example: ?
3.2.2. Specific Package Identifier
3.2.2.1. Purpose: Machine name of package.
3.2.2.2. Format: identifier.suffix
3.2.2.3. Examples: foo.tar, foo.rpm, ?
3.2.3. Official Source Location
3.2.3.1. Purpose: identify where the original version of this package resides (at time of analysis).
3.2.3.2. Format: download URL
3.2.3.3. Example: ?
(20100310 JM 3.2.3 – We should allow usage of git, svn, etc values)
3.2.4. Declared License for Package
3.2.4.1. Purpose: use a standard way of referring to license and its version. See Section 5.0 for standardized license short forms. If more than one in effect, list license package defaults to and indicate alternate license is present.
3.2.4.2. Format: identifier | other
3.2.4.3. Example: ? (something like GPL2.0)
3.2.5. License(s) Present
3.2.5.1. Purpose: list of all licenses found in files in package by scanning
3.2.5.2. Format: identifier
3.2.5.3. Example: Y
3.2.6. –removed-
3.2.7. Formal Copyright Holder of Package
3.2.7.1. Purpose: identify the author and licensor of package. ? Permit international extended characters in character string or restrict ?
3.2.7.2. Format: ?
3.2.7.3. Example: ?
3.2.8. Formal Copyright Date of Package
3.2.8.1. Purpose: Identify the date this package was created. Individual files inside package may have different copyright dates.
3.2.8.2. Format: YYYY
3.2.8.3. Example: 2010
3.2.9. ???
4. File Specific Information
(20100310 JM 4.0- Would files that do not have licensing information be present in this block? I would think so and the relevant fields would be blank. We may want to have an explicit statement versus leaving blank. Interested in others thoughts.)
(20100310 JM 4.0 – Do we need an exceptions field to capture exceptions that are written in to a license? Likely difficult to farm from existing code per my comment in 4.2.2 but seems useful. )
4.1. One instance for every file in package
4.2. Fields:
4.2.1. File Name
4.2.1.1. Purpose: identify path to file that corresponds to this summary information. version of this standard to use to parse the rest of the file.
4.2.1.2. Format: [directory/]filename.suffix
4.2.1.3. Example: bar/foo.c
4.2.2. File Type
4.2.2.1. Purpose: Identify common types of files where there may be different treatment of copyright and license information: source, binary, machine generated, ??
4.2.2.2. Format: ?
4.2.2.3. Example: ?
(20100310 JM 4.2.2 – I like the field but I’m struggling with whether it will be difficult to automate the generation of this information if that’s there and whether to be concerned about that. Specifically I am wondering about auto generated files that come from tools. Here is my thought process. I can see where a project could farm everything in 4.0 from existing source except for possibly this field. If so that means they have to either answer this manually for every file (think of the Linux kernel) or try and adopt (as an example) a keyword approach and add it to files.)
4.2.3. License(s)
4.2.3.1. Purpose: License governing file if known. This will either be explicit in file, or be expected to default to package license. Use a standard way of referring to license and its version. See Section 5.0 for standardized license references. If more than one in effect, list all licenses.
4.2.3.2. Format: ? [identier,]* [identifier | “string“]
4.2.3.3. Example: GPL2.0,BSD,”xyz license type”
4.2.4. Copyright(s)
4.2.4.1. Purpose: identify the copyright holders and associated dates of their copyright that are in this specific file if known. Note: Copyright holder identifier may have developer names, companies, email addresses, so we’ll probably need a generic string mechanism (including international characters). Since there may be multiple per file, need a way of having separators between them.
4.2.4.2. Format: [ “copyright holder”:”date(s)”]*
4.2.4.3. Example: “Linus Torvalds”:”1996-2010”
4.2.5. ?
5. Standard License Identifiers
5.1. Rationale for licenses to choose to standardize identifiers. Focus on standardizing the most commonly used rather than all. Align with any other standardization efforts underway here that will meet the need.
5.2. Table of standard licenses and their identifiers
Identifier |
Full name |
Official Source Text |
GPL2.0 |
GNU General Public License (GPL) Ver. 2, June 1991 |
|
GPL3.0 |
... |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6. Definitions
1. Package: ...
2. Date range: [YYYY,]*[YYYY-]YYYY syntax for multiple ranges needed.