Talk:General Meeting/License Field Discussion

From SPDX Wiki
Revision as of 15:36, 10 April 2013 by MartinMichlmayr (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

An auditor's view

From reading many of the comments posted, some of the consumers of SPDX plan would not use the concluded license since they would trust only their own analysis based on the file level information, some consumers would trust only the package author (declared license), and some would find the concluded license useful.

From an auditor’s point of view, having a concluded license allows for the inclusion of licensing information discovered during the auditing process. One relatively common scenario is a package licensed under a non-copyleft license including source code from a copyleft package. The inclusion of the copyleft code would cause the concluded license to be the copyleft license.

Since the SPDX standard makes it clear that this concluded license is based on analysis (and not stated by the author), the consumer of the package can choose to ignore the analysis and go with either the stated license or their own analysis based on file level information. Alternatively, the consumer can validate the conclusion. Note that there is judgment applied in the process and in many cases the conclusion depends on the usage of the source files (e.g. files used as tools for building the source may be under a copyleft license but will not affect the concluded licenses since these files are not distributed/deployed with the resultant binaries), so the results should always be validated in my opinion.

-- Submitted by Gary O'Neall on Mon, 2011-06-20 17:19.

All License Info From Files

I think it's important to include section 4.8 in this dicussion as well. I've added the text from the June 5th version of the spec below.

 4.8 All License Information from Files
 
 4.8.1 Purpose: This field is to contain a list of all license found in the package by scanning the files, manually or using automated tools. The options to populate this list are limited to:
 
 (a) the SPDX License List short form identifier if a detected license is on the SPDX License List;
 
 (b) a reference to the license, denoted by LicenseRef-#, if the detected license is not on the SPDX License List;
 
 (c)"NONE" if no license information is detected in any of the files; or
 
 (d) “NOASSERTION” if the reviewer has not examined the contents of the actual files.
 
 Note: The relationship between licenses (conjunctive, disjunctive) is not specified in this field -- it is simply a listing of all licenses found.
 
 4.8.2 Intent: Here, we intend to capture all license information actually seen within the package detected in the actual files.

-- Submitted by Anonymous on Thu, 2011-06-16 15:26.

Attributed licensing

Perhaps rather that having a lot of different types of licensing properties we have just one which is repeatable and allows the value to be attributed. This would allow an SPDX file to express something like "The original authors think the licensing is MIT, but supplier A thinks it is MIT and BSD, but audit company B thinks it is MIT, BSD and GPL".

This would look a bit like this in the RDF:

   <http://example.org/some-package.tar.gz>
     :license [
       :declaredBy :copying_text;
       :licensing [
         :member license:MIT;
         a :ConjunctiveLicenseSet];
       a :LicensingInfo];
   
     :license [
       :declaredBy "Supplier A";
       :licensing [
         :member license:MIT, license:BSD;
         a :ConjunctiveLicenseSet];
       a :LicensingInfo];
     
     :license [
       :declaredBy "Openlogic, Inc";
       :licensing [
         :member license:MIT, license:BSD, license:GPL;
         a :ConjunctiveLicenseSet];
       a :LicensingInfo].

(I am sure we can come up with better names and a tag-value syntax for this if we decide to proceed.)

This would allow licensing information from various people/organizations to be included in an SPDX file. It would also make it explicit exactly who was doing the declaring of any particular licensing.

-- Submitted by pezra on Wed, 2011-06-15 16:41.

Declared License

I believe that both of the fields are useful.

However, I do not have a clear understanding of what is intended to be included in Declared License. In particular, see my three examples below.

I have heard the Declared License field characterized as package-level license information. But, the Purpose says that it is "by the authors of the package". Is determination of who added particular license information relevant to whether it is included in the Declared License? That determination might be useful. But, the requirement for that judgment is departing from the goal of focusing on objective information. A definition that encompasses all package-level license information would be more objective; but, in some cases it would convey different information that a definition tied to the originator of the information.

Which is Declared License intended to be?

  • limited to information "by the authors of the package"?
  • any package-level information?

If the former, then is "authors of the package" intended to mean the primary authors of the software? Or is it intended to also include package maintainers?

The following are three examples of package-level license information. Are any of these intended to be captured in Declared License?

(A) One common piece of package-level information is a COPYING file found in the root of the source tree. Such files are often placed there by the primary authors of the software in that tree. However, lacking information from a version control system, one rarely has any way of knowing who put that file in the package. Suppose that the source files from which the software is built all have a BSD notice, but there are some GPL-licensed files that are used in the build process and the root of the source tree includes a copy of the GPL. Although we do not know for sure, it would seem very unlikely that the COPYING file represents the primary author's declaration of the license for the package.

(B) Another common piece of package-level information is the license field in the header of an RPM. That information is often NOT added by the primary author of the software.

(C) Debian packages include a COPYRIGHT file in which the Debian package maintainer has collected information about the licenses that apply to the package.

-- Submitted by Scott K. Peterson on Tue, 2011-06-14 21:59.

The distinction is useful

The distinction is most useful and important when the scope is a package. The two fields should contrast what the Autor/Copyright holder says about the package with what can be found in the package. (4.9.2 says "the license identified in text in the actual package source code files", this makes me believe package scope is applicable).

Reading 4.9.2, I am surprised it is defined based on all source code files. For any bigger package, this is going to be many licenses. The better the scanning tool, the more get added here. The name declared license implies to me, the license that the author would use when talking about his work. Simplicity would be paramount. Example: Ask the FSF about gcc-4.5, they would respond with "GPL-3.0+". This is what I'd call the declared license. Going into the files with a scanner I get a collection like this "GPL-2.0+; GPL-2.0+(with linking); X11-MIT; LGPL-2.1+; LGPL-2.1+(with linking); BSD-3-Clause; GFDL-1.2; GFDL-1.1; Public-Domain; Zlib-License; EPL-1.0; BSD-4-Clause(UCB)". Is this the declared license?

If so, we'd need a third field to grasp the simplified concept that "gcc-4.5 is said to be under GPL-3.0+" The usefulness of this inaccuracy is it simplicity. But then, three fields "for basically the same thing" may already be too many, and thus become confusing again.

When the scope is a single file, and the file has a proper license header, then this is it. If a file is missing a proper license header, but there are several COPYRIGHT files in parent or nearby directories, then the reviewer adds value by discerning which of them effectivly governs a particular file.

4.7. is per definition opinionated. It should always be clear, who is the respective reviewer; Would the Schema allow this field multiple times, in case we want to document differning conclusions from multiple reviewers?

-- Submitted by jnweiger on Fri, 2011-06-10 19:18.

As far as I can tell we have to determine useful & practical.
I do not feel that distributing a file with multiple opinions about how to interpret how a package is licensed has practical value. The more opinions you collect the more confusing things will get. Much of the opinion forming should be an internal Company process and not necessarily recorded and redistributed.
The ultimate objective I think we want to get to is to answer the question "What do I need to do to be compliant with the licenses for the code that I am getting in this Package"?
Part of the challenge addressing this question begins with being very clear with what the term Package really represents. Does everyone agree that it is the colloquial meaning associated with something like an RPM or not? A functional unit of code that may depend on functionality of other packages? If so lets be clear about this scope and document it. Right now stating "A package can contain sub-packages, but the overview is a reference to the entire contents of the package listed." creates a scenario where Concluded and Declared may never make sense.
Then lets look at what the developer/author of a Package has in mind (intent) from a licensing perspective. Did they intend to create a single Licensed work or to create ambiguity and confusion? If the latter is accidental (probably) then perhaps applying simplified SPDX can help fix this.
If one asserts that a package is GPLv2, for example, and the package also contains BSD code does it matter that we look at the BSD code knowing that GPLv2 trumps BSD? By complying with GPLv2 your are likely complying with BSD anyway.
Ultimately full compliance is not delivered from a decision on a concluded or declared license type if there is no clear License to apply. It is attained by being able to roll up the license constraints presented in the license files that comprise the package. Since these files are already captured in the SPDX file, with their own License conclusions, the value of having more than one opinion about the License that the package is associated with is a little lost on me.
I would try to make packages less complicated and try to get to a point where one license applies.
Not knowing how prevalent such complex packages are I also wonder how much this complexity has been created to address a corner case that is <20% of possible SPDX uses using the 80-20 rule.
Submitted by stcroppe on Fri, 2011-06-10 22:44.
The distinction is useful most and important when the scope is a package. The two fields should contrast what the Autor/Copyright holder says about the package with what can be found in the package. (4.9.2 says "the license identified in text in the actual package source code files", this makes me believe package scope is applicable).
Reading 4.9.2, I am surprised it is defined based on all source code files. For any bigger package, this is going to be many licenses. The better the scanning tool, the more get added here. The name declared license implies to me, the license that the author would use when talking about his work. Simplicity would be paramount. Example: Ask the FSF about gcc-4.5, they would respond with "GPL-3.0+". This is what I'd call the declared license. Going into the files with a scanner I get a collection like this "GPL-2.0+; GPL-2.0+(with linking); X11-MIT; LGPL-2.1+; LGPL-2.1+(with linking); BSD-3-Clause; GFDL-1.2; GFDL-1.1; Public-Domain; Zlib-License; EPL-1.0; BSD-4-Clause(UCB)". Is this the declared license?
If so, we'd need a third field to grasp the simplified concept that "gcc-4.5 is said to be under GPL-3.0+" The usefulness of this inaccuracy is it simplicity. But then, three fields "for basically the same thing" may already be too many, and thus become confusing again.
When the scope is a single file, and the file has a proper license header, then this is it. If a file is missing a proper license header, but there are several COPYRIGHT files in parent or nearby directories, then the reviewer adds value by discerning which of them effectivly governs a particular file.
4.7. is per definition opinionated. It should always be clear, who is the respective reviewer; Would the Schema allow this field multiple times, in case we want to document differning conclusions from multiple reviewers?
Submitted by jnweiger on Fri, 2011-06-10 19:14.

License Field

Is the distinction clear between the two fields?

Yes

Are both useful?

Not really. If an Author of a Package (Person or Entity) states that the Package is licensed under GPLv2 then do we really want to second guess their assertion or do we want to make them responsible for that assertion? I feel strongly that the latter is the case.

The only purpose I can see for having these fields is if the SPDX file is not created by the Author(s) of the code it documents and someone else creates the SPDX file. If this is the case then I think we will have a hard time wrestling a conclusive statement about the license to the ground because we are second guessing the Author(s). Every recipient of the SPDX file will not be able to trust these fields and will likely want to do their own analysis. In this case not having this information at all is possibly better.

I would propose a single field such as "Package Author Declared" and then everyone can go back to the Package Author if something in the Package is wrong and it needs to be changed. If a Package is distributed through a Supply Chain and it is changed en route then a new Package is created and the new Author should declare the License associated with it. In this case the original (root) Open Source Author should still be maintained for verification that the current license is not wildly different from the license that was initially intended.

Can you provide examples of where they are useful?

I would like to see the examples here. They can help define the use cases to help with simplification, I would like to see us avoid using such examples as justification for SPDX complexity.

Do you have an example that raises issues?

Any example where there is no alignment on how to interpret the License governing a Package means that Compliance with the license will be problematic. I believe that the objective of SPDX is to provide information that simplifies compliance and doesn't make it complicated. I would like to see a focus on simplicity and a reduction in ambiguity.

If it is hard to assert what License a Package is governed by then perhaps the Package is too complex and should be decomposed into several smaller SPDX files that can be more definitive.

An example would be a Linux Distribution (I know this isn't necessarily a good example but serves my purpose). Technically a Linux Distribution could be documented by SPDX today. The problem is that a Linux Distribution is too noisy with License types. An SPDX file would be perfectly capable of documenting every single file and related license + Artifact etc. and also a rolled up summary of all unique licenses in the Package but making any statement about the whole work would be impossible. Another problem is that by treating a Linux Distribution as a blob you lose context for making license and compliance decisions.

A better approach is to open the Linux Distribution Package and go through each sub-package documenting each with its own SPDX file because things are now more manageable and it is easier to Declare a license. In a lot of cases a definitive Declaration is likely to be possible. In cases where it isn't perhaps another level of decomposition is necessary. This is where having the Author take responsibility for the SPDX file helps because a Linux Distribution would in most cases be a collection of Official SPDX files assembled from their points of Origin. I know there are exceptions.

I'll stop here to see what other commentary is added.

-- Submitted by stcroppe on Fri, 2011-06-10 02:08.

I would like to add the aspect of a supplier - OEM relationship to this discussion: The supplier manufactures a product for the OEM, which the OEM will distribute. For this reason the OEM needs clear instructions from the supplier on relevant licensing restrictions. In most cases the supplier will be liable to the OEM for the product and correct information on distribution requirements.
For this reason, I would like to see the possibility in SPDX to include the "supplier's warranted declaration" of the relevant FOSS license (this could be the use of the field "Concluded License"). Obviously, in most cases this should be the same as the Declared License. However, some packages are available under dual licensing scheme, and the way how the supplier incorporated the FOSS component could already exclude one of the two licensing choices. So the relevant licenses before integration into a product could be different from the applicable licenses after integration. Only the integrator knows if additional restrictions apply.
Of course, I am assuming here that SPDX can also be used to describe the FOSS contents within a binary software or physical product.
Submitted by sz on Thu, 2011-06-16 06:57.
From an OEM's point of view who receives a product from a supplier and needs to be able to distribute this product, it is essential to get clear instructions from the supplier which obligations apply. For this reason, I need a license field that reflects the unambiguous interpretation from the supplier for which the supplier will be liable to me. Also, in cases of dual licensing, I need to know which license was chosen by the supplier. Thus, while in the original software I may have the choice of two licenses, after integration into a product this may no longer be the case.
Submitted by Anonymous on Tue, 2011-06-14 08:18.