THE SPDX WIKI IS NO LONGER ACTIVE. ALL CONTENT HAS BEEN MOVED TO https://github.com/spdx

Difference between revisions of "CommunityBridge/CommunityBridge ProjectIdeas"

From SPDX Wiki
Jump to: navigation, search
(Created page with "<br /> <span style="font-size:150%">'''Welcome to the 2020 SPDX Google Summer of Code Project Page'''</span> See the [https://rtgdk.github.io/spdx-gsoc-proposal.html proposa...")
 
Line 1: Line 1:
 
<br />
 
<br />
  
<span style="font-size:150%">'''Welcome to the 2020 SPDX Google Summer of Code Project Page'''</span>
+
<span style="font-size:150%">'''Welcome to the 2020 SPDX Community Bridge Project Ideas Page'''</span>
  
See the [https://rtgdk.github.io/spdx-gsoc-proposal.html proposal template] if you are interested in submitting a Google Summer of Code proposal.
+
Proposals can use the [https://rtgdk.github.io/spdx-gsoc-proposal.html proposal template] as a guide for community bridge projects.
  
 
Should you have questions please do not hesitate to contact one of the mentors directly.
 
Should you have questions please do not hesitate to contact one of the mentors directly.
Line 54: Line 54:
 
  list individuals who are willing to mentor and provide information about the project proposal.  
 
  list individuals who are willing to mentor and provide information about the project proposal.  
  
(The projects from last year can be found on the [https://summerofcode.withgoogle.com/organizations/4532099550281728/#5727887162867712 2019 Google Summer of Code projects page for SPDX] ).
+
==SPDX Workgroup Tooling Projects==
 +
 
 +
=== Replace Java Functions in the SPDX Online Tools with Python Library Functions ===
 +
add overview of project here
 +
====Skills Needed====
 +
what skills should the student have to do the coding exercises
 +
====Background Information====
 +
context for the project and references to be studied
 +
====Available Mentors====
 +
list individuals who are willing to mentor and provide information about the project proposal.  
 +
 
 +
=== Improve the Python Libraries ===
 +
add overview of project here
 +
====Skills Needed====
 +
what skills should the student have to do the coding exercises
 +
====Background Information====
 +
context for the project and references to be studied
 +
====Available Mentors====
 +
list individuals who are willing to mentor and provide information about the project proposal.
 +
 
 +
=== Improve the Deployment Infrastructure for the SPDX Online Tools ===
 +
add overview of project here
 +
====Skills Needed====
 +
what skills should the student have to do the coding exercises
 +
====Background Information====
 +
context for the project and references to be studied
 +
====Available Mentors====
 +
list individuals who are willing to mentor and provide information about the project proposal.  
  
==SPDX Workgroup Tooling Projects==
 
 
These projects are aimed at contributing to the SPDX tools to help reduce the effort to create SPDX documents and increase the accuracy of them.
 
These projects are aimed at contributing to the SPDX tools to help reduce the effort to create SPDX documents and increase the accuracy of them.
  
Line 81: Line 107:
 
====Available Mentors====
 
====Available Mentors====
 
[mailto:rohit.lodhartg@gmail.com Rohit Lodha] [mailto:gary@sourceauditor.com Gary O'Neall]
 
[mailto:rohit.lodhartg@gmail.com Rohit Lodha] [mailto:gary@sourceauditor.com Gary O'Neall]
 
=== Generate Java SPDX Model Classes from XML XSD file ===
 
In SPDX 3.0, we will be generating an XML XSD schema to define the model.  This project idea is to use the XSD schema to generate a set of Java classes which represent the complete SPDX model.  The generated classes would be used as part of a re-designed Java tool for SPDX.
 
 
====Skills Needed====
 
* Java programming skills
 
* XML/XSD skills
 
* Skills in code generation practices
 
* Ability to work with the community in integrating results with other projects
 
 
====Background Information====
 
* A proposed XSD for SPDX can be found on [https://github.com/mil-oss/spdx-xsd github].  Note: This is a very early proposal and would likely change significantly.
 
* Current Java tools can be found on [https://github.com/spdx/tools SPDX Tools github page]
 
* A rewrite of the Java tools is in progress.  The in progress work can be found at the [https://github.com/goneall/Spdx-Java-Library Spdx-Java-Library] github page.
 
 
====Available Mentors====
 
[mailto:rohit.lodhartg@gmail.com Rohit Lodha] [mailto:gary@sourceauditor.com Gary O'Neall]
 
 
=== Validate License Cross-References ===
 
Enhance the SPDX LicenseListPublisher to validate the cross reference / seeAlso URL's for the license.  One check would be to validate the link is still valid.  This would need to be done in a way that has reasonably good performance (e.g. a long timeout would not work).  Another check would be to identify the license text in the linked URL and compare it to the license text for the license itself to make sure they match.  If either of these tests fail, a validity attribute should be added to the license output files (e.g. the license JSON files).
 
 
====Skills Needed====
 
* Java programming skills
 
* XML/XSD skills
 
* HTML parsing skills
 
* Ability to work with the community in integrating results with other projects
 
 
====Background Information====
 
The [https://spdx.org/licenses/ SPDX license list] is generated from a [https://github.com/spdx/license-list-XML git repository of XML files].  One of the fields maintained in the XML is the crossRef which is a URL cross reference for the license which may be valid or it may also be a "dead link".  The [https://github.com/spdx/LicenseListPublisher LicenseListPublisher] is the tool that generates the web pages and the output formats.  The output formats can be found in the [https://github.com/spdx/license-list-data SPDX license list data] git repository.  [https://github.com/spdx/LicenseListPublisher/issues/60#issuecomment-570511697 Issue #60] for the LicenseListPublisher describes a request to include validity attribute.
 
 
Over the summer, we may be adding the XML format to the supported output data formats in the license list data repo.
 
 
====Available Mentors====
 
[mailto:gary@sourceauditor.com Gary O'Neall] [mailto:swinslow@linuxfoundation.org Steve Winslow]
 
 
=== Improve SPDX Golang tooling ===
 
The goal of this GSoC project would be to add support in the [https://github.com/spdx/tools-golang SPDX Golang tools] for SPDX documents in versions of the SPDX spec other than 2.1, including the upcoming 2.2 spec release which will add JSON, YAML and XML to the supported formats. Other work may include improving the validation and data model used by the Golang tools.
 
 
====Skills Needed====
 
* Go programming skills
 
* Experience with JSON and YAML (XML a plus)
 
* Ability to interpret and implement the SPDX specification and related community documentation
 
* Ability to work with the community in integrating results with other projects
 
* Willingness to learn about open source licensing and related technical matters
 
 
====Background Information====
 
The [https://github.com/spdx/tools-golang SPDX Golang tools] were initially designed to work with SPDX documents in tag-value format, for version 2.1 of [https://spdx.org/specifications the SPDX specification]. Currently it does not support earlier versions of the SPDX specification. Also, [https://github.com/spdx/spdx-spec/milestone/2 the upcoming 2.2 spec release], in addition to new data fields, will also add support for SPDX documents in JSON, YAML and XML formats. The Golang tools should (at a minimum) be updated to enable reading and writing in JSON and YAML.
 
 
The SPDX Golang tools currently do some validation when reading and parsing SPDX documents, but they do not currently do much validation of the content itself. For example, license fields are represented as strings, but the tools do not currently check to confirm that e.g. the license identifiers are valid SPDX license expressions. Additional validation support to improve this would be beneficial.
 
 
Additionally, the [https://github.com/spdx/tools-golang/tree/master/spdx data model used internally by the SPDX tools] to represent SPDX content is different in some ways from the data model used by other tools (e.g. [https://github.com/spdx/tools/tree/master/src/org/spdx/rdfparser/model Java], [https://github.com/spdx/tools-python/tree/master/spdx Python]). One goal for this project might be to evaluate the choices made by those other tools, and consider whether the Golang tools' data model should change to align with those.
 
 
====Available Mentors====
 
[mailto:swinslow@linuxfoundation.org Steve Winslow]
 
 
=== GoLang Parallel RDF Parser ===
 
Implement a high performance RDF parser for GoLang to support the SPDX RDF/XML format parsing and production.
 
 
====Skills Needed====
 
* GoLang programming skills
 
* Knowledge of RDF
 
* Knowledge of parsing algorithms
 
 
====Background Information====
 
As of now, there are very few known libraries in GoLang which support RDF-Parsing and are Licensed to use with an open-source project. One of them is [https://godoc.org/github.com/knakk/rdf](knakk/rdf2go). This library uses the RDF serialization format. All the current projects claiming to parse RDF files have implemented it in a linear fashion(line by line parsing) making it a repetitive and time-consuming process. This project will aim to chunk the RDF files into appropriate blocks and parse them simultaneously to reduce the effective time required to parse the entire document. The parser must conform to the SPDX-2.x version standards and have proper tests for each procedure.
 
 
This project could be included as part of a larger project which includes other fixes to the GoLang tools (see the Improve GoLang tooling project idea above).
 
====Available Mentors====
 
[mailto:rohit.lodhartg@gmail.com Rohit Lodha] [mailto:gary@sourceauditor.com Gary O'Neall]
 
 
=== SPDX Plugins for Package Managers ===
 
Create a native plug-in or extension to a well-known package manager to generate valid SPDX documents based on the information provided in the build metadata files.  Examples of package managers include Node Package Manager (NPM), Gradle, Rust Cargo, Ruby Gems, Python pip, and Cocoa Pods.  A plugin for Maven has already been developed and can be used as an example.
 
 
The plugin should generate a valid SPDX document with minimal configuration required by the user.
 
====Skills Needed====
 
* Programming languages skills will depend on the package manager (e.g. Ruby for Ruby Gems, JavaScript for NPM)
 
* Knowledge of package managers and build processes
 
* Skills in parsing and pattern matching
 
* Ability to work with the community in integrating results with other projects
 
====Background Information====
 
SPDX produces a standard Bill of Materials for software containing package and license information.  Package managers collect and manage much of the data needed to produce an SPDX document.  Automatically generating SPDX documents in package managers will greatly increase the efficiency and adoption of SPDX.
 
 
The [Maven Plugin]( https://github.com/spdx/spdx-maven-plugin) is a prototype plugin developed for the Maven build system.  It can be used as an example for creating plugins for other build environments.
 
====Available Mentors====
 
[mailto:gary@sourceauditor.com Gary O'Neall] [mailto:kstewart@linuxfoundation.org Kate Stewart]
 
 
==SPDX Specification Projects==
 
The following projects contribute directly to the creation or validation of the SPDX 2.1 specification.
 
 
=== SPDX Specification Views for legal counsels and developers ===
 
The proposal is to see if it possible to deduct large SPDX documents into a small subset SPDX document providing a specific reduced "views" on larger data.
 
====Skills Needed====
 
* Understanding of compliance needs of legal counsels and developers so we can remove friction to adopt SPDX
 
====Background Information====
 
SPDX documents commonly contain 100s, if not 1000s of entries making it hard for a human to make manual corrections or draw conclusions. No scanner can provide 100% complete data human corrections are usual needed. The aim from this proposal is twofold:
 
1. Enable developers with a "code view" of tool-generated SPDX document close to the code they work on to enable them to make corrections to the SPDX data. For instance amend SPDX package tag values or model package dependencies not detected by used scanner.
 
2. Provide legal counsels with a "package and limited file view" to enable legal conclusions
 
====Available Mentors====
 
[mailto:swinslow@linuxfoundation.org Steve Winslow]
 
[mailto:thomas.steenbergen@here.com Thomas Steenbergen]
 
 
=== ClearlyDefined exporting and importing SPDX documents  ===
 
The goal of this GSoC project would be to add support in the [https://https://github.com/clearlydefined ClearlyDefined project] to export curated data into SPDX 2.2 documents.  Once that is accomplished,  being able to import SPDX documents into the curated database would be the next step.
 
 
====Skills Needed====
 
* Experience with JSON and YAML (XML a plus)
 
* Ability to interpret and implement the SPDX specification and related ClearlyDefined community documentation
 
* Ability to work with the community in integrating results with other projects
 
* Willingness to learn about open source licensing and related technical matters
 
 
====Background Information====
 
Export a ClearlyDefined workspace as a SPDX document:
 
* user to navigate to https://clearlydefined.io/workspace
 
* Add one or more components to the workspace through any of the existing means, 
 
* then click Share,  and then slick SPDX (choice of 2.2 supported output formats).
 
which would result in an SPDX document is exported containing all of the components that were in the workspace.
 
Note:  If there is mandatory information required by SPDX that ClearlyDefined does not have we will need to determine how to accommodate that.
 
 
To populate a workspace from a SPDX document:
 
* user to navigate to https://clearlydefined.io/workspace
 
* drag a SPDX document into the workspace and then all of the components in the SPDX document are added to the workspace.
 
 
There are some discrepancies between the content in ClearlyDefined and that SPDX documents, so work would be needed with both communities to figure out: what to do if license information in the SPDX disagrees with what ClearlyDefined has and how to handle pending curations?
 
 
====Available Mentors====
 
[mailto:kstewart@linuxfoundation.org Kate Stewart]
 
[mailto:IAMWILLBAR@github.com William Bartholomew]
 

Revision as of 17:04, 4 June 2020


Welcome to the 2020 SPDX Community Bridge Project Ideas Page

Proposals can use the proposal template as a guide for community bridge projects.

Should you have questions please do not hesitate to contact one of the mentors directly.



What is SPDX ?

First and foremost we are a community dedicated to solving the issues and problems around Open Source licensing and compliance. The SPDX work group (part of the Linux Foundation) consists of individuals, community members, and representatives from companies, foundations and organizations who use or are considering using the SPDX standard. The work group operates much like a meritocratic, consensus-based community project; that is, anyone with an interest in the project can join the community, contribute to the specification, and participate in the decision-making process. We come from many different backgrounds including open source developers, lawyers, consultants and business professionals, many of who have been involved with license compliance and identification for years.

As part of this effort we have developed a set of collateral that can be used:

Why choose an SPDX Project?

Contributing to one of the SPDX projects below will provide a valuable contribution to developers and/or users of open source software. We believe you will find the projects both technically challenging and rewarding. In essence we believe you will be able to look back one day and I say I was part of that effort.


Getting Involved

Beyond working with your mentor(s) we highly encourage students who select one of these projects to get involved with the SPDX community via our technical working group. Interaction with the technical team is primarily done via its mailing list and on gitter (see resources). There is however a weekly call you could join as well. .

Resources

Proposed 2020 Projects

Mentors: please fill out the following template for any projects you wish to propose.

=== Project Name ===
add overview of project here
====Skills Needed====
what skills should the student have to do the coding exercises
====Background Information====
context for the project and references to be studied
====Available Mentors====
list individuals who are willing to mentor and provide information about the project proposal. 

SPDX Workgroup Tooling Projects

=== Replace Java Functions in the SPDX Online Tools with Python Library Functions ===
add overview of project here
====Skills Needed====
what skills should the student have to do the coding exercises
====Background Information====
context for the project and references to be studied
====Available Mentors====
list individuals who are willing to mentor and provide information about the project proposal. 
=== Improve the Python Libraries ===
add overview of project here
====Skills Needed====
what skills should the student have to do the coding exercises
====Background Information====
context for the project and references to be studied
====Available Mentors====
list individuals who are willing to mentor and provide information about the project proposal. 
=== Improve the Deployment Infrastructure for the SPDX Online Tools ===
add overview of project here
====Skills Needed====
what skills should the student have to do the coding exercises
====Background Information====
context for the project and references to be studied
====Available Mentors====
list individuals who are willing to mentor and provide information about the project proposal. 

These projects are aimed at contributing to the SPDX tools to help reduce the effort to create SPDX documents and increase the accuracy of them.

Implement SPDX License Matching in Python

Implement as much of the SPDX License Matching Guidelines as practical in Python. This could replace the current Java implementation for the Check License SPDX Online license checking tool.

Following is a list of suggested features:

  • Provide an interface which will check text against a license template using the license matching guidelines
  • Provide an interface which will check text and return all matching SPDX listed license ID's
  • Provide an interface which takes 2 license texts as input and returns a boolean indicating if the 2 licenses match per the license matching guidelines
  • When there is not a match, provide a return value making it possible to describe where and why the license does not match

Background Information

  • See the SPDX License Matching Guidelines for a description of the guidelines
  • A technical description of the templates and license matching can be found in Appendix II of the SPDX specification
  • A Java implementation can be found in Github SPDX Tools LicenseCompareHelper.java
  • It's harder than you may think - the template language is a challenge to implement. Performance can be a challenge when matching a single text against hundreds of potential licenses. Reporting back where the missmatch occurs can also be a challenge.

Skills Needed

  • Development skills in the Python
  • Skills in parsing and pattern matching
  • Ability to work with the community in integrating results with other projects

Available Mentors

Rohit Lodha Gary O'Neall