THE SPDX WIKI IS NO LONGER ACTIVE. ALL CONTENT HAS BEEN MOVED TO https://github.com/spdx

Difference between revisions of "GSOC/GSOC ProjectIdeas"

From SPDX Wiki
Jump to: navigation, search
(SPDX Specification in PDF and HTML)
(Available Mentors)
(36 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
<br />
 
<br />
  
<span style="font-size:150%">'''Welcome to the 2019 SPDX Google Summer of Code Project Page'''</span>
+
<span style="font-size:150%">'''Welcome to the 2021 SPDX Google Summer of Code Project Page'''</span>
  
 
See the [https://rtgdk.github.io/spdx-gsoc-proposal.html proposal template] if you are interested in submitting a Google Summer of Code proposal.
 
See the [https://rtgdk.github.io/spdx-gsoc-proposal.html proposal template] if you are interested in submitting a Google Summer of Code proposal.
Line 20: Line 20:
  
 
* [https://spdx.org/using-spdx License List and Short Identifiers]
 
* [https://spdx.org/using-spdx License List and Short Identifiers]
* [https://spdx.org/using-spdx SPDX Specification for generating SPDX Doucments in either RDF or Tag/Value format]
+
* [https://spdx.org/using-spdx SPDX Specification for generating SPDX Documents in multiple formats]
 
* [https://spdx.org/tools A set of basic tools for working with SPDX Documents]
 
* [https://spdx.org/tools A set of basic tools for working with SPDX Documents]
 
* [https://spdx.org/using-spdx License Identifiers in source]
 
* [https://spdx.org/using-spdx License Identifiers in source]
Line 32: Line 32:
 
= Getting Involved =
 
= Getting Involved =
  
Beyond working wth your mentor(s) we highly encourage students who select one of these projects to get involved with the SPDX community via our technical working group. Interaction with the technical team is primarily done via its mailing list (see resources). There is however a weekly call you could join as well. All of the daily work for the Tech team is done on this wiki.
+
Beyond working with your mentor(s) we highly encourage students who select one of these projects to get involved with the SPDX community via our technical working group. Interaction with the technical team is primarily done via its mailing list and on gitter (see resources). There is however a weekly call you could join as well. .
 
+
  
 
== Resources ==
 
== Resources ==
Line 42: Line 41:
 
* [https://lists.spdx.org/mailman/listinfo/spdx-tech SPDX tech mailing list]
 
* [https://lists.spdx.org/mailman/listinfo/spdx-tech SPDX tech mailing list]
  
 +
=Proposed 2021 Projects=
  
=SPDX Workgroup Tooling Projects=
+
Mentors:  please fill out the following template for any projects you wish to propose.
  
These projects are aimed at contributing to the SPDX tools to help reduce the effort to create SPDX and increase the accuracy of the SPDX documents.
+
=== Project Name ===
==Registry for License List Namespaces==
+
add overview of project here
Build automation for a GitHub repository to support registration of license namespaces to support the following workflow:
+
====Skills Needed====
* User submits a request for a new license namespace
+
what skills should the student have to do the coding exercises
** The namespace can be a dns-style request or a free-format namespace (e.g. .org.spdx.ad-hoc-licenses or this-is-my-licenses)
+
====Background Information====
* User includes optional information for a URL with additional information for the namespace
+
context for the project and references to be studied
* User includes optional information on the submitter name and email
+
====Available Mentors====
* User agrees that the information will be publicly shared per the terms of the Linux Foundation privacy policy
+
list individuals who are willing to mentor and provide information about the project proposal.
* A check is made that the namespace is not already in use
+
* A pull request is created for the new namespace
+
* A committer to the namespace repository accepts the pull request
+
* When accepted, the namespace is published to a known website
+
* REST based API's are available to query the namespace repository
+
  
The automation can be built as a function of the SPDX online tools or as an independent tool or as extensions to GitHub.
+
(The projects from 2019 can be found on the [https://summerofcode.withgoogle.com/organizations/4532099550281728/#5727887162867712 2019 Google Summer of Code projects page for SPDX] ).
 +
 
 +
==SPDX Workgroup Tooling Projects==
 +
These projects are aimed at contributing to the SPDX tools to help reduce the effort to create SPDX documents and increase the accuracy of them.
 +
 
 +
=== Migrate SPDX Online Tools to DJango3 ===
 +
Migrate the SPDX online tools to later versions of Django and upgrade dependencies (such as Django Social Auth) to later version to support better / more secure authentication to Github.  In addition to migrating to DJango 3, additional issues can be taken on to create a full GSoC project (see the list below).
  
 
====Skills Needed====
 
====Skills Needed====
* Development skills in the Python or JavaScript language
+
* Experience with Python3 programming
* Understanding of GitHub API's
+
* Familiar with Django framework
* Ability to work with the user community in refining requirements
+
* UI development
+
* REST API development
+
  
 
====Background Information====
 
====Background Information====
SPDX provides a license list for commonly used open source license - the [https://spdx.org/licenses SPDX License List].  SPDX also supports defining licenses within the SPDX document using a LicenseRef syntax defined in [https://spdx.org/spdx-specification-21-web-version#h.1v1yuxt section 6 of the SPDX specification]. In the next release of SPDX, we plan to introduce a mechanism for other organizations or individuals to maintain lists of licenses outside of the SPDX license list, but allow those licenses to be valid without requiring the text to be in the SPDX document itself.  This enhancement has been documented in the [https://github.com/spdx/spdx-spec/issues SPDX specification issues list]. This project automates the registration and management of the namespaces.
+
The SPDX Online Tools are currently being migrated to Python3.  Several libraries can now be upgraded to more supportable versions including DJango.  There are some known issues with the current version of Django Social Auth which would be resolved by upgrading the versions.  There may be additional libraries which can be upgraded.  There is also an opportunity to improve the structure and unit tests for the online tools if time allows.  See the [https://github.com/spdx/spdx-online-tools/tree/rtgdk_python3 Python3 Branch] of the SPDX online tools for the current state of the Python3 migration.
 +
 
 +
Additional tasks and issues can can be included in a project (in priority order):
 +
* Migrate project to python3 and Django3 [https://github.com/spdx/spdx-online-tools/issues/58 issue 50] and [https://github.com/spdx/spdx-online-tools/issues/24 issue 24]
 +
* [https://github.com/spdx/spdx-online-tools/issues/287 Merge app and api code]
 +
* Fix and improve test [https://github.com/spdx/spdx-online-tools/issues/201 issue 201] [https://github.com/spdx/spdx-online-tools/issues/282 issue 282], [https://github.com/spdx/spdx-online-tools/issues/202 issue 202] + others
 +
* [https://github.com/spdx/spdx-online-tools/issues/299 Add a submit via mail functionality to license submittal]
 +
* [https://github.com/spdx/spdx-online-tools/issues/218 Store license diff screenshot in database instead of uploading to github]
 +
* [https://github.com/spdx/spdx-online-tools/issues/157 Improve error message when error received from the Java code]
 +
* [https://github.com/spdx/spdx-online-tools/issues/186 Add a License Diff section to the SPDX Online Tool]
 +
* [https://github.com/spdx/spdx-online-tools/issues/91 API documentation tool]
 +
* API for license namespace
 +
* [https://github.com/spdx/spdx-online-tools/issues/204 Move to Github Apps and improvement for production]
  
 
====Available Mentors====
 
====Available Mentors====
[mailto:gary@sourceauditor.com Gary O'Neall]
+
[mailto:rohit.lodhartg@gmail.com Rohit Lodha] [mailto:anshuldutt21@gmail.com Anshul Dutt Sharma] [mailto:gary@sourceauditor.com Gary O'Neall]
  
==Develop a Distributed License Repository Application==
+
=== Migrate Python Tools to Python 3 ===
Develop an application which accepts license text and adds it to a repository of licenses stored in a GitHub repository.  The interface the application would be a REST API.  There would be optional storage of a license in a GitHub repository.  The application would support the following use cases:
+
Migrate the [https://github.com/spdx/tools-python SPDX Python Tools] to Python 3 including all unit testingIn addition, additional known issues raised on GitHub may be tackled.
* See if a license text has already been registered or if license text is already on the [https://spdx.org/licenses SPDX License List].  If so, the ID would be returned.
+
* Add externally referenced SPDX documents containing license definitions.  When added, each license would be checked to see if it already exists.  A list of license ID aliases would be maintained for all licenses which point to the same license text.
+
* Add a license text to a known git repositoryInput would be license text, license name, proposed license ID, optional license namespace, optional comment, optional creator.  This would be stored as an SPDX document which defines the license.  There would be one document per license.  When added, the license would be checked to see if it already exists.  A list of license ID aliases would be maintained for all licenses which point to the same license text.
+
* Promote a license to the license list - this would call the REST API's for the online tool to add a license to the SPDX license list.
+
* Remove a license reference.  This would also update the license aliases.
+
* [Optional] Provide metrics on use for licenses to help the SPDX legal team propose licenses which should be on the SPDX license list.
+
  
 
====Skills Needed====
 
====Skills Needed====
* Development skills in the Python, Java or JavaScript language
+
* Experience with Python3 programming
* Understanding of Github API's
+
* Knowledge of parsing algorithms
* Ability to work with the user community in refining requirements
+
* REST API development
+
 
+
====Available Mentors====
+
[mailto:gary@sourceauditor.com Gary O'Neall]
+
  
 
====Background Information====
 
====Background Information====
SPDX provides a license list for commonly used open source license - the [https://spdx.org/licenses SPDX License List]SPDX also supports defining licenses within the SPDX document using a LicenseRef syntax defined in [https://spdx.org/spdx-specification-21-web-version#h.1v1yuxt section 6 of the SPDX specification]. In the next release of SPDX, we plan to introduce a mechanism for other organizations or individuals to maintain lists of licenses outside of the SPDX license list, but allow those licenses to be valid without requiring the text to be in the SPDX document itself. This proposal is to provide a method of comparing, tracking and storing licenses which are not currently proposed to be on the SPDX license list.  It would likely be used by a front end UI and by other external tools.
+
The Python tools are command-line tools and a library that implement reading and writing of SPDX files in different formats, as well as converting and validating SPDX files.
 +
The current implementation uses Python 2, which is no longer supported.
 +
In addition to migration, some additional tasks may be taken on to improve the supportability of the libraryIn particular, restructuring the code to separate out the different serialization formats (see [https://github.com/spdx/tools-python/issues/147 issue 147]).
 +
 
 +
====Available Mentors====
 +
[mailto:santiago@nyu.edu Santiago Torres Arias] [mailto:alexios.zavras@intel.com Alexios Zavras] [mailto:anshuldutt21@gmail.com Anshul Dutt Sharma]
  
==Enhanced Workflow for Online License Request==
+
=== RDF Writer for Golang ===
Update the SPDX Online Tools license submit feature to support the following workflow:
+
Gordf supports writing rdf triples to rdf file. Create an interface that would take in a SPDX document and generate RDF triples out of it. Which will then be consumed by the gordf to generate a RDF/xml file.
* License submit can be initiated directly from the UI or through an external application (e.g. the [https://github.com/spdx/spdx-license-diff SPDX License Diff browser plugin]) using a documented API
+
* License text is compared to the currently approved license list
+
** If matched, the SPDX ID is returned and the user is informed that the license already exists
+
* License is compared to all submitted yet not approved licenses
+
** If matched, the user is informed the license is already submitted and is provided a link to the [https://github.com/spdx/license-list-XML/issues License List XML issue]
+
* License is compared to all submitted and rejected licenses
+
** If a match is found, the user is provided a link to the [https://github.com/spdx/license-list-XML/issues License List XML issue]
+
* License is compared to the existing license list using an algorithm which finds close matches (SPDX License Diff is an example)
+
** If an existing license is close, a diff view will show the word differences
+
** The user is presented with a choice of adding an issue for the nearly matching license stating that the license should match
+
*** If the user chooses to add the issue, the license text will be added to the issue requesting a change to the license XML to allow the match
+
*** We could also implement suggested XML markup (e.g. alt or optional text) to make the licenses match - NOTE: This may be a technically challenging feature to implement
+
* If the user wants to submit a new license request, the information is captured and processed by the SPDX legal team through [https://github.com/spdx/license-list-XML GitHub]
+
  
 
====Skills Needed====
 
====Skills Needed====
* Development skills in the Python language
+
* Knowledge of RDF
* Experience with parser development
+
* Skills in XML parsing
* Experience proposing spec changes
+
* Knowledge and experience in Golang
* Understanding of Github API's
+
* Experience in XML parsing
+
  
 
====Background Information====
 
====Background Information====
The SPDX legal team uses an online request process for new license requests. This feature was implemented by a GSoC student in 2018.  Extending the functionality to check for duplicate requests and checking for near matches would greatly improve the efficiency of the license request and approval process.
+
RDF/XML is one of the supported formats for SPDX documents.  Creating an RDFWriter would create a generally useful facility for Golang and provide a more modular structure for the SPDX Golang tools.
  
This project would require significant interaction with the users of the tool (the SPDX legal team) and would have some interesting technical challenges in storing and matching text. The optional feature of suggesting XML markup for near matches could involve sophisticated matching techniques to find the appropriate text to include as optional or alternate.
+
See the [https://github.com/spdx/tools-golang SPDX Golang tools repo] and the [https://github.com/spdx/gordf gordf library] for more details on the current implementations.
  
The current SPDX license list request process is documented in the [https://github.com/spdx/license-list-XML/blob/master/CONTRIBUTING.md License List XML contributing page].
+
====Available Mentors====
 +
[mailto:bhatnagarrishabh4@gmail.com Rishabh Bhatnagar]
  
 +
=== YAML Support for Golang libraries ===
 +
YAML is one of the supported formats for SPDX.  This project is to add support for reading and writing YAML to the Golang libraries.
  
====Available Mentors====
+
====Skills Needed====
[mailto:gary@sourceauditor.com Gary O'Neall]
+
* Knowledge of YAML
 +
* Skills in YAML parsing
 +
* Knowledge and experience in Golang
  
==Update Parser Libraries for Golang ==
 
A new Golang library has recently been added to the SPDX tools, at [https://github.com/spdx/tools-golang]. This tool updated the SPDX Golang libraries to the SPDX 2.1 specification. This tool has several opportunities for improvement and adding features.
 
====Skills Needed====
 
* Development skills in the Golang language
 
* Experience with parser development
 
* Understanding of RDF and XML
 
 
====Background Information====
 
====Background Information====
SPDX currently provides libraries supporting the reading and writing of SPDX documents. A recent new tool has been added for parsing, generating and working with SPDX documents in Golang [https://github.com/spdx/tools-golang]. Opportunities for improving and adding features to this tool include the following:
+
See the [https://github.com/spdx/tools-golang SPDX Golang tools repo] and the [https://github.com/spdx/gordf gordf library] for more details on the current implementations.
* adding support for the official RDF format
+
* experimenting with support for other formats, such as JSON, YAML and XML
+
* enabling support for parsing and generation of documents under pre-2.1 versions of the SPDX spec
+
  
 
====Available Mentors====
 
====Available Mentors====
[mailto:swinslow@linuxfoundation.org Steve Winslow]
+
[mailto:bhatnagarrishabh4@gmail.com Rishabh Bhatnagar] [mailto:swinslow@linuxfoundation.org Steve Winslow]
[mailto:gary@sourceauditor.com Gary O'Neall]
+
  
==Additional Format Support for the Python Libraries==
+
=== JSON Support for Golang libraries ===
Add the ability to read and write XML, JSON, and YAML formats of the SPDX documents.
+
JSON is one of the supported formats for SPDX.  This project is to add support for reading and writing JSON to the Golang libraries.
  
 
====Skills Needed====
 
====Skills Needed====
* Development skills in the Python language
+
* Knowledge of JSON
* Experience with parser development
+
* Skills in JSON parsing
* Understanding of XML, JSON and YAML
+
* Knowledge and experience in Golang
  
 
====Background Information====
 
====Background Information====
SPDX 2.1 specification supports reading and writing RDF/XML and a tag/value format for SPDX documents. Version 2.2 of the specification will add support for XML, JSON and YAML.  The Python libraries currently support reading and writing the RDF/XML and tag/value.  This project would extend the parsing and file generation capabilities of the python libraries to include XML, JSON and YAML format. 
+
See the [https://github.com/spdx/tools-golang SPDX Golang tools repo] and the [https://github.com/spdx/gordf gordf library] for more details on the current implementations.
 
+
The current python libraries are in the [[https://github.com/spdx/tools-python SPDX python tools git repository]]
+
  
 
====Available Mentors====
 
====Available Mentors====
[mailto:tetechris20@gmail.com Krys Nuvadga], [mailto:gary@sourceauditor.com Gary O'Neall]
+
[mailto:bhatnagarrishabh4@gmail.com Rishabh Bhatnagar] [mailto:swinslow@linuxfoundation.org Steve Winslow]
 +
 
 +
==SPDX Specification Projects==
 +
The following projects contribute directly to the creation or validation of the SPDX specification.
 +
 
 +
=== Generate a JSON Representation of the Specification from Structured Markdown ===
 +
Convert a consistently structured Markdown file into a JSON structure following a well defined schema.  Changes to an existing Markdown file should update the JSON files.  The Markdown will have a well defined structure to allow for translation of the text in Markdown to the properties of the JSON file.  The conversion will also validate that the Markdown follows the required specification.  The conversion would be run as part of a Github action for the SPDX specification.
  
== Port SPDX license expression library to Ruby, JavaScript and Java==
 
The [[https://github.com/nexB/license-expression/]|licens_expressionlibrary]] provides comprehensive support license expression using a boolean engine for Python.
 
The goal of this project is to port and/or package this library for JavaScript, Ruby and Java, considering either code conversion tools, alternative Python implementations (e.g. Jython) or calling Python from another language to bring the same features to these other languages.
 
 
====Skills Needed====
 
====Skills Needed====
* Development skills in Python, Java, Ruby, JavaScript.
+
* Skills in writing parsing algorithms (e.g. working with [https://en.wikipedia.org/wiki/Abstract_syntax_tree Abstract Syntax Tree])
 +
* Knowledge and experience in the programming language chosen for the project (e.g. Java, JavaScript, Python)
 +
* Knowledge of Markdown and JSON syntax
  
 
====Background Information====
 
====Background Information====
See https://github.com/spdx/tools-python/issues/10 and https://github.com/nexB/license-expression/
+
The SPDX tech team works very collaboratively on the specification updates using markdown pages in GitHub as the primary documentation for the specification. RDF/OWL is used as the primary technical specification for the object model including relationships, cardinality, class structure, and other restrictions.  There is a lot of overlap between the information in the Markdown and the information in the OWL document.  To improve the quality and productivity of the specification work, the SPDX technical team has decided to add tooling for verification of the Markdown and conversion of any common information to the OWL document.  The conversion will be in 2 stages:
====Available Mentors====
+
* Convert from Markdown to an intermediate JSON format
[mailto:pombredanne@nexb.com Philippe Ombredanne]
+
* Convert from the JSON format to RDF/OWL
  
=SPDX Specification Projects=
+
The specific schema for the JSON format is under development and planned to be available before the start of GSoC.
The following projects contribute directly to the creation or validation of the SPDX 2.1 specification.
+
  
== SPDX Specification in PDF and HTML ==
+
Below are some additional resources for this project:
 +
* [https://github.github.com/gfm/ GitHub flavored Markdown]
 +
* [https://json-schema.org/ JSON Schema] information site and its [https://tools.ietf.org/html/draft-bhutton-json-schema-00 current draft]
 +
* [https://docs.google.com/document/d/13PojpaFPdoKZ9Gyh_DEY-Rp7lldyMbSiGE3vCRQhR9M/edit# Results of process discussion]
 +
* [https://docs.google.com/document/d/1EGoAmKxPfmmlF3XV6fXwNmsCiFKLH83Bhh8_xrmGhko Analysis of RDF/OWL fields relative to Markdown] (work in progress)
 +
* [https://docs.google.com/document/d/1LN5CepVVOu38w4pXeLpw_3BDNn2CKAWUyTVdlh8C2WM/edit?usp=sharing Template for Spec Markdown] (work in progress)
 +
* [https://docs.google.com/document/d/1J6P3q6wcP0c1xquTIfMfIBJCa8S1xtkhDs6dD1hSf4Y/edit?usp=sharing Example spec JSON file] (work in progress)
 +
* [https://docs.google.com/document/d/1OF_EqfU4tLPF-oGheEZ1aCzsnGIJHyRptzQ_U7AA-wY/edit?usp=sharing Draft JSON schema] (work in progress)
  
We need to generate both HTML and PDF versions of the SPDX Specification from [https://www.markdownguide.org/ Markdown (MD)]. The default is English but going forward we envision other language translations as well so they would need to be accounted for in the overall structure and approach.
+
====Available Mentors====
 
+
[mailto:gary@sourceauditor.com Gary O'Neall] [mailto:alexios.zavras@intel.com Alexios Zavras]
What we need going forward is:
+
 
+
* The ability to generate both an HTML and PDF document for every released version of the specification. This would be a final copy that should should not be re-rendered.
+
* The ability to display "real time" draft versions of the specification in HTML. This means changes to the specification via commits to GIT for that version should be shown, More than one draft version should be able to be displayed in this fashion. That means we could be rendering draft version 3.1 and draft version 4.0. We do not need this in PDF.
+
* For the PDF version we will need a cover page, page numbers, TOC, header and footer. Internal document references should work (see the background). The ideal solution will be to come up with something that works for both the HTML and PDF versions un-changed  but some automated post processing for the PDF is okay and changing how th elinks are done in the MD is viable as well.
+
* The default is English but going forward we envision other language translations as well for the HTML and PDF so they would need to be accounted for in the overall structure and approach.
+
* The specifications are currently released on branches today. The structure of the GIT may need some changes to accommodate language versions and the PDF.
+
  
Other approaches than what we are using in the background can be suggested and used if they meet the requirements above, particularly for internal document links.
+
=== Generate RDF/OWL from from JSON specification format ===
 +
Convert a set of JSON files into a Web Ontology Language XML document.  The JSON file will map to the elements and attributes of the RDF/OWL XML file.  The JSON schema will be defined prior to the project start and will be consistent with the "Generate a JSON Representation of the Specification from Structured Markdown" project described above.
  
 
====Skills Needed====
 
====Skills Needed====
* Understanding of documentation tooling: Markdown, HTML, mkdocs, etc.,.
+
* Knowledge and experience in the programming language chosen for the project (e.g. Java, JavaScript, Python)
* Familiarity with GIT and github API's
+
* Knowledge of RDF/OWL/XML formats
 +
* Knowledge of JSON parsers
  
 
====Background Information====
 
====Background Information====
The [https://spdx.org/specifications 2.1 SPDX specification] has been moved to markdown on at https://github.com/spdx/spdx-spec
+
The SPDX tech team works very collaboratively on the specification updates using markdown pages in GitHub as the primary documentation for the specification. RDF/OWL is used as the primary technical specification for the object model including relationships, cardinality, class structure, and other restrictions. There is a lot of overlap between the information in the Markdown and the information in the OWL document.  To improve the quality and productivity of the specification work, the SPDX technical team has decided to add tooling for verification of the Markdown and conversion of any common information to the OWL document.  The conversion will be in 2 stages:
and now generates an HTML version at: https://spdx.github.io/spdx-spec
+
* Convert from Markdown to an intermediate JSON format
 +
* Convert from the JSON format to RDF/OWL
  
If you look at the SPDX GitHub today it shows the 2.1 SPDX specification nicely as HTML. This is done using a Travis VM, GitHub Pages and markdown. Mkdocs is used to generate the HTML pages. You can look at the scripts in the GIT. What we need going forward is a way to generate both a PDF version and HTML and  for each draft and release version of the specification.
+
The specific schema for the JSON format is under development and planned to be available before the start of GSoC.
  
For the 2.1.1 specification we are currently using [https://pandoc.org/ pandoc] and [https://wkhtmltopdf.org/ wkhtmltopdf] to generate a PDF specification. This is currently being done offline. In doing this we need to combine the MD documents as each chapter in the specification is a separate MD document. That would work ecept there are internal links that go between the MD documents; e.g. xxxxxx/chapter4.md. These links are broken on conversion to the PDF so a solution for both HTML and PDF needs to be found.
+
Below are some additional resources for this project:
 
+
* [https://www.w3.org/TR/2004/NOTE-owl-parsing-20040121/ RDF OWL parsing notes from the W3C]
A potential approach other than pandoc and wkhtmltopdf to consider for the PDF is at https://github.com/tombensve/MarkdownDoc.
+
* [https://json-schema.org/ JSON Schema] information site and its [https://tools.ietf.org/html/draft-bhutton-json-schema-00 current draft]
 +
* [https://docs.google.com/document/d/13PojpaFPdoKZ9Gyh_DEY-Rp7lldyMbSiGE3vCRQhR9M/edit# Results of process discussion]
 +
* [https://docs.google.com/document/d/1EGoAmKxPfmmlF3XV6fXwNmsCiFKLH83Bhh8_xrmGhko Analysis of RDF/OWL fields relative to Markdown] (work in progress)
 +
* [https://github.com/spdx/spdx-spec/blob/development/v2.2.1/ontology/spdx-ontology.owl.xml Current RDF/OWL document for SPDX spec]
 +
* [https://docs.google.com/document/d/1LN5CepVVOu38w4pXeLpw_3BDNn2CKAWUyTVdlh8C2WM/edit?usp=sharing Template for Spec Markdown] (work in progress)
 +
* [https://docs.google.com/document/d/1J6P3q6wcP0c1xquTIfMfIBJCa8S1xtkhDs6dD1hSf4Y/edit?usp=sharing Example spec JSON file] (work in progress)
 +
* [https://docs.google.com/document/d/1OF_EqfU4tLPF-oGheEZ1aCzsnGIJHyRptzQ_U7AA-wY/edit?usp=sharing Draft JSON schema] (work in progress)
  
 
====Available Mentors====
 
====Available Mentors====
[mailto:j-manbeck2@ti.com Jack Manbeck]
+
[mailto:gary@sourceauditor.com Gary O'Neall] [mailto:alexios.zavras@intel.com Alexios Zavras]
[mailto:kstewart@linuxfoundation.org Kate Stewart]
+
[mailto:thomas.steenbergen@here.com Thomas Steenbergen]
+
  
== SPDX Specification Views for legal counsels and developers ==
+
=== SPDX Specification Views for legal counsels and developers ===
 
The proposal is to see if it possible to deduct large SPDX documents into a small subset SPDX document providing a specific reduced "views" on larger data.
 
The proposal is to see if it possible to deduct large SPDX documents into a small subset SPDX document providing a specific reduced "views" on larger data.
 
====Skills Needed====
 
====Skills Needed====
Line 220: Line 213:
 
[mailto:thomas.steenbergen@here.com Thomas Steenbergen]
 
[mailto:thomas.steenbergen@here.com Thomas Steenbergen]
  
== SPDX Document Generator for projects using SPDXIDs ==
+
=== ClearlyDefined exporting and importing SPDX documents  ===
As more projects start to use SPDXIDs at the file level it becomes much simpler to generate SPDX docs for them from a python script.   
+
The goal of this GSoC project would be to add support in the [https://https://github.com/clearlydefined ClearlyDefined project] to export curated data into SPDX 2.2 documentsOnce that is accomplished,  being able to import SPDX documents into the curated database would be the next step.
====Skills Needed ====
+
 
* Ability to program in python
+
====Skills Needed====
====Background Information ====
+
* Experience with JSON and YAML (XML a plus)
Forward thinking open source projects are adopting SPDXIDs in source files (initially U-Bootbut now much wider use like Zephyr, Linux Kernel, etc.)  
+
* Ability to interpret and implement the SPDX specification and related ClearlyDefined community documentation
With these easy to find "SPDX-License-Identifier:" strings,  generating an SPDX document for a project is a matter of iterating over
+
* Ability to work with the community in integrating results with other projects
the files in a project and extracting the information from these SPDXIDs and calculating checksums.  
+
* Willingness to learn about open source licensing and related technical matters
Creating an open source tool to do this will aid these projects in generating accurate SBOM information at release time.
+
 
This tool should be implemented as a command line, so it can be incorporated into builds, and options can be added.  
+
====Background Information====
Goal is that projects that use SPDX identifiers can automatically generate a SPDX document as a Software Bill of Materials
+
Export a ClearlyDefined workspace as a SPDX document:
(SBOM) on demand (build, release, etc.).
+
* user to navigate to https://clearlydefined.io/workspace
 +
* Add one or more components to the workspace through any of the existing means,   
 +
* then click Share, and then slick SPDX (choice of 2.2 supported output formats).
 +
which would result in an SPDX document is exported containing all of the components that were in the workspace.
 +
Note:  If there is mandatory information required by SPDX that ClearlyDefined does not have we will need to determine how to accommodate that.
 +
 
 +
To populate a workspace from a SPDX document:
 +
* user to navigate to https://clearlydefined.io/workspace
 +
* drag a SPDX document into the workspace and then all of the components in the SPDX document are added to the workspace.
 +
 
 +
There are some discrepancies between the content in ClearlyDefined and that SPDX documents, so work would be needed with both communities to figure out: what to do if license information in the SPDX disagrees with what ClearlyDefined has and how to handle pending curations?
 +
 
 
====Available Mentors====
 
====Available Mentors====
 
[mailto:kstewart@linuxfoundation.org Kate Stewart]
 
[mailto:kstewart@linuxfoundation.org Kate Stewart]
[mailto:saiudayshankar@gmail.com Uday Shankar]
+
[mailto:IAMWILLBAR@github.com William Bartholomew]

Revision as of 16:41, 28 March 2021


Welcome to the 2021 SPDX Google Summer of Code Project Page

See the proposal template if you are interested in submitting a Google Summer of Code proposal.

Should you have questions please do not hesitate to contact one of the mentors directly.



What is SPDX ?

First and foremost we are a community dedicated to solving the issues and problems around Open Source licensing and compliance. The SPDX work group (part of the Linux Foundation) consists of individuals, community members, and representatives from companies, foundations and organizations who use or are considering using the SPDX standard. The work group operates much like a meritocratic, consensus-based community project; that is, anyone with an interest in the project can join the community, contribute to the specification, and participate in the decision-making process. We come from many different backgrounds including open source developers, lawyers, consultants and business professionals, many of who have been involved with license compliance and identification for years.

As part of this effort we have developed a set of collateral that can be used:

Why choose an SPDX Project?

Contributing to one of the SPDX projects below will provide a valuable contribution to developers and/or users of open source software. We believe you will find the projects both technically challenging and rewarding. In essence we believe you will be able to look back one day and I say I was part of that effort.


Getting Involved

Beyond working with your mentor(s) we highly encourage students who select one of these projects to get involved with the SPDX community via our technical working group. Interaction with the technical team is primarily done via its mailing list and on gitter (see resources). There is however a weekly call you could join as well. .

Resources

Proposed 2021 Projects

Mentors: please fill out the following template for any projects you wish to propose.

=== Project Name ===
add overview of project here
====Skills Needed====
what skills should the student have to do the coding exercises
====Background Information====
context for the project and references to be studied
====Available Mentors====
list individuals who are willing to mentor and provide information about the project proposal. 

(The projects from 2019 can be found on the 2019 Google Summer of Code projects page for SPDX ).

SPDX Workgroup Tooling Projects

These projects are aimed at contributing to the SPDX tools to help reduce the effort to create SPDX documents and increase the accuracy of them.

Migrate SPDX Online Tools to DJango3

Migrate the SPDX online tools to later versions of Django and upgrade dependencies (such as Django Social Auth) to later version to support better / more secure authentication to Github. In addition to migrating to DJango 3, additional issues can be taken on to create a full GSoC project (see the list below).

Skills Needed

  • Experience with Python3 programming
  • Familiar with Django framework

Background Information

The SPDX Online Tools are currently being migrated to Python3. Several libraries can now be upgraded to more supportable versions including DJango. There are some known issues with the current version of Django Social Auth which would be resolved by upgrading the versions. There may be additional libraries which can be upgraded. There is also an opportunity to improve the structure and unit tests for the online tools if time allows. See the Python3 Branch of the SPDX online tools for the current state of the Python3 migration.

Additional tasks and issues can can be included in a project (in priority order):

Available Mentors

Rohit Lodha Anshul Dutt Sharma Gary O'Neall

Migrate Python Tools to Python 3

Migrate the SPDX Python Tools to Python 3 including all unit testing. In addition, additional known issues raised on GitHub may be tackled.

Skills Needed

  • Experience with Python3 programming
  • Knowledge of parsing algorithms

Background Information

The Python tools are command-line tools and a library that implement reading and writing of SPDX files in different formats, as well as converting and validating SPDX files. The current implementation uses Python 2, which is no longer supported. In addition to migration, some additional tasks may be taken on to improve the supportability of the library. In particular, restructuring the code to separate out the different serialization formats (see issue 147).

Available Mentors

Santiago Torres Arias Alexios Zavras Anshul Dutt Sharma

RDF Writer for Golang

Gordf supports writing rdf triples to rdf file. Create an interface that would take in a SPDX document and generate RDF triples out of it. Which will then be consumed by the gordf to generate a RDF/xml file.

Skills Needed

  • Knowledge of RDF
  • Skills in XML parsing
  • Knowledge and experience in Golang

Background Information

RDF/XML is one of the supported formats for SPDX documents. Creating an RDFWriter would create a generally useful facility for Golang and provide a more modular structure for the SPDX Golang tools.

See the SPDX Golang tools repo and the gordf library for more details on the current implementations.

Available Mentors

Rishabh Bhatnagar

YAML Support for Golang libraries

YAML is one of the supported formats for SPDX. This project is to add support for reading and writing YAML to the Golang libraries.

Skills Needed

  • Knowledge of YAML
  • Skills in YAML parsing
  • Knowledge and experience in Golang

Background Information

See the SPDX Golang tools repo and the gordf library for more details on the current implementations.

Available Mentors

Rishabh Bhatnagar Steve Winslow

JSON Support for Golang libraries

JSON is one of the supported formats for SPDX. This project is to add support for reading and writing JSON to the Golang libraries.

Skills Needed

  • Knowledge of JSON
  • Skills in JSON parsing
  • Knowledge and experience in Golang

Background Information

See the SPDX Golang tools repo and the gordf library for more details on the current implementations.

Available Mentors

Rishabh Bhatnagar Steve Winslow

SPDX Specification Projects

The following projects contribute directly to the creation or validation of the SPDX specification.

Generate a JSON Representation of the Specification from Structured Markdown

Convert a consistently structured Markdown file into a JSON structure following a well defined schema. Changes to an existing Markdown file should update the JSON files. The Markdown will have a well defined structure to allow for translation of the text in Markdown to the properties of the JSON file. The conversion will also validate that the Markdown follows the required specification. The conversion would be run as part of a Github action for the SPDX specification.

Skills Needed

  • Skills in writing parsing algorithms (e.g. working with Abstract Syntax Tree)
  • Knowledge and experience in the programming language chosen for the project (e.g. Java, JavaScript, Python)
  • Knowledge of Markdown and JSON syntax

Background Information

The SPDX tech team works very collaboratively on the specification updates using markdown pages in GitHub as the primary documentation for the specification. RDF/OWL is used as the primary technical specification for the object model including relationships, cardinality, class structure, and other restrictions. There is a lot of overlap between the information in the Markdown and the information in the OWL document. To improve the quality and productivity of the specification work, the SPDX technical team has decided to add tooling for verification of the Markdown and conversion of any common information to the OWL document. The conversion will be in 2 stages:

  • Convert from Markdown to an intermediate JSON format
  • Convert from the JSON format to RDF/OWL

The specific schema for the JSON format is under development and planned to be available before the start of GSoC.

Below are some additional resources for this project:

Available Mentors

Gary O'Neall Alexios Zavras

Generate RDF/OWL from from JSON specification format

Convert a set of JSON files into a Web Ontology Language XML document. The JSON file will map to the elements and attributes of the RDF/OWL XML file. The JSON schema will be defined prior to the project start and will be consistent with the "Generate a JSON Representation of the Specification from Structured Markdown" project described above.

Skills Needed

  • Knowledge and experience in the programming language chosen for the project (e.g. Java, JavaScript, Python)
  • Knowledge of RDF/OWL/XML formats
  • Knowledge of JSON parsers

Background Information

The SPDX tech team works very collaboratively on the specification updates using markdown pages in GitHub as the primary documentation for the specification. RDF/OWL is used as the primary technical specification for the object model including relationships, cardinality, class structure, and other restrictions. There is a lot of overlap between the information in the Markdown and the information in the OWL document. To improve the quality and productivity of the specification work, the SPDX technical team has decided to add tooling for verification of the Markdown and conversion of any common information to the OWL document. The conversion will be in 2 stages:

  • Convert from Markdown to an intermediate JSON format
  • Convert from the JSON format to RDF/OWL

The specific schema for the JSON format is under development and planned to be available before the start of GSoC.

Below are some additional resources for this project:

Available Mentors

Gary O'Neall Alexios Zavras

SPDX Specification Views for legal counsels and developers

The proposal is to see if it possible to deduct large SPDX documents into a small subset SPDX document providing a specific reduced "views" on larger data.

Skills Needed

  • Understanding of compliance needs of legal counsels and developers so we can remove friction to adopt SPDX

Background Information

SPDX documents commonly contain 100s, if not 1000s of entries making it hard for a human to make manual corrections or draw conclusions. No scanner can provide 100% complete data human corrections are usual needed. The aim from this proposal is twofold: 1. Enable developers with a "code view" of tool-generated SPDX document close to the code they work on to enable them to make corrections to the SPDX data. For instance amend SPDX package tag values or model package dependencies not detected by used scanner. 2. Provide legal counsels with a "package and limited file view" to enable legal conclusions

Available Mentors

Steve Winslow Thomas Steenbergen

ClearlyDefined exporting and importing SPDX documents

The goal of this GSoC project would be to add support in the ClearlyDefined project to export curated data into SPDX 2.2 documents. Once that is accomplished, being able to import SPDX documents into the curated database would be the next step.

Skills Needed

  • Experience with JSON and YAML (XML a plus)
  • Ability to interpret and implement the SPDX specification and related ClearlyDefined community documentation
  • Ability to work with the community in integrating results with other projects
  • Willingness to learn about open source licensing and related technical matters

Background Information

Export a ClearlyDefined workspace as a SPDX document:

  • user to navigate to https://clearlydefined.io/workspace
  • Add one or more components to the workspace through any of the existing means,
  • then click Share, and then slick SPDX (choice of 2.2 supported output formats).

which would result in an SPDX document is exported containing all of the components that were in the workspace. Note: If there is mandatory information required by SPDX that ClearlyDefined does not have we will need to determine how to accommodate that.

To populate a workspace from a SPDX document:

  • user to navigate to https://clearlydefined.io/workspace
  • drag a SPDX document into the workspace and then all of the components in the SPDX document are added to the workspace.

There are some discrepancies between the content in ClearlyDefined and that SPDX documents, so work would be needed with both communities to figure out: what to do if license information in the SPDX disagrees with what ClearlyDefined has and how to handle pending curations?

Available Mentors

Kate Stewart William Bartholomew