THE SPDX WIKI IS NO LONGER ACTIVE. ALL CONTENT HAS BEEN MOVED TO https://github.com/spdx

Difference between revisions of "GSOC/GSOC ProjectIdeas"

From SPDX Wiki
Jump to: navigation, search
(Proposed 2021 Projects)
(Add Golang RDF Saver project)
 
(22 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
<br />
 
<br />
  
<span style="font-size:150%">'''Welcome to the 2021 SPDX Google Summer of Code Project Page'''</span>
+
<span style="font-size:150%">'''Welcome to the 2022 SPDX Google Summer of Code Project Page'''</span>
  
 
See the [https://rtgdk.github.io/spdx-gsoc-proposal.html proposal template] if you are interested in submitting a Google Summer of Code proposal.
 
See the [https://rtgdk.github.io/spdx-gsoc-proposal.html proposal template] if you are interested in submitting a Google Summer of Code proposal.
Line 41: Line 41:
 
* [https://lists.spdx.org/mailman/listinfo/spdx-tech SPDX tech mailing list]
 
* [https://lists.spdx.org/mailman/listinfo/spdx-tech SPDX tech mailing list]
  
=Proposed 2021 Projects=
+
= Ideas for 2022 Projects =
  
Mentors:  please fill out the following template for any projects you wish to propose.  
+
== SBOM Conformance Checker ==
 +
The goal of this project is to create a simple tool that
 +
checks whether an SBOM (in SPDX format)
 +
conforms to NTIA's minimum elements guidance.
 +
=== Description ===
 +
The SPDX Specification defines a number of fields (elements) that may appear in an SBOM (Software Bill of Materials).
 +
Not all of them are mandatory, however, so SBOMs in SPDX format can vary greatly.
  
=== Project Name ===
+
While researching the attributes that have to be present in an SBOM,
add overview of project here
+
NTIA came up with a guidance about the minimum elements that must appear therein:
====Skills Needed====
+
https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf
what skills should the student have to do the coding exercises
+
====Background Information====
+
context for the project and references to be studied
+
====Available Mentors====
+
list individuals who are willing to mentor and provide information about the project proposal.  
+
  
(The projects from 2019 can be found on the [https://summerofcode.withgoogle.com/organizations/4532099550281728/#5727887162867712 2019 Google Summer of Code projects page for SPDX] ).
+
It would, therefore, be useful to have a tool that can determine whether an SBOM stored in SPDX format
 +
fulfills all such minimum obligations.
  
==SPDX Workgroup Tooling Projects==  
+
The tool should make use of the already existing libraries for reading SPDX documents
These projects are aimed at contributing to the SPDX tools to help reduce the effort to create SPDX documents and increase the accuracy of them.
+
=== Technologies ===
 +
Python
 +
=== Duration ===
 +
This will be a short (175 hours) project.
 +
It might be extended to a long (350 hours) project if integration
 +
with the existing SPDX handling tools (e.g., the Validation tool)
 +
is also implemented.
 +
=== Mentors ===
 +
Dick Brooks, Kate Stewart
  
=== Migrate SPDX Online Tools to DJango3 ===
+
== Private license management system ==
Migrate the SPDX online tools to later versions of Django and upgrade dependencies (such as Django Social Auth) to later version to support better / more secure authentication to Github.
+
A web-based system for managing license texts; similar to the SPDX License List but oriented towards other private collections of licenses.
 +
=== Description ===
 +
The goal of the project would be to create a simple web application
 +
for people to upload license texts
 +
and automatically create a license repository.
 +
The initial rough "functional specifications" describe it as
 +
mainly an input form, where the information is entered.
 +
There will be some automatic processing (e.g., canonicalization, duplicate avoidance, etc.),
 +
a review/approval (and naming) step,
 +
and then publishing in a specified format.
  
====Skills Needed====
+
It should be noted that the specification is not yet finalized
* Experience with Python3 programming
+
regarding naming namespaces, way to publish licenses, etc.
* Familiar with Django framework
+
If the SPDX project has already advanced in these definitions,
 +
this project will obviously implement the decisions taken.
 +
=== Technologies ===
 +
Python (any framework) for the back-end; JavaScript (any framework) for the minimal front-end.
 +
=== Duration ===
 +
This can be either a short (175 hours) project, implementing only the basic functionality;
 +
or a long (350 hours) one, implementing more functionality and automation.
 +
=== Mentors ===
 +
Alexios Zavras; more TBD
  
====Background Information====
+
== SBOM combiner ==
The SPDX Online Tools are currently being migrated to Python3.  Several libraries can now be upgraded to more supportable versions including DJango.  There are some known issues with the current version of Django Social Auth which would be resolved by upgrading the versions. There may be additional libraries which can be upgraded. There is also an opportunity to improve the structure and unit tests for the online tools if time allows.  See the [Python3 Branch https://github.com/spdx/spdx-online-tools/tree/rtgdk_python3] of the SPDX online tools for the current state of the Python3 migration.
+
The project will result in a simple command-line tool that will be able to “combine” information from a number of SBOMs into a comprehensive SBOM that includes all the information of the provided ones.
 +
An actual use case would be the generation of an SBOM for an actual software delivery that is comprised by a number of components, each one of which has its own correct SBOM.
 +
=== Description ===
 +
The primary purpose of this tool would be
 +
to stitch together smaller component-level SPDX documents
 +
and amalgamate them into one top-level SPDX document
 +
representing a "sum of parts" piece of software.
 +
As an initial pass for implementation, the component-level SBOMs would have to be provided by the caller
 +
until the tool was advanced enough to fetch SPDX Documents referenced by ExternalDocumenRef reliably.  
 +
=== Technologies ===
 +
Python (preferably); or Go.
 +
=== Duration ===
 +
This will be a short (175 hours) project.
 +
=== Mentors ===
 +
Rose Judge; others TBD
  
====Available Mentors====
+
== Update of Java SPDX libraries to handle latest spec ==
[mailto:rohit.lodhartg@gmail.com Rohit Lodha] [mailto:gary@sourceauditor.com Gary O'Neall]
+
=== Description ===
 +
The SPDX Project maintains a library, written in Java, for working with SPDX data.
 +
The development of the library does not always follow the development of the specification immediately.
 +
Since the specification has evolved
 +
and a newer version is expected to be published
 +
right before the timeframe of the project,
 +
it would be useful to have the standard Java libraries
 +
capable of handling the latest spec.
  
=== Generate RDF/OWL/XML from Structured Markdown ===
+
The project will involve obviously understanding deeply
Convert a consistently structured Markdown file into a Web Ontology Language XML document.  Changes to an existing Markdown file should update the OWL document.  The Markdown will have a well defined structure to allow for translation of the text in Markdown to the elements and attributes of the RDF/OWL XML file. The conversion will also validate that the Markdown follows the required specification.  The conversion would be run as part of a Github action for the SPDX specification.
+
the existing libraries
 +
and extending them to handle the latest additions
 +
of the specification (to the point of the published version).
 +
=== Technologies ===
 +
Java; see https://github.com/spdx/Spdx-Java-Library
 +
=== Duration ===
 +
This will be a short (175 hours) project.
 +
=== Mentors ===
 +
TBD
  
====Skills Needed====
+
== Update of Go SPDX libraries to handle latest spec ==
* Skills in writing parsing algorithms (e.g. working with [https://en.wikipedia.org/wiki/Abstract_syntax_tree Abstract Syntax Tree])
+
=== Description ===
* Knowledge and experience in the programming language chosen for the project (e.g. Java, JavaScript, Python)
+
The SPDX Project maintains a library, written in Go, for working with SPDX data.
* Knowledge of RDF/OWL/XML formats
+
The development of the library does not always follow the development of the specification immediately.
* Knowledge of Markdown syntax
+
Since the specification has evolved
 +
and a newer version is expected to be published
 +
right before the timeframe of the project,
 +
it would be useful to have the standard Go libraries
 +
capable of handling the latest spec.
  
====Background Information====
+
The project will involve obviously understanding deeply
The SPDX tech team works very collaboratively on the specification updates using markdown pages in Github as the primary documentation for the specification.  RDF/OWL is used as the primary technical specification for the object model including relationships, cardinality, class structure, and other restrictions.  There is a lot of overlap between the information in the Markdown and the information in the OWL document.  To improve the quality and productivity of the specification work, the SPDX technical team has decided to add tooling for verification of the Markdown and conversion of any common information to the OWL document.
+
the existing libraries
 +
and extending them to handle the latest additions
 +
of the specification (to the point of the published version).
 +
=== Technologies ===
 +
Go; see https://github.com/spdx/tools-golang
 +
=== Duration ===
 +
This will be a short (175 hours) project.
 +
=== Mentors ===
 +
TBD
  
Below are some additional resources for this project:
+
== SPDX Golang RDF Saver ==
* [https://www.w3.org/TR/2004/NOTE-owl-parsing-20040121/ RDF OWL parsing notes from the W3C]
+
=== Description ===
* [https://github.github.com/gfm/ Github flavored Markdown]
+
SPDX already has a Golang library to save RDF triples into a file/string
* [https://docs.google.com/document/d/13PojpaFPdoKZ9Gyh_DEY-Rp7lldyMbSiGE3vCRQhR9M/edit# Results of process discussion]
+
using the gordf project: https://github.com/spdx/gordf
* [https://docs.google.com/document/d/1EGoAmKxPfmmlF3XV6fXwNmsCiFKLH83Bhh8_xrmGhko Analysis of RDF/OWL fields relative to Markdown] (work in progress)
+
* [https://github.com/spdx/spdx-spec/blob/development/v2.2.1/ontology/spdx-ontology.owl.xml Current RDF/OWL document for SPDX spec]
+
* [https://github.com/spdx/spdx-spec/tree/development/v3.0/chapters Markdown for version 3.0] - subject to change as we specify the markdown constraints for this project
+
  
====Available Mentors====
+
The aim of this GSoC project would be to write an adapter in the
[mailto:gary@sourceauditor.com Gary O'Neall]
+
SPDX Golang Tools (the tools-golang repository at https://github.com/spdx/tools-golang) that
 +
would take an SPDX Document struct (see https://github.com/spdx/tools-golang/blob/main/spdx/document.go) as
 +
an input, and serialize it and its child elements into RDF triples to be consumed by the
 +
aforementioned gordf rdf-writer.
 +
=== Technologies ===
 +
Golang; RDF
 +
=== Duration ===
 +
This will be a short (175 hours) project. If the project requires less than 175 hours, remaining time can be spent on
 +
additional improvements to the Golang tools.
 +
=== Mentors ===
 +
Rishabh Bhatnagar; Steve Winslow as secondary / backup
  
=== RDF Writer for Golang ===
 
Gordf supports writing rdf triples to rdf file. Create an interface that would take in a SPDX document and generate RDF triples out of it. Which will then be consumed by the gordf to generate a RDF/xml file.
 
  
====Skills Needed====
+
== Update of Python SPDX libraries to handle latest spec ==
* Knowledge of RDF
+
=== Description ===
* Skills in XML parsing
+
The SPDX Project maintains a library, written in Python, for working with SPDX data.
* Knowledge and experience in Golang
+
The development of the library does not always follow the development of the specification immediately.
 +
Since the specification has evolved
 +
and a newer version is expected to be published
 +
right before the timeframe of the project,
 +
it would be useful to have the standard Python libraries
 +
capable of handling the latest spec.
  
====Background Information====
+
The project will involve obviously understanding deeply
RDF/XML is one of the supported formats for SPDX documents. Creating an RDFWriter would create a generally useful facility for Golang and provide a more modular structure for the SPDX Golang tools.
+
the existing libraries
 +
and extending them to handle the latest additions
 +
of the specification (to the point of the published version).
 +
=== Technologies ===
 +
Python; see https://github.com/spdx/tools-python
 +
=== Duration ===
 +
This will be a short (175 hours) project.
 +
=== Mentors ===
 +
TBD
  
See the [https://github.com/spdx/tools-golang SPDX Golang tools repo] and the [https://github.com/spdx/gordf gordf library] for more details on the current implementations.
 
  
====Available Mentors====
+
== More to come... ==
[mailto:bhatnagarrishabh4@gmail.com Rishabh Bhatnagar]
+
Mentors: please fill out the following template for any projects you wish to propose.  
  
=== YAML Support for Golang libraries ===
+
=== Project Name ===
YAML is one of the supported formats for SPDX. This project is to add support for reading and writing YAML to the Golang libraries.
+
  add overview of project here
 
+
  ====Skills Needed====
====Skills Needed====
+
what skills should the student have to do the coding exercises
* Knowledge of YAML
+
====Duration===
* Skills in YAML parsing
+
whether this is a short or a long project
* Knowledge and experience in Golang
+
====Background Information====
 
+
context for the project and references to be studied
====Background Information====
+
====Available Mentors====
See the [https://github.com/spdx/tools-golang SPDX Golang tools repo] and the [https://github.com/spdx/gordf gordf library] for more details on the current implementations.
+
  list individuals who are willing to mentor and provide information about the project proposal.
 
+
====Available Mentors====
+
[mailto:bhatnagarrishabh4@gmail.com Rishabh Bhatnagar] [mailto:swinslow@linuxfoundation.org Steve Winslow]
+
 
+
=== JSON Support for Golang libraries ===
+
JSON is one of the supported formats for SPDX. This project is to add support for reading and writing JSON to the Golang libraries.
+
 
+
====Skills Needed====
+
* Knowledge of JSON
+
* Skills in JSON parsing
+
* Knowledge and experience in Golang
+
 
+
====Background Information====
+
See the [https://github.com/spdx/tools-golang SPDX Golang tools repo] and the [https://github.com/spdx/gordf gordf library] for more details on the current implementations.
+
 
+
====Available Mentors====
+
[mailto:bhatnagarrishabh4@gmail.com Rishabh Bhatnagar] [mailto:swinslow@linuxfoundation.org Steve Winslow]
+
 
+
==SPDX Specification Projects==
+
The following projects contribute directly to the creation or validation of the SPDX 2.1 specification.
+
 
+
=== SPDX Specification Views for legal counsels and developers ===
+
The proposal is to see if it possible to deduct large SPDX documents into a small subset SPDX document providing a specific reduced "views" on larger data.
+
====Skills Needed====
+
* Understanding of compliance needs of legal counsels and developers so we can remove friction to adopt SPDX
+
====Background Information====
+
SPDX documents commonly contain 100s, if not 1000s of entries making it hard for a human to make manual corrections or draw conclusions. No scanner can provide 100% complete data human corrections are usual needed. The aim from this proposal is twofold:
+
1. Enable developers with a "code view" of tool-generated SPDX document close to the code they work on to enable them to make corrections to the SPDX data. For instance amend SPDX package tag values or model package dependencies not detected by used scanner.
+
2. Provide legal counsels with a "package and limited file view" to enable legal conclusions
+
====Available Mentors====
+
[mailto:swinslow@linuxfoundation.org Steve Winslow]
+
[mailto:thomas.steenbergen@here.com Thomas Steenbergen]
+
 
+
=== ClearlyDefined exporting and importing SPDX documents ===
+
The goal of this GSoC project would be to add support in the [https://https://github.com/clearlydefined ClearlyDefined project] to export curated data into SPDX 2.2 documents.  Once that is accomplished,  being able to import SPDX documents into the curated database would be the next step.
+
 
+
====Skills Needed====
+
* Experience with JSON and YAML (XML a plus)
+
* Ability to interpret and implement the SPDX specification and related ClearlyDefined community documentation
+
* Ability to work with the community in integrating results with other projects
+
* Willingness to learn about open source licensing and related technical matters
+
 
+
====Background Information====
+
Export a ClearlyDefined workspace as a SPDX document:
+
* user to navigate to https://clearlydefined.io/workspace
+
* Add one or more components to the workspace through any of the existing means, 
+
* then click Share,  and then slick SPDX (choice of 2.2 supported output formats).
+
which would result in an SPDX document is exported containing all of the components that were in the workspace.
+
Note:  If there is mandatory information required by SPDX that ClearlyDefined does not have we will need to determine how to accommodate that.
+
 
+
To populate a workspace from a SPDX document:
+
* user to navigate to https://clearlydefined.io/workspace
+
* drag a SPDX document into the workspace and then all of the components in the SPDX document are added to the workspace.
+
  
There are some discrepancies between the content in ClearlyDefined and that SPDX documents, so work would be needed with both communities to figure out: what to do if license information in the SPDX disagrees with what ClearlyDefined has and how to handle pending curations?
+
= Historical info =
  
====Available Mentors====
+
[[GSOC/PastProjectIdeas]]
[mailto:kstewart@linuxfoundation.org Kate Stewart]
+
[mailto:IAMWILLBAR@github.com William Bartholomew]
+

Latest revision as of 17:36, 31 March 2022


Welcome to the 2022 SPDX Google Summer of Code Project Page

See the proposal template if you are interested in submitting a Google Summer of Code proposal.

Should you have questions please do not hesitate to contact one of the mentors directly.



What is SPDX ?

First and foremost we are a community dedicated to solving the issues and problems around Open Source licensing and compliance. The SPDX work group (part of the Linux Foundation) consists of individuals, community members, and representatives from companies, foundations and organizations who use or are considering using the SPDX standard. The work group operates much like a meritocratic, consensus-based community project; that is, anyone with an interest in the project can join the community, contribute to the specification, and participate in the decision-making process. We come from many different backgrounds including open source developers, lawyers, consultants and business professionals, many of who have been involved with license compliance and identification for years.

As part of this effort we have developed a set of collateral that can be used:

Why choose an SPDX Project?

Contributing to one of the SPDX projects below will provide a valuable contribution to developers and/or users of open source software. We believe you will find the projects both technically challenging and rewarding. In essence we believe you will be able to look back one day and I say I was part of that effort.


Getting Involved

Beyond working with your mentor(s) we highly encourage students who select one of these projects to get involved with the SPDX community via our technical working group. Interaction with the technical team is primarily done via its mailing list and on gitter (see resources). There is however a weekly call you could join as well. .

Resources

Ideas for 2022 Projects

SBOM Conformance Checker

The goal of this project is to create a simple tool that checks whether an SBOM (in SPDX format) conforms to NTIA's minimum elements guidance.

Description

The SPDX Specification defines a number of fields (elements) that may appear in an SBOM (Software Bill of Materials). Not all of them are mandatory, however, so SBOMs in SPDX format can vary greatly.

While researching the attributes that have to be present in an SBOM, NTIA came up with a guidance about the minimum elements that must appear therein: https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf

It would, therefore, be useful to have a tool that can determine whether an SBOM stored in SPDX format fulfills all such minimum obligations.

The tool should make use of the already existing libraries for reading SPDX documents

Technologies

Python

Duration

This will be a short (175 hours) project. It might be extended to a long (350 hours) project if integration with the existing SPDX handling tools (e.g., the Validation tool) is also implemented.

Mentors

Dick Brooks, Kate Stewart

Private license management system

A web-based system for managing license texts; similar to the SPDX License List but oriented towards other private collections of licenses.

Description

The goal of the project would be to create a simple web application for people to upload license texts and automatically create a license repository. The initial rough "functional specifications" describe it as mainly an input form, where the information is entered. There will be some automatic processing (e.g., canonicalization, duplicate avoidance, etc.), a review/approval (and naming) step, and then publishing in a specified format.

It should be noted that the specification is not yet finalized regarding naming namespaces, way to publish licenses, etc. If the SPDX project has already advanced in these definitions, this project will obviously implement the decisions taken.

Technologies

Python (any framework) for the back-end; JavaScript (any framework) for the minimal front-end.

Duration

This can be either a short (175 hours) project, implementing only the basic functionality; or a long (350 hours) one, implementing more functionality and automation.

Mentors

Alexios Zavras; more TBD

SBOM combiner

The project will result in a simple command-line tool that will be able to “combine” information from a number of SBOMs into a comprehensive SBOM that includes all the information of the provided ones. An actual use case would be the generation of an SBOM for an actual software delivery that is comprised by a number of components, each one of which has its own correct SBOM.

Description

The primary purpose of this tool would be to stitch together smaller component-level SPDX documents and amalgamate them into one top-level SPDX document representing a "sum of parts" piece of software. As an initial pass for implementation, the component-level SBOMs would have to be provided by the caller until the tool was advanced enough to fetch SPDX Documents referenced by ExternalDocumenRef reliably.

Technologies

Python (preferably); or Go.

Duration

This will be a short (175 hours) project.

Mentors

Rose Judge; others TBD

Update of Java SPDX libraries to handle latest spec

Description

The SPDX Project maintains a library, written in Java, for working with SPDX data. The development of the library does not always follow the development of the specification immediately. Since the specification has evolved and a newer version is expected to be published right before the timeframe of the project, it would be useful to have the standard Java libraries capable of handling the latest spec.

The project will involve obviously understanding deeply the existing libraries and extending them to handle the latest additions of the specification (to the point of the published version).

Technologies

Java; see https://github.com/spdx/Spdx-Java-Library

Duration

This will be a short (175 hours) project.

Mentors

TBD

Update of Go SPDX libraries to handle latest spec

Description

The SPDX Project maintains a library, written in Go, for working with SPDX data. The development of the library does not always follow the development of the specification immediately. Since the specification has evolved and a newer version is expected to be published right before the timeframe of the project, it would be useful to have the standard Go libraries capable of handling the latest spec.

The project will involve obviously understanding deeply the existing libraries and extending them to handle the latest additions of the specification (to the point of the published version).

Technologies

Go; see https://github.com/spdx/tools-golang

Duration

This will be a short (175 hours) project.

Mentors

TBD

SPDX Golang RDF Saver

Description

SPDX already has a Golang library to save RDF triples into a file/string using the gordf project: https://github.com/spdx/gordf

The aim of this GSoC project would be to write an adapter in the SPDX Golang Tools (the tools-golang repository at https://github.com/spdx/tools-golang) that would take an SPDX Document struct (see https://github.com/spdx/tools-golang/blob/main/spdx/document.go) as an input, and serialize it and its child elements into RDF triples to be consumed by the aforementioned gordf rdf-writer.

Technologies

Golang; RDF

Duration

This will be a short (175 hours) project. If the project requires less than 175 hours, remaining time can be spent on additional improvements to the Golang tools.

Mentors

Rishabh Bhatnagar; Steve Winslow as secondary / backup


Update of Python SPDX libraries to handle latest spec

Description

The SPDX Project maintains a library, written in Python, for working with SPDX data. The development of the library does not always follow the development of the specification immediately. Since the specification has evolved and a newer version is expected to be published right before the timeframe of the project, it would be useful to have the standard Python libraries capable of handling the latest spec.

The project will involve obviously understanding deeply the existing libraries and extending them to handle the latest additions of the specification (to the point of the published version).

Technologies

Python; see https://github.com/spdx/tools-python

Duration

This will be a short (175 hours) project.

Mentors

TBD


More to come...

Mentors: please fill out the following template for any projects you wish to propose.

=== Project Name ===
add overview of project here
====Skills Needed====
what skills should the student have to do the coding exercises
====Duration===
whether this is a short or a long project
====Background Information====
context for the project and references to be studied
====Available Mentors====
list individuals who are willing to mentor and provide information about the project proposal.

Historical info

GSOC/PastProjectIdeas