Commit 0c6f3656 authored by Enrique Garcia's avatar Enrique Garcia
Browse files

Merge branch 'add_codemeta_and_utils' into 'master'

Add codemeta.json and utils

See merge request escape2020/wp3/escape_metadata_template!1
parents 1a60dc0c 1138b26c
# Metadata version - do not change
metadata-version: 0.2
# Mandatory entries
title: escape metadata template
authors:
- Thomas Vuillaume
- Enrique Garcia
contact:
- name: Thomas Vuillaume
- email: thomas.vuillaume@lapp.in2p3.fr
license: MIT
url: https://gitlab.in2p3.fr/escape2020/escape/escape_metadata_template
description: A machine-readable information template for the ESCAPE repository projects
# Optional entries
doi: null
keywords:
- EOSC
type: source
grant: 824064
language: python
hardware:
- machine: local
- CPU: null
- RAM: 100MB
- drive:
- type: HDD
- volume: 2MB
- GPU: null
dependencies:
- python>=3.6
- pip
os:
- 'win-64'
- 'linux'
- 'osx-64'
compiler:
- gcc>=4.7
multi-thread: false
container:
- docker>=2.0
# ESCAPE metadata template
A machine-readable metadata template for ESCAPE software.
ESCAPE will be following the **CodeMeta** schema context to describe metadata.
Download and use the latest version of this template to upload your software in the ESCAPE repository.
Create and incorporate a `codemeta.json` file to your project before uploading it to the ESCAPE repository.
There are mandatory and optional fields.
Please find below the description of the fields.
Comments are welcome. Open an issue here or [contact](mailto:vuillaume@lapp.in2p3.fr;garcia@lapp.in2p3.fr) the authors.
Comments are welcome. Open an issue here or email the authors.
## Quickstart
1. Go to the [CodeMeta generator](https://codemeta.github.io/codemeta-generator/). Create a `codemeta.json` file based on your library/repository.
- Check in the same web application that the generate / your own file is valid !
- Please for the moment restrict the list of keywords to the the ones that we propose (see below).
2. Include the `codemeta.json` file in the root directory of your project.
3. To automate the upload to the [ESCAPE repository](https://zenodo.org/communities/escape2020) through the GitLab-CI pipelines
- Include the `.zenodoci` library in the root directory of your project.
- Configure the pipeline (Quikstart and tutorials [here](https://escape2020.pages.in2p3.fr/wp3/ossr-pages/page/repository/publish_in_repository/)).
-----------------
-----------------
## Create a Zenodo metadata file from the CodeMeta schema
## Mandatory
The zenodo repository does not accept codemeta metadata files yet. In the meanwhile, this library provides a simple tool
to create a native Zenodo metadata file (`.zenodo.json`) from a `codemeta.json` file. To do so;
- title: project title
- authors: list of authors
- contact:
- name: could be a person or an entity (e.g. ESCAPE or CTA Observatory)
- email
- description: short description
- keywords: list of keywords to categories the project. A pre-defined list of keywords is given below.
- license: the open-source project license (e.g. MIT)
1. Include a `codemeta.json` file to the root directory of your project.
2. Run the following command;
````bash
$ python codemeta_utils/codemeta_to_zenodo_json.py
````
3. In case of doubts or problems, please [contact us](mailto:vuillaume@lapp.in2p3.fr;garcia@lapp.in2p3.fr).
## Optional
## Metadata schema templates
- Digital Object Identifier (doi)
- publication date (if already published elsewere, leave empty otherwise)
- type of publication: source, compiled, container
- grant/funding
- contributors
- references
- language: programming language
- dependencies: external dependencies (including matching versions)
- os: operating system
- compiler: compilation environement
- hardware requirements
- general use case (HPC, server, local desktop)
- CPU, RAM, HDD/SSD requirements
- GPU requirements
- multi-threading: true or false
- container: dependency (including matching version)
Inside the `codemeta_utils` directory you will find two template files with **the all the terms of the corresponding metadata schema context**
for both the CodeMeta metadata file and the Zenodo metadata file.
Feel free to create and incorporate the metadata files starting from these templates. However, please note that
the final filenames **MUST** be either `codemeta.json` or `.zenodo.json` (note the `.` !). In case you do not fill a key field, take it out of the file.
In case of doubts please also check;
- The [CodeMeta terms description](https://codemeta.github.io/terms/) or,
- the [`metadata representation`](https://developers.zenodo.org/#representation) allowed for the `.zenodo.json` metadata file.
### Extending the CodeMeta Context schema
In case you find that CodeMeta context does not describe deep enough your project, you can extend the metadata context
and combine it with all the terms available in [https://schema.org](https://schema.org/docs/full.html).
For this purpose, and following the [CodeMeta's developer guide](https://codemeta.github.io/developer-guide/);
1. Modify the `"@Context"` key of the `codemeta.json` as;
"@context": ["https://raw.githubusercontent.com/codemeta/codemeta/2.0-rc/codemeta.jsonld", "http://schema.org/"]
2. Include the desired terms / properties following the `schema.org` context.
3. Contact us for a likely implementation into the OSSR environment :-)
## Automate the metadata schema in the OSSR environment.
The `ZenodoCI` project contains a copy of the code in this library !
This means that if you have already configured the GitLabCI pipeline together with the Zenodo repository, the CI
pipeline will take care of creating a `.zenodo.json` file automatically and incorporate it to the new upload/new
version to Zenodo.
## Keywords list
Please restrict the list of keywords within the `codemeta.json` file to the following.
- CTA
- LSST
- LOFAR
......@@ -62,4 +91,4 @@ Comments are welcome. Open an issue here or email the authors.
- ESO
- Astronomy
- Astroparticle physics
- Particle physics
- Particle physics
\ No newline at end of file
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"name": "escape metadata template",
"description": "A machine-readable metadata template for the ESCAPE repository projects based in the CodeMeta Project and schema; ",
"keywords": "EOSC",
"license": "https://spdx.org/licenses/MIT",
"softwareVersion": "v1.0",
"developmentStatus": "active",
"codeRepository": "https://gitlab.in2p3.fr/escape2020/wp3/escape_metadata_template",
"downloadUrl": "https://gitlab.in2p3.fr/escape2020/wp3/escape_metadata_template/-/releases",
"dateCreated": "2020-03-31",
"datePublished": "2020-03-31",
"dateModified": "2020-11-23",
"isAccessibleForFree": true,
"isPartOf": [
"https://gitlab.in2p3.fr/escape2020",
"https://gitlab.in2p3.fr/escape2020/wp3",
"https://projectescape.eu/"
],
"contIntegration": "https://gitlab.in2p3.fr/escape2020/wp3/escape_metadata_template/-/pipelines",
"issueTracker": "https://gitlab.in2p3.fr/escape2020/wp3/escape_metadata_template/-/issues",
"readme": "https://gitlab.in2p3.fr/escape2020/wp3/escape_metadata_template/-/blob/master/README.md",
"operatingSystem": [
"GNU",
"macOS",
"windows"
],
"programmingLanguage": [{}],
"softwareRequirements": [ ],
"author": [
{
"@type": "Person",
"@id": "https://orcid.org/0000-0002-5686-2078",
"givenName": "Thomas",
"familyName": "Vuillaume",
"email": "vuillaume@lapp.in2p3.fr",
"affiliation": {
"@type": "Organization",
"name": "LAPP, CNRS"
}
},
{
"@type": "Person",
"@id": "https://orcid.org/0000-0003-2224-4594",
"givenName": "Enrique",
"familyName": "Garcia",
"email": "garcia@lapp.in2p3.fr",
"affiliation": {
"@type": "Organization",
"name": "LAPP, CNRS"
}
}
],
"funder":[
{
"@type": "Organization",
"@id": "https://doi.org/10.13039/501100000780",
"name": "European Commission"
},
{
"@type": "Organization",
"name": "ESCAPE: European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures",
"funder": {
"@type": "Organization",
"@id": "https://doi.org/10.13039/501100000780",
"name": "European Commission"
}
}
],
"funding": "824064"
}
\ No newline at end of file
from .codemeta_to_zenodo_json import *
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"applicationCategory": "",
"applicationSubCategory": "",
"author": [
{
"@type": "Person",
"@id": "EXAMPLE https://orcid.org/0000-0000-0000-0000",
"givenName": "Name",
"familyName": "Surename",
"email": "email@organization.org",
"affiliation": {
"@type": "Organization",
"name": "name_of_organization"
}
}
],
"buildInstructions": "",
"citation": "",
"codeRepository": "",
"contIntegration": "",
"contributor": [
{
"@type": "Person"
}
],
"copyrightHolder": "",
"copyrightYear": "",
"creator": "",
"dateCreated": "YYYY-MM-DD",
"dateModified": "YYYY-MM-DD",
"datePublished": "YYYY-MM-DD",
"description": "",
"developmentStatus": "",
"downloadUrl": "",
"editor": "",
"embargoDate": "",
"encoding": "",
"fileFormat": "",
"fileSize": "",
"funder": [
{
"@type": "Organization",
"@id": "https://doi.org/10.13039/501100000780",
"name": "European Commission"
},
{
"@type": "Organization",
"name": "ESCAPE: European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures",
"funder": {
"@type": "Organization",
"@id": "https://doi.org/10.13039/501100000780",
"name": "European Commission"
}
}
],
"funding": "824064",
"hasPart": "",
"identifier": "",
"installUrl": "",
"isAccessibleForFree": "",
"isPartOf": "",
"issueTracker": "",
"keywords": "",
"license": "",
"maintainer": {
"@type": "Person"
},
"memoryRequirements": "",
"name": "",
"operatingSystem": "",
"permissions": "",
"position": "",
"processorRequirements": "",
"producer": "",
"programmingLanguage": [
{
"@type": "EXAMPLE ComputerLanguage",
"name": "EXAMPLE Python",
"url": "EXAMPLE https://www.python.org/"
}
],
"provider": "",
"publisher": "",
"readme": "",
"referencePublication": "",
"relatedLink": "",
"releaseNotes": "",
"runtimePlatform": "",
"sameAs": "",
"softwareHelp": "",
"softwareRequirements": [
{
"@type": "EXAMPLE SoftwareApplication",
"identifier": "EXAMPLE numpy",
"name": "EXAMPLE numpy",
"softwareVersion": "EXAMPLE 1.18"
}
],
"softwareVersion": "",
"sponsor": "",
"storageRequirements": "",
"supportingData": "",
"targetProduct": ""
}
\ No newline at end of file
{
"title": "For details check https://developers.zenodo.org/#representation",
"upload_type": "software",
"access_right": "open",
"publication_date": "YYYY-MM-DD",
"communities": [{"identifier": "escape2020"}],
"grants": [{"id": "10.13039/501100000780::824064"}],
"creators": [
{"name": "Name and Surname",
"affiliation": "Institute, Center",
"orcid": "0000-0000-0000-0000"}
],
"description": "",
"license": "",
"doi": "",
"prereserve_doi": "",
"keywords": [],
"notes": "",
"related_identifiers": "",
"contributors": [{}],
"references": [],
"version": "",
"language": "",
"journal_title": "",
"journal_volume": "",
"journal_issue": "",
"journal_pages": "",
"conference_title": "",
"conference_acronym": "",
"conference_dates": "",
"conference_place": "",
"conference_url": "",
"conference_session": "",
"conference_session_part": "",
"imprint_publisher": "",
"imprint_isbn": "",
"imprint_place": "",
"partof_title": "",
"partof_pages": "",
"thesis_supervisors": "",
"thesis_university": "",
"subjects": "",
"locations": [{}],
"dates": [{}],
"method": ""
}
\ No newline at end of file
# -*- coding: utf-8 -*-
#
# Enrique Garcia. Nov 2020.
# email: garcia 'at' lapp.in2p3.fr
import os
import sys
import json
from pathlib import Path
from distutils.util import strtobool
def parse_person_schema_property(person_property, contributor_field):
"""
Parse the Person Schema property correctly
Parameters:
--------
person_property: dict
dictionary codemeta key with the a list or a single Person property item.
contributor_field : str
contributor type {'editor', 'producer', 'sponsor'} or publisher, although the last one can only happen if
`upload_type` is publication (NOT SUPPORTED - contact E. Garcia by email).
Returns:
--------
zenodo_person: dict
dictionary with the correct zenodo syntax for all {author, contributor, maintainer}.
"""
zenodo_person = {}
special_contributor_cases = ['editor', 'producer', 'publisher', 'provider', 'sponsor']
name = person_property['familyName']
if 'givenName' in person_property:
name += f', {person_property["givenName"]}'
zenodo_person['name'] = name
if "@id" in person_property:
if 'orcid.org/' in person_property["@id"]: # "https://orcid.org/0000-0002-5686-2078" format not accepted
zenodo_person['orcid'] = person_property["@id"].split('orcid.org/')[-1]
else:
zenodo_person['orcid'] = person_property["@id"]
if "affiliation" in person_property:
zenodo_person['affiliation'] = person_property['affiliation']['name']
# Parse correctly the contributors
if contributor_field in special_contributor_cases:
if contributor_field is 'provider' or contributor_field is 'publisher':
zenodo_person['type'] = 'Other'
else:
try:
zenodo_person['type'] = person_property["type"]
except:
zenodo_person['type'] = contributor_field
return zenodo_person
def add_author_metadata(zenodo_file, codemt_file, field):
"""
Aux function to parse correctly all the authors, contributors and maintainers that can be found at the
codemeta.json file
zenodo_file: dict
metadata dictionary with the zenodo syntax
codem_file: list or dict
metadata dictionary key field with the codemeta syntax
field: str
codemeta key field specifying creator {author, contributor, maintainer, creator}, or
contributors {editor, sponsor, producer, project manager...}
"""
full_contacts = {}
creators_fields = ['author', 'creator', 'maintainer', 'contributor']
contributors_fields = ['editor', 'producer', 'publisher', 'provider', 'sponsor']
# First create the full contact agenda by field
if type(codemt_file[field]) is list:
for person_property in codemt_file[field]:
zenodo_person = parse_person_schema_property(person_property, field)
# 'name' is the only key that MUST be contained in a person_property at least
full_contacts[zenodo_person['name']] = zenodo_person
else:
zenodo_person = parse_person_schema_property(codemt_file[field], field)
full_contacts[zenodo_person['name']] = zenodo_person
# then save each person by field and avoid duplicates
for i, person in enumerate(full_contacts):
if field in creators_fields:
# Contributors and maintainers in the same zenodo key
if i == 0 and 'creators' not in zenodo_file:
zenodo_file['creators'] = []
elif person not in zenodo_file['creators']:
zenodo_file['creators'].append(full_contacts[person])
else:
pass # avoid duplicates
elif field in contributors_fields:
if i == 0 and 'contributors' not in zenodo_file:
zenodo_file['contributors'] = []
elif person not in zenodo_file['contributors']:
zenodo_file['contributors'].append(full_contacts[person])
else:
pass # avoid duplicates
def find_matching_metadata(codemeta_json):
"""
Please note that the following fields are ASSUMED. If they are not correct, change them, or contact us otherwise.
"access_right": "open"
"language": "eng"
param codemeta_json: dict
already parsed dictionary containing the metadata of the codemeta.json file
Returns:
--------
metadata_zenodo : dict
dictionary cotaining the metadata information found at the codemeta.json file but written using the Zenodo
syntax.
"""
person_filed = ['author', 'creator', 'maintainer', 'contributor', 'editor', 'producer', 'publisher',
'provider', 'sponsor']
metadata_zenodo = {'language': 'eng',
'access_right': 'open'}
if codemeta_json["@type"] == "SoftwareSourceCode":
metadata_zenodo['upload_type'] = 'software'
else:
metadata_zenodo['upload_type'] = ''
print("\nCould not identify the type of schema in the `codemeta.json file`.\n"
"Thus the 'upload_type' within the `.zenodo.json` file was left EMPTY.\n"
"Please fill it up by yourself - otherwise zenodo will NOT be able to publish your entry.\n")
if 'name' in codemeta_json:
metadata_zenodo['title'] = codemeta_json['name']
if 'description' in codemeta_json:
metadata_zenodo['description'] = codemeta_json['description']
if 'softwareVersion' in codemeta_json and 'version' not in codemeta_json:
metadata_zenodo['version'] = codemeta_json['softwareVersion']
elif 'version' in codemeta_json and 'softwareVersion' not in codemeta_json:
metadata_zenodo['version'] = codemeta_json['version']
else:
metadata_zenodo['version'] = codemeta_json['version']
if 'keywords' in codemeta_json:
if type(codemeta_json['keywords']) == list:
metadata_zenodo['keywords'] = codemeta_json['keywords']
else:
metadata_zenodo['keywords'] = [codemeta_json['keywords']]
if 'license' in codemeta_json:
metadata_zenodo['license'] = codemeta_json['license'].split('/')[-1] # TODO to be improved
if 'releaseNotes' in codemeta_json:
metadata_zenodo['notes'] = "Release Notes: " + codemeta_json['releaseNotes']
if 'citation' in codemeta_json:
metadata_zenodo['references'] = codemeta_json['citation']
if 'datePublished' in codemeta_json:
metadata_zenodo['publication_date'] = codemeta_json['datePublished']
for person in person_filed:
if person in codemeta_json:
add_author_metadata(metadata_zenodo, codemeta_json, field=person)
return metadata_zenodo
def add_compulsory_escape_metadata(json_file):
"""
Add compulsory information to the .zenodo.json file:
* zenodo community : ESCAPE2020
* ESCAPE grant ID (zenodo syntax)
param json_file: dict
dictionary containing the .zenodo.json metadata information
"""
json_file["communities"] = [{"identifier": "escape2020"}]
json_file["grants"] = [{"id": "10.13039/501100000780::824064"}]
def parse_codemeta_and_write_zenodo_metadata_file(codemeta_filename, zenodo_outname):
"""
Reads the codemeta.json file and creates a new `.zenodo.json` file. This file will contain the SAME information
that in the codemeta.json file but *** WITH THE ZENODO SYNTAX. ***
codemeta_filename: str or Path
path to the codemeta.json file
zenodo_outname: str or Path
path and name to the zenodo metada json file
NOT TO BE CHANGED. The file must be named `.zenodo.json` and be stored in the root directory of the library.
"""
with open(codemeta_filename) as infile:
codemeta_json = json.load(infile)
metadata_zenodo = find_matching_metadata(codemeta_json)
add_compulsory_escape_metadata(metadata_zenodo)
# Correct format for Zenodo
data = {'metadata': metadata_zenodo}
with open(zenodo_outname, 'w') as outfile:
json.dump(data, outfile, indent=4, sort_keys=True)
def query_yes_no(question, default="yes"):
"""
Ask a yes/no question via raw_input() and return their answer.
:param question: str
question to the user