" - thought for large and continues queries of a repository.\n",
" - No token needed to harvest and fetch entries.\n",
" + $-$ Metadata representation of files is provided by the data provider.\n",
" - Using the REST API;\n",
" + $+$ Access to the full entry/record/community information.\n",
" + $-$ An [access token](https://zenodo.org/account/settings/applications/) is needed to communicate with the REST API.\n",
" + $-$ Harvest not optimised for large searches."
]
},
...
...
@@ -127,7 +125,8 @@
"id": "f9ad3584",
"metadata": {},
"source": [
" No token is needed to fetch metadata files provided by Zenodo (the provider). However please note that the **metadata schema representation of the records is chosen by the provider !** \n",
" No token is needed to fetch metadata files provided by Zenodo (the provider). \n",
" However please note that the **metadata schema representation of the records is chosen by the provider !** \n",
" \n",
"Zenodo supports the following schema representations:\n",
" - `DataCite` (various version),\n",
...
...
@@ -164,34 +163,42 @@
"import requests"
]
},
{
"cell_type": "markdown",
"id": "87f186f9",
"metadata": {},
"source": [
"We would need to specify some arguments to reduce the search"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "076bfee8",
"execution_count": 3,
"id": "5ee3a192",
"metadata": {},
"outputs": [],
"source": [
"token = ''"
"parameters = {'communities': 'escape2020',\n",
" 'size':100}"
]
},
{
"cell_type": "markdown",
"id": "87f186f9",
"id": "e268aef2",
"metadata": {},
"source": [
"We would need to specify some arguments to reduce the search"
"**NOTE** No token is needed to fetch/communicate with the REST API. \n",
"However, you would need to [create one](https://zenodo.org/account/settings/applications/) if you would like to write or publish through the API."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5ee3a192",
"id": "ddccf248",
"metadata": {},
"outputs": [],
"source": [
"parameters = {'access_token': token,\n",
" 'communities': 'escape2020',\n",
" 'size':100}"
"token = ''"
]
},
{
...
...
%% Cell type:markdown id:b0fcf1bf tags:
<h1><center><fontsize="36"> How to harvest metadata from Zenodo </font></center></h1>
---------------------
#### Notebook outline
- Zenodo OAI-PMH protocol
- Zenodo REST API
- Explore the REST API answer (payload) with the `request` library
- Using `eossr` library
- Using `PyZenodo3` library
- Pros and cons of both methods
---------------------
%% Cell type:markdown id:2529eacc tags:
## TL;DR: Pros and cons of each method
- Using AOI-PMH for harvesting;
+ $+$ More efficient harvest:
- faster,
- thought for large and continues queries of a repository.
- No token needed to harvest and fetch entries.
+ $-$ Metadata representation of files is provided by the data provider.
- Using the REST API;
+ $+$ Access to the full entry/record/community information.
+ $-$ An [access token](https://zenodo.org/account/settings/applications/) is needed to communicate with the REST API.
+ $-$ Harvest not optimised for large searches.
%% Cell type:markdown id:2193adc5 tags:
## OAI-PMH protocol
%% Cell type:markdown id:9bb7e516 tags:
#### - First have a lookg to a nice [tutorial to the protocol](https://indico.cern.ch/event/5710/sessions/108048/attachments/988151/1405129/Simeon_tutorial.pdf).
%% Cell type:markdown id:1bcf7733 tags:
The [OAI-PMH protocol](https://www.openarchives.org/pmh/) uses a base URL + special syntax ('verbs') to query and find metadata representation(s) of a data provider.
In the case of zenodo the base URL is: https://zenodo.org/oai2d.
For example;
- to retrieve all the entries (`verb=ListRecords`)
- belonging to escape2020 community (`set=user-escape2020`)
- in the OAI DataCite metadata representation (`metadataPrefix=oai_datacite`)
No token is needed to fetch metadata files provided by Zenodo (the provider). However please note that the **metadata schema representation of the records is chosen by the provider !**
No token is needed to fetch metadata files provided by Zenodo (the provider).
However please note that the **metadata schema representation of the records is chosen by the provider !**
Zenodo supports the following schema representations:
-`DataCite` (various version),
-`Dublin Core`,
-`MARC21`,
- However it **does not provide** metadata under the `codemeta.json` schema.
%% Cell type:markdown id:c6a47567 tags:
# Query Zenodo's records through its REST API
%% Cell type:code id:26424a79 tags:
``` python
# pip install request
```
%% Cell type:code id:e7a84906 tags:
``` python
importrequests
```
%% Cell type:code id:076bfee8 tags:
``` python
token=''
```
%% Cell type:markdown id:87f186f9 tags:
We would need to specify some arguments to reduce the search
%% Cell type:code id:5ee3a192 tags:
``` python
parameters={'access_token':token,
'communities':'escape2020',
parameters={'communities':'escape2020',
'size':100}
```
%% Cell type:markdown id:e268aef2 tags:
**NOTE** No token is needed to fetch/communicate with the REST API.
However, you would need to [create one](https://zenodo.org/account/settings/applications/) if you would like to write or publish through the API.
%% Cell type:code id:ddccf248 tags:
``` python
token=''
```
%% Cell type:markdown id:4cd8011b tags:
## Example with the `requests` lib - How to recover all ESCAPE2020 community records ?
{'affiliation': 'CSC-IT Center for Science', 'name': 'Liinamaa, Iiris'},
{'affiliation': 'CSC-IT Center for Science', 'name': 'Märkälä, Anu'},
{'affiliation': 'Athena Research Center',
'name': 'Marinos-Kouris, Christos'},
{'affiliation': 'GO FAIR Foundation',
'name': 'Meerman, Bert',
'orcid': '0000-0002-0071-2660'},
{'affiliation': 'TU Wien',
'name': 'Saurugger, Bernd',
'orcid': '0000-0001-5730-3983'},
{'affiliation': 'Trust-IT Services',
'name': 'Smith, Zachary',
'orcid': '0000-0002-9984-008X'}],
'description': '<p>The EOSC Symposium 2021 provided a key engagement opportunity for the EOSC community after the European Open Science Cloud finally entered its highly-anticipated implementation phase in 2021. Delivered online to just under 1,000 EOSC stakeholders from over 63 different countries, this was not only the largest EOSC Symposium yet, but it was also an essential opportunity for convergence and alignment on principles and priorities.</p>\n\n<p>The EOSC Association will play an important role in this phase. With already over 210 member and observer organisations from across Europe, the Association represents a single voice for the advocacy and representation of the broader EOSC Stakeholder community in Europe, promoting alignment of EU research policy and priorities.</p>\n\n<p>The Association will continuously develop the EOSC Strategic Research and Innovation Agenda (SRIA) which will influence future EOSC activities at institutional, national and EU level (including the EOSC-related work programmes in Horizon Europe). This living document will adapt to the changing EOSC ecosystem and the needs of EOSC stakeholders. The Association is setting up a series of Advisory Groups (AG) with Task Forces (TF) to engage with the EOSC community around priority areas, namely:</p>\n\n<ul>\n\t<li>Implementation of EOSC</li>\n\t<li>Metadata and Data Quality</li>\n\t<li>Research Careers and Curricula</li>\n\t<li>Sustaining EOSC</li>\n\t<li>Technical Challenges on EOSC</li>\n</ul>\n\n<p>The Symposium was the first opportunity for the Association to present the draft charters of the Task Forces. A key objective of the event was also for the Association to understand what work has been carried out, is in progress, or is planned on the topics of the AGs and TFs. A call for contributions ran throughout May 2021, with a total of 137 applications received. Through presentations, lightning talks, and panels, over 70 community members were able to highlight key findings and recommendations for the AGs and TFs to take into consideration for their work.</p>',
'description': '<p>In this release the major features added are:</p>\n<ul>\n<li><p>an exponential cutoff power-law for the electron spectra;</p>\n</li>\n<li><p>the possibility to compute the gamma-gamma opacity for misaligned sources (<code>viewing angle != 0</code>) for the following targets: point source behind the jet, BLR and the DT.</p>\n</li>\n</ul>',
All these methods are implemented in the [Zenodo client](https://gitlab.in2p3.fr/escape2020/wp3/eossr/-/blob/master/eossr/api/zenodo.py)(a REST API handler) of the [eossr library](https://gitlab.in2p3.fr/escape2020/wp3/eossr).
The library is also in charge of automatise the project's uploads from GitLab to Zenodo (by the use of the GitLab-CI and the REST API handler).
'creators': [{'affiliation': "Institut de Física d'Altes Energies (IFAE), The Barcelona Institute of Science and Technology, Campus UAB, 08193 Bellaterra (Barcelona), Spain",
'name': 'Cosimo Nigro',
'orcid': '0000-0001-8375-1907'},
{'affiliation': 'University of Lodz, Faculty of Physics and Applied Informatics, Department of Astrophysics, 90-236 Lodz, Poland',
'name': 'Julian Sitarek',
'orcid': '0000-0002-1659-5374'},
{'affiliation': 'University of Lodz, Faculty of Physics and Applied Informatics, Department of Astrophysics, 90-236 Lodz, Poland',
'name': 'Paweł Gliwny',
'orcid': '0000-0002-4183-391X'},
{'affiliation': "Laboratoire d'Annecy de Physique des Particules, Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LAPP, 74000 Annecy, France",
'name': 'David Sanchez'},
{'affiliation': 'Minnesota State University Moorhead, Moorhead, Minnesota, US',
'name': 'Matthew Craig',
'orcid': '0000-0002-4183-391X'}],
'description': "This repository contains the scripts to generate the figures included in the paper 'agnpy: an open-source python package modelling the radiative processes of jetted active galactic nuclei'.",