Commit 5668b350 authored by LE GAC Renaud's avatar LE GAC Renaud
Browse files

Update the API documentation.

parent 1f65e815
invenio_tools.recordpubli.RecordPubli.year
==========================================
.. currentmodule:: invenio_tools.recordpubli
.. automethod:: RecordPubli.year
\ No newline at end of file
invenio_tools.recordthesis.RecordThesis.these_town
==================================================
.. currentmodule:: invenio_tools.recordthesis
.. automethod:: RecordThesis.these_town
\ No newline at end of file
......@@ -2,14 +2,19 @@
harvest_tools
-------------
The *harvest_tools* package contains all classes to harvester the invenio
The *harvest_tools* package contains all classes to harvest the invenio
store and to load the publication in the database.
The base class is :class:`.Automaton`.
All the other classes inherited from it.
Inherited classes specialise the work for :class:`.Articles`, :class:`.Notes`,
:class:`.Preprints`, :class:`.Proceedings`, :class:`.Reports`,
:class:`.Talks` and :class:`.Thesis`.
All the specialised classes inherited from it:
* :class:`.Articles`
* :class:`.Notes`
* :class:`.Preprints`
* :class:`.Proceedings`
* :class:`.Reports`
* :class:`.Talks`
* :class:`.Thesis`
The automaton is instantiated by the factory :func:`.build_harvester_tool`,
for a given category of publication.
......
......@@ -8,7 +8,6 @@ store and to retrieve the publications according to user search criteria.
The main classes are:
* :class:`.InvenioStore` to search publications in the store,
* :class:`.Marc12` to instantiate the record associated to a publication.
* :class:`.RecordPubli`, :class:`.RecordConf`, :class:`.RecordThesis`
Constants
......@@ -20,14 +19,20 @@ Constants
~base.ARXIV
~base.ARXIV_PDF
~base.MSG_INV_CONF
~base.MSG_INV_CONF_KEY
~base.MSG_NO_CONF
~base.MSG_NO_CONF_ID_KEY
~base.MSG_NO_COUNTRY
~base.MSG_NO_PUBLISHER
~base.MSG_NO_THESIS
~base.MSG_WELL_FORMED_COLLABORATION
~base.OAI
~base.OAI_URL
~base.REG_ARXIV_NUMBER
~base.REG_AUTHOR
~base.REG_DATE
~base.REG_CONF
~base.REG_OAI
~base.REG_YEAR
~base.THESIS_DIR
......@@ -53,8 +58,6 @@ Classes
:toctree: generated/
~inveniostore.InvenioStore
~iterrecord.IterRecord
~marc12.Marc12
~record.Record
~recordconf.RecordConf
~recordinst.RecordInst
......
......@@ -491,7 +491,7 @@ class Automaton(object):
return 0
def process_collection(self, collection):
""""Retrieve JSON objects from the invenio store and for the given
"""Retrieve JSON objects from the invenio store and for the given
collection. Corresponding records are inserted in the database.
Args:
......@@ -500,8 +500,8 @@ class Automaton(object):
Note:
* Design to never stop although exceptions are raised
* Have a look to the collection_logs and logs in order to
understand what happen.
* Have a look to the attributes ``collection_logs`` and ``logs``
in order to understand what happen.
"""
if self.dbg:
......@@ -605,15 +605,16 @@ class Automaton(object):
"""Process the publication identified by its record identifier:
* get the publication data from the store using its identifier
* instantiate the record (RecordPubli, REcordConf, RecordThesis)
* instantiate the record: ``RecordPubli``, ``RecordConf``
or ``RecordThesis``
* process OAI data
* check the record
* insert new record in the database
Note:
* Design to never stop although exception are raised
* Have a look to the collection_logs and logs in order to
understand what happen.
* Have a look to the attribute ``collection_logs`` and ``logs`` in
order to understand what happen.
Args:
rec_id (int):
......@@ -647,8 +648,8 @@ class Automaton(object):
Note:
* Design to never stop although exceptions are raised
* Have a look to the collection_logs and logs in order to
understand what happen.
* Have a look to the attributes ``collection_logs`` and ``logs``
in order to understand what happen.
Args:
host (unicode):
......@@ -683,7 +684,7 @@ class Automaton(object):
dict:
* ``collection_logs`` list of :class:`MsgCollection`
* ``controller`` unicode
* ``logs`` list of :class:Msg
* ``logs`` list of :class:`Msg`
* ``selector`` :class:`plugin_dbui.Selector`
"""
......
......@@ -329,6 +329,7 @@ class CheckAndFix(object):
def authors(self, record):
"""Check that:
* author fields are defined.
* first author is not like ATLAS Collaboration
......@@ -359,10 +360,8 @@ class CheckAndFix(object):
Raises:
CheckException:
* the collaboration is unknown
(neither collaboration nor synonym)
* the collaboration is unknown (neither collaborationnor synonym)
* more than one synonym found.
"""
if self.dbg:
print "\t\tCheck collaboration"
......@@ -406,18 +405,6 @@ class CheckAndFix(object):
* the country is unknown (neither country nor synonym)
* more than one synonym found.
"""
"""Check conference country.
Have a look to the synonyms when the country does not exist.
Args:
record (RecordConf):
record describing a talk or a proceeding.
Raises:
CheckException:
the country is not defined nor entered as a synonym.
"""
if self.dbg:
print "\t\tCheck country"
......@@ -455,7 +442,7 @@ class CheckAndFix(object):
raise CheckException(*e.args)
def conference_date(self, record):
"""Check conference date and format it properly.
"""Check conference date exists and well formatted.
Args:
record (RecordConf):
......@@ -562,8 +549,8 @@ class CheckAndFix(object):
fmt (str):
define the format for author names.
Possible values are "First, Last", "F. Last", "Last",
"Last, First" and "Last F."
Possible values are ``First, Last``, ``F. Last``, ``Last``,
``Last, First`` and ``Last F.``
"""
if self.dbg:
......@@ -778,7 +765,7 @@ class CheckAndFix(object):
sort authors by family name when true otherwise use the
order of authors at the creation of the record
Return
Returns:
str:
* the found affiliation
* an empty string when the rescue list is used.
......@@ -900,7 +887,8 @@ class CheckAndFix(object):
record describing a publication.
Raises:
CheckException:
CheckException::
* the publisher is unknown (neither abbreviation nor synonym)
* more than one synonym found.
......@@ -939,7 +927,8 @@ class CheckAndFix(object):
record describing a publication.
Raises:
CheckException:
CheckException::
* the date is not well formed
* more than one date are found.
......
......@@ -50,7 +50,8 @@ def load_record(host, record_id):
either RecordPubli, RecordInst, RecordConf of RecordThesis.
Raises:
CdsException:
CdsException::
* the server return an HTTP error.
* no JSON object could be decoded.
......
......@@ -114,11 +114,11 @@ def is_thesis(recjson):
"""True when the record describes a thesis.
Args:
record (Record): MARC12 record associated to a publication
or to and institute.
recjson (dict):
record associated to a publication or to and institute.
Return:
bool: ``True`` when the MARC record describes a thesis.
bool: ``True`` when the record describes a thesis.
"""
# THESIS in collection
......
......@@ -371,7 +371,8 @@ class InvenioStore(object):
* The list is empty when the request failed on the server.
Raises:
CdsException:
CdsException::
* keyword argument is invalid;
* the server return an HTTP error;
* JSON object can't be decoded
......@@ -430,10 +431,11 @@ class InvenioStore(object):
Returns:
dict:
the record data (MarcJSON).
the record data (recjson).
Raises:
CdsException:
CdsException::
* the server return an HTTP error.
* no JSON object could be decoded.
......
......@@ -16,53 +16,95 @@ class Record(dict):
record[field] = [dict1(subfield1=..., subfield2=...),
dict2(subfield1=..., subfield2=...), ...]
for an article, typical field ares (cds 1951625, ins 1319638)::
For an article, typical field ares (cds 1951625, ins 1319638, *etc.*):
+-----------------------------+-----------------------------+
| field (cds) | field (inspirehep) |
+-----------------------------+-----------------------------+
+=============================+=============================+
| | FIXME_OAI |
+-----------------------------+-----------------------------+
| abstract | abstract |
+-----------------------------+-----------------------------+
| accelerator_experiment | accelerator_experiment |
+-----------------------------+-----------------------------+
| agency_code | |
+-----------------------------+-----------------------------+
| authors | authors |
+-----------------------------+-----------------------------+
| base | |
+-----------------------------+-----------------------------+
| collection | collection |
+-----------------------------+-----------------------------+
| comment | comment |
+-----------------------------+-----------------------------+
| copyright_status | |
+-----------------------------+-----------------------------+
| corporate_name | corporate_name |
+-----------------------------+-----------------------------+
| creation_date | creation_date |
+-----------------------------+-----------------------------+
| doi | doi |
+-----------------------------+-----------------------------+
| email_message | |
+-----------------------------+-----------------------------+
| filenames | filenames |
+-----------------------------+-----------------------------+
| files | files |
+-----------------------------+-----------------------------+
| filetypes | filetypes |
+-----------------------------+-----------------------------+
| imprint | imprint |
+-----------------------------+-----------------------------+
| keywords | keywords |
+-----------------------------+-----------------------------+
| language | |
+-----------------------------+-----------------------------+
| license | license |
+-----------------------------+-----------------------------+
| number_of_authors | number_of_authors |
+-----------------------------+-----------------------------+
| number_of_citations | number_of_citations |
+-----------------------------+-----------------------------+
| number_of_comments | number_of_comments |
+-----------------------------+-----------------------------+
| number_of_reviews | number_of_reviews |
+-----------------------------+-----------------------------+
| oai | |
+-----------------------------+-----------------------------+
| other_report_number | |
+-----------------------------+-----------------------------+
| persistent_identifiers_keys | persistent_identifiers_keys |
+-----------------------------+-----------------------------+
| physical_description | physical_description |
+-----------------------------+-----------------------------+
| prepublication | prepublication |
+-----------------------------+-----------------------------+
| primary_report_number | primary_report_number |
+-----------------------------+-----------------------------+
| publication_info | publication_info |
+-----------------------------+-----------------------------+
| recid | recid |
+-----------------------------+-----------------------------+
| | reference |
+-----------------------------+-----------------------------+
| report_number | |
+-----------------------------+-----------------------------+
| | source_of_acquisition |
+-----------------------------+-----------------------------+
| status_week | |
+-----------------------------+-----------------------------+
| subject | subject |
+-----------------------------+-----------------------------+
| system_control_number | system_control_number |
+-----------------------------+-----------------------------+
| thesaurus_terms | thesaurus_terms |
+-----------------------------+-----------------------------+
| title | title |
+-----------------------------+-----------------------------+
| | title_additional |
+-----------------------------+-----------------------------+
| url | |
+-----------------------------+-----------------------------+
| version_id | version_id |
+-----------------------------+-----------------------------+
......@@ -207,7 +249,8 @@ class Record(dict):
"""The Open Archive Initiative identifier URL(s).
Returns:
str: the primary and secondary URLs are separated by a comma.
unicode:
the primary and secondary URLs are separated by a comma.
The pattern of the URL is ``http://host/record/id`` or
an empty string when it is not defined or when the OAI is
not well formed.
......
......@@ -8,19 +8,20 @@ from .recordpubli import RecordPubli
class RecordConf(RecordPubli):
"""The record describing a conference talk or a proceeding.
Additional field describing the conference data are::
Additional field describing the conference data are:
+----------------+-----------------------------------------------+
| field | subfield |
+----------------+-----------------------------------------------+
+================+===============================================+
| meeting_name | closing_date, coference_code, country, date, |
| | location, opening_date, year |
+----------------+-----------------------------------------------+
One field is added by limbra:
+----------------+-----------------------------------------------+
| field (limbra) | subfield |
+----------------+-----------------------------------------------+
+================+===============================================+
| meeting_note | recid, url |
+----------------+-----------------------------------------------+
......@@ -77,7 +78,7 @@ class RecordConf(RecordPubli):
Returns:
unicode:
- empty string when not defined
empty string when not defined
"""
# algorithm depends on the store
......
......@@ -11,35 +11,52 @@ MSG_INVALID_RECORD = "Invalid record, it is not describing an institute"
class RecordInst(Record):
"""The record describing an institute.
Fields are::
Fields are:
+-----------------------------+----------------------------------+
| field (inspirehep) | subfield |
+-----------------------------+----------------------------------+
+=============================+==================================+
| FIXME_OAI | id, set |
+-----------------------------+----------------------------------+
| administrative_history | |
+-----------------------------+----------------------------------+
| authority_institution | institution |
+-----------------------------+----------------------------------+
| cataloguer_info | creation_date, modification_date |
+-----------------------------+----------------------------------+
| collection | primary, secondary |
+-----------------------------+----------------------------------+
| corporate_name | name |
+-----------------------------+----------------------------------+
| creation_date | |
+-----------------------------+----------------------------------+
| files | |
+-----------------------------+----------------------------------+
| filetypes | |
+-----------------------------+----------------------------------+
| number_of_citations | |
+-----------------------------+----------------------------------+
| number_of_comments | |
+-----------------------------+----------------------------------+
| number_of_reviews | |
+-----------------------------+----------------------------------+
| persistent_identifiers_keys | |
+-----------------------------+----------------------------------+
| recid | |
+-----------------------------+----------------------------------+
| source_of_description | note |
+-----------------------------+----------------------------------+
| system_control_number | institute, value |
+-----------------------------+----------------------------------+
| url | |
+-----------------------------+----------------------------------+
| version_id | |
+-----------------------------+----------------------------------+
One field is added by limbra:
+-----------------------------+----------------------------------+
| field (limbra) | subfield |
+-----------------------------+----------------------------------+
+=============================+==================================+
| corporate_note | identifier, futur_identifier, |
| | name |
+-----------------------------+----------------------------------+
......
......@@ -69,62 +69,104 @@ def to_str(x):
class RecordPubli(Record):
"""The record describes an article, preprint, proceeding, report and talk.
The main ``field`` and ``subfield`` are::
+---------------------------------+----------------------------------+
| field | subfield |
+---------------------------------+----------------------------------+
| FIXME_OAI (inspire) | id |
| abstract | |
| accelerator_experiment | |
| agency_code (cds) | |
| authors | INSPIRE_number, affiliation, |
| | control_number, first_name, |
| | full_name, last_name, |
| | relator_name (phd director) |
| base (cds) | |
| collection | |
| comment | |
| copyright_status (cds) | |
| corporate_name | collaboration |
| creation_date | |
| doi | |
| email_message (cds) | |
| filenames | |
| files | comment, description, eformat, |
| | full_name, full_path, magic, |
| | name, path, size, status, |
| | subformat, superformat, type, |
| | url, version |
| filetypes | |
| imprint | |
| keywords | |
| language (cds) | |
| license | |
| number_of_authors | |
| number_of_citations | |
| number_of_comments | |
| number_of_reviews | |
| oai (cds) | value |
| other_report_number (cds) | |
| persistent_identifiers_keys | |
| physical_description | |
| prepublication | date, publisher_name, place |
| primary_report_number | |
| publication_info | pagination, title, volume, year |
| recid | none |
| reference (inspire) | |
| report_number (cds) | internal, report_number |
| source_of_acquisition (inspire) | |
| status_week (cds) | |
| subject | |
| system_control_number | institute, value or canceled |
| thesaurus_terms | |
| title | title |
| title_additional (inspire) | |
| url (cds) | description, url |
| version_id | |
+---------------------------------+----------------------------------+
The main ``field`` and ``subfield`` are:
+---------------------------------+----------------------------------+
| field | subfield |
+=================================+==================================+
| FIXME_OAI (inspire) | id |
+---------------------------------+----------------------------------+
| abstract | |
+---------------------------------+----------------------------------+
| accelerator_experiment | |
+---------------------------------+----------------------------------+
| agency_code (cds) | |
+---------------------------------+----------------------------------+
| authors | INSPIRE_number, affiliation, |
| | control_number, first_name, |
| | full_name, last_name, |
| | relator_name (phd director) |
+---------------------------------+----------------------------------+
| base (cds) | |
+---------------------------------+----------------------------------+
| collection | |
+---------------------------------+----------------------------------+
| comment | |
+---------------------------------+----------------------------------+
| copyright_status (cds) | |
+---------------------------------+----------------------------------+
| corporate_name | collaboration |
+---------------------------------+----------------------------------+
| creation_date | |
+---------------------------------+----------------------------------+
| doi | |
+---------------------------------+----------------------------------+
| email_message (cds) | |
+---------------------------------+----------------------------------+
| filenames | |
+---------------------------------+----------------------------------+
| files | comment, description, eformat, |
| | full_name, full_path, magic, |
| | name, path, size, status, |
| | subformat, superformat, type, |
| | url, version |
+---------------------------------+----------------------------------+
| filetypes | |
+---------------------------------+----------------------------------+
| imprint | |
+---------------------------------+----------------------------------+
| keywords | |
+---------------------------------+----------------------------------+
| language (cds) | |
+---------------------------------+----------------------------------+
| license | |
+---------------------------------+----------------------------------+
| number_of_authors | |
+---------------------------------+----------------------------------+
| number_of_citations | |
+---------------------------------+----------------------------------+
| number_of_comments | |
+---------------------------------+----------------------------------+
| number_of_reviews | |
+---------------------------------+----------------------------------+
| oai (cds) | value |
+---------------------------------+----------------------------------+
| other_report_number (cds) | |
+---------------------------------+----------------------------------+
| persistent_identifiers_keys | |
+---------------------------------+----------------------------------+
| physical_description | |
+---------------------------------+----------------------------------+
| prepublication | date, publisher_name, place |
+---------------------------------+----------------------------------+
| primary_report_number | |
+---------------------------------+----------------------------------+
| publication_info | pagination, title, volume, year |
+---------------------------------+----------------------------------+
| recid | none |
+---------------------------------+----------------------------------+
| reference (inspire) | |
+---------------------------------+----------------------------------+
| report_number (cds) | internal, report_number |
+---------------------------------+----------------------------------+
| source_of_acquisition (inspire) | |
+---------------------------------+----------------------------------+
| status_week (cds) | |
+---------------------------------+----------------------------------+
| subject | |
+---------------------------------+----------------------------------+
| system_control_number | institute, value or canceled |
+---------------------------------+----------------------------------+
| thesaurus_terms | |
+---------------------------------+----------------------------------+
| title | title |
+---------------------------------+----------------------------------+
| title_additional (inspire) | |
+---------------------------------+----------------------------------+
| url (cds) | description, url |
+---------------------------------+----------------------------------+
| version_id | |
+---------------------------------+----------------------------------+
"""
def __init__(self, *args):
......