limbra issueshttps://gitlab.in2p3.fr/limbra/limbra/-/issues2015-11-10T17:18:57+01:00https://gitlab.in2p3.fr/limbra/limbra/-/issues/42Review the error log generated in production 0.8.142015-11-10T17:18:57+01:00LE GAC RenaudReview the error log generated in production 0.8.14* To improve the robustness of the code.
* Review the errors log generated in production by the version 0.8.14
* Try to fix errors as far as possible.
* This might help to fix the crash of the web2py server happening from time to time...* To improve the robustness of the code.
* Review the errors log generated in production by the version 0.8.14
* Try to fix errors as far as possible.
* This might help to fix the crash of the web2py server happening from time to time. The origin of the crash is not understood.
* Must be done before the release 0.9.0https://gitlab.in2p3.fr/limbra/limbra/-/issues/12Use pandas and the matplotlib library for the graph2015-11-06T15:09:35+01:00LE GAC RenaudUse pandas and the matplotlib library for the graph* A tentative to create graphs was introduced in version 0.8.8.1.
* It relies on the *chart package* of the Ext JS library (http://docs.sencha.com/extjs/4.2.1/#!/api)
* The quality of the graphs is poor with respect to what we are doin...* A tentative to create graphs was introduced in version 0.8.8.1.
* It relies on the *chart package* of the Ext JS library (http://docs.sencha.com/extjs/4.2.1/#!/api)
* The quality of the graphs is poor with respect to what we are doing.
* The pandas (http://pandas.pydata.org/pandas-docs/stable/) and the matplolib (http://matplotlib.org/) libraries have been developed to perform data analysis in python. They can manipulate a large amount of data, perform high level statistical analysis and produce high quality plot.
* They are available on the server and used in the `track_events` applications.
* Use **pandas** and **maplotlib** instead of the **chart package**.
* Take the code develop in `track_events` as an example. It might be possible to encapsulated the track_events part into a web2py plugin ?
https://gitlab.in2p3.fr/limbra/limbra/-/issues/40Unlock the publication field authors_role2015-10-29T10:08:30+01:00LE GAC RenaudUnlock the publication field authors_role* Unlock the field `authors_roles`.
* Fill the table `authors_roles` with predefined values.
* Add this items in the publications grid filter.
* Add this item in the list / metric selectors.* Unlock the field `authors_roles`.
* Fill the table `authors_roles` with predefined values.
* Add this items in the publications grid filter.
* Add this item in the list / metric selectors.https://gitlab.in2p3.fr/limbra/limbra/-/issues/35Publisher abbreviations have to follow the ISO 4 standard2015-10-28T18:06:31+01:00LE GAC RenaudPublisher abbreviations have to follow the ISO 4 standard* ISO-4 [abbreviation](http://www.issn.org/services/online-services/access-to-the-ltwa/)
* List of journal abbreviations:
* [WLS](http://www.wsl.ch/dienstleistungen/publikationen/office/abk_EN)
* [CASSI](http://cassi.cas.org/s...* ISO-4 [abbreviation](http://www.issn.org/services/online-services/access-to-the-ltwa/)
* List of journal abbreviations:
* [WLS](http://www.wsl.ch/dienstleistungen/publikationen/office/abk_EN)
* [CASSI](http://cassi.cas.org/search.jsp)
* [INSPIREHEP](http://inspirehep.net/collection/Journals)
* WebOfScience
* Official definition from `Web Of Sciences` [InCites Journal citations Reports](https://jcr.incites.thomsonreuters.com/JCRJournalHomeAction.action?year=&edition=&journal=)https://gitlab.in2p3.fr/limbra/limbra/-/issues/36Use list widget for synonym field2015-10-19T13:38:25+02:00LE GAC RenaudUse list widget for synonym field* Solve the case ANTARES and ANTARES, TANAMI
* Treat ``collaboration`` as ``country`` or ``publishers``. Do no enter a new value. The user has to decide what to do even if the collaboration is well formed. Solve the issue like *ATLAS Co...* Solve the case ANTARES and ANTARES, TANAMI
* Treat ``collaboration`` as ``country`` or ``publishers``. Do no enter a new value. The user has to decide what to do even if the collaboration is well formed. Solve the issue like *ATLAS Collaboration* versus *ATLAS Collaborations*
* In mysql the field type is `LONGTEXT` (instead of `TEXT`) for `list:string` and `json`.
* ...https://gitlab.in2p3.fr/limbra/limbra/-/issues/10Origin field contains a list of values2015-10-09T10:49:58+02:00LE GAC RenaudOrigin field contains a list of values* The same publication can be found in the `cds.cern.ch` and in the `inspirehep.cern.ch` stores:
- http://cds.cern.ch/record/1951625
- http://inspirehep.net/record/1319638
* In that case the database field origin has *two* val...* The same publication can be found in the `cds.cern.ch` and in the `inspirehep.cern.ch` stores:
- http://cds.cern.ch/record/1951625
- http://inspirehep.net/record/1319638
* In that case the database field origin has *two* values.
* The harvesters mechanism has been design to work with unique value for the origin field. It has to be modified to work with a list of values.
* Several lists are encoded in the database as a string in which values are separeted by comma. The same technique can be used in that case.
* To be develop once the issue #9 is working.
* List of value can be generated as soon as the record is found.
* CDS:
- The origin field is at the MARC key `0248 a`
- The MARC key `35 a` and `35 9` contains the origin field value in INSPIREHEP.
* INSPIREHEP:
- The origin field is at the MARC key `909C0 o`
- The MARC key `35 a` and `35 9` contains the origin field value in CDS.
https://gitlab.in2p3.fr/limbra/limbra/-/issues/8Modify the controller harvesters/run to scan several store2015-10-09T10:49:58+02:00LE GAC RenaudModify the controller harvesters/run to scan several store* The parameter of the action ``launch harvester`` are the year (period), the project and the publication category.
* In most of the case these parameters correspond to a single harvester defined in the database table ``harvesters``
* ...* The parameter of the action ``launch harvester`` are the year (period), the project and the publication category.
* In most of the case these parameters correspond to a single harvester defined in the database table ``harvesters``
* However, the articles (porceeding, talks, ...) can be looked for in the ``cds.cern.ch`` and in the ``inspirehep.net`` stores. In that case, the action parameters correspond to **two** harvesters.
* In the controller harvester/run, the first one is always selected, see line 272 ``row = selector.select(db.harvesters).first()``.
* This restriction has to be removed.https://gitlab.in2p3.fr/limbra/limbra/-/issues/32Add the synonym field in the publishers table2015-10-07T11:30:04+02:00LE GAC RenaudAdd the synonym field in the publishers table* The `synonym` field is a `string` containing values separeted by a comma.
* If a publisher does not exit have a look to the synonym. If nothing match, reject the record. Latter on, the user can add a new publisher or synonym in the da...* The `synonym` field is a `string` containing values separeted by a comma.
* If a publisher does not exit have a look to the synonym. If nothing match, reject the record. Latter on, the user can add a new publisher or synonym in the database and catch the faulty record.
* Rename the field `publisher`.https://gitlab.in2p3.fr/limbra/limbra/-/issues/31Add the synonym field in the country table2015-10-07T11:30:04+02:00LE GAC RenaudAdd the synonym field in the country table* The `synonym` field is a `string` containing values separated by a comma.
* if a country is not found, then look to the synomym. If nothing match, reject the record. Latter on the user can add a new country or synonym and catch the fa...* The `synonym` field is a `string` containing values separated by a comma.
* if a country is not found, then look to the synomym. If nothing match, reject the record. Latter on the user can add a new country or synonym and catch the faulty record.https://gitlab.in2p3.fr/limbra/limbra/-/issues/5Remove obsolete field harvesters.ratio2015-10-07T11:30:04+02:00LE GAC RenaudRemove obsolete field harvesters.ratio* The field ``harvesters.ratio`` was used when upgrading a conference talk to conference proceeding. It was designed to control small difference between the title of the talk and the title of the proceeding.
* Since version 0.8.9 talk...* The field ``harvesters.ratio`` was used when upgrading a conference talk to conference proceeding. It was designed to control small difference between the title of the talk and the title of the proceeding.
* Since version 0.8.9 talk are not upgrade to proceeding any more. Therefore this field is obsolete.
* It has to be removed from:
- the database model
- the existing databases
- the user guide
https://gitlab.in2p3.fr/limbra/limbra/-/issues/7Adapt build_version to the gitlab branch model2015-10-03T16:55:38+02:00LE GAC RenaudAdapt build_version to the gitlab branch model* The current branch model relies on the branches: master, develop, feature, hotfix and release. It has been chosen before the migration to GitLab
* The branch model can be simplified using the GitLab functionalities **issues tracking**...* The current branch model relies on the branches: master, develop, feature, hotfix and release. It has been chosen before the migration to GitLab
* The branch model can be simplified using the GitLab functionalities **issues tracking** and **merge request**.
* It is based on two stable branches **master** and **production**.
* Each code modification (bug fix, improvement, ...) start with an issue.
* For each issue a feature branch is create. Its name starts with the issue number.
* When the code for the issue is finished, it is rebase with respect to the master branch and then pushed in the master branch via a *merge request*. The merge request description has to contains the issues number (fixes #14, closes #67, etc.). The issue has to be closed and the branch has to be deleted when the merge request is accepted.
* When the master branch reach a point corresponding to a release, it is pushed in the production branch via a *merge request*.
* An *hot fix* start by an issue. It is prepared in a dedicated branch. Once ready the dedicated branch is push to the master via a *merge request* (conflict might be solved at that time). The hot fix branch is pushed to the production branch when the hot fix is working in the master branch. Then the hot fix branch is deleted.
* More details in http://doc.gitlab.com/ee/workflow/gitlab_flow.html
* The script ``build_version.py`` contains options to create the feature, hot fix and release branches. These options are obsolete with the GitLab branch models and have to be removed.
* We might have to keep an option when creating a new release ? to be clarified.https://gitlab.in2p3.fr/limbra/limbra/-/issues/17Review the logic of the harvester2015-09-25T10:45:18+02:00LE GAC RenaudReview the logic of the harvester* In the current implementation, we have to deal with the following problem:
*Des enregistrements ayant un champ collaboration mal défini sont rejetés par le moissonneur. Ils sont corrigés à la main et ajoutés dans la base de données...* In the current implementation, we have to deal with the following problem:
*Des enregistrements ayant un champ collaboration mal défini sont rejetés par le moissonneur. Ils sont corrigés à la main et ajoutés dans la base de données en utilisant l'action "éditer et insérer". Au prochain moissonnage ces enregistrements ne devraientt pas apparaitre dans la liste des publications moissonnées en erreurs car il sont déjà dans la base de données, mais ce n'est pas le cas.*
* In addition, when inserting a record in the database it can be reject by the database engine. This case of error is not detector nor counted.
* Implementing these require a deep modification of the code. It can only take place once the issue #9 is closed and when #11 is rather well advanced.https://gitlab.in2p3.fr/limbra/limbra/-/issues/16The harvesters must reject as soon as possible record using their id and the ...2015-09-21T19:02:09+02:00LE GAC RenaudThe harvesters must reject as soon as possible record using their id and the database field originhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/30Get conference data in the Marc12 decoding not in CheckAndFix2015-09-20T15:59:43+02:00LE GAC RenaudGet conference data in the Marc12 decoding not in CheckAndFix* for a talk or a proceeding the record is retrieved and decoded by the `Marc12` class.
* Later on the `CheckAndFix` class adds the conference data.
It should be better to have a record complete as soon as possible. Therefore, all me...* for a talk or a proceeding the record is retrieved and decoded by the `Marc12` class.
* Later on the `CheckAndFix` class adds the conference data.
It should be better to have a record complete as soon as possible. Therefore, all methods are available when a `Record` is ready. The class CheckAndFix only correct non-conformities
* [ ] add the method `_get_conference_data` to the `Marc12` class
* [ ] execute the method in `Marc12.__call__`
* [ ] simplify the method `CheckAndFix.conference`
* [ ] modify the `tests/harvester` section (conference record can be check in the section record).https://gitlab.in2p3.fr/limbra/limbra/-/issues/24Better strategy to find the institute identifier2015-09-17T19:36:08+02:00LE GAC RenaudBetter strategy to find the institute identifier* The institute identifier is defined in the application preferences, *reg_institute*
* It has to match the value used in the invenio store.
A better strategy, would be:
* Store the record identifier associated to the institute in t...* The institute identifier is defined in the application preferences, *reg_institute*
* It has to match the value used in the invenio store.
A better strategy, would be:
* Store the record identifier associated to the institute in the INSPIREHEP data base
* Extract from it the identifier used in INSPIREHEP or CDS and use it in the harvesters.
More on decoding:
* the address of the record is http://inspirehep.net/record/902989
* The definition of the institute identifier is in the field `110u` and `110t`. The field `110u` is the institute id used up to now while the field `110t` is the future one.
* Create a regular expression `110u|110t` and store it in the local variable `reg_institute`
* The name of the preference can be `inspirehep_institute_id`. It contains the record id, *e.g* `902989`
* Remove the preference `reg_institute` which become a local variable. It should be constructed when the first harvester runs.
* A dedicated class, `Institute` might have to be created.
* Do not forget to modify the documentationhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/29Break the modules invenio_tools and harvest_tools2015-09-11T15:06:50+02:00LE GAC RenaudBreak the modules invenio_tools and harvest_tools* The modules `invenio_tools` is too big. It contains 4 classes, 5 exceptions and its length is about 2680 lines
* The modules `harvest_tools` is also too long. It contains 10 classes, 6 functions and its length is about 1970 lines
* T...* The modules `invenio_tools` is too big. It contains 4 classes, 5 exceptions and its length is about 2680 lines
* The modules `harvest_tools` is also too long. It contains 10 classes, 6 functions and its length is about 1970 lines
* Transform these modules in packages with the same name. The packages contains several sub-modules, one per class. Add one base sub-module for regular expression, functions used in several places,...
* In addition the translation (`gluon.current.T`) of the error message is performed at the initialisation of the modules `invenio_tools`. Remove the translation, it can be done later on, in the error handler.
https://gitlab.in2p3.fr/limbra/limbra/-/issues/9Develop unit tests for the class Record2015-09-03T18:44:31+02:00LE GAC RenaudDevelop unit tests for the class Record* The harvester is the critical part of this application.
* Procedures have to be developed to make them robust and to ensure that they are working before releasing a new version.
* The first step is to develop unit tests for the class...* The harvester is the critical part of this application.
* Procedures have to be developed to make them robust and to ensure that they are working before releasing a new version.
* The first step is to develop unit tests for the class ``Record``
* Use the python package nose (https://nose.readthedocs.org/en/latest/)
* Create a `test` directory in the modules one.
* Create a file `test_record_article.py`
* Recuperate a well known record from a store:
```
from invenio_tools import InvenioStore, Marc12
host = 'cds.cern.ch'
record_id = 1951625
store = InvenioStore(host)
xml = store.get_record(record_id)
record = Marc12(xml)[0]
```
* For each method of the class `Record`, making sense for the *article* category, develop a test function:
```
def test_collaboration()
assert record.collaboration() == "LHCb Collaboration"
```
* Develop the test file for the others categories Proceeding, Talk, Report, ...
https://gitlab.in2p3.fr/limbra/limbra/-/issues/23Improve the interface2015-07-15T12:27:57+02:00LE GAC RenaudImprove the interface* [x] Add a node *application*
* [x] Migrate the leaf *properties* in the application node and rename it *préferences*
* [x] Migrate the *CAS* leaves to the *applications* node
* [x] Destroy the *CAS* and *configure application* nodes
* ...* [x] Add a node *application*
* [x] Migrate the leaf *properties* in the application node and rename it *préferences*
* [x] Migrate the *CAS* leaves to the *applications* node
* [x] Destroy the *CAS* and *configure application* nodes
* [x] Migrate the leaves *edit and insert*, *insert MARCXML* into the *wizard* node
* [x] For harvester table and wizard use the label *automaton* or *robot* instead of *category*, for the field *controller*
* [x] Update the user documentationhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/3Export list using BibTex and CSV format2015-05-18T18:25:54+02:00LE GAC RenaudExport list using BibTex and CSV format* Some users request to export the list in BibTex and CSV formats.
* Reference defining the BibTex format details can be found in the book of Leslie Lamport "A document preparation system" or on the web, *e.g.* http://en.wikipedia.org/...* Some users request to export the list in BibTex and CSV formats.
* Reference defining the BibTex format details can be found in the book of Leslie Lamport "A document preparation system" or on the web, *e.g.* http://en.wikipedia.org/wiki/BibTeX.
LE GAC RenaudLE GAC Renaudhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/2Show list of feature or release identifier to pick from in build_version.py2015-04-24T19:02:43+02:00MEESSEN ChristopheShow list of feature or release identifier to pick from in build_version.pyWhen closing a feature or a release with build_version.py it currently shows the list of all branches.
* It would be more convenient if it shows a list of identifiers only.
* If there is only one feature or release branch, it shoul...When closing a feature or a release with build_version.py it currently shows the list of all branches.
* It would be more convenient if it shows a list of identifiers only.
* If there is only one feature or release branch, it should pick it's identifier as default value. MEESSEN ChristopheMEESSEN Christophe