limbra issueshttps://gitlab.in2p3.fr/groups/limbra/-/issues2015-09-17T19:36:08+02:00https://gitlab.in2p3.fr/limbra/limbra/-/issues/24Better strategy to find the institute identifier2015-09-17T19:36:08+02:00LE GAC RenaudBetter strategy to find the institute identifier* The institute identifier is defined in the application preferences, *reg_institute*
* It has to match the value used in the invenio store.
A better strategy, would be:
* Store the record identifier associated to the institute in t...* The institute identifier is defined in the application preferences, *reg_institute*
* It has to match the value used in the invenio store.
A better strategy, would be:
* Store the record identifier associated to the institute in the INSPIREHEP data base
* Extract from it the identifier used in INSPIREHEP or CDS and use it in the harvesters.
More on decoding:
* the address of the record is http://inspirehep.net/record/902989
* The definition of the institute identifier is in the field `110u` and `110t`. The field `110u` is the institute id used up to now while the field `110t` is the future one.
* Create a regular expression `110u|110t` and store it in the local variable `reg_institute`
* The name of the preference can be `inspirehep_institute_id`. It contains the record id, *e.g* `902989`
* Remove the preference `reg_institute` which become a local variable. It should be constructed when the first harvester runs.
* A dedicated class, `Institute` might have to be created.
* Do not forget to modify the documentationhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/23Improve the interface2015-07-15T12:27:57+02:00LE GAC RenaudImprove the interface* [x] Add a node *application*
* [x] Migrate the leaf *properties* in the application node and rename it *préferences*
* [x] Migrate the *CAS* leaves to the *applications* node
* [x] Destroy the *CAS* and *configure application* nodes
* ...* [x] Add a node *application*
* [x] Migrate the leaf *properties* in the application node and rename it *préferences*
* [x] Migrate the *CAS* leaves to the *applications* node
* [x] Destroy the *CAS* and *configure application* nodes
* [x] Migrate the leaves *edit and insert*, *insert MARCXML* into the *wizard* node
* [x] For harvester table and wizard use the label *automaton* or *robot* instead of *category*, for the field *controller*
* [x] Update the user documentationhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/22Protect harvesters data2015-07-15T15:04:09+02:00LE GAC RenaudProtect harvesters data* The table harvesters contains the configuration of the harvester automatons. It links one automaton to the publication category, *e.g* the autmaton *article* with the publication category *ACL*
* It is possible to link one automaton t...* The table harvesters contains the configuration of the harvester automatons. It links one automaton to the publication category, *e.g* the autmaton *article* with the publication category *ACL*
* It is possible to link one automaton to more than one category, *e.g* ACL and ACLN. This is bad since the automaton is not able to make the difference from the information provided by the invenio store.
To be done:
* [ ] add a protection, using callback mechanism to avois this case.https://gitlab.in2p3.fr/limbra/limbra/-/issues/21Protect data of the application table2015-07-14T17:44:09+02:00LE GAC RenaudProtect data of the application table* The application table contains a list of configuration parameter with their definitions and values.
* The user can modify the value of each configuration parameter as expected.
* It can also delete configuration parameter, modify its...* The application table contains a list of configuration parameter with their definitions and values.
* The user can modify the value of each configuration parameter as expected.
* It can also delete configuration parameter, modify its name, etc. This is bad and create some mistakes during the workshop in June 2015.
To be done:
* [ ] A user can not add / delete a configuration parameter
* [ ] A user can not modify the name of a configuration parameter
* [ ] Possible value for each configuration parameter have to be validated
* [ ] It would be better to use a grid configuration object to manipulate this table. This approach might solve all the problems.
https://gitlab.in2p3.fr/limbra/limbra/-/issues/20Errors while executing harvest_all2015-06-26T16:53:07+02:00MEESSEN ChristopheErrors while executing harvest_allWhile executing `harvest_all` using `web2py -d`, the following stacktraces were printed out on the console.
The Selector options were: 2014 -> 2015, no team, no project, mode:save in database, format:html
Bug present in production 0.8...While executing `harvest_all` using `web2py -d`, the following stacktraces were printed out on the console.
The Selector options were: 2014 -> 2015, no team, no project, mode:save in database, format:html
Bug present in production 0.8.10.
Traceback (most recent call last):
File "applications/track_publications/modules/harvest_tools.py", line 926, in process_url
self.decode_xml(xml)
File "applications/track_publications/modules/harvest_tools.py", line 973, in decode_xml
self.load_db(record)
File "applications/track_publications/modules/harvest_tools.py", line 1442, in load_db
publication_url=record.paper_url(),
File "applications/track_publications/modules/invenio_tools.py", line 2210, in paper_url
elif 'y' not in el and el['u'].endswith(pdf):
AttributeError: 'list' object has no attribute 'endswith'
LE GAC RenaudLE GAC Renaudhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/19Errors while executing harvest_all2015-09-25T11:06:01+02:00MEESSEN ChristopheErrors while executing harvest_allWhile executing `harvest_all` using `web2py -d`, the following stacktraces were printed out on the console.
The Selector options were: 2014 -> 2015, no team, no project, mode:save in database, format:html
Bug present in production 0.8...While executing `harvest_all` using `web2py -d`, the following stacktraces were printed out on the console.
The Selector options were: 2014 -> 2015, no team, no project, mode:save in database, format:html
Bug present in production 0.8.10.
Traceback (most recent call last):
File "applications/track_publications/modules/harvest_tools.py", line 926, in process_url
self.decode_xml(xml)
File "applications/track_publications/modules/harvest_tools.py", line 973, in decode_xml
self.load_db(record)
File "applications/track_publications/modules/harvest_tools.py", line 1209, in load_db
year=year)
File "applications/track_publications/modules/harvest_tools.py", line 1108, in check_by_fields
year=year)
File "applications/track_publications/modules/plugin_dbui/helper.py", line 388, in get_id
row = table(query)
File "/home/meessen/mywap/web2py/gluon/dal.py", line 9124, in __call__
limitby=(0, 1), for_update=for_update, orderby=orderby, orderby_on_limitby=False).first()
File "/home/meessen/mywap/web2py/gluon/dal.py", line 10749, in select
return adapter.select(self.query, fields, attributes)
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1868, in select
sql = self._select(query, fields, attributes)
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1767, in _select
sql_w = ' WHERE ' + self.expand(query) if query else ''
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1544, in expand
out = op(first, second, **optional_args)
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1382, in AND
return '(%s AND %s)' % (self.expand(first), self.expand(second))
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1544, in expand
out = op(first, second, **optional_args)
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1382, in AND
return '(%s AND %s)' % (self.expand(first), self.expand(second))
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1544, in expand
out = op(first, second, **optional_args)
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1440, in EQ
self.expand(second, first.type))
File "/home/meessen/mywap/web2py/gluon/dal.py", line 1555, in expand
return str(self.represent(expression, field_type))
File "/home/meessen/mywap/web2py/gluon/dal.py", line 2005, in represent
return str(long(obj))
ValueError: invalid literal for long() with base 10: '|2014|2014|'
LE GAC RenaudLE GAC Renaudhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/18Inhibit harvesters when the application property reg_institute is not defined2015-07-15T15:04:09+02:00LE GAC RenaudInhibit harvesters when the application property reg_institute is not defined* In the first tests performed by the first batch of *documentaliste*, it happened that some harvesters are run when the `reg_institute` property is not defined.
* In that case the harvester works but the list of authors belonging to th...* In the first tests performed by the first batch of *documentaliste*, it happened that some harvesters are run when the `reg_institute` property is not defined.
* In that case the harvester works but the list of authors belonging to the institute is wrong.
* **A protection has to be added to refuse to run the harvester when reg_institute is not defined**.https://gitlab.in2p3.fr/limbra/limbra/-/issues/17Review the logic of the harvester2015-09-25T10:45:18+02:00LE GAC RenaudReview the logic of the harvester* In the current implementation, we have to deal with the following problem:
*Des enregistrements ayant un champ collaboration mal défini sont rejetés par le moissonneur. Ils sont corrigés à la main et ajoutés dans la base de données...* In the current implementation, we have to deal with the following problem:
*Des enregistrements ayant un champ collaboration mal défini sont rejetés par le moissonneur. Ils sont corrigés à la main et ajoutés dans la base de données en utilisant l'action "éditer et insérer". Au prochain moissonnage ces enregistrements ne devraientt pas apparaitre dans la liste des publications moissonnées en erreurs car il sont déjà dans la base de données, mais ce n'est pas le cas.*
* In addition, when inserting a record in the database it can be reject by the database engine. This case of error is not detector nor counted.
* Implementing these require a deep modification of the code. It can only take place once the issue #9 is closed and when #11 is rather well advanced.https://gitlab.in2p3.fr/limbra/limbra/-/issues/16The harvesters must reject as soon as possible record using their id and the ...2015-09-21T19:02:09+02:00LE GAC RenaudThe harvesters must reject as soon as possible record using their id and the database field originhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/15Use `import datetime` instead of `from datetime import datetime`2015-05-21T17:21:46+02:00MEESSEN ChristopheUse `import datetime` instead of `from datetime import datetime`According to this [web2py mailing list discussion](https://groups.google.com/d/msg/web2py/kCBXXqdC3Yo/eddPAepsq9MJ), use of `from datetime import datetime` conflicts with use of `import datetime` and calling `datetime.datetime.now()`. Th...According to this [web2py mailing list discussion](https://groups.google.com/d/msg/web2py/kCBXXqdC3Yo/eddPAepsq9MJ), use of `from datetime import datetime` conflicts with use of `import datetime` and calling `datetime.datetime.now()`. The problem shows up when using the scheduler.
To avoid such conflict, use of `from datetime import datetime` must be replaced by `import datetime`, and calls to `datetime.XXX` must be replace with calls to `datetime.datetime.XXX`.
The problem occurs when the name of the module is the same as the name of the import. It is a python problem.
MEESSEN ChristopheMEESSEN Christophehttps://gitlab.in2p3.fr/limbra/limbra/-/issues/14Use of undefined variable in ProductionTools class2015-05-22T10:13:28+02:00MEESSEN ChristopheUse of undefined variable in ProductionTools class**This bug is in release 0.8.9 !!**
After a `git co production`, in modules/harvester_tools.py at line 484, the member variable y2 is used but has never been set.
It is the year value to use when `start_year` and `end_year` are n...**This bug is in release 0.8.9 !!**
After a `git co production`, in modules/harvester_tools.py at line 484, the member variable y2 is used but has never been set.
It is the year value to use when `start_year` and `end_year` are not defined.
LE GAC RenaudLE GAC Renaudhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/12Use pandas and the matplotlib library for the graph2015-11-06T15:09:35+01:00LE GAC RenaudUse pandas and the matplotlib library for the graph* A tentative to create graphs was introduced in version 0.8.8.1.
* It relies on the *chart package* of the Ext JS library (http://docs.sencha.com/extjs/4.2.1/#!/api)
* The quality of the graphs is poor with respect to what we are doin...* A tentative to create graphs was introduced in version 0.8.8.1.
* It relies on the *chart package* of the Ext JS library (http://docs.sencha.com/extjs/4.2.1/#!/api)
* The quality of the graphs is poor with respect to what we are doing.
* The pandas (http://pandas.pydata.org/pandas-docs/stable/) and the matplolib (http://matplotlib.org/) libraries have been developed to perform data analysis in python. They can manipulate a large amount of data, perform high level statistical analysis and produce high quality plot.
* They are available on the server and used in the `track_events` applications.
* Use **pandas** and **maplotlib** instead of the **chart package**.
* Take the code develop in `track_events` as an example. It might be possible to encapsulated the track_events part into a web2py plugin ?
https://gitlab.in2p3.fr/limbra/limbra/-/issues/11Develop unit test for the module invenio_tools and for the class PublicationT...2017-06-18T12:16:06+02:00LE GAC RenaudDevelop unit test for the module invenio_tools and for the class PublicationTool.* Make the harvesters robust is the priority.
* The first step is described in #9.
* The second test is to develop unit tests for the main classes of the `invenio_tool` module: `CheckAndFix`, `InvenioStore` and `Marc12`.
* The last st...* Make the harvesters robust is the priority.
* The first step is described in #9.
* The second test is to develop unit tests for the main classes of the `invenio_tool` module: `CheckAndFix`, `InvenioStore` and `Marc12`.
* The last step is to develop unit tests for the class `PublicationTool`.https://gitlab.in2p3.fr/limbra/limbra/-/issues/10Origin field contains a list of values2015-10-09T10:49:58+02:00LE GAC RenaudOrigin field contains a list of values* The same publication can be found in the `cds.cern.ch` and in the `inspirehep.cern.ch` stores:
- http://cds.cern.ch/record/1951625
- http://inspirehep.net/record/1319638
* In that case the database field origin has *two* val...* The same publication can be found in the `cds.cern.ch` and in the `inspirehep.cern.ch` stores:
- http://cds.cern.ch/record/1951625
- http://inspirehep.net/record/1319638
* In that case the database field origin has *two* values.
* The harvesters mechanism has been design to work with unique value for the origin field. It has to be modified to work with a list of values.
* Several lists are encoded in the database as a string in which values are separeted by comma. The same technique can be used in that case.
* To be develop once the issue #9 is working.
* List of value can be generated as soon as the record is found.
* CDS:
- The origin field is at the MARC key `0248 a`
- The MARC key `35 a` and `35 9` contains the origin field value in INSPIREHEP.
* INSPIREHEP:
- The origin field is at the MARC key `909C0 o`
- The MARC key `35 a` and `35 9` contains the origin field value in CDS.
https://gitlab.in2p3.fr/limbra/limbra/-/issues/9Develop unit tests for the class Record2015-09-03T18:44:31+02:00LE GAC RenaudDevelop unit tests for the class Record* The harvester is the critical part of this application.
* Procedures have to be developed to make them robust and to ensure that they are working before releasing a new version.
* The first step is to develop unit tests for the class...* The harvester is the critical part of this application.
* Procedures have to be developed to make them robust and to ensure that they are working before releasing a new version.
* The first step is to develop unit tests for the class ``Record``
* Use the python package nose (https://nose.readthedocs.org/en/latest/)
* Create a `test` directory in the modules one.
* Create a file `test_record_article.py`
* Recuperate a well known record from a store:
```
from invenio_tools import InvenioStore, Marc12
host = 'cds.cern.ch'
record_id = 1951625
store = InvenioStore(host)
xml = store.get_record(record_id)
record = Marc12(xml)[0]
```
* For each method of the class `Record`, making sense for the *article* category, develop a test function:
```
def test_collaboration()
assert record.collaboration() == "LHCb Collaboration"
```
* Develop the test file for the others categories Proceeding, Talk, Report, ...
https://gitlab.in2p3.fr/limbra/limbra/-/issues/8Modify the controller harvesters/run to scan several store2015-10-09T10:49:58+02:00LE GAC RenaudModify the controller harvesters/run to scan several store* The parameter of the action ``launch harvester`` are the year (period), the project and the publication category.
* In most of the case these parameters correspond to a single harvester defined in the database table ``harvesters``
* ...* The parameter of the action ``launch harvester`` are the year (period), the project and the publication category.
* In most of the case these parameters correspond to a single harvester defined in the database table ``harvesters``
* However, the articles (porceeding, talks, ...) can be looked for in the ``cds.cern.ch`` and in the ``inspirehep.net`` stores. In that case, the action parameters correspond to **two** harvesters.
* In the controller harvester/run, the first one is always selected, see line 272 ``row = selector.select(db.harvesters).first()``.
* This restriction has to be removed.https://gitlab.in2p3.fr/limbra/limbra/-/issues/7Adapt build_version to the gitlab branch model2015-10-03T16:55:38+02:00LE GAC RenaudAdapt build_version to the gitlab branch model* The current branch model relies on the branches: master, develop, feature, hotfix and release. It has been chosen before the migration to GitLab
* The branch model can be simplified using the GitLab functionalities **issues tracking**...* The current branch model relies on the branches: master, develop, feature, hotfix and release. It has been chosen before the migration to GitLab
* The branch model can be simplified using the GitLab functionalities **issues tracking** and **merge request**.
* It is based on two stable branches **master** and **production**.
* Each code modification (bug fix, improvement, ...) start with an issue.
* For each issue a feature branch is create. Its name starts with the issue number.
* When the code for the issue is finished, it is rebase with respect to the master branch and then pushed in the master branch via a *merge request*. The merge request description has to contains the issues number (fixes #14, closes #67, etc.). The issue has to be closed and the branch has to be deleted when the merge request is accepted.
* When the master branch reach a point corresponding to a release, it is pushed in the production branch via a *merge request*.
* An *hot fix* start by an issue. It is prepared in a dedicated branch. Once ready the dedicated branch is push to the master via a *merge request* (conflict might be solved at that time). The hot fix branch is pushed to the production branch when the hot fix is working in the master branch. Then the hot fix branch is deleted.
* More details in http://doc.gitlab.com/ee/workflow/gitlab_flow.html
* The script ``build_version.py`` contains options to create the feature, hot fix and release branches. These options are obsolete with the GitLab branch models and have to be removed.
* We might have to keep an option when creating a new release ? to be clarified.https://gitlab.in2p3.fr/limbra/limbra/-/issues/5Remove obsolete field harvesters.ratio2015-10-07T11:30:04+02:00LE GAC RenaudRemove obsolete field harvesters.ratio* The field ``harvesters.ratio`` was used when upgrading a conference talk to conference proceeding. It was designed to control small difference between the title of the talk and the title of the proceeding.
* Since version 0.8.9 talk...* The field ``harvesters.ratio`` was used when upgrading a conference talk to conference proceeding. It was designed to control small difference between the title of the talk and the title of the proceeding.
* Since version 0.8.9 talk are not upgrade to proceeding any more. Therefore this field is obsolete.
* It has to be removed from:
- the database model
- the existing databases
- the user guide
https://gitlab.in2p3.fr/limbra/limbra/-/issues/4Fail to export list in PDF or TeX using Chrome2015-12-14T16:34:12+01:00LE GAC RenaudFail to export list in PDF or TeX using ChromeWhen using the Chrome browser, the list generation is working when the output format is HTML. However it failed when the output format is either PDF or TeX.
This bug is confirmed and can be reproduced in test environment.When using the Chrome browser, the list generation is working when the output format is HTML. However it failed when the output format is either PDF or TeX.
This bug is confirmed and can be reproduced in test environment.https://gitlab.in2p3.fr/limbra/limbra/-/issues/3Export list using BibTex and CSV format2015-05-18T18:25:54+02:00LE GAC RenaudExport list using BibTex and CSV format* Some users request to export the list in BibTex and CSV formats.
* Reference defining the BibTex format details can be found in the book of Leslie Lamport "A document preparation system" or on the web, *e.g.* http://en.wikipedia.org/...* Some users request to export the list in BibTex and CSV formats.
* Reference defining the BibTex format details can be found in the book of Leslie Lamport "A document preparation system" or on the web, *e.g.* http://en.wikipedia.org/wiki/BibTeX.
LE GAC RenaudLE GAC Renaud