limbra issueshttps://gitlab.in2p3.fr/limbra/limbra/-/issues2018-04-27T15:02:17+02:00https://gitlab.in2p3.fr/limbra/limbra/-/issues/80Improve filter on authors for publications2018-04-27T15:02:17+02:00LE GAC RenaudImprove filter on authors for publications* In the `publications` table, the filter for the grid can be run on the `author` field.
* Currently the operator is `contains`.
* Replace it by the `like` operator. Therefore a search on two authors can be executed, *e.g* `Dupont%Dura...* In the `publications` table, the filter for the grid can be run on the `author` field.
* Currently the operator is `contains`.
* Replace it by the `like` operator. Therefore a search on two authors can be executed, *e.g* `Dupont%Durant`.
* for the time being the search is base on a exact match. It perceived as painful for accentuated letter (é, è, ...). Modify the search pattern in such way that `Hélène` can be found when the user request `helene` !https://gitlab.in2p3.fr/limbra/limbra/-/issues/78Add the number of citations2020-03-20T17:30:27+01:00LE GAC RenaudAdd the number of citations* It is now possible to collect the *number of citations*:
- http://inspirehep.net/record/1471725?of=recjson&ot=number_of_citations.
- https://inspirehep.net/search?recid=1762038&ot=number_of_citations&of=recjson
* Add a databa...* It is now possible to collect the *number of citations*:
- http://inspirehep.net/record/1471725?of=recjson&ot=number_of_citations.
- https://inspirehep.net/search?recid=1762038&ot=number_of_citations&of=recjson
* Add a database table, `citations`, containing citations per publications as a function of time:
date,
publications.id,
number_of_citations
* Add a web2py cron job updating the number of citation once a week (http://web2py.com/books/default/chapter/29/04/the-core#Cron)
* Build a pdf report contains plots (article):
- `number of publications` as a function of the `number of citations`
- `box plot` on the number of citation all, per domaine scientifique, per team, or per project
https://gitlab.in2p3.fr/limbra/limbra/-/issues/77Migrate from MARC to JSON format2020-02-12T16:48:07+01:00LE GAC RenaudMigrate from MARC to JSON format* Stores `inspirehep.net` and `cds.cern.ch` can return a JSON string instead of an XML one.
* For example: http://inspirehep.net/record/1471725?of=recjson
* More on the JSON API: https://inspirehep.net/info/hep/api?ln=fr
* Map JSON / ...* Stores `inspirehep.net` and `cds.cern.ch` can return a JSON string instead of an XML one.
* For example: http://inspirehep.net/record/1471725?of=recjson
* More on the JSON API: https://inspirehep.net/info/hep/api?ln=fr
* Map JSON / MARC: https://github.com/inspirehep/invenio/blob/prod/modules/bibfield/etc/atlantis.cfg
* Comparison between JSON and MARC in [CodesJsonMarc.pdf](/uploads/644fd6af927f05d54064baba7fb5b7ab/CodesJsonMarc.pdf) and [Json_fields.pdf](/uploads/6d30afc3ff85bbc7389dcf5d81159f01/Json_fields.pdf)https://gitlab.in2p3.fr/limbra/limbra/-/issues/76Optimize python code2018-04-27T15:02:17+02:00LE GAC RenaudOptimize python codeImprove speed and memory footprint. Systematic use of:
* decorator `@staticmethod`
* PyDAL iterator `db(query).iterselect()`
* Use `pandas.DataFrame`
* itertools: `chain`, `imap`, `ifilter`, `izip`, ....Improve speed and memory footprint. Systematic use of:
* decorator `@staticmethod`
* PyDAL iterator `db(query).iterselect()`
* Use `pandas.DataFrame`
* itertools: `chain`, `imap`, `ifilter`, `izip`, ....https://gitlab.in2p3.fr/limbra/limbra/-/issues/86Use pandas.DataFrame in record for author and their affiliation2018-04-27T15:02:17+02:00LE GAC RenaudUse pandas.DataFrame in record for author and their affiliation* Class `invenio_tools.RecordPubli`
* Store the authors and their affiliation in a DataFrame with two columns `raw_author` and `affiliation`
* Columns like `format_author`, `first_name`, `last_name` can be added when required.
* Vector...* Class `invenio_tools.RecordPubli`
* Store the authors and their affiliation in a DataFrame with two columns `raw_author` and `affiliation`
* Columns like `format_author`, `first_name`, `last_name` can be added when required.
* Vectorize the function `format_author_fr` and `familly_name_fr`.
* Should speed up the author / affilation processing and simplify the code.https://gitlab.in2p3.fr/limbra/limbra/-/issues/75Add a combobox for the automate filter of the harvester table2018-04-27T15:02:17+02:00LE GAC RenaudAdd a combobox for the automate filter of the harvester table* Filter of the Harvester table
* Automate filter is there but it is a text field
* Replace it with a ComboBox containing the value. The ComboBox can be reset.* Filter of the Harvester table
* Automate filter is there but it is a text field
* Replace it with a ComboBox containing the value. The ComboBox can be reset.https://gitlab.in2p3.fr/limbra/limbra/-/issues/54Review the label for the table controller2018-04-27T15:02:17+02:00LE GAC RenaudReview the label for the table controller* The table controller associate an automate to publications category.
* For historical reason the name of the table, field are `controller(s)`.
* In version 0.8.14, the term `controller` has been replace by `automate`.
* This change ...* The table controller associate an automate to publications category.
* For historical reason the name of the table, field are `controller(s)`.
* In version 0.8.14, the term `controller` has been replace by `automate`.
* This change has to be propagated to that table and the related actions.https://gitlab.in2p3.fr/limbra/limbra/-/issues/83Speed up graph2018-04-27T15:02:17+02:00LE GAC RenaudSpeed up graph* deleguate dirty task to `DataFrame` instead of the database and python functions.* deleguate dirty task to `DataFrame` instead of the database and python functions.https://gitlab.in2p3.fr/limbra/limbra/-/issues/53Add language in the application preference2018-04-27T15:02:17+02:00LE GAC RenaudAdd language in the application preference* Allow to choose tke language from the UI.
* Possible values are FR et UK.* Allow to choose tke language from the UI.
* Possible values are FR et UK.https://gitlab.in2p3.fr/limbra/limbra/-/issues/82Speed up the model2018-04-27T15:02:17+02:00LE GAC RenaudSpeed up the model* Follow the `tev/plugin_event` implementaion
* Create the module `auth` with the function `configure_auth`
* Create modules `model_core`, `model_report`, `model_selector` with the class `Core`, `Report` and `Selector`
* Create modules ...* Follow the `tev/plugin_event` implementaion
* Create the module `auth` with the function `configure_auth`
* Create modules `model_core`, `model_report`, `model_selector` with the class `Core`, `Report` and `Selector`
* Create modules `ui_core`, `ui_report`, `ui_selector` and `ui_viewport` with the classes `CoreUi`, `ReportUi`, `SelectorUi` and `ViewportUi`
* Reduce the model to one short file `main.py`
The main steps are:
- [x] models
- [x] lazyT
- [x] use `current.db`, `current.virtdb`, `current.auth`
- [x] replace `'` by `"`
- [x] run `pylint`https://gitlab.in2p3.fr/limbra/limbra/-/issues/49Create sanity check wizard2018-04-27T15:02:17+02:00LE GAC RenaudCreate sanity check wizard* A wizard similar to `Check And Validate`
* It aim is to check the configuration / set up of the database:
1. The relation between `team` and `project`
2. The configuration of the harvester
3. The relation between automa...* A wizard similar to `Check And Validate`
* It aim is to check the configuration / set up of the database:
1. The relation between `team` and `project`
2. The configuration of the harvester
3. The relation between automaton and category
4. ...
* When a problem is report, ask first to run the sanity check.
* As soon as a new problem is solve, add more test in the sanity check.
* When there is problem with the PDF creation, require to run the `check and validate` wizard. Then to export the `LaTeX` file,....https://gitlab.in2p3.fr/limbra/limbra/-/issues/81Migrate to plugin_dbui 0.9.8.12018-04-27T15:02:17+02:00LE GAC RenaudMigrate to plugin_dbui 0.9.8.1* The release 0.9.8.1 is not backward compatible.* The release 0.9.8.1 is not backward compatible.https://gitlab.in2p3.fr/limbra/limbra/-/issues/44Use combobox with multiple selection2018-04-27T15:02:17+02:00LE GAC RenaudUse combobox with multiple selection* ComboBox can be configured to allow *multiple selection* (via multiSelector).
* Use this features in *metrics* and *graphs* when selecting publications categories.
* MultiSelector are available in ExtJS 6, and can be used in differen...* ComboBox can be configured to allow *multiple selection* (via multiSelector).
* Use this features in *metrics* and *graphs* when selecting publications categories.
* MultiSelector are available in ExtJS 6, and can be used in different ways.
* http://examples.sencha.com/extjs/6.0.1/examples/classic/multiselect/multiselect-demo.html
* http://examples.sencha.com/extjs/6.0.1/examples/kitchensink/#multi-selectorhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/39Harvest private collections2018-04-27T15:02:17+02:00LE GAC RenaudHarvest private collections* Private collections are private notes for Atlas, LHCb, ....
* Add a flag to identify *private collection* in the harvester configuration.
* Request login and password when an harvester runs on private collection.
* Exclude the scan ...* Private collections are private notes for Atlas, LHCb, ....
* Add a flag to identify *private collection* in the harvester configuration.
* Request login and password when an harvester runs on private collection.
* Exclude the scan of *private collection* when running all harvesters.
* Remove the wizard insert Marc XML.
Road to explore (I):
* the `cern-get-sso-cookie` is the linux command which might help.
* the packages can be found at http://linuxsoft.cern.ch/cern/centos/7/cern/x86_64/Packages/
* there is a python wrapper: https://github.com/sashabaranov/cernsso
* might require to register you application at CERN via https://sso-management.web.cern.ch/
* might be a good idea to understand the difference between `request` and `urlib`
* Might be a good idea to start with a small python script
Road to explore (II):
* the solution can be in https://media.readthedocs.org/pdf/flask-sso/latest/flask-sso.pdf since `flask` is not so different from `web2py`
* ...https://gitlab.in2p3.fr/limbra/limbra/-/issues/72Add author condition in the selector for metrics and graphs2018-04-27T15:02:17+02:00LE GAC RenaudAdd author condition in the selector for metrics and graphs* Same set of condition for lists, metrics and graphs
* Add the authors fields.
* The comparator operator is `contains`
* ...* Same set of condition for lists, metrics and graphs
* Add the authors fields.
* The comparator operator is `contains`
* ...https://gitlab.in2p3.fr/limbra/limbra/-/issues/6Automatize the harvesters2018-04-27T15:02:17+02:00LE GAC RenaudAutomatize the harvestersCurrently, each group runs its harvesters manually. This development will run the harvesters for each group periodically.
* Periodicity is once every week.
* The logs will be stored in the database and kept during one month.
* The log...Currently, each group runs its harvesters manually. This development will run the harvesters for each group periodically.
* Periodicity is once every week.
* The logs will be stored in the database and kept during one month.
* The logs can be view using the current *harvester views*.
* The automatize process can be switch off.
* Each harvester can be activated or deactivated in the automatize process.
* This development would relies on the web2py task scheduler.
### Roadmap
* [x] Refactor harvester
* [x] Add automated harvester application parameter
* [x] Setup Scheduler with a skeleton automated harvesting task function
* Phase1: Create a scheduler task for automated harvesting
* [x] If global automated harvester parameter is not *yes* or *true* return from task
* [x] Iterate on all harvester group entry
* [x] If harvest is False continue
* [x] Harvest group using process_url
* [x] Convert logs and collection_logs to json
* [x] Use logging system for debug information
* [x] Add an application parameter to define the execution scheduling
* [x] Queue or dequeue automatic harvesting task according to application parameter values
* [x] Requeue the automatic harvesting task with the new start time if the scheduling is modified
* Phase 2: Create DB tables
* [x] Create a table to hold automatic harvesting logs
* [x] Write json logs and info into the table
* [x] Erase logs older than one month
* [x] Update the DB schema graphic
* Phase 3: Create view for the logs
* [x] Create Selector for harvesting logs display
* [x] Create Controller function for harvesting logs
* [x] Add menu command to display harvesting logs
* [x] Get logs from the database
### Conclusions
From that prototype, we identified all pieces required to run periodically the harvesters:
* task scheduler
* scheduler tables
* task modules
* additional controller to manipualte the task and to give access to the log
It also appears that we have to simplify the interface exposes to the user.
A possible evolution is to create a separate application, SCAN, connected to the task scheduler:
* Give access to the schedule tables
* Contain the logic to authorize the running of the harvester for a given track_publications_xxx database
* Contain the logic to balance the load between the different track_publications_xxx applications
For each track_publication application, the user will have access to:
* a switch to allow or not the periodic scan
* a switch for each harvester
* an action to consult log. It will give access to the date and the harvester log for each team. The layout is a grid where row are grouped per team. Each row contains the date and an hyper-link pointing to the harvester log.
https://gitlab.in2p3.fr/limbra/limbra/-/issues/70Change the logic for status OK2018-04-27T15:02:17+02:00LE GAC RenaudChange the logic for status OK* Currently, record marked `OK` can't be deleted nor modified.
* Relax this rule following gitlab approach.
* To delete / modify a record marked `OK`, a popup window appears asking if you are really sure and to confirm that you really...* Currently, record marked `OK` can't be deleted nor modified.
* Relax this rule following gitlab approach.
* To delete / modify a record marked `OK`, a popup window appears asking if you are really sure and to confirm that you really want to do it by typing a predefined value which depends on the record, *e.g.* the beginning of the title of the name of the first author.
* ...https://gitlab.in2p3.fr/limbra/limbra/-/issues/69Change the javascript namespace from Trp to Limbra2018-04-27T15:02:17+02:00LE GAC RenaudChange the javascript namespace from Trp to Limbra* See `static/limbra/src/wizard/Harvester.js`
* Will improve the lisibility of the documentation.
* ...* See `static/limbra/src/wizard/Harvester.js`
* Will improve the lisibility of the documentation.
* ...https://gitlab.in2p3.fr/limbra/limbra/-/issues/65Run jsduck, sencha or sphinx via a docker container2018-04-27T15:02:17+02:00LE GAC RenaudRun jsduck, sencha or sphinx via a docker container* In the development setup, `build_version.py` run the command `jsduck`, `sencha` or `sphinx` using those defined in the local file system. It is painful to install them. In addition they are availble in docker image `web2py-degj`.
* It...* In the development setup, `build_version.py` run the command `jsduck`, `sencha` or `sphinx` using those defined in the local file system. It is painful to install them. In addition they are availble in docker image `web2py-degj`.
* It would be possible to run them via a docker container: `docker run --rm web2py-degj:2.9.11 sencha [options] ....`
* Add an option `image` or `docker image` to select the relevant image
* The same mechanism can be applied to the `run` command using web2py.LE GAC RenaudLE GAC Renaudhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/93fix bugs2019-10-31T10:11:03+01:00LE GAC Renaudfix bugs* [x] the role `user` is not working any more.
* [x] The affiliation key `Ecole Polytechnique` select autors from LLR (`Ecole Polytechnique`) and EPFL (`Ecole Polytechnique, Lausanne`)
* [x] Internal server error https://inspirehep.net/r...* [x] the role `user` is not working any more.
* [x] The affiliation key `Ecole Polytechnique` select autors from LLR (`Ecole Polytechnique`) and EPFL (`Ecole Polytechnique, Lausanne`)
* [x] Internal server error https://inspirehep.net/record/1713704 (article notice error)
* [x] Internal server error https://inspirehep.net/record/1692891 (proceeding missing 909CO field))
* [x] Internal server error https://inspirehep.net/record/1610670 (proceeding missing 909CO field)
* [x] Add a button Go in the selector of the publication tables
* [x] Allow user to modify authors rolehttps://gitlab.in2p3.fr/limbra/limbra/-/issues/94migrate to python 3.72019-12-05T18:23:20+01:00LE GAC Renaudmigrate to python 3.7* require web2py 2.18.3 or above
* require plugin_dbui 0.9.9.1-py37 or above* require web2py 2.18.3 or above
* require plugin_dbui 0.9.9.1-py37 or abovehttps://gitlab.in2p3.fr/limbra/limbra/-/issues/24Better strategy to find the institute identifier2015-09-17T19:36:08+02:00LE GAC RenaudBetter strategy to find the institute identifier* The institute identifier is defined in the application preferences, *reg_institute*
* It has to match the value used in the invenio store.
A better strategy, would be:
* Store the record identifier associated to the institute in t...* The institute identifier is defined in the application preferences, *reg_institute*
* It has to match the value used in the invenio store.
A better strategy, would be:
* Store the record identifier associated to the institute in the INSPIREHEP data base
* Extract from it the identifier used in INSPIREHEP or CDS and use it in the harvesters.
More on decoding:
* the address of the record is http://inspirehep.net/record/902989
* The definition of the institute identifier is in the field `110u` and `110t`. The field `110u` is the institute id used up to now while the field `110t` is the future one.
* Create a regular expression `110u|110t` and store it in the local variable `reg_institute`
* The name of the preference can be `inspirehep_institute_id`. It contains the record id, *e.g* `902989`
* Remove the preference `reg_institute` which become a local variable. It should be constructed when the first harvester runs.
* A dedicated class, `Institute` might have to be created.
* Do not forget to modify the documentationhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/16The harvesters must reject as soon as possible record using their id and the ...2015-09-21T19:02:09+02:00LE GAC RenaudThe harvesters must reject as soon as possible record using their id and the database field originhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/63Change the name of the project to LIMBRA2016-06-08T18:35:36+02:00LE GAC RenaudChange the name of the project to LIMBRA*LIstes et Métriques BibliogRaphiques Automatisées**LIstes et Métriques BibliogRaphiques Automatisées*https://gitlab.in2p3.fr/limbra/limbra/-/issues/9Develop unit tests for the class Record2015-09-03T18:44:31+02:00LE GAC RenaudDevelop unit tests for the class Record* The harvester is the critical part of this application.
* Procedures have to be developed to make them robust and to ensure that they are working before releasing a new version.
* The first step is to develop unit tests for the class...* The harvester is the critical part of this application.
* Procedures have to be developed to make them robust and to ensure that they are working before releasing a new version.
* The first step is to develop unit tests for the class ``Record``
* Use the python package nose (https://nose.readthedocs.org/en/latest/)
* Create a `test` directory in the modules one.
* Create a file `test_record_article.py`
* Recuperate a well known record from a store:
```
from invenio_tools import InvenioStore, Marc12
host = 'cds.cern.ch'
record_id = 1951625
store = InvenioStore(host)
xml = store.get_record(record_id)
record = Marc12(xml)[0]
```
* For each method of the class `Record`, making sense for the *article* category, develop a test function:
```
def test_collaboration()
assert record.collaboration() == "LHCb Collaboration"
```
* Develop the test file for the others categories Proceeding, Talk, Report, ...
https://gitlab.in2p3.fr/limbra/limbra/-/issues/37Create a wizard to configure harvesters2015-12-09T18:06:38+01:00LE GAC RenaudCreate a wizard to configure harvesters* Ease the configuration of harvesters for non expert.
* Wizard with 4 tabs. The wizard is launched by the action `harvesters > Add`.
* TAB 1 → select the *team* and the *project*.
* TAB 2 → select the store (`cds.cern.ch` or `inspire...* Ease the configuration of harvesters for non expert.
* Wizard with 4 tabs. The wizard is launched by the action `harvesters > Add`.
* TAB 1 → select the *team* and the *project*.
* TAB 2 → select the store (`cds.cern.ch` or `inspirehep.net`)
* TAB 3 → select the collection by choosing the *experiment* and the *automaton*:
* The underlying processing should generate the search criteria which depends on the store.
* Could we extract the list of experiment from inspirehep?
* Could extract the list of collection (*e.g* LHCb Papers) from cds?
* Could we standardize the search criteria between the two stores?
* One can have an expert view in which the user enter the criteria by hand.
* TAB 4 → select the publication category. Possible values depends on the automatonhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/2Show list of feature or release identifier to pick from in build_version.py2015-04-24T19:02:43+02:00MEESSEN ChristopheShow list of feature or release identifier to pick from in build_version.pyWhen closing a feature or a release with build_version.py it currently shows the list of all branches.
* It would be more convenient if it shows a list of identifiers only.
* If there is only one feature or release branch, it shoul...When closing a feature or a release with build_version.py it currently shows the list of all branches.
* It would be more convenient if it shows a list of identifiers only.
* If there is only one feature or release branch, it should pick it's identifier as default value. MEESSEN ChristopheMEESSEN Christophehttps://gitlab.in2p3.fr/limbra/limbra/-/issues/3Export list using BibTex and CSV format2015-05-18T18:25:54+02:00LE GAC RenaudExport list using BibTex and CSV format* Some users request to export the list in BibTex and CSV formats.
* Reference defining the BibTex format details can be found in the book of Leslie Lamport "A document preparation system" or on the web, *e.g.* http://en.wikipedia.org/...* Some users request to export the list in BibTex and CSV formats.
* Reference defining the BibTex format details can be found in the book of Leslie Lamport "A document preparation system" or on the web, *e.g.* http://en.wikipedia.org/wiki/BibTeX.
LE GAC RenaudLE GAC Renaudhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/5Remove obsolete field harvesters.ratio2015-10-07T11:30:04+02:00LE GAC RenaudRemove obsolete field harvesters.ratio* The field ``harvesters.ratio`` was used when upgrading a conference talk to conference proceeding. It was designed to control small difference between the title of the talk and the title of the proceeding.
* Since version 0.8.9 talk...* The field ``harvesters.ratio`` was used when upgrading a conference talk to conference proceeding. It was designed to control small difference between the title of the talk and the title of the proceeding.
* Since version 0.8.9 talk are not upgrade to proceeding any more. Therefore this field is obsolete.
* It has to be removed from:
- the database model
- the existing databases
- the user guide
https://gitlab.in2p3.fr/limbra/limbra/-/issues/7Adapt build_version to the gitlab branch model2015-10-03T16:55:38+02:00LE GAC RenaudAdapt build_version to the gitlab branch model* The current branch model relies on the branches: master, develop, feature, hotfix and release. It has been chosen before the migration to GitLab
* The branch model can be simplified using the GitLab functionalities **issues tracking**...* The current branch model relies on the branches: master, develop, feature, hotfix and release. It has been chosen before the migration to GitLab
* The branch model can be simplified using the GitLab functionalities **issues tracking** and **merge request**.
* It is based on two stable branches **master** and **production**.
* Each code modification (bug fix, improvement, ...) start with an issue.
* For each issue a feature branch is create. Its name starts with the issue number.
* When the code for the issue is finished, it is rebase with respect to the master branch and then pushed in the master branch via a *merge request*. The merge request description has to contains the issues number (fixes #14, closes #67, etc.). The issue has to be closed and the branch has to be deleted when the merge request is accepted.
* When the master branch reach a point corresponding to a release, it is pushed in the production branch via a *merge request*.
* An *hot fix* start by an issue. It is prepared in a dedicated branch. Once ready the dedicated branch is push to the master via a *merge request* (conflict might be solved at that time). The hot fix branch is pushed to the production branch when the hot fix is working in the master branch. Then the hot fix branch is deleted.
* More details in http://doc.gitlab.com/ee/workflow/gitlab_flow.html
* The script ``build_version.py`` contains options to create the feature, hot fix and release branches. These options are obsolete with the GitLab branch models and have to be removed.
* We might have to keep an option when creating a new release ? to be clarified.https://gitlab.in2p3.fr/limbra/limbra/-/issues/8Modify the controller harvesters/run to scan several store2015-10-09T10:49:58+02:00LE GAC RenaudModify the controller harvesters/run to scan several store* The parameter of the action ``launch harvester`` are the year (period), the project and the publication category.
* In most of the case these parameters correspond to a single harvester defined in the database table ``harvesters``
* ...* The parameter of the action ``launch harvester`` are the year (period), the project and the publication category.
* In most of the case these parameters correspond to a single harvester defined in the database table ``harvesters``
* However, the articles (porceeding, talks, ...) can be looked for in the ``cds.cern.ch`` and in the ``inspirehep.net`` stores. In that case, the action parameters correspond to **two** harvesters.
* In the controller harvester/run, the first one is always selected, see line 272 ``row = selector.select(db.harvesters).first()``.
* This restriction has to be removed.https://gitlab.in2p3.fr/limbra/limbra/-/issues/10Origin field contains a list of values2015-10-09T10:49:58+02:00LE GAC RenaudOrigin field contains a list of values* The same publication can be found in the `cds.cern.ch` and in the `inspirehep.cern.ch` stores:
- http://cds.cern.ch/record/1951625
- http://inspirehep.net/record/1319638
* In that case the database field origin has *two* val...* The same publication can be found in the `cds.cern.ch` and in the `inspirehep.cern.ch` stores:
- http://cds.cern.ch/record/1951625
- http://inspirehep.net/record/1319638
* In that case the database field origin has *two* values.
* The harvesters mechanism has been design to work with unique value for the origin field. It has to be modified to work with a list of values.
* Several lists are encoded in the database as a string in which values are separeted by comma. The same technique can be used in that case.
* To be develop once the issue #9 is working.
* List of value can be generated as soon as the record is found.
* CDS:
- The origin field is at the MARC key `0248 a`
- The MARC key `35 a` and `35 9` contains the origin field value in INSPIREHEP.
* INSPIREHEP:
- The origin field is at the MARC key `909C0 o`
- The MARC key `35 a` and `35 9` contains the origin field value in CDS.
https://gitlab.in2p3.fr/limbra/limbra/-/issues/11Develop unit test for the module invenio_tools and for the class PublicationT...2017-06-18T12:16:06+02:00LE GAC RenaudDevelop unit test for the module invenio_tools and for the class PublicationTool.* Make the harvesters robust is the priority.
* The first step is described in #9.
* The second test is to develop unit tests for the main classes of the `invenio_tool` module: `CheckAndFix`, `InvenioStore` and `Marc12`.
* The last st...* Make the harvesters robust is the priority.
* The first step is described in #9.
* The second test is to develop unit tests for the main classes of the `invenio_tool` module: `CheckAndFix`, `InvenioStore` and `Marc12`.
* The last step is to develop unit tests for the class `PublicationTool`.https://gitlab.in2p3.fr/limbra/limbra/-/issues/12Use pandas and the matplotlib library for the graph2015-11-06T15:09:35+01:00LE GAC RenaudUse pandas and the matplotlib library for the graph* A tentative to create graphs was introduced in version 0.8.8.1.
* It relies on the *chart package* of the Ext JS library (http://docs.sencha.com/extjs/4.2.1/#!/api)
* The quality of the graphs is poor with respect to what we are doin...* A tentative to create graphs was introduced in version 0.8.8.1.
* It relies on the *chart package* of the Ext JS library (http://docs.sencha.com/extjs/4.2.1/#!/api)
* The quality of the graphs is poor with respect to what we are doing.
* The pandas (http://pandas.pydata.org/pandas-docs/stable/) and the matplolib (http://matplotlib.org/) libraries have been developed to perform data analysis in python. They can manipulate a large amount of data, perform high level statistical analysis and produce high quality plot.
* They are available on the server and used in the `track_events` applications.
* Use **pandas** and **maplotlib** instead of the **chart package**.
* Take the code develop in `track_events` as an example. It might be possible to encapsulated the track_events part into a web2py plugin ?
https://gitlab.in2p3.fr/limbra/limbra/-/issues/17Review the logic of the harvester2015-09-25T10:45:18+02:00LE GAC RenaudReview the logic of the harvester* In the current implementation, we have to deal with the following problem:
*Des enregistrements ayant un champ collaboration mal défini sont rejetés par le moissonneur. Ils sont corrigés à la main et ajoutés dans la base de données...* In the current implementation, we have to deal with the following problem:
*Des enregistrements ayant un champ collaboration mal défini sont rejetés par le moissonneur. Ils sont corrigés à la main et ajoutés dans la base de données en utilisant l'action "éditer et insérer". Au prochain moissonnage ces enregistrements ne devraientt pas apparaitre dans la liste des publications moissonnées en erreurs car il sont déjà dans la base de données, mais ce n'est pas le cas.*
* In addition, when inserting a record in the database it can be reject by the database engine. This case of error is not detector nor counted.
* Implementing these require a deep modification of the code. It can only take place once the issue #9 is closed and when #11 is rather well advanced.https://gitlab.in2p3.fr/limbra/limbra/-/issues/23Improve the interface2015-07-15T12:27:57+02:00LE GAC RenaudImprove the interface* [x] Add a node *application*
* [x] Migrate the leaf *properties* in the application node and rename it *préferences*
* [x] Migrate the *CAS* leaves to the *applications* node
* [x] Destroy the *CAS* and *configure application* nodes
* ...* [x] Add a node *application*
* [x] Migrate the leaf *properties* in the application node and rename it *préferences*
* [x] Migrate the *CAS* leaves to the *applications* node
* [x] Destroy the *CAS* and *configure application* nodes
* [x] Migrate the leaves *edit and insert*, *insert MARCXML* into the *wizard* node
* [x] For harvester table and wizard use the label *automaton* or *robot* instead of *category*, for the field *controller*
* [x] Update the user documentationhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/30Get conference data in the Marc12 decoding not in CheckAndFix2015-09-20T15:59:43+02:00LE GAC RenaudGet conference data in the Marc12 decoding not in CheckAndFix* for a talk or a proceeding the record is retrieved and decoded by the `Marc12` class.
* Later on the `CheckAndFix` class adds the conference data.
It should be better to have a record complete as soon as possible. Therefore, all me...* for a talk or a proceeding the record is retrieved and decoded by the `Marc12` class.
* Later on the `CheckAndFix` class adds the conference data.
It should be better to have a record complete as soon as possible. Therefore, all methods are available when a `Record` is ready. The class CheckAndFix only correct non-conformities
* [ ] add the method `_get_conference_data` to the `Marc12` class
* [ ] execute the method in `Marc12.__call__`
* [ ] simplify the method `CheckAndFix.conference`
* [ ] modify the `tests/harvester` section (conference record can be check in the section record).https://gitlab.in2p3.fr/limbra/limbra/-/issues/31Add the synonym field in the country table2015-10-07T11:30:04+02:00LE GAC RenaudAdd the synonym field in the country table* The `synonym` field is a `string` containing values separated by a comma.
* if a country is not found, then look to the synomym. If nothing match, reject the record. Latter on the user can add a new country or synonym and catch the fa...* The `synonym` field is a `string` containing values separated by a comma.
* if a country is not found, then look to the synomym. If nothing match, reject the record. Latter on the user can add a new country or synonym and catch the faulty record.https://gitlab.in2p3.fr/limbra/limbra/-/issues/64test existence of jsduck and sench using which2016-02-18T12:49:33+01:00LE GAC Renaudtest existence of jsduck and sench using which* `build_version.py`
* Test the existence of the command `jsduck` and `sencha` using `which` instead of a fix path.
* This is required to run on the docker image for the server* `build_version.py`
* Test the existence of the command `jsduck` and `sencha` using `which` instead of a fix path.
* This is required to run on the docker image for the serverhttps://gitlab.in2p3.fr/limbra/limbra/-/issues/29Break the modules invenio_tools and harvest_tools2015-09-11T15:06:50+02:00LE GAC RenaudBreak the modules invenio_tools and harvest_tools* The modules `invenio_tools` is too big. It contains 4 classes, 5 exceptions and its length is about 2680 lines
* The modules `harvest_tools` is also too long. It contains 10 classes, 6 functions and its length is about 1970 lines
* T...* The modules `invenio_tools` is too big. It contains 4 classes, 5 exceptions and its length is about 2680 lines
* The modules `harvest_tools` is also too long. It contains 10 classes, 6 functions and its length is about 1970 lines
* Transform these modules in packages with the same name. The packages contains several sub-modules, one per class. Add one base sub-module for regular expression, functions used in several places,...
* In addition the translation (`gluon.current.T`) of the error message is performed at the initialisation of the modules `invenio_tools`. Remove the translation, it can be done later on, in the error handler.
https://gitlab.in2p3.fr/limbra/limbra/-/issues/40Unlock the publication field authors_role2015-10-29T10:08:30+01:00LE GAC RenaudUnlock the publication field authors_role* Unlock the field `authors_roles`.
* Fill the table `authors_roles` with predefined values.
* Add this items in the publications grid filter.
* Add this item in the list / metric selectors.* Unlock the field `authors_roles`.
* Fill the table `authors_roles` with predefined values.
* Add this items in the publications grid filter.
* Add this item in the list / metric selectors.https://gitlab.in2p3.fr/limbra/limbra/-/issues/36Use list widget for synonym field2015-10-19T13:38:25+02:00LE GAC RenaudUse list widget for synonym field* Solve the case ANTARES and ANTARES, TANAMI
* Treat ``collaboration`` as ``country`` or ``publishers``. Do no enter a new value. The user has to decide what to do even if the collaboration is well formed. Solve the issue like *ATLAS Co...* Solve the case ANTARES and ANTARES, TANAMI
* Treat ``collaboration`` as ``country`` or ``publishers``. Do no enter a new value. The user has to decide what to do even if the collaboration is well formed. Solve the issue like *ATLAS Collaboration* versus *ATLAS Collaborations*
* In mysql the field type is `LONGTEXT` (instead of `TEXT`) for `list:string` and `json`.
* ...https://gitlab.in2p3.fr/limbra/limbra/-/issues/57Add a command in build_version to install a plugin2016-02-23T17:32:04+01:00LE GAC RenaudAdd a command in build_version to install a plugin* Remove the existing plugin
* Install a fresh version
* plugin repository can be choose by the user (by default ../plugin_dbui_build)
* Proposal:
```
./build_version plugin dbui extjs mathjax
./build_version plugin --rel...* Remove the existing plugin
* Install a fresh version
* plugin repository can be choose by the user (by default ../plugin_dbui_build)
* Proposal:
```
./build_version plugin dbui extjs mathjax
./build_version plugin --release 0.7.3dev dbui
./build_version plugin --git ../plugin_dbui_build extjs
./build_version plugin --tar myfile.tar.gz dbui extjs mathjax
```https://gitlab.in2p3.fr/limbra/limbra/-/issues/62Migrate to web2py 2.13.42016-06-08T18:35:36+02:00LE GAC RenaudMigrate to web2py 2.13.4* The current version `0.9.5.2` is not running.
* Since the release `2.10.1`, web2py uses the external module `pyDAL` instead of `gluon.dal`.
* Then the call `from gluon.dal import smart_query` is failing.
* Remove `from ...` and uses...* The current version `0.9.5.2` is not running.
* Since the release `2.10.1`, web2py uses the external module `pyDAL` instead of `gluon.dal`.
* Then the call `from gluon.dal import smart_query` is failing.
* Remove `from ...` and uses `DAL.smart_query` in a controller or other syntax in a module.
* Check all `from gluon.dal import ...`https://gitlab.in2p3.fr/limbra/limbra/-/issues/32Add the synonym field in the publishers table2015-10-07T11:30:04+02:00LE GAC RenaudAdd the synonym field in the publishers table* The `synonym` field is a `string` containing values separeted by a comma.
* If a publisher does not exit have a look to the synonym. If nothing match, reject the record. Latter on, the user can add a new publisher or synonym in the da...* The `synonym` field is a `string` containing values separeted by a comma.
* If a publisher does not exit have a look to the synonym. If nothing match, reject the record. Latter on, the user can add a new publisher or synonym in the database and catch the faulty record.
* Rename the field `publisher`.https://gitlab.in2p3.fr/limbra/limbra/-/issues/25Export list in word2015-12-15T09:41:48+01:00LE GAC RenaudExport list in word* Popular request, mainly related to AERES reports.
* Not so easy to implement since, the converter has to deal with equations.
* Known converter are not perfect.
* ...
* A solution will be:
1. Export the list in RTF
2....* Popular request, mainly related to AERES reports.
* Not so easy to implement since, the converter has to deal with equations.
* Known converter are not perfect.
* ...
* A solution will be:
1. Export the list in RTF
2. Do not render the equation.
3. Avoid to depend on external converter by using the python module `pyrtf`. It is available in web2py.
https://gitlab.in2p3.fr/limbra/limbra/-/issues/35Publisher abbreviations have to follow the ISO 4 standard2015-10-28T18:06:31+01:00LE GAC RenaudPublisher abbreviations have to follow the ISO 4 standard* ISO-4 [abbreviation](http://www.issn.org/services/online-services/access-to-the-ltwa/)
* List of journal abbreviations:
* [WLS](http://www.wsl.ch/dienstleistungen/publikationen/office/abk_EN)
* [CASSI](http://cassi.cas.org/s...* ISO-4 [abbreviation](http://www.issn.org/services/online-services/access-to-the-ltwa/)
* List of journal abbreviations:
* [WLS](http://www.wsl.ch/dienstleistungen/publikationen/office/abk_EN)
* [CASSI](http://cassi.cas.org/search.jsp)
* [INSPIREHEP](http://inspirehep.net/collection/Journals)
* WebOfScience
* Official definition from `Web Of Sciences` [InCites Journal citations Reports](https://jcr.incites.thomsonreuters.com/JCRJournalHomeAction.action?year=&edition=&journal=)https://gitlab.in2p3.fr/limbra/limbra/-/issues/42Review the error log generated in production 0.8.142015-11-10T17:18:57+01:00LE GAC RenaudReview the error log generated in production 0.8.14* To improve the robustness of the code.
* Review the errors log generated in production by the version 0.8.14
* Try to fix errors as far as possible.
* This might help to fix the crash of the web2py server happening from time to time...* To improve the robustness of the code.
* Review the errors log generated in production by the version 0.8.14
* Try to fix errors as far as possible.
* This might help to fix the crash of the web2py server happening from time to time. The origin of the crash is not understood.
* Must be done before the release 0.9.0https://gitlab.in2p3.fr/limbra/limbra/-/issues/96Migrate to new inspirehep API2021-04-30T14:56:00+02:00LE GAC RenaudMigrate to new inspirehep API* Since March 2020 a new JSON API is available to search and to retrieve publications from `inspirehep.net`
* The old JSON API is available from `old.inspirehep.net`. It is also used by `cds.cern.ch`.
* `Inspirehep` will close the old AP...* Since March 2020 a new JSON API is available to search and to retrieve publications from `inspirehep.net`
* The old JSON API is available from `old.inspirehep.net`. It is also used by `cds.cern.ch`.
* `Inspirehep` will close the old API in the coming month.
* As a consequence, LIMBRA will have to deal with two stores (cds and inspirehep) working with a different API
* Documentation for the new API: https://github.com/inspirehep/rest-api-doc