Automatize the harvesters

Currently, each group runs its harvesters manually. This development will run the harvesters for each group periodically.

Refactor harvester
Add automated harvester application parameter
Setup Scheduler with a skeleton automated harvesting task function
Phase1: Create a scheduler task for automated harvesting
- If global automated harvester parameter is not yes or true return from task
- Iterate on all harvester group entry
  - If harvest is False continue
  - Harvest group using process_url
  - Convert logs and collection_logs to json
- Use logging system for debug information
- Add an application parameter to define the execution scheduling
- Queue or dequeue automatic harvesting task according to application parameter values
- Requeue the automatic harvesting task with the new start time if the scheduling is modified
Phase 2: Create DB tables
- Create a table to hold automatic harvesting logs
- Write json logs and info into the table
- Erase logs older than one month
- Update the DB schema graphic
Phase 3: Create view for the logs
- Create Selector for harvesting logs display
- Create Controller function for harvesting logs
- Add menu command to display harvesting logs
- Get logs from the database

From that prototype, we identified all pieces required to run periodically the harvesters:

It also appears that we have to simplify the interface exposes to the user.

A possible evolution is to create a separate application, SCAN, connected to the task scheduler:

Give access to the schedule tables
Contain the logic to authorize the running of the harvester for a given track_publications_xxx database
Contain the logic to balance the load between the different track_publications_xxx applications

For each track_publication application, the user will have access to:

a switch to allow or not the periodic scan
a switch for each harvester
an action to consult log. It will give access to the date and the harvester log for each team. The layout is a grid where row are grouped per team. Each row contains the date and an hyper-link pointing to the harvester log.

Assignee

Assign to

Time tracking