Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
L limbra
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 15
    • Issues 15
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • limbra
  • limbra
  • Issues
  • #6

Closed
Open
Created May 12, 2015 by LE GAC Renaud@legacOwner20 of 20 tasks completed20/20 tasks

Automatize the harvesters

Currently, each group runs its harvesters manually. This development will run the harvesters for each group periodically.

  • Periodicity is once every week.
  • The logs will be stored in the database and kept during one month.
  • The logs can be view using the current harvester views.
  • The automatize process can be switch off.
  • Each harvester can be activated or deactivated in the automatize process.
  • This development would relies on the web2py task scheduler.

Roadmap

  • Refactor harvester
  • Add automated harvester application parameter
  • Setup Scheduler with a skeleton automated harvesting task function
  • Phase1: Create a scheduler task for automated harvesting
    • If global automated harvester parameter is not yes or true return from task
    • Iterate on all harvester group entry
      • If harvest is False continue
      • Harvest group using process_url
      • Convert logs and collection_logs to json
    • Use logging system for debug information
    • Add an application parameter to define the execution scheduling
    • Queue or dequeue automatic harvesting task according to application parameter values
    • Requeue the automatic harvesting task with the new start time if the scheduling is modified
  • Phase 2: Create DB tables
    • Create a table to hold automatic harvesting logs
    • Write json logs and info into the table
    • Erase logs older than one month
    • Update the DB schema graphic
  • Phase 3: Create view for the logs
    • Create Selector for harvesting logs display
    • Create Controller function for harvesting logs
    • Add menu command to display harvesting logs
    • Get logs from the database

Conclusions

From that prototype, we identified all pieces required to run periodically the harvesters:

  • task scheduler
  • scheduler tables
  • task modules
  • additional controller to manipualte the task and to give access to the log

It also appears that we have to simplify the interface exposes to the user.

A possible evolution is to create a separate application, SCAN, connected to the task scheduler:

  • Give access to the schedule tables
  • Contain the logic to authorize the running of the harvester for a given track_publications_xxx database
  • Contain the logic to balance the load between the different track_publications_xxx applications

For each track_publication application, the user will have access to:

  • a switch to allow or not the periodic scan
  • a switch for each harvester
  • an action to consult log. It will give access to the date and the harvester log for each team. The layout is a grid where row are grouped per team. Each row contains the date and an hyper-link pointing to the harvester log.
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking