Design of the SWARM for performance analysis
Current situation
The implementation of stereoscopic analysis in HipeRTA will require the SWARM to work with daemons to send and receive data from multiple R0DL1 and Stereo daemons. The main issue we face is that the new architecture will place additional pressure on the network. As a result, it is essential to design a resilient architecture capable of handling network errors. The objective is so to design and implement a performance monitoring system that will be responsible for analyzing latencies and give feedback through a user Friendly interface.
Development plan
In order to ensure optimal functionality, it is essential to separate tasks into a well-structured and coherent workflow. This will help maintain efficiency and improve performance during development. The next steps to implement the SWARM properly are :
-
Design and implement the stereo daemons. -
Design and implement a performance monitoring system : -
Define which data are required for the performance analysis. -
Propose and design a user-friendly interface for the performance monitoring of the SWARM. -
Implement simple test procedure to validate the architecture.
-
-
Evaluate the stereo daemons using the performance manager.
Ideas to be explored
User Interface
Concerning the user interface, a good solution can be a tree representation or something equivalent. This might provide a good user interface with some features :
-
Colorized branches which can indicate the status of communication between multiple daemons (from green which is ok to red which indicates an error) -
Branches of varying sizes to indicate the frequency of communication between two daemons. -
Possibility to access a branch or a daemon to see more in detail its latencies.
For this interface, we need to work on the frontend application which will have some technical requirements, such as the capacity to handle lot of data, because we must be able to perform monitoring analysis of the SWARM for up to 100 telescopes.
For the frontend, we can think about using several technologies :
-
Some similar technologies than Nvidia Morpheus which is currently open source (not sure that its frontend is open-source to) -
Although Grafana is not specifically designed for tree display, we can think about use it for graphs or custom visualizations to represent relationships between different entities or systems (such as daemons or services) in a hierarchical format, or use some plugins such as TreeMap or Flowcharting which are Grafana plugins that lets you represent data in the form of a tree map, where elements are displayed as blocks with variable sizes and colors. -
To represent hierarchical relationships or connections between entities (for example, daemons in our case), we could use libraries like NetworkX (in Python) to create graphs and data networks that can be displayed with Streamlit or other graphics technology.
As both backend and frontend, we could use CheckMK which provides agent to gather metrics and server to display them properly.
This should be discussed after the SWARM stereo daemons have been implemented and undergo basic testing.
