Commit 2bf70409 authored by Maude Le Jeune's avatar Maude Le Jeune
Browse files

data storage + web application + db reconstruction

parent e2038324
Pipelet is a free framework allowing for creation, manipulation,
execution and browsing of scientific data processing pipelines. It
provides:
WARNING Pipelet is currently under active development and highly
unstable. There is good chance that it becomes incompatible from one
commit to another.
Pipelet is a free framework allowing for creation, manipulation,
execution and browsing of scientific data processing pipelines. It
provides:
+ easy chaining of interdependent elementary tasks,
+ web access to data products,
+ branch handling,
......@@ -76,6 +76,14 @@ pipeweb start
4. You should be able to browse the result on the web page
http://localhost:8080
*** Getting a new pipe framework
To get a new pipe framework, with sample main and segment scripts :
pipeutils -c pipename
** Writing Pipes
*** Pipeline architecture
......@@ -156,6 +164,10 @@ directive can be found the last one is retained.
- @multiplex gather : The input set contains one tuple of all the ouputs.
*** Depend directive
*** Orphan segments
TODO TBD
......@@ -170,7 +182,27 @@ actual data elsewhere, but you will loose the benefit of automated
versionning which proves to be quite convenient.
The storage is organized as follows:
all data are stored below a root
- all pipeline instances are stored below a root which corresponds to
the prefix parameter of the Pipeline object.
/prefix/
- all segment meta data are stored below a root which name corresponds
to an unique match of the segment code.
/prefix/seg_segname_YFLJ65/
- Segment's meta data are:
- a copy of the segment python script
- a copy of all segment hook scripts
- a parameter file (.args) which contains segment parameters value
- a meta data file (.meta) which contains some extra meta data
- all segment instances data and meta data are stored in a specific subdirectory
which name corresponds to a string representation of its input
/prefix/seg_segname_YFLJ65/data/1/
- if there is a single segment instance, then data are stored in
/prefix/seg_segname_YFLJ65/data/
- If a segment has at least one parent, its root will be located below
one of its parent's one :
/prefix/seg_segname_YFLJ65/seg_segname2_PLMBH9/
- etc...
*** The segment environment
......@@ -178,16 +210,20 @@ The segment code is executed in a specific environment that provides:
1. access to the segment input and output
- seg_input: this variable is a dictionnary containing the input of the segment
- get_input():
- get_input():
- seg_output: this variable has to be set to a list containing the
2. Functionnalities to use the automated hierarchical data storage system.
- get_data_fn(basename): complete the filename with the path to the working directory.
- glob_seg():
- glob_seg(regexp, seg): return the list of filename matching regexp from segment seg
- get_tmp_fn(): return a temporary filename.
3. Various convenient functionalities
3. Functionnalities to use the automated parameters handling
- var_key: list of parameter names of the segment
- var_tag: list of parameter names which will be made visible from the web interface
- load_param(seg, var_names)
4. Various convenient functionalities
- save_products(filename=', var_names='*'): use pickle to save a
part of a given namespace.
- load_products(filename, var_names): update the namespace by
......@@ -195,33 +231,26 @@ The segment code is executed in a specific environment that provides:
- logged_subprocess(lst_args): execute a subprocess and log its output.
- log is a standard logging.Logger object that can be used to log the processing
4. Hooking support
5. Hooking support
Pipelet enables you to write reusable generic
segments by providing a hooking system via the hook function.
hook (hookname, globals()): execute Python script ‘seg_segname_hookname.py’ and update the namespace.
fullname = get_tmp_fn (): return a temporary filename
lst_file = glob_seg (regexp, seg): return the list of filename matching regexp from segment seg
Parameter tools
output : the input value of the downstream segment.
load_param (seg, globals(), lst_par) : update the namespace with parameters of segment seg
save_products (filename, globals(), lst_par):
load_products (filename, globals(), lst_par): update the namespace by unpickling requested object from the file.
Code dependency tools
Loading another environment
Loading another environment
*** Depend directive
** Running Pipes
*** The interactive mode
This mode has been designed to ease debugging. If P is an instance of the pipeline object, the syntax reads :
This mode has been designed to ease debugging. If P is an instance of
the pipeline object, the syntax reads :
from pipelet.launchers import launch_interactive
w, t = launch_interactive(P)
......@@ -231,24 +260,106 @@ In this mode, each tasks will be computed in a sequential way.
Do not hesitate to invoque the Python debugger from IPython : %pdb
*** The process mode
In this mode, one can run simultaneous tasks (if the pipe scheme
allows to do so).
The number of subprocess is set by the N parameter :
from pipelet.launchers import launch_process
launch_process(P, N)
*** The batch mode
In this mode, one can submit some batch jobs to execute the tasks.
The number of job is set by the N parameter :
from pipelet.launchers import launch_pbs
launch_pbs(P, N , address=(os.environ['HOST'],50000))
** Browsing Pipes
*** The pipelet webserver and ACL
The pipelet webserver allows the browsing of multiple pipelines.
Each pipeline has to be register using :
pipeweb track <shortname> sqlfile
As the pipeline browsing implies a disk parsing, some basic security
has to be set also. All users have to be register with a specific access
level (1 for read-only access, and 2 for write access).
pipeutils -a <username> -l 2 sqlfile
*** The pipelet webserver
Start the web server using :
pipeweb start
Then the web application will be available on the web page http://localhost:8080
*** The web application
- The various views (index, pipeline, segment, tasks)
-
*** ACL
In order to ease the comparison of different processing, the web
interface displays various views of the pipeline data :
**** The index page
The index page display a tree view of all pipeline instances. Each
segment may be expand or reduce via the +/- buttons.
The parameters used in each segments are resumed and displayed with
the date of execution and the number of related tasks order by
status.
A checkbox allows to performed operation on multiple segments :
- deletion : to clean unwanted data
- tag : to tag remarkable data
The filter panel allows to display the segment instances wrt 2
criterions :
- tag
- date of execution
**** The code page
Each segment names is a link to its code page. From this page the user
can view all python scripts code which have been applied to the data.
The tree view is reduced to the current segment and its related
parents.
The root path corresponding to the data storage is also displayed.
**** The product page
The number of related tasks, order by status, is a link to the product
pages, where the data can be directly displayed (if images, or text
files) or downloaded.
From this page it is also possible to delete a specific product and
its dependencies.
**** The log page
The log page can be acceed via the log button of the filter panel.
Logs are ordered by date.
* Advanced usage
** Database reconstruction
** The hooking system
In case of unfortunate lost of the pipeline sql data base, it is
possible to reconstruct it from the disk :
import pipelet
pipelet.utils.rebuild_db_from_disk (prefix, sqlfile)
All information will be retrieve, but with new identifiers.
** The hooking system
** Writing custom environments
** Using custom dependency schemes
......@@ -259,3 +370,13 @@ Pipeweb use the cherrypy web framework server and can be run behind an
apache webserver which brings essentially two advantages:
- https support.
- faster static files serving.
* The pipelet actors
** The Repository object
** The Pipeline object
** The Task object
** The Scheduler object
** The Tracker object
** The Worker object
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment