Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Open sidebar
pipelet
Pipelet
Commits
c1da684c
Commit
c1da684c
authored
Sep 02, 2010
by
Marc Betoule
Browse files
Ze readme
parent
a03eb935
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
116 additions
and
17 deletions
+116
-17
README.org
README.org
+116
-17
No files found.
README.org
View file @
c1da684c
...
...
@@ -104,6 +104,18 @@ of "a" will be feeded as input for "b". In the given example, the node
their is no relation between "b" and "c" which of the two will be
executed first is not defined.
*** The Pipeline object
Practically, the creation of a Pipeline object by needs 3 arguments:
Pipeline(pipedot, codedir=, prefix=)
- pipedot is the string description of the pipeline
- codedir is the path of the code of the segments
- prefix is the path of the data repository
*** Dependencies between segments
The modification of the code of one segment will trigger its
recalculation and the recalculation of all the segments which
depend on it.
...
...
@@ -125,38 +137,125 @@ strings he receives separated by a space, the final output set of
segment "d" will be: [('Lancelot the Brave'), ('Lancelot the Pure'),
('Galahad the Brave'), ('Galahad the Pure')].
*** Multiplex directive
This default behavior can be altered by specifying a @multiplex
directive in the commentary of the segment code.
This default behavior can be altered by specifying an @multiplex
directive in the commentary of the segment code. If several multiplex
directive can be found the last one is retained.
- @multiplex cross_prod : activate the default behaviour
- @multiplex zip : similar to the zip python command. The input set is
a list of tuples, where each tuple contains the i-th element from
each of the parent sorted output list. If the list have different
size, the shortest is used.
- @multiplex union : The input set contains all the output.
- @multiplex gather : The input set contains one tuple of all the ouputs.
*** Orphan segments
If a segment code has to be applied on several data, the pipe engine
creates as many subtasks as dataset size. This behaviour is specified
by setting a list in the output variable of the upstream
segment. There will be then one task per element of the list, each
task will receive one list element as an input.
TODO TBD
Depend directive
*** Hierarchical data storage
This system provides versionning of your data and easy access through
the web interface. It is also used to keep track of the code, of the
execution logs, and various meta-data of the processing. Of course,
you remain able to bypass the hierarchical storage and store your
actual data elsewhere, but you will loose the benefit of automated
versionning which proves to be quite convenient.
The storage is organized as follows:
all data are stored below a root
*** The segment environment
The segment code is executed in a specific environment that provides:
1. access to the segment input and output
- seg_input: this variable is a dictionnary containing the input of the segment
- get_input():
- seg_output: this variable has to be set to a list containing the
2. Functionnalities to use the automated hierarchical data storage system.
- get_data_fn(basename): complete the filename with the path to the working directory.
- glob_seg():
- get_tmp_fn(): return a temporary filename.
3. Various convenient functionalities
- load_param(seg, var_names)
- save_products(filename=', var_names='*'): use pickle to save a
part of a given namespace.
- load_products(filename, var_names): update the namespace by
unpickling requested object from the file.
- logged_subprocess(lst_args): execute a subprocess and log its output.
- log is a standard logging.Logger object that can be used to log the processing
4. Hooking support
Pipelet enables you to write reusable generic
segments by providing a hooking system via the hook function.
hook (hookname, globals()): execute Python script ‘seg_segname_hookname.py’ and update the namespace.
If a segment code needs several outputs to run, the output variable of the upstream segments has to be set to None.
Default segment environment
Some usefull functionnalities are available from the segment script environment.
Filename tools:
fullname = get_data_fn (shortname) : complete the filename with the path to the working directory.
fullname = get_tmp_fn (): return a temporary filename
lst_file = glob_seg (regexp, seg): return the list of filename matching regexp from segment seg
Parameter tools
input: the output value from the upstream segment.
output : the input value of the downstream segment.
load_param (seg, globals(), lst_par) : update the namespace with parameters of segment seg
save_products (filename, globals(), lst_par):
use pickle to save a part of a given namespace.
save_products (filename, globals(), lst_par):
load_products (filename, globals(), lst_par): update the namespace by unpickling requested object from the file.
Code dependency tools
logged_subprocess (lst_args) : execute a subprocess and log its output.
hook (hookname, globals()): execute Python script ‘seg_segname_hookname.py’ and update the namespace.
Loading another environment
*** Depend directive
** Running Pipes
*** The interactive mode
This mode has been designed to ease debugging. If P is an instance of the pipeline object, the syntax reads :
from pipelet.launchers import launch_interactive
w, t = launch_interactive(P)
w.run()
In this mode, each tasks will be computed in a sequential way.
Do not hesitate to invoque the Python debugger from IPython : %pdb
*** The process mode
*** The batch mode
** Browsing Pipes
*** The pipelet webserver
*** The web application
- The various views (index, pipeline, segment, tasks)
-
*** ACL
* Advanced usage
** Database reconstruction
** The hooking system
** Writing custom environments
** Using custom dependency schemes
** Launching pipeweb behind apache
Pipeweb use the cherrypy web framework server and can be run behind an
apache webserver which brings essentially two advantages:
- https support.
- faster static files serving.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment