Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Open sidebar
pipelet
Pipelet
Commits
d0967eb1
Commit
d0967eb1
authored
Feb 11, 2011
by
Maude Le Jeune
Browse files
new name for glob_seg. README updated.
parent
db310f0f
Changes
6
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
54 additions
and
23 deletions
+54
-23
README.org
README.org
+19
-9
TODO.org
TODO.org
+18
-6
pipelet/environment.py
pipelet/environment.py
+4
-4
test/multiplex/a.py
test/multiplex/a.py
+2
-1
test/multiplex/b.py
test/multiplex/b.py
+3
-3
test/multiplex/c.py
test/multiplex/c.py
+8
-0
No files found.
README.org
View file @
d0967eb1
...
...
@@ -293,6 +293,13 @@ or
id = seg_input.values()[0]
#+end_src
In this scheme, it is important to uniquely identify the child tasks
of the orphan segment by setting a dedicated output.
#+begin_src python
seg_output = id
#+end_src
See section [[*The%20segment%20environment][The segment environment]] for more details.
*** Hierarchical data storage
...
...
@@ -319,7 +326,8 @@ The storage is organized as follows:
- a meta data file (.meta) which contains some extra meta data
- all segment instances data and meta data are stored in a specific subdirectory
which name corresponds to a string representation of its input
=/prefix/segname_YFLJ65/data/1/=
prefix by its identifier number
=/prefix/segname_YFLJ65/data/1_a/=
- if there is a single segment instance, then data are stored in
=/prefix/segname_YFLJ65/data/=
- If a segment has at least one parent, its root will be located below
...
...
@@ -350,14 +358,16 @@ The segment code is executed in a specific environment that provides:
- =seg_output=: this variable has to be a list.
2. Functionalities to use the automated hierarchical data storage system.
- =get_data_fn(basename)=: complete the filename with the path to the working directory.
- =glob_seg(seg, regexp)=: Return the list of filename matching the pattern y in the
data directory of parent tasks from the parent segment x.
- =glob_seg_all(seg, regexp)=: Return the list of filename matching
y in the working directory of segment x independantly of
whether the file comes from a task related to the current
task. glob_seg_all is provided to reproduce the behaviour of
old glob_seg for backward compatibility. Its usage should be limited as it:
- =get_data_fn(basename)=: complete the filename with the path to
the working directory.
- =glob_parent(regexp, segs)=: Return the list of filename matching
the pattern y in the data directory of direct parent tasks. It
is possible to search only in a specific segment list segs.
- =glob_seg(seg, regexp)=: Return the list of filename matching the
pattern y in the data directory of parent segment x (all task
directories are searched, independantly of whether the file
comes from a task related to the current task). Its usage
should be limited as it:
- potentially breaks the dependancy scheme.
- may hurt performances as all task directories of the segment
x will be searched.
...
...
TODO.org
View file @
d0967eb1
...
...
@@ -2,10 +2,21 @@
I see at least before three projects to complete before making the first release:
* The task_id project is not closed:
This is release critical.
- [ ] There is some dark zone (sideeffects):
- [ ] how segment without seg_output are treated (no task is stored, what happened when we delete these kind of segs ...)
- [ ] Any problem with Orphan tasks ?
- [ ] problem for parents giving same str_input outside the special case of groud_by
- [ ] how segment without seg_output are treated (no task is
stored, what happened when we delete these kind of segs
...).
- [ ] Any problem with Orphan tasks ? -> Main issue here :
task are identified by parent list. For orphan task the
current solution is to use product instead but there is two
exceptions here : group_by and ouput constant (None for
example). In those two cases: if seg_input changes from
main, no recomputation. Is it possible to store parent of
orphan task (phantom) ?
- [X] problem for parents giving same str_input outside the
special case of groud_by -> compute as many task as parents
even if str_input is the same.
- [ ] Does the tracking on disk allow to reconstruct the database
- [ ] I modified the task format to allow for easy glob_seg. It may have break other things (At least the database reconstruction)
- [X] Is the treatment of redundant tasks resulting from group_by OK
...
...
@@ -15,14 +26,15 @@ I see at least before three projects to complete before making the first release
release critical (this is the case of the glob_seg project I think),
API changes for which compatibility could be maintained by
publishing some kind of OldEnvironment classes are not.
- [
] I started a project (glob_seg/glob_seg_all separation) to make
- [
X
] I started a project (glob_seg/glob_seg_all separation) to make
glob_seg easier to use and more efficient when large number of
task are present (look only in the requested directory). This is not finalize:
- It allows only to glob the direct parent. Going further causes a
big "depth" problem in the current state.
- It is probably buggy with side effects (see what happened on
segment without seg_output.
- Its calling signature should be rethought
segment without seg_output) -> no bug here, but search give no
result. I think this is not an issue (Maude)
- [ ] Its calling signature should be rethought
- The name should be changed in glob_parents and glob_seg
- [ ] Are we satisfied with seg_input, is this convenient, should we
provide extra function to ease the retrieval of inputs.
...
...
pipelet/environment.py
View file @
d0967eb1
...
...
@@ -182,10 +182,10 @@ class Environment(EnvironmentBase):
self
.
logger
.
info
(
"hooking %s"
%
hook_name
)
return
self
.
_hook
(
hook_name
,
glo
)
def
glob_
seg
(
self
,
y
,
segs
=
None
):
def
glob_
parent
(
self
,
y
,
segs
=
None
):
""" Globbing limited to the fatherhood
For unlimited globbing see glob_seg
_all
.
For unlimited globbing see glob_seg.
Parameters
----------
...
...
@@ -212,11 +212,11 @@ class Environment(EnvironmentBase):
res
+=
glob
(
path
.
join
(
self
.
_worker
.
pipe
.
get_data_dir
(
segx
),
t
,
y
))
return
res
def
glob_seg
_all
(
self
,
x
,
y
):
def
glob_seg
(
self
,
x
,
y
):
""" Return the list of filename matching y in the working
directory of segment x.
Usage of glob_seg
_all
should be limited:
Usage of glob_seg should be limited:
- potentially breaks the dependancy scheme
- May hurt performances as all task directories of the segment x will be searched
...
...
test/multiplex/a.py
View file @
d0967eb1
print
seg_input
import
os
os
.
system
(
"touch %s"
%
get_data_fn
(
"file%d.dat"
%
seg_input
.
values
()[
0
]))
seg_output
=
[
seg_input
.
values
()[
0
]]
test/multiplex/b.py
View file @
d0967eb1
#seg_output = ['aa', 'bb']
import
os
print
seg_input
seg_output
=
[(
seg_input
.
values
()[
0
],
"pp"
)]
os
.
system
(
"touch %s"
%
get_data_fn
(
"bbbb%d.dat"
%
seg_input
.
values
()[
0
]))
#
seg_output = [(seg_input.values()[0], "pp")]
test/multiplex/c.py
View file @
d0967eb1
#multiplex cross_prod group_by "0"
print
seg_input
import
os
f
=
glob_seg
(
"a"
,
"file*.dat"
)
for
file
in
f
:
os
.
system
(
"cp %s %s"
%
(
file
,
get_data_fn
(
"fromglobseg_%s"
%
os
.
path
.
basename
(
file
))))
g
=
glob_parent
(
"b*.dat"
)
for
file
in
g
:
os
.
system
(
"cp %s %s"
%
(
file
,
get_data_fn
(
"fromglobparent_%s"
%
os
.
path
.
basename
(
file
))))
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment