You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The solutions to #283, #473, and #480 ultimately relied on hacking config$graph to make it keep track of file inputs, file outputs, and trigger-specific dependencies. This is clumsy not only stylistically, but also functionally.
Here and here are places where only the trigger dependencies need to be loaded, but because drake does not retain enough of the original code analysis information, it loads all the target's non-file dependencies.
What little code analysis information drake does retain is copied into the target-level metadata (drake_meta()). At the time, I thought the metadata would be sufficient, but we still need attributes from config$graph for functions like dependencies(). The metadata should store stuff like hashes and be as light as possible. It should not have to store dependency information.
Why not use the data structure from #440 for this? Because the igraph and its constituent upstream/downstream connections are still extremely useful for scheduling. And even if #440 is the right approach, the proposal below will get us several steps closer.
Proposal
I think it's time we relied on igraph vertex attributes to store all the code analysis dependency information in its full complexity. Since it is difficult to make igraphs store objects with length greater than 1, I am thinking we should define an internal R6 class to keep the dependencies of each target and import and store objects of this class in a "deps" igraph vertex attribute. For the sake of performance, this R6 class should be defined with class = FALSE and cloneable = FALSE. The members should be
name: the name of the target or import.
type: "target" for targets and "import" for imports.
globals: from code_dependencies()
namespaced: from code_dependencies()
loadd: from code_dependencies()
readd: from code_dependencies()
file_in: from code_dependencies()
knitr_in: from code_dependencies()
file_out: from code_dependencies()
condition_trigger: all dependencies of trigger(condition = stuff)
change_trigger: all dependencies of trigger(change = stuff)
There should also be a couple methods for convenience.
objects() - list the non-object dependencies of the command: globals, namespaced, loadd, readd.
files() - file_in, knitr_in, file_out
list() - output a named list with the member data for functions like deps_code() and deps_target().
Yes, this approach will increase the memory requirements of the graph, but the code will rest on more solid footing.
The text was updated successfully, but these errors were encountered:
On second thought, perhaps S3 would be faster than R6 in this case. To minimize the footprint and make each dependency object have length 1, each object could be a new.env(parent = empty.env(), hash = TRUE). Those methods above could just be S3 methods, and we could add an additional S3 get() method that enforces inherit = FALSE. Ref: richfitz/storr#81 (comment).
Refactoring in the refactor_graph branch seems to be going well. I feel much better about the cleaned-up code in the new build_drake_graph(), and I am confident that this decorated igraph is the right data structure for drake. I do not think we need to bother with any OOP infrastructure since looking up igraph attributes is already easy.
My goal is to finish debugging and testing by the end of this upcoming week, but I do not know how long it will take.
Problem
The solutions to #283, #473, and #480 ultimately relied on hacking
config$graph
to make it keep track of file inputs, file outputs, and trigger-specific dependencies. This is clumsy not only stylistically, but also functionally.drake
does not retain enough of the original code analysis information, it loads all the target's non-file dependencies.drake
does retain is copied into the target-level metadata (drake_meta()
). At the time, I thought the metadata would be sufficient, but we still need attributes fromconfig$graph
for functions likedependencies()
. The metadata should store stuff like hashes and be as light as possible. It should not have to store dependency information.Why not use the data structure from #440 for this? Because the
igraph
and its constituent upstream/downstream connections are still extremely useful for scheduling. And even if #440 is the right approach, the proposal below will get us several steps closer.Proposal
I think it's time we relied on
igraph
vertex attributes to store all the code analysis dependency information in its full complexity. Since it is difficult to makeigraph
s store objects with length greater than 1, I am thinking we should define an internalR6
class to keep the dependencies of each target and import and store objects of this class in a "deps"igraph
vertex attribute. For the sake of performance, thisR6
class should be defined withclass = FALSE
andcloneable = FALSE
. The members should bename
: the name of the target or import.type
:"target"
for targets and"import"
for imports.globals
: fromcode_dependencies()
namespaced
: fromcode_dependencies()
loadd
: fromcode_dependencies()
readd
: fromcode_dependencies()
file_in
: fromcode_dependencies()
knitr_in
: fromcode_dependencies()
file_out
: fromcode_dependencies()
condition_trigger
: all dependencies oftrigger(condition = stuff)
change_trigger
: all dependencies oftrigger(change = stuff)
There should also be a couple methods for convenience.
objects()
- list the non-object dependencies of the command:globals
,namespaced
,loadd
,readd
.files()
-file_in
,knitr_in
,file_out
list()
- output a named list with the member data for functions likedeps_code()
anddeps_target()
.Yes, this approach will increase the memory requirements of the graph, but the code will rest on more solid footing.
The text was updated successfully, but these errors were encountered: