Embrace igraph vertex attributes for storing code analysis information #483

wlandau · 2018-07-28T04:27:34Z

Problem

The solutions to #283, #473, and #480 ultimately relied on hacking config$graph to make it keep track of file inputs, file outputs, and trigger-specific dependencies. This is clumsy not only stylistically, but also functionally.

Here and here are places where only the trigger dependencies need to be loaded, but because drake does not retain enough of the original code analysis information, it loads all the target's non-file dependencies.
What little code analysis information drake does retain is copied into the target-level metadata (drake_meta()). At the time, I thought the metadata would be sufficient, but we still need attributes from config$graph for functions like dependencies(). The metadata should store stuff like hashes and be as light as possible. It should not have to store dependency information.

Why not use the data structure from #440 for this? Because the igraph and its constituent upstream/downstream connections are still extremely useful for scheduling. And even if #440 is the right approach, the proposal below will get us several steps closer.

Proposal

I think it's time we relied on igraph vertex attributes to store all the code analysis dependency information in its full complexity. Since it is difficult to make igraphs store objects with length greater than 1, I am thinking we should define an internal R6 class to keep the dependencies of each target and import and store objects of this class in a "deps" igraph vertex attribute. For the sake of performance, this R6 class should be defined with class = FALSE and cloneable = FALSE. The members should be

name: the name of the target or import.
type: "target" for targets and "import" for imports.
globals: from code_dependencies()
namespaced: from code_dependencies()
loadd: from code_dependencies()
readd: from code_dependencies()
file_in: from code_dependencies()
knitr_in: from code_dependencies()
file_out: from code_dependencies()
condition_trigger: all dependencies of trigger(condition = stuff)
change_trigger: all dependencies of trigger(change = stuff)

There should also be a couple methods for convenience.

objects() - list the non-object dependencies of the command: globals, namespaced, loadd, readd.
files() - file_in, knitr_in, file_out
list() - output a named list with the member data for functions like deps_code() and deps_target().

Yes, this approach will increase the memory requirements of the graph, but the code will rest on more solid footing.

The text was updated successfully, but these errors were encountered:

wlandau · 2018-07-28T04:38:36Z

On second thought, perhaps S3 would be faster than R6 in this case. To minimize the footprint and make each dependency object have length 1, each object could be a new.env(parent = empty.env(), hash = TRUE). Those methods above could just be S3 methods, and we could add an additional S3 get() method that enforces inherit = FALSE. Ref: richfitz/storr#81 (comment).

wlandau · 2018-07-28T12:11:36Z

And code_dependencies() does not necessarily need to return an object of this type. It just needs to make it into the graph.

wlandau · 2018-07-28T22:18:33Z

Refactoring in the refactor_graph branch seems to be going well. I feel much better about the cleaned-up code in the new build_drake_graph(), and I am confident that this decorated igraph is the right data structure for drake. I do not think we need to bother with any OOP infrastructure since looking up igraph attributes is already easy.

My goal is to finish debugging and testing by the end of this upcoming week, but I do not know how long it will take.

wlandau · 2018-07-29T06:49:27Z

Fixed via #484.

wlandau added difficulty: advanced topic: performance status: priority topic: style labels Jul 28, 2018

wlandau self-assigned this Jul 28, 2018

wlandau mentioned this issue Jul 29, 2018

Refactor graph #484

Merged

7 tasks

wlandau closed this as completed Jul 29, 2018

wlandau mentioned this issue Jul 30, 2018

Add custom file_out() files back to dependency graph visuals #486

Closed

wlandau mentioned this issue Nov 3, 2018

A new internal fast-lookup data structure for target information #440

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embrace igraph vertex attributes for storing code analysis information #483

Embrace igraph vertex attributes for storing code analysis information #483

wlandau commented Jul 28, 2018

wlandau commented Jul 28, 2018

wlandau commented Jul 28, 2018

wlandau commented Jul 28, 2018

wlandau commented Jul 29, 2018

Embrace igraph vertex attributes for storing code analysis information #483

Embrace igraph vertex attributes for storing code analysis information #483

Comments

wlandau commented Jul 28, 2018

Problem

Proposal

wlandau commented Jul 28, 2018

wlandau commented Jul 28, 2018

wlandau commented Jul 28, 2018

wlandau commented Jul 29, 2018