Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embrace igraph vertex attributes for storing code analysis information #483

Closed
wlandau opened this issue Jul 28, 2018 · 4 comments
Closed

Comments

@wlandau
Copy link
Member

wlandau commented Jul 28, 2018

Problem

The solutions to #283, #473, and #480 ultimately relied on hacking config$graph to make it keep track of file inputs, file outputs, and trigger-specific dependencies. This is clumsy not only stylistically, but also functionally.

  • Here and here are places where only the trigger dependencies need to be loaded, but because drake does not retain enough of the original code analysis information, it loads all the target's non-file dependencies.
  • What little code analysis information drake does retain is copied into the target-level metadata (drake_meta()). At the time, I thought the metadata would be sufficient, but we still need attributes from config$graph for functions like dependencies(). The metadata should store stuff like hashes and be as light as possible. It should not have to store dependency information.

Why not use the data structure from #440 for this? Because the igraph and its constituent upstream/downstream connections are still extremely useful for scheduling. And even if #440 is the right approach, the proposal below will get us several steps closer.

Proposal

I think it's time we relied on igraph vertex attributes to store all the code analysis dependency information in its full complexity. Since it is difficult to make igraphs store objects with length greater than 1, I am thinking we should define an internal R6 class to keep the dependencies of each target and import and store objects of this class in a "deps" igraph vertex attribute. For the sake of performance, this R6 class should be defined with class = FALSE and cloneable = FALSE. The members should be

  • name: the name of the target or import.
  • type: "target" for targets and "import" for imports.
  • globals: from code_dependencies()
  • namespaced: from code_dependencies()
  • loadd: from code_dependencies()
  • readd: from code_dependencies()
  • file_in: from code_dependencies()
  • knitr_in: from code_dependencies()
  • file_out: from code_dependencies()
  • condition_trigger: all dependencies of trigger(condition = stuff)
  • change_trigger: all dependencies of trigger(change = stuff)

There should also be a couple methods for convenience.

  • objects() - list the non-object dependencies of the command: globals, namespaced, loadd, readd.
  • files() - file_in, knitr_in, file_out
  • list() - output a named list with the member data for functions like deps_code() and deps_target().

Yes, this approach will increase the memory requirements of the graph, but the code will rest on more solid footing.

@wlandau
Copy link
Member Author

wlandau commented Jul 28, 2018

On second thought, perhaps S3 would be faster than R6 in this case. To minimize the footprint and make each dependency object have length 1, each object could be a new.env(parent = empty.env(), hash = TRUE). Those methods above could just be S3 methods, and we could add an additional S3 get() method that enforces inherit = FALSE. Ref: richfitz/storr#81 (comment).

@wlandau
Copy link
Member Author

wlandau commented Jul 28, 2018

And code_dependencies() does not necessarily need to return an object of this type. It just needs to make it into the graph.

@wlandau
Copy link
Member Author

wlandau commented Jul 28, 2018

Refactoring in the refactor_graph branch seems to be going well. I feel much better about the cleaned-up code in the new build_drake_graph(), and I am confident that this decorated igraph is the right data structure for drake. I do not think we need to bother with any OOP infrastructure since looking up igraph attributes is already easy.

My goal is to finish debugging and testing by the end of this upcoming week, but I do not know how long it will take.

@wlandau wlandau mentioned this issue Jul 29, 2018
7 tasks
@wlandau
Copy link
Member Author

wlandau commented Jul 29, 2018

Fixed via #484.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant