Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbols loaded after make() #253

Closed
krlmlr opened this issue Feb 14, 2018 · 21 comments
Closed

Symbols loaded after make() #253

krlmlr opened this issue Feb 14, 2018 · 21 comments

Comments

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 14, 2018

When running make() for a target, it is available in the global environment only if it is built during this run of make() (with the default config). Can we implement a more consistent way of handling built targets?

library(drake)

plan <- drake_plan(
  target1 = 5
)

make(plan)
#> cache /tmp/RtmpMxWAKq/.drake
#> connect 1 import: plan
#> connect 1 target: target1
#> check 1 item: target1
#> target target1
target1
#> [1] 5

make(plan)
#> cache /tmp/RtmpMxWAKq/.drake
#> Unloading targets from environment:
#>   target1
#> connect 1 import: plan
#> connect 1 target: target1
#> check 1 item: target1
#> All targets are already up to date.
target1
#> Error in eval(expr, envir, enclos): object 'target1' not found

Created on 2018-02-14 by the reprex package (v0.2.0).

@wlandau
Copy link
Member

wlandau commented Feb 14, 2018

For now, users can get around this by creating a custom envir for make(). Going forward, we can definitely address this. By default, drake operates in the user's workspace. Early in development, I tried creating a special isolated execution environment for drake, but I could not solve the lexical scoping problems (nested imported functions were not seeing each other). We might be able to try again with something like expose_imports(). A little extra overhead, but it could be worth it. In fact, this would mean we no longer have to test everything separately for the global and custom environments, so it would cut the scaled-up testing workflow in half. cc @dapperjapper.

@violetcereza
Copy link
Contributor

I think this specific issue could be solved by just loadding any targets in make(plan, targets) as a convenience to the user when all targets are already up to date and make was run in a way that would modify the global environment (parallelism != Makefile). Alternatively, drake could do the "Unloading targets from environment" thing to clean up after all make runs. I can see a downside to both options, because they require a little more overhead in pursuit of consistency. But I'm fine with that.

I'm not sure about the question of environments. I like way future deals with environments and scope, because it meshes with the normal use of environments and scope in R really naturally, by definition and design. The nature of drake (targets are stored on a flat key-value level, I can't see anyone using #228) precludes some of the complexity found in future, but I think the question of how drake deals with environments is very existentially central. (if that makes sense??)

@wlandau
Copy link
Member

wlandau commented Feb 14, 2018

I think this specific issue could be solved by just loadding any targets in make(plan, targets) as a convenience to the user when all targets are already up to date and make was run in a way that would modify the global environment (parallelism != Makefile).

I'm not sure I agree. Some of these targets get really big, and automatically loading things could get really expensive. Sometimes people want to run make() just to check for outdated targets.

Alternatively, drake could do the "Unloading targets from environment" thing to clean up after all make runs.

Unfortunately, that needs to happen first because sometimes targets conflict with imports, especially when users load targets from the cache to interact with the results.

I'm not sure about the question of environments. I like way future deals with environments and scope, because it meshes with the normal use of environments and scope in R really naturally, by definition and design. The nature of drake (targets are stored on a flat key-value level, I can't see anyone using #228) precludes some of the complexity found in future, but I think the question of how drake deals with environments is very existentially central. (if that makes sense??)

Flatness is a good way to put it. It's a source source of a lot of struggle for drake, and thinking that way is keeping me from wrapping my head around #228 (both in picturing what users might want and trying to guess if it could work).

@wlandau wlandau mentioned this issue Feb 14, 2018
@violetcereza
Copy link
Contributor

Unfortunately, that needs to happen first because sometimes targets conflict with imports, especially when users load targets from the cache to interact with the results.

Sorry, I meant both before and after all make runs!

@wlandau
Copy link
Member

wlandau commented Feb 24, 2018

I thought about this one more, and I think our best options are the following (ranked according to my preference):

  1. Unload all the targets at the end of make(). Consistent, but possibly inconvenient if you want to loadd() things all over again. Then again, this approach would help get things ready for a possible next make().
  2. Leave any in-memory targets alone at the end of make() (current behavior). Possibly convenient, but inconsistent. And the next make() has more prep work to do.
  3. Load the final targets at the end of make(), including ones that were skipped. It would be difficult to figure out which targets to load, and loading for the sake of loading may not be worth the time for large targets.

Just say the word and I will implement the first one. I may need more convincing on the other two.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Feb 24, 2018

Consistency is king, but it looks like the first option might break existing user code. Maybe a variant of option 3 combined with lazy loading (=active bindings)? It doesn't matter if we load more than necessary as long as we're consistent in what we load.

Anyway, this feels sufficiently low prio to me at this point, labeled accordingly.

@wlandau
Copy link
Member

wlandau commented Feb 24, 2018

Maybe active bindings for all the targets in the cache with no revdeps?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Feb 28, 2018

I'm really not sure what's best at this point.

@wlandau
Copy link
Member

wlandau commented Mar 2, 2018

At the very least, this is documented in the caution.Rmd vignette.

What about a console message that lists the newly-loaded targets?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Mar 2, 2018

I'm leaning towards loading always with active bindings. Do the active bindings work with caching, so that the second access is faster?

@wlandau
Copy link
Member

wlandau commented Mar 2, 2018

You mean in-memory caching, right? I just made that happen in 2fc5932.

Creating active bindings for all non-file targets in the cache should cover it. But awkwardly, the bindings would need to be destroyed/unloaded before the next make() in the same session so we don't confuse targets with imports.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Mar 2, 2018

But unloading bindings is cheap, and reloading them too if they are in the in-memory cache?

@wlandau
Copy link
Member

wlandau commented Mar 2, 2018

I think so.

@wlandau
Copy link
Member

wlandau commented Mar 2, 2018

At this point, my only concern is the read-only nature of active bindings. Is there a way to relax that with bindr?

> make(drake_plan(x = 1))
> loadd(x, lazy = "bind")
> x <- 2
Error: Binding is read-only.

@wlandau
Copy link
Member

wlandau commented Mar 2, 2018

Edit: just saw r-lib/R6#99. Not sure how users will feel about that. What about lazy loading with promises instead?

@wlandau
Copy link
Member

wlandau commented Mar 2, 2018

Except that delayedAssign() fails if we clean the cache and make() again. The second make() tries to fulfill a promise based on a cache that may not even exist anymore.

> x <- drake_plan(a = 1)
> make(x)
> clean(destroy = TRUE)
> make(x)
Error: key 'a' ('objects') not found 

So promises with delayedAssign() don't seem to be an option either.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Mar 2, 2018

The theory of functional programming, and also much of drake's philosophy, suggests that these variables should be read-only. But this turns the problem into something we may wish to do for a major release. (Also, I don't see how to fully support assignment with bindr. The variable would remain an active binding, and we'd have to store the value the user gave us somewhere. Yikes.)

@wlandau
Copy link
Member

wlandau commented Mar 6, 2018

If we implement #296, no targets will be automatically loaded anymore, which will give us consistency here.

@wlandau
Copy link
Member

wlandau commented Mar 8, 2018

Closing in favor of #296. I do not think it will be a major hit to back compatibility. Will reopen if we cannot implement #296.

@wlandau
Copy link
Member

wlandau commented May 2, 2018

FYI: as a side effect of #369, the symbols loaded after make() are likely to go away for mclapply and parLapply parallelism when jobs > 1. In this case, I think it's worth the sudden change. I have a preference for also removing these symbols when jobs is 1.

@krlmlr
Copy link
Collaborator Author

krlmlr commented May 2, 2018

Good news. Agree it's good to be consistent regardless of the jobs value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants