Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider building targets in a separate environment. #296

Closed
wlandau opened this issue Mar 5, 2018 · 33 comments
Closed

Consider building targets in a separate environment. #296

wlandau opened this issue Mar 5, 2018 · 33 comments

Comments

@wlandau
Copy link
Member

wlandau commented Mar 5, 2018

Maybe make() should execute with a new.envir(parent = globalenv()) rather than globalenv() by default. I did not have the issues figured out early in development, but this might be possible using the @dapperjapper's idea that made expose_imports() possible.

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2018

Could be an alternative to #253.

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 7, 2018

Maybe it's worthwhile to make this a pkgconfig-controlled option too.

@wlandau
Copy link
Member Author

wlandau commented Mar 7, 2018

Possibly. Right now, though, I have a preference for a clean switch. With the environment<- trick, I think we can do it.

@rkrug
Copy link
Contributor

rkrug commented Mar 7, 2018

I agree with @wlandau -- a clean switch to an own environment would be the cleanest option.

@wlandau-lilly
Copy link
Collaborator

It just occurred to me: we now need to distinguish between the envir arguments to make() and drake_config() and the environment that will eventually reside in config$envir. Not a big deal, but worth noting.

@wlandau wlandau added this to the CRAN release 5.1.0 milestone Mar 9, 2018
wlandau-lilly added a commit that referenced this issue Mar 9, 2018
Expose some of the code used for expose_imports(). Will use it to create a special execution environment.
wlandau-lilly added a commit that referenced this issue Mar 9, 2018
@krlmlr
Copy link
Collaborator

krlmlr commented Mar 10, 2018

This may break user scripts if they access targets after running make(). I think we need to clearly communicate that this is a potentially breaking change.

@rkrug
Copy link
Contributor

rkrug commented Mar 10, 2018 via email

@wlandau
Copy link
Member Author

wlandau commented Mar 10, 2018

@krlmlr: so this should probably not be part of drake 5.1.0. Removing it from the milestone.

@rkrug: we could pursue something like that, but the main concern is changes to the default behavior that break compatibility. It gets complicated (see #253).

@wlandau wlandau removed this from the CRAN release 5.1.0 milestone Mar 10, 2018
wlandau pushed a commit that referenced this issue Mar 18, 2018
- Make sure the "future"/globalenv() testing scenario actually uses the global environment.
- Repair a testing scenario: "future" backend with globalenv(). Need to pass globals explicitly in a named list rather than let `future` just guess at the values.
- Use a clumsy workaround to make sure globals passed to the "future" backend do not conflict with anything `drake` needs. When #296 is solved, the point will be moot.
- Set future.globals = FALSE in `future_lapply()` to reduce overhead. This is possible since the "future_lapply" backend assumes file system access and relies completely on the cache.
@wlandau-lilly wlandau-lilly changed the title Consider executing inside a new.envir(parent = globalenv()) rather than globalenv() by default Consider not modifying the user's R workspace/environment. Mar 20, 2018
@wlandau-lilly
Copy link
Collaborator

Re: #330 (comment) (cc @rkrug): a separate R session for make() is worth considering. reprex uses callr::r_safe(), which is now just callr::r(). (Alternatively, we could just use callr::r_bg() to free up the current session.) It strikes me as a safer and more thorough solution to the problem #296 is trying to solve. But I do have concerns:

  1. It adds annoying overhead.
  2. Ultimately, we still get imports from the user's R session, so @rkrug's code from Creating plan in Rmd file - knit() same chunk names #330 (comment) would behave the same way.

@krlmlr, what do you think?

@rkrug
Copy link
Contributor

rkrug commented Mar 20, 2018

That sounds like a plan, but why not use callr::r_vanilla() ? This would be even more reproducible as nothing is set by anybody than drake?

Overhead: One could add an argument to disable this behaviour and ude the existing session instead?

#330 : Why does the call to compile a target have any information if it was called from within a knitr document? If the compilation is taking place in a brand new R session, why should there be a problem?

@wlandau-lilly
Copy link
Collaborator

wlandau-lilly commented Mar 20, 2018

why not use callr::r_vanilla() ?

Sure.

Overhead: One could add an argument to disable this behaviour and use the existing session instead?

Maybe. I am not yet sure if this is worth complicating the interface.

#330 : Why does the call to compile a target have any information if it was called from within a knitr document? If the compilation is taking place in a brand new R session, why should there be a problem?

I was just referring to one of your later comments in #330, not the main issue itself. Your code in this comment did not call knitr specifically.

@wlandau
Copy link
Member Author

wlandau commented Mar 20, 2018

I suppose withr::local_envvar() may be worth a mention too. Maybe it can be a low-overhead alternative to r_vanilla().

@wlandau
Copy link
Member Author

wlandau commented Mar 20, 2018

Just realized that the original bit about environments in #296 should be solved as planned before this session business. This way, it will be easier to pass "globals" to callr. Will open a separate issue.

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 21, 2018

I like the idea of running make() in a separate session, but I'm not convinced drake should be handling this.

We could implement a two-step process:

  1. Create plan, communicate it to drake(). The plan and all imports are recorded and stored.
  2. Run make().

These two steps should work when run from separate sessions. Then, the user can just do callr::r(drake::make()) when the setup is finished.

I was planning to implement something like this for redrake, not sure if and when though.

@rkrug
Copy link
Contributor

rkrug commented Mar 22, 2018

@krlmlr I disagree - make should provide reproducibility as far as possible. Other tools handle R package versions management very well, so this is not the business of drake. Also there is docker, which introduces the OS and R version issue.

But within one session, make should be consistent and always produce the same results (which, by the way, should also include the same documentation of the make file, i.e. the graphs - see #335).

To put the burden of the rather complex issue of environments and session management onto the shoulders, if it could be more easily be solved by a simple argument (for the user), is doing drake a disservice as there are many surprises waiting (see.g #335 for me).

If drake want's to achieve true reproducability on a project level, the use of a clean R session (--vanilla) would be essential. This would make calling drake from a bash makefile possible and consistent with running drake::make() in an "old and dirty" R session.

@wlandau
Copy link
Member Author

wlandau commented Mar 22, 2018

I deliberately designed drake to be as R-focused as possible. That means it depends on your R workspace and session rather than on a bunch of files. Reproducibility principles and best practices for files, interactive sessions, and workspaces are already well-known and well-established. It is just a matter of applying them to make() just like you would to any other piece of R code. I think that what you bring up is a useful niche for redrake.

@AlexAxthelm
Copy link
Collaborator

Could we take the RStudio approach?

If the user opts to use the "knit" button in the UI, it's done in a new session. This doesn't prevent them from using rmarkdown::render() in the console, and running with their "dirty" session.

This might mean that drake needs an RStudio addin/extension? Anyone who isn't on RStudio wouldn't benefit, but they probably are okay with knowing that they are responsible for managing their own sessions.

@rkrug
Copy link
Contributor

rkrug commented Mar 26, 2018

This would be a first step, but I still think that one should be easily be able to run make() in it's own session. And if the session is a clone of the current session, no additional files would be needed to set the environment up.

@wlandau
Copy link
Member Author

wlandau commented Mar 27, 2018

For this and #253, I am leaning towards just removing targets from the environment after make(). Given that drake is designed to be immersed in the master session, I think this is the most consistent behavior. Thoughts?

@rkrug
Copy link
Contributor

rkrug commented Mar 28, 2018

I have the feeling that this is doctoring (no offense intended) and there will be other effects (side effects from the compilation of the targets, e.g. the dreaded <<-) which can cause problems. I still think it would be the safest way of running the make process in it's own R session which is discarded afterwards. This can be a copy of the calling session (callr::r_vanilla()), but via an option also be a vanilla session (callr::r_vanilla()). But it could also be, if the overhang is the problem, executed in the current session (with all side effects).

@AlexAxthelm
Copy link
Collaborator

I'll say that from a reproduciblity/minimizing side effect standpoint, I'm on team vanilla, but I have enough situations where I want drake to import something that is already in the environment, that I feel that it should be user accessible option with a r_vanilla() default. Possibly also with an option to loadd the most recent targets automatically, to preserve current behavior (although I'm not sure this is behavior worth preserving).

@rkrug
Copy link
Contributor

rkrug commented Mar 28, 2018

So the suggestion would be that

  1. make() runs in it's own session, where two arguments / options are available:
  2. session which can take the values
    • vanilla (run in vanilla session),
    • as_current (run in session which is configured as the current session)
    • current (run in current session, i.e. the current behaviour),
    • ?
  3. return_targets which, if TRUE should loadd the targets (as in the current behaviour) and if FALSE does not loadd any targets. But I am here with @AlexAxthelm - this is probably not worth preserving.

Is this correct?

@AlexAxthelm
Copy link
Collaborator

I'll agree with that suggestion, with the exception that I think the arguments for session should be actual callr::r_*() functions (drake could re-export them, or at least a few), rather than re-interpreting and renaming.

That way, if callr expands its options, they are covered, since the API to pass to r_*() appears to be the same. This also give the user flexiblity if they want to do something along the lines of

make(my_plan, session = r_copycat) # run against current session for debugging
make(my_plan, session = r_vanilla) # run against clean session, for production

@rkrug
Copy link
Contributor

rkrug commented Mar 28, 2018

Sounds perfect to me. This opens as well the option to define own session functions.

@wlandau
Copy link
Member Author

wlandau commented Mar 28, 2018

In that case, I think we can close this and reopen #333. I will have new comments there.

@wlandau
Copy link
Member Author

wlandau commented Aug 10, 2018

I want to revisit the issue of the environment in which targets are built. Using the global environment is not reproducible, but in the past, my attempts to create a special execution environment have lead to lexical scoping issues for functions. @dapperjapper's environment<- hack (he called it that first) was a potential solution, but I did not trust it enough. Now, with tidy evaluation, we could simply eval(enexpr()) any functions we pull from the global environment.

@wlandau wlandau reopened this Aug 10, 2018
@wlandau wlandau changed the title Consider not modifying the user's R workspace/environment. Consider building targets in a separate environment. Aug 10, 2018
@wlandau
Copy link
Member Author

wlandau commented Aug 10, 2018

rlang::enexpr() seemed like a candidate approach for taking the code for a function and assigning it in the new environment.

reassign_object <- function(name, old, new){
  object <- get(name, envir = old, inherits = FALSE)
  if (rlang::is_closure(object)){
    expr <- as.call(c(quote(`<-`), as.symbol(name), rlang::enexpr(object)))
    eval(expr, envir = new)
  } else {
    assign(x = name, value = object, envir = new, inherits = FALSE)
  }
  invisible()
}

Unfortunately, closures still scope the same way after reassignment.

old <- new.env(parent = globalenv())
old$y <- 1
eval(
  quote(
    f <- function(x){
      x + y
    }
  ),
  envir = old
)
new <- new.env(parent = globalenv())

reassign_object("f", old, new)
new$f(0) # Should be 1.
#> [1] 1
new$y <- 2
new$f(0) # Should be 2.
#> [1] 1

@wlandau
Copy link
Member Author

wlandau commented Aug 10, 2018

Looking at Hadley's chapter on evaluation, new_function() looks like a great way to extract the good parts of a function without the baggage of the closure.

reassign_object <- function(name, old, new){
  object <- get(name, envir = old, inherits = FALSE)
  if (rlang::is_closure(object)){
    object <- rlang::new_function(
      args = formals(object),
      body = body(object),
      env = new
    )
  }
  assign(x = name, value = object, envir = new, inherits = FALSE)
  invisible()
}

But that has a problem too: functions produced with Vectorize() really do need their closures.

old <- new.env(parent = globalenv())
old$y <- 1
eval(
  quote(
    f <- Vectorize(function(x){
      x + y
    }, "x")
  ),
  envir = old
)
new <- new.env(parent = globalenv())
reassign_object("f", old, new)
new$f(0)
#> Error in names %in% vectorize.args: object 'vectorize.args' not found

Similarly perverse edge cases may arise in the custom code of drake's more aggressive users.

@wlandau
Copy link
Member Author

wlandau commented Aug 10, 2018

So that leaves us with environment<- all over again, and at this point, I am still not willing to use it as a foundation for drake's internals. But at least my experimentation is in the thread for reference.

@wlandau
Copy link
Member Author

wlandau commented Dec 15, 2018

@wlandau
Copy link
Member Author

wlandau commented Dec 15, 2018

I am revisiting this issue in response to a recent suggestion in the comments: create a special new environment and lock it. That way, impure functions error out. We could maintain a separate enclos argument to eval() to contain targets that get built over the course of make().

@wlandau wlandau reopened this Dec 15, 2018
@wlandau
Copy link
Member Author

wlandau commented Dec 15, 2018

Yet again, I experimented a version of expose_envir() that used the environment<- trick only on functions, and functions from Vectorize() lost their closures. We should probably just recommend make(session = callr::r_vanilla())`.

@wlandau
Copy link
Member Author

wlandau commented Dec 15, 2018

See #619 and the prework #620.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants