Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with futile.logger #268

Open
adrfantini opened this issue Dec 7, 2018 · 8 comments
Open

Integration with futile.logger #268

adrfantini opened this issue Dec 7, 2018 · 8 comments

Comments

@adrfantini
Copy link

Hi and thanks for the awesome package.
I usually use futile.logger to log to the console, but unfortunately when using parallel futures, messages are relayed only at the end of the job. Is there a way to make the logging call flush directly to the console?

I opened a similar issue (zatonovo/futile.logger#84) on futile.logger's github page, and the author suggested to raise the matter here.

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Dec 7, 2018

Is there a way to make the logging call flush directly to the console?

I generally, not, but it might be possible to add support for this for certain future backends (e.g. future.callr).

For more background, see the discussion in Issue #172 ("DESIGN: Future API - Minimal/Core/Essential API and Extended/Optional API"), which has pointers to other, closely related feature requests.

@adrfantini
Copy link
Author

Thanks for the info. @muxspace, do you find anything in #172 that could help?

@muxspace
Copy link

@adrfantini if the hooks were implemented, maybe that would work? Let's jump back to zatonovo/futile.logger#84 issue in futile.logger, as I have a few questions for you.

@HenrikBengtsson
Copy link
Collaborator

Thanks for the interest. Just for clarification: there are basically two kinds/levels of logging and progress framework people are asking for. It's useful to distinguish those two when discussing feature requests because their implementations are quite different:

  1. The "easy" one is the one that is solely orchestrated from the local, main R session where we can imagine hook functions being signaled/called when a future is created, launched, queried ("are you resolved yet"?), and results are pulled back from the worker. I can also imagine global hook functions that are called for all futures and hook functions that are specific to a particular future (i.e. there might be a need for two different hook function APIs).

  2. The other one is much harder where users/developers want "live" updates from the worker - that requires a much more complicated, asynchronous orchestration framework. To get this working consistently across OSes and R environments (R terminal, Rgui, RStudio, ...) needs to be considered too.

@muxspace
Copy link

I think @adrfantini just wants to have the log messages emitted from futile.logger emitted when executed. They appear to not be flushed until the future is fully resolved. That's my understanding.

@HenrikBengtsson
Copy link
Collaborator

So, then that's the second and much harder kind of output relaying. Below are two minimal example that illustrates the problems involved. These examples correspond to future backends plan(multicore) and plan(multisession).

Example 1: Using forked processes (Unix & macOS only) displays the output from child processes immediately when running R in a terminal, e.g.

> y <- parallel::mclapply(1:2, FUN = print)
[1] 1    <= stdout from the child process
[1] 2    <= stdout from the child process
> str(y)
List of 2
 $ : int 1
 $ : int 2
>

yet you cannot capture it from the main R process;

> out <- capture.output(y <- parallel::mclapply(1:2, FUN = print))
> str(out)
 chr(0)
> str(y)
List of 2
 $ : int 1
 $ : int 2
>

and if called from, say, the RStudio Console, the output is completely lost:

> y <- parallel::mclapply(1:2, FUN = print)
> str(y)
List of 2
 $ : int 1
 $ : int 2
>

Example 2: Using a PSOCK cluster (all OSes), we get:

> cl <- parallel::makeCluster(1L)
> y <- parallel::parLapply(cl, X = 1:2, fun = print)
> str(y)
List of 2
 $ : int 1
 $ : int 2
>

regardless whether R runs in a terminal of, say, RStudio. If running in the terminal (also RStudio Console on Linux), we can do:

> cl <- parallel::makeCluster(1L, outfile = NULL)
starting worker pid=9000 on localhost:11851 at 20:05:19.612  <<= also here

> y <- parallel::parLapply(cl, X = 1:2, fun = print)
[1] 1    <= "terminal" stdout from the child process
[1] 2    <= "terminal" stdout from the child process
> str(y)
List of 2
 $ : int 1
 $ : int 2
>

Also, that stdout is going straight to the terminal and never reaches R, e.g.

> out <- capture.output(y <- parallel::parLapply(cl, X = 1:2, fun = print))
[1] 1
[1] 2
> str(out)
chr(0)
>

But if using RStudio Console on Windows, I think the above output is completely lost.

I don't see how it's possible to relay the output from print() in a "live" fashion in a consistent manner. This is why I say, it may be done for some parallel backends and R setups but not all.

@adrfantini
Copy link
Author

As far as my tests go (only on Linux) the live logging does not work in any case if working from the console, but works if logging to a file, which is good and, I would say, covers many usecases.

@s-fleck
Copy link

s-fleck commented Jan 5, 2019

Not a solution, but if you are on linux you can just have a terminal open with tail -f logfile.log. That's pretty close to real-time logging, though not to the R console I guess :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants