-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BENCHMARKING: Record timing and memory stats for the various steps in futures #59
Comments
For memory profile, we have |
See also |
"Note to self": This will most likely require an update to the Future API. More precisely, a backend needs to return not only the value but also other information. Currently, backends return the value (or errors) "as is". In order to return other information, this needs to be updated, e.g. UPDATE 2018-12-27: This part was resolved in future 1.8.0 (2018-04-08) where FutureResult was introduced. |
Hi Henrik - Curious if you have done any benchmarking on what the memory usage is when multiple R sessions are generated (ie, multisession) using a future? Do the new R sessions get new RAM allocations in Windows or are they constrained by the initial R session’s RAM allocation? Have used tcltk::tclTaskSchedule in the past to do similar non blocking parallelization and ran into this RAM issue. Just curious if you’ve run across this at all? Any insight you might have would be greatly appreciated. Your package looks a like it might be better than going the task scheduler route. Thx, Seth |
Hi. When using multisession, a new R session is launched in the background (basically as if you'd start another R session manually). If you launch a vanilla R session you can use the Windows Task Manager to see how much memory that consumes. That'll be your lower-bound memory consumption per background workers. Then, if you parallel code uses functions in packages, then those packages will be loaded/attached as well. That'll consume additional memory (look at Task Manager) - again, this is per worker. On top of that, you'll find that "input data" (arguments and global variables) that the future expression needs, will be exported to the workers, which adds to the memory usage. Because multisession workers live over multiple futures, that is, they don't shut down immediately after a future is resolved, any packages loaded will stay loaded in those workers. However, input data and other created objects will be erased and garbage collected as soon as each worker is done with a future - that helps to keep the memory down. As a rule of thumb, there is no magic parallelization method/framework in R that is more memory efficient that others. I often assume they all use roughly the same amount. Making sure to Also, some people argue that forked processes (used by multicore futures, mclapply(), ... - so not Windows) may consume less memory because of the "shared memory" property of process forking. However, it has been shown/mentioned several that R's garbage collector can really mess this up - if the garbage collector starts running in one of the forked child processes, or the master process, then that originally shared memory can no longer be shared and the operating system starts copying memory blocks into each child process. Since the garbage collector runs whenever it wants to, there is no simple way to avoid this. Also, just in case your Hope this helps. |
Record timing and memory stats for the various steps in futures, e.g. creation, identification of global variables, exporting globals, launching future, time for future to complete, collection of value/exceptions etc. This should be done (optional?) for all types of futures.
The text was updated successfully, but these errors were encountered: