-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud computing guide #21
Comments
The biggest potential difference I see to this (relative to Metaflow's approach to AWS) is that the |
If |
@MilesMcBain, your team uses AWS, right? What's your preferred way to interact with it? |
I read up more on AWS ParallelCluster, AWS Batch, and Metaflow's HPC, and |
Hey @wlandau, so far we have preferred to call the AWS CLI directly. This is mainly due to a combination of very simple workflows, and uncertainty about the stability of the AWS-R ecosystem. |
That's helpful, I know |
Reopening. I plan to write about ropensci/targets#176 at least. I will keep my eye on R + cloud packages. Looks like |
Just wrote about S3 integration in the new cloud chapter. Will reopen after ropensci/targets#152. |
Prework
Proposal
There is a community misconception that
targets
(anddrake
) do not have HPC capabilities beyond parallel computing over the cores of a single local machine. On the contrary, both tools support distributed computing on clusters (guides here and here) and the workers do not necessarily need access to the file system of the master process. (In fact, I designedtargets
with an efficient dynamic branching model to go beyond the inherent limitations of map-reduce-like scheduling algorithms and conserve computing resources.) However, I do realize that data scientists from smaller institutions do not always have access to clusters, and an increasing number of folks use AWS. AWS ParallelCluster could be a way to deploy pipelines to the cloud without any need to modifytargets
itself. If it works, we should probably write a tutorial either in the existing HPC chapter or a chapter of its own.ropensci/tarchetypes#8 could be an alternative way to deploy to AWS. The advantage of ropensci/tarchetypes#8 is that we should also get the data versioning capabilities of Metaflow for free, and Metaflow may take care of a lot of the AWS setup. However, each new
tar_metaflow()
will require its own local R worker in order to avoid blocking the master process, which is not ideal.The text was updated successfully, but these errors were encountered: