Beyond traditional HPC: containers and cloud computing #102

wlandau · 2018-08-24T15:39:08Z

Can clustermq use workers on AWS, Digital Ocean, arbitrary remote Docker containers, etc.? It seems straightforward, for example, to use the ssh scheduler to deploy to workers on the same AWS instance. But what about a single pool of workers spread over multiple instances?

I was at an R conference last week, and there seems to be uncertainty and debate about the long-term future of traditional HPC systems. cc @dpastoor

The text was updated successfully, but these errors were encountered:

mschubert · 2018-09-08T16:53:46Z

Yes, that's definitely on the list.

In principle, you should already be able to use everything that you can connect to via SSH and has multicore set up. However, I have never tested anything like that.

For multiple remote machines, this will require some changes in how clustermq works. These will likely happen, but not in the near future.

chapmandu2 · 2019-01-30T16:52:22Z

Have you looked at Docker and Kubernetes to do parallel processing in the cloud? A kubernetes cluster is a lot easier to set up on AWS or Azure than a conventional cluster would be, plus you get scaling thrown in. RStudio Server Pro has just added this feature interestingly enough. I'm looking at makeClusterFunctions in batchtools and makeClusterPSOCK in future but I think Kubernetes might be better. Thanks for the great packages.

pat-s · 2019-01-31T15:12:45Z

While we currently building up a HPC, we have several standalone machines. It would be great if we could use the SSH connector to distribute jobs across all machines.

This would perfectly work together with drake and the job argument to make() which could be used to distribute the parallel jobs across as many SSH machines as possible.

mschubert · 2019-02-01T12:07:29Z

Thank you for the hints re kubernetes, @chapmandu2.

@pat-s Is there a reason why you don't set up a scheduler on your HPC? That would not only support clustermq as it is, but also many other tools interfacing with them (that you may want down the line).

pat-s · 2019-02-01T14:39:21Z

As said, we're already building a HPC with warewulf and slurm. Until then, we have several standalone servers that are used for production and cannot be turned off until there is a production ready replacement 🙂 our main goal is to combine all of them but until then, the multiple ssh approach would be a nice thing to have.

mschubert · 2019-02-01T14:42:13Z

I dropped words (the "while") while reading again, you did say. Sorry.

I'm afraid I won't have multiple SSH hosts set up in the next couple of weeks.

wlandau · 2020-09-06T03:26:14Z

What about AWS Batch? Metaflow uses it.

wlandau · 2020-09-06T04:39:48Z

Looks like the paws::batch() creates an object with a submit_job() method, though I am not sure how to return the job's data.

mschubert added the idea label Oct 2, 2018

wlandau mentioned this issue Oct 27, 2018

Remove all non-clustermq parallel backends? ropensci/drake#561

Closed

wlandau mentioned this issue Jan 13, 2020

Cloud storage on a target-by-target basis ropensci/drake#1112

Closed

2 tasks

This was referenced Sep 8, 2020

Cloud computing guide ropensci-books/targets#21

Closed

AWS Batch ropensci/targets#152

Closed

wlandau mentioned this issue Sep 15, 2020

Seamless integration with AWS S3 buckets ropensci/targets#154

Closed

3 tasks

wlandau mentioned this issue Oct 11, 2020

AWS Batch scheduler #208

Closed

hongyuanjia mentioned this issue Nov 7, 2020

Multiple SSH hosts #217

Open

mschubert closed this as completed Mar 29, 2021

Repository owner locked and limited conversation to collaborators Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Beyond traditional HPC: containers and cloud computing #102

Beyond traditional HPC: containers and cloud computing #102

wlandau commented Aug 24, 2018

mschubert commented Sep 8, 2018

chapmandu2 commented Jan 30, 2019

pat-s commented Jan 31, 2019

mschubert commented Feb 1, 2019

pat-s commented Feb 1, 2019

mschubert commented Feb 1, 2019

wlandau commented Sep 6, 2020

wlandau commented Sep 6, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

Beyond traditional HPC: containers and cloud computing #102

Beyond traditional HPC: containers and cloud computing #102

Comments

wlandau commented Aug 24, 2018

mschubert commented Sep 8, 2018

chapmandu2 commented Jan 30, 2019

pat-s commented Jan 31, 2019

mschubert commented Feb 1, 2019

pat-s commented Feb 1, 2019

mschubert commented Feb 1, 2019

wlandau commented Sep 6, 2020

wlandau commented Sep 6, 2020

This issue was moved to a discussion.