Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beyond traditional HPC: containers and cloud computing #102

Closed
wlandau opened this issue Aug 24, 2018 · 8 comments
Closed

Beyond traditional HPC: containers and cloud computing #102

wlandau opened this issue Aug 24, 2018 · 8 comments

Comments

@wlandau
Copy link
Contributor

wlandau commented Aug 24, 2018

Can clustermq use workers on AWS, Digital Ocean, arbitrary remote Docker containers, etc.? It seems straightforward, for example, to use the ssh scheduler to deploy to workers on the same AWS instance. But what about a single pool of workers spread over multiple instances?

I was at an R conference last week, and there seems to be uncertainty and debate about the long-term future of traditional HPC systems. cc @dpastoor

@mschubert
Copy link
Owner

Yes, that's definitely on the list.

In principle, you should already be able to use everything that you can connect to via SSH and has multicore set up. However, I have never tested anything like that.

For multiple remote machines, this will require some changes in how clustermq works. These will likely happen, but not in the near future.

@chapmandu2
Copy link

Have you looked at Docker and Kubernetes to do parallel processing in the cloud? A kubernetes cluster is a lot easier to set up on AWS or Azure than a conventional cluster would be, plus you get scaling thrown in. RStudio Server Pro has just added this feature interestingly enough. I'm looking at makeClusterFunctions in batchtools and makeClusterPSOCK in future but I think Kubernetes might be better. Thanks for the great packages.

@pat-s
Copy link

pat-s commented Jan 31, 2019

While we currently building up a HPC, we have several standalone machines. It would be great if we could use the SSH connector to distribute jobs across all machines.

This would perfectly work together with drake and the job argument to make() which could be used to distribute the parallel jobs across as many SSH machines as possible.

@mschubert
Copy link
Owner

Thank you for the hints re kubernetes, @chapmandu2.

@pat-s Is there a reason why you don't set up a scheduler on your HPC? That would not only support clustermq as it is, but also many other tools interfacing with them (that you may want down the line).

@pat-s
Copy link

pat-s commented Feb 1, 2019

As said, we're already building a HPC with warewulf and slurm. Until then, we have several standalone servers that are used for production and cannot be turned off until there is a production ready replacement 🙂 our main goal is to combine all of them but until then, the multiple ssh approach would be a nice thing to have.

@mschubert
Copy link
Owner

I dropped words (the "while") while reading again, you did say. Sorry.

I'm afraid I won't have multiple SSH hosts set up in the next couple of weeks.

@wlandau
Copy link
Contributor Author

wlandau commented Sep 6, 2020

What about AWS Batch? Metaflow uses it.

@wlandau
Copy link
Contributor Author

wlandau commented Sep 6, 2020

Looks like the paws::batch() creates an object with a submit_job() method, though I am not sure how to return the job's data.

Repository owner locked and limited conversation to collaborators Mar 29, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants