-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubleshooting Makefile parallelism for SLURM #117
Comments
make(my_plan, parallelism = "Makefile", args = c("--touch", "--silent")) I should put that one in the parallelism vignette, thanks for another spot on idea. Unfortunately, the For the purposes of this thread, did you only want a better way to troubleshoot, or do you also want a |
Sorry, to clarify, I'm trying to troubleshoot the error I've added the arguments and my example ran; but where is the makefile? The .makefile folder is empty after I run this. |
After a passable night's sleep, I think I know what the problem is. GNU Make thinks
But I think you need a special
And then you call I have not tried this, but an alternative might be the regular
with make(
plan = my_plan,
targets = c("data1", "data2"), # `primer` is built too
parallelism = "Makefile",
jobs = 2,
prepend = c(
"#!/bin/bash",
"#SBATCH -J testing",
"#SBATCH -A landcare00063",
"#SBATCH --time=1:00:00",
"#SBATCH --cpus-per-task=1",
"#SBATCH --begin=now",
"#SBATCH --mem=1G",
"#SBATCH -C sb",
"SHELL=./shell.sh"
),
recipe_command = "srun Rscript -e 'R_RECIPE'"
) ...and you really don't see a |
By the way, if you get it working, I have colleagues from grad school who would really benefit. It would be a great help if you share your solution, maybe here in the parallelism vignette, maybe in an example like Makefile-cluster. |
Alright, we're progressing! Found the makefile; thanks! I tried creating a #!/bin/bash
#SBATCH -J testing
#SBATCH -A landcare00063
#SBATCH --time=1:00:00
#SBATCH --cpus-per-task=1
#SBATCH --begin=now
#SBATCH --mem=1G
#SBATCH -C sb
shift
echo "module load R; $*" | srun And I get this: make(
plan = my_plan,
targets = c("data1", "data2"), # `primer` is built too
parallelism = "Makefile",
jobs = 2,
recipe_command = "srun Rscript -e 'R_RECIPE'",
prepend="SHELL=./shell.sh"
)
check 3 items: print, rnorm, Sys.sleep
import print
import rnorm
import Sys.sleep
check 1 item: simulate
import simulate
srun Rscript -e 'drake::mk(target = "primer1", cache_path = "<wd>/.drake")'
srun Rscript -e 'drake::mk(target = "primer2", cache_path = "<wd>/.drake")'
srun: fatal: No command given to execute.
srun: fatal: No command given to execute.
make: *** [<wd>/.drake/ts/3c356dca4040e3c4] Error 1
make: *** Waiting for unfinished jobs....
make: *** [<wd>/.drake/ts/b3a79b8e12e4bcd5] Error 1 |
Maybe FWIW, this approach dates back to this blog post. My colleagues and I were using that approach in grad school, and it was super convenient at the time. But then they told me it had apparently stopped working, and by then I had graduated and could no longer access the cluster. |
No shell file required, but might not work: |
With ``recipe_command = "srun bash -c Rscript -e 'R_RECIPE'"```, I get the same error as above. With This might help: https://mussolblog.wordpress.com/2013/07/17/setting-up-a-testing-slurm-cluster/ |
That's unfortunate. If Do you still have a Thank you for sending the Vagrant example. Unfortunately, copying over my
With the trouble I'm having installing job schedulers, maybe learning |
I can't run It might help to make sure we're on the same page for I usually do this. I write a
Then submit the job using:
The sbatch command reads the configuration commands and submits the srun(s) to the scheduler. |
Can
with make(
your_plan,
parallelism = "Makefile",
jobs = 8,
recipe_command = "sbatch testing.sl 'R_RECIPE'"
) |
On second thought, rather than deal with shell scripts with arguments, it may be better to go back to your earlier attempt with |
I tried your second most recent suggestion and it successfully submits jobs. However, they all failed with the following error:
All 5 jobs got submitted at once as well, so the solution didn't seem to obey the dependency rules. I'm not sure I understand your most recent suggestion. |
If all 5 jobs got submitted at once, that makes me think we should always be using So maybe this?
# in R
make(
your_plan,
parallelism = "Makefile",
jobs = 8,
recipe_command = "srun testing.sl 'R_RECIPE'"
) The later suggestion probably won't work anyway. |
It doesn't seem to register the account when running with srun.
Should it not include multiple srun commands within the .sl file and run with sbatch at the terminal? |
Then maybe SLURM doesn't see the
# in R
make(
your_plan,
parallelism = "Makefile",
jobs = 8,
recipe_command = "sbatch testing.sl 'R_RECIPE'"
) |
The above runs but again it submits all 5 jobs at once again. I tried a bunch of permutations of |
I really need all the help I can get to get SLURM working on Ubuntu 16.04. |
As I mentioned in #115, I got SLURM to run on a Debian VM. (I followed this guide, substituting in my own user name instead of library(drake)
load_basic_example()
make(
my_plan,
parallelism = "Makefile",
jobs = 2,
prepend = c(
"SHELL=srun",
".SHELLFLAGS=-N1 -n1 bash -c"
)
) I am simultaneously stoked that something this simple actually worked and bothered that I cannot reproduce everyone's errors. I thought it might be because I listed myself in |
How would one add the SBATCH configuration in the above? |
Command line argument to make(
my_plan,
parallelism = "Makefile",
jobs = 2,
prepend = c(
"SHELL=srun",
".SHELLFLAGS=-N1 -n1 bash -c"
)
)
|
@kendonB Please see the response on Stack Overflow. |
Not surprisingly, SLURM arrays are not an option with this approach. The new rslurm package would cover this as a separate special backend. Given the other bottlenecks from |
@kendonB, from what you learned solving #115, do you think #117 could be solved the same way? Is it even worth the time now that you have #115? If you no longer need #117 to work, please let me know. Makefiles with |
I had that |
As described here: #115
I am trying to get Makefile paralellism working using slurm.
First one I get the error
Makefile:9: *** missing separator. Stop.
:I can't seem to find the makefile itself to see what it's actually producing. Is there a way to produce the makefile only without running it?
The text was updated successfully, but these errors were encountered: