Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the jobs argument. #251

Closed
wlandau opened this issue Feb 10, 2018 · 10 comments
Closed

Refactor the jobs argument. #251

wlandau opened this issue Feb 10, 2018 · 10 comments

Comments

@wlandau
Copy link
Member

wlandau commented Feb 10, 2018

Reasons:

  1. "Jobs" was a poor choice of words to begin with. I actually meant "workers". Now that drake relies so much on future, their vocabularies should agree.
  2. Targets and imports are processed differently. Imports should always be processed locally, while targets may be deployed to a totally different computing environment. The parallelism needs are different in each case. Makefile parallelism can assign different numbers of workers, but all of drake should be capable of this, and it should all be cleaner.
@wlandau
Copy link
Member Author

wlandau commented Feb 10, 2018

We may just go with a workers argument if #227 becomes the only parallel backend. But a switch this dramatic would take a lot of time to transition safely.

@wlandau
Copy link
Member Author

wlandau commented Feb 11, 2018

The max_useful_jobs() would need a deprecation and name change, as will most of the functions that take jobs. So I hesitate to jump into this one too quickly.

@wlandau
Copy link
Member Author

wlandau commented Feb 14, 2018

If the manual scheduler in #227 replaces the non-Makefile backends, there may not actually be anything to do here. But I predict that the staged parallelism mclapply backend will continue to be the best way to process imports because of its minimal overhead. (We might look into mcfork() with the manual scheduler eventually though.)

@wlandau wlandau changed the title Introduce distinct workers_imports and workers_targets arguments to replace jobs Set a different number of remote vs local workers for manual scheduling. Feb 23, 2018
@wlandau
Copy link
Member Author

wlandau commented Feb 23, 2018

What about just one single workers argument? make(..., workers = 8) should set an overarching max of 8 workers at a time, and make(workers = c(future = 100, callr = 4) uses 100 remote workers and 4 local workers. We don't want to overload our local resources just because we want to submit 100 jobs to the cluster. This is not a problem for the old "future_lapply" and "Makefile" backends, but it is already a problem for the experimental new "future" backend and upcoming "callr" backend (#227). Two major tasks:

  1. Bite the bullet and deprecate jobs in favor of workers.
  2. Implement the optional two-part workers argument for the existing parallel computing functionality.

@wlandau wlandau changed the title Set a different number of remote vs local workers for manual scheduling. Refactor and rename the jobs argument. Feb 23, 2018
@krlmlr
Copy link
Collaborator

krlmlr commented Feb 23, 2018

Are you still going to support workers = c(imports = 16), perhaps defaulting to the number of available cores?

@wlandau
Copy link
Member Author

wlandau commented Feb 23, 2018

That's interesting. So if we name just one of the slots, the other should be some other default? For workers = c(future = 16), I was thinking imports should just be 1 if missing. I don't want to accidentally max out somebody's shared login node. On the other hand, I thought an unnmaed workers = 16 would use a max of 16 for each of future and callr. Either that or an overarching max of 16 for both kinds of worker combined.

A note on names: I would actually rather make the distinction between local vs remote workers rather than targets vs imports. With the upcoming parallelism column in workflow plans, some targets may use callr and others may use future (with imports exclusively using callr).

@wlandau
Copy link
Member Author

wlandau commented Feb 25, 2018

Looking back at the Make man page, I remember now why I originally picked "jobs" rather than "workers".

$ man make

...
-j [jobs], --jobs[=jobs]
     Specifies the number of jobs (commands) to run simultaneously...

So "jobs" was not an entirely terrible choice of words. In any case, I am biased toward keeping it because it would be extremely painful to deprectate.

If/when we run multiple jobs per worker, we could make the behavior back compatible: workers could be the max parallel workers, jobs could be the max concurrent commands, and we could let drake do the load balancing.

# 1 worker with 1 job (default)
make(...)

# 2 jobs with 2 workers to match (1 job per worker, back compatible)
make(..., jobs = 2)

# same
make(..., workers = 2)

# 4 concurrent jobs spread over 2 parallel workers 
make(..., workers = 2, jobs = 4) 

# 16 concurrent jobs spread over 4 parallel workers,
# plus 2 multicore jobs on the master process for imports, etc.
make(..., workers = 4, jobs = c(master = 2, workers = 16))

# 16 concurrent jobs spread over 16 parallel workers (one job per worker)
# 2 multicore jobs on the master process for imports, etc.
make(..., jobs = c(master = 2, workers = 16))

Thoughts? I could start implementing jobs = c(master = x, workers = y) fairly soon, and we won't have to create a workers argument until we are ready to have multiple jobs per worker.

@wlandau wlandau changed the title Refactor and rename the jobs argument. Refactor the jobs argument. Feb 25, 2018
@wlandau
Copy link
Member Author

wlandau commented Feb 27, 2018

Are you still going to support workers = c(imports = 16), perhaps defaulting to the number of available cores?

I have been thinking about this, and I decided I agree after all. jobs = c(imports = x, targets = y) gains us a lot of ground, and it is much easier to implement than what I was thinking of before.

@wlandau
Copy link
Member Author

wlandau commented Feb 27, 2018

Fixed via #278. May revisit when it is time to consider multiple jobs per worker.

@wlandau wlandau closed this as completed Feb 27, 2018
@wlandau
Copy link
Member Author

wlandau commented May 11, 2018

Update: because of #369 and its documentation, I am more comfortable leaving the name "jobs" alone. Yes, we actually have persistent workers in the lapply-like parallel backends, but the workers are transient (essentially per-target jobs) for the "future" and "Makefile" backends. So "jobs" can stay a catch-all for "jobs or workers".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants