-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ShellTask #681
Improve ShellTask #681
Conversation
src/prefect/tasks/shell.py
Outdated
- command (string, optional): shell command to be executed; can also be | ||
provided post-initialization by calling this task instance | ||
- env (dict, optional): dictionary of environment variables to use for | ||
the subprocess; can also be provided at runtime | ||
- helper_fns (List[str], optional): a list of strings, each of which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the example in this test, I think this could get a more generic name -- like "pre-run" or something, since it looks like it just runs whatever code blocks are put here as a way of "pre-seeding" the terminal prior to the main command.
This raises the question -- why not just have users prepend whatever they'd put here to the command
itself? It seems like a string concatenation task '\n'.join(helper_fns + [command])
would do it? I'm not trying to force users to do more, I'm just wondering if there's an argument for exposing a specific hook for pre-executing code vs just expecting users to provide fully-fledged shell scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I considered that. The reason I did it this way is because I'm starting to really like our API for creating what I call "task templates" which are initialized tasks with various default settings that can be inherited from / overwritten via calling the task in a Flow context.
So to be more concrete with this particular task in mind, let's say I have a little helper function I use for updating config settings on my hadoop cluster / spark cluster, and throughout my flow I intend to update settings prior to various jobs. Using this kwarg I can:
config_script = ShellTask(helper_fns=["presumably_nontrivial_function_definition"],
env=dict("various-env-settings"))
with Flow():
home_dir = config_script(command="spark_config update $(pwd)")
some_spark_job = sparktask(upstream_tasks=[home_dir])
update_port = config_script(command="spark_config port-update 4200")
another_spark_job = another_sparktask(upstream_tasks=[update_port])
etc. It seemed like a cleaner separation between "preparing my environment" from "run a specific command / script" IMO. Open to pushback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, cool. I see the pattern, I wasn't thinking as much about the "configuration" aspect of it where you create the task template early and reuse it. 👍
So second question: helper_fns
feels a little specific, as this is really any code you want to run before your code runs (could include a cd
or any prep work, really). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that makes sense. What would be a good name in that case? pre_run
, and just accept a string instead of a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre_run
or pre_execute
or even before_execute
? /shrug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to helper_script
I am ...heartened... that my name is attached to dependent PRs submitted on a weekend ;) |
Approved with an open (optional) question about |
src/prefect/tasks/shell.py
Outdated
shell: str = "bash", | ||
**kwargs: Any | ||
): | ||
self.command = command | ||
self.env = env | ||
self.helper_fns = helper_fns or [] | ||
self.helper_script = helper_script or "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably doesn't matter, but you check if self.helper_script:
below before writing this value so I think it might be safe to leave this as None
(or do this or ""
there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Reduce brittleness of crash capturing test
Thanks for contributing to Prefect!
Please describe your work and make sure your PR:
CHANGELOG.md
(if appropriate)docs/outline.toml
for API reference docs (if appropriate)What does this PR change?
This PR removes the
cd
kwarg onShellTask
(this seems way too brittle when executing on remote systems) and adds ahelper_fns
kwarg for pre-populating the shell with helper functions that the commands can utilize. It also refactors the execution of commands to use temporary files, so it's much more robust to multiline commands / full scripts even.Why is this PR important?
This PR improves the usability of the
ShellTask
.Pulling a weekend Jeremiah here, this PR depends on #680