Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ShellTask #681

Merged
merged 10 commits into from
Feb 25, 2019
Merged

Improve ShellTask #681

merged 10 commits into from
Feb 25, 2019

Conversation

cicdw
Copy link
Member

@cicdw cicdw commented Feb 24, 2019

Thanks for contributing to Prefect!

Please describe your work and make sure your PR:

  • adds new tests (if appropriate)
  • updates CHANGELOG.md (if appropriate)
  • updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

What does this PR change?

This PR removes the cd kwarg on ShellTask (this seems way too brittle when executing on remote systems) and adds a helper_fns kwarg for pre-populating the shell with helper functions that the commands can utilize. It also refactors the execution of commands to use temporary files, so it's much more robust to multiline commands / full scripts even.

Why is this PR important?

This PR improves the usability of the ShellTask.

Pulling a weekend Jeremiah here, this PR depends on #680

- command (string, optional): shell command to be executed; can also be
provided post-initialization by calling this task instance
- env (dict, optional): dictionary of environment variables to use for
the subprocess; can also be provided at runtime
- helper_fns (List[str], optional): a list of strings, each of which
Copy link
Member

@jlowin jlowin Feb 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the example in this test, I think this could get a more generic name -- like "pre-run" or something, since it looks like it just runs whatever code blocks are put here as a way of "pre-seeding" the terminal prior to the main command.

This raises the question -- why not just have users prepend whatever they'd put here to the command itself? It seems like a string concatenation task '\n'.join(helper_fns + [command]) would do it? I'm not trying to force users to do more, I'm just wondering if there's an argument for exposing a specific hook for pre-executing code vs just expecting users to provide fully-fledged shell scripts.

Copy link
Member Author

@cicdw cicdw Feb 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I considered that. The reason I did it this way is because I'm starting to really like our API for creating what I call "task templates" which are initialized tasks with various default settings that can be inherited from / overwritten via calling the task in a Flow context.

So to be more concrete with this particular task in mind, let's say I have a little helper function I use for updating config settings on my hadoop cluster / spark cluster, and throughout my flow I intend to update settings prior to various jobs. Using this kwarg I can:

config_script = ShellTask(helper_fns=["presumably_nontrivial_function_definition"],
                                env=dict("various-env-settings"))

with Flow():
    home_dir = config_script(command="spark_config update $(pwd)")
    some_spark_job = sparktask(upstream_tasks=[home_dir])
    update_port = config_script(command="spark_config port-update 4200")
    another_spark_job = another_sparktask(upstream_tasks=[update_port])

etc. It seemed like a cleaner separation between "preparing my environment" from "run a specific command / script" IMO. Open to pushback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, cool. I see the pattern, I wasn't thinking as much about the "configuration" aspect of it where you create the task template early and reuse it. 👍

So second question: helper_fns feels a little specific, as this is really any code you want to run before your code runs (could include a cd or any prep work, really). What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that makes sense. What would be a good name in that case? pre_run, and just accept a string instead of a list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_run or pre_execute or even before_execute? /shrug

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to helper_script

@jlowin
Copy link
Member

jlowin commented Feb 24, 2019

I am ...heartened... that my name is attached to dependent PRs submitted on a weekend ;)

jlowin
jlowin previously approved these changes Feb 24, 2019
@jlowin
Copy link
Member

jlowin commented Feb 24, 2019

Approved with an open (optional) question about helper_fns name!

shell: str = "bash",
**kwargs: Any
):
self.command = command
self.env = env
self.helper_fns = helper_fns or []
self.helper_script = helper_script or ""
Copy link
Member

@jlowin jlowin Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't matter, but you check if self.helper_script: below before writing this value so I think it might be safe to leave this as None (or do this or "" there)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@cicdw cicdw merged commit 420935c into master Feb 25, 2019
@cicdw cicdw deleted the improve-shell branch February 25, 2019 16:47
cicdw added a commit that referenced this pull request Dec 9, 2021
Reduce brittleness of crash capturing test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants