Improve ShellTask #681

cicdw · 2019-02-24T02:55:10Z

Thanks for contributing to Prefect!

Please describe your work and make sure your PR:

adds new tests (if appropriate)
updates CHANGELOG.md (if appropriate)
updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

What does this PR change?

This PR removes the cd kwarg on ShellTask (this seems way too brittle when executing on remote systems) and adds a helper_fns kwarg for pre-populating the shell with helper functions that the commands can utilize. It also refactors the execution of commands to use temporary files, so it's much more robust to multiline commands / full scripts even.

Why is this PR important?

This PR improves the usability of the ShellTask.

Pulling a weekend Jeremiah here, this PR depends on #680

src/prefect/core/flow.py

jlowin · 2019-02-24T19:11:11Z

src/prefect/tasks/shell.py

        - command (string, optional): shell command to be executed; can also be
            provided post-initialization by calling this task instance
+        - env (dict, optional): dictionary of environment variables to use for
+            the subprocess; can also be provided at runtime
+        - helper_fns (List[str], optional): a list of strings, each of which


Based on the example in this test, I think this could get a more generic name -- like "pre-run" or something, since it looks like it just runs whatever code blocks are put here as a way of "pre-seeding" the terminal prior to the main command.

This raises the question -- why not just have users prepend whatever they'd put here to the command itself? It seems like a string concatenation task '\n'.join(helper_fns + [command]) would do it? I'm not trying to force users to do more, I'm just wondering if there's an argument for exposing a specific hook for pre-executing code vs just expecting users to provide fully-fledged shell scripts.

Yea, I considered that. The reason I did it this way is because I'm starting to really like our API for creating what I call "task templates" which are initialized tasks with various default settings that can be inherited from / overwritten via calling the task in a Flow context.

So to be more concrete with this particular task in mind, let's say I have a little helper function I use for updating config settings on my hadoop cluster / spark cluster, and throughout my flow I intend to update settings prior to various jobs. Using this kwarg I can:

config_script = ShellTask(helper_fns=["presumably_nontrivial_function_definition"], env=dict("various-env-settings")) with Flow(): home_dir = config_script(command="spark_config update $(pwd)") some_spark_job = sparktask(upstream_tasks=[home_dir]) update_port = config_script(command="spark_config port-update 4200") another_spark_job = another_sparktask(upstream_tasks=[update_port])

etc. It seemed like a cleaner separation between "preparing my environment" from "run a specific command / script" IMO. Open to pushback.

Oh, cool. I see the pattern, I wasn't thinking as much about the "configuration" aspect of it where you create the task template early and reuse it. 👍

So second question: helper_fns feels a little specific, as this is really any code you want to run before your code runs (could include a cd or any prep work, really). What do you think?

Yea, that makes sense. What would be a good name in that case? pre_run, and just accept a string instead of a list?

pre_run or pre_execute or even before_execute? /shrug

Updated to helper_script

jlowin · 2019-02-24T19:14:01Z

I am ...heartened... that my name is attached to dependent PRs submitted on a weekend ;)

jlowin · 2019-02-24T22:28:05Z

Approved with an open (optional) question about helper_fns name!

jlowin · 2019-02-25T15:32:24Z

src/prefect/tasks/shell.py

        shell: str = "bash",
        **kwargs: Any
    ):
        self.command = command
        self.env = env
-        self.helper_fns = helper_fns or []
+        self.helper_script = helper_script or ""


Probably doesn't matter, but you check if self.helper_script: below before writing this value so I think it might be safe to leave this as None (or do this or "" there)

Reduce brittleness of crash capturing test

cicdw added 5 commits February 23, 2019 17:57

Update behavior of shell task to only _update_ environment

130a167

Minor tweaks to ShellTask

ad13a16

Refactor ShellTask to be more robust to multiline commands

3c27ddc

Add helper_fns argument to shelltask

2a46eaa

Update Changelog

0d96ef8

cicdw added the depends on other PRs label Feb 24, 2019

cicdw requested review from jlowin and joshmeek as code owners February 24, 2019 02:55

Have Flow on_schedule return None and warn if no more scheduled runs

c6dd32f

jlowin requested changes Feb 24, 2019

View reviewed changes

cicdw added 2 commits February 24, 2019 14:18

Resolve merge conflicts w/ master

89ae8fc

Remove cloud reference from docstring

6aaa3f7

jlowin previously approved these changes Feb 24, 2019

View reviewed changes

cicdw removed the depends on other PRs label Feb 24, 2019

Rename helper_fns helper_script to make clear its general usage

fbacbee

cicdw dismissed jlowin’s stale review via fbacbee February 25, 2019 00:45

jlowin reviewed Feb 25, 2019

View reviewed changes

Update helper_script attr

e53a3b4

jlowin approved these changes Feb 25, 2019

View reviewed changes

cicdw merged commit 420935c into master Feb 25, 2019

cicdw deleted the improve-shell branch February 25, 2019 16:47

cicdw added a commit that referenced this pull request Dec 9, 2021

Merge pull request #681 from PrefectHQ/brittle-test

038c290

Reduce brittleness of crash capturing test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ShellTask #681

Improve ShellTask #681

cicdw commented Feb 24, 2019 •

edited

Loading

jlowin Feb 24, 2019 •

edited

Loading

cicdw Feb 24, 2019 •

edited

Loading

jlowin Feb 24, 2019

cicdw Feb 24, 2019

jlowin Feb 25, 2019

cicdw Feb 25, 2019

jlowin commented Feb 24, 2019

jlowin commented Feb 24, 2019

jlowin Feb 25, 2019 •

edited

Loading

cicdw Feb 25, 2019

Improve ShellTask #681

Improve ShellTask #681

Conversation

cicdw commented Feb 24, 2019 • edited Loading

What does this PR change?

Why is this PR important?

jlowin Feb 24, 2019 • edited Loading

Choose a reason for hiding this comment

cicdw Feb 24, 2019 • edited Loading

Choose a reason for hiding this comment

jlowin Feb 24, 2019

Choose a reason for hiding this comment

cicdw Feb 24, 2019

Choose a reason for hiding this comment

jlowin Feb 25, 2019

Choose a reason for hiding this comment

cicdw Feb 25, 2019

Choose a reason for hiding this comment

jlowin commented Feb 24, 2019

jlowin commented Feb 24, 2019

jlowin Feb 25, 2019 • edited Loading

Choose a reason for hiding this comment

cicdw Feb 25, 2019

Choose a reason for hiding this comment

cicdw commented Feb 24, 2019 •

edited

Loading

jlowin Feb 24, 2019 •

edited

Loading

cicdw Feb 24, 2019 •

edited

Loading

jlowin Feb 25, 2019 •

edited

Loading