Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hailctl] fix hailctl batch submit and restore tests #14186

Closed
wants to merge 20 commits into from

Conversation

danking
Copy link
Contributor

@danking danking commented Jan 19, 2024

Fixes: #14247

Issues resolved herein:

  1. build.yaml tests must not use exit 0 as it exits the test early.
  2. Always prefer orjson to json.
  3. Add --wait which waits for the submitted batch to complete and exits success only when the batch is success.
  4. Whenever working with paths, we must use the realpath which resolves symlinks. In particular, on Mac OS X, /tmp is a symlink to /private/tmp and Python's APIs are inconsistent on whether they return a realpath or a path with symlinks. [1]
  5. If the destination looks like a directory (e.g. "bar:/foo/", "bar:/"), the tests all suggest we should copy into not to. We now check for a trailing slash and copy into.
  6. ln -s src dst means different things depending on whether dst is an extant folder or not. In this PR, I prefer to always be fully explicit so I never rely on ln detecting the destination is a directory and acting differently. Put differently: file_input_to_src_dest now never returns a file source and a destination folder.
  7. We need to create the real_absolute_cwd() on the job before we cd into it.
  8. test_dir_outside_curdir suggests that --file foo/:/ is meant to copy the contents of foo into the root. This cannot be implemented with our symlink strategy (you can't replace the root with a symlink), so I changed the interpretation: a trailing slash on the source is meaningless. If the destination ends in a slash, we "copy into", otherwise we "copy to".
  9. Add examples of --files usage.

[1]:

In [1]: import tempfile
   ...: tempfile.TemporaryDirectory()
Out[1]: <TemporaryDirectory '/var/folders/x1/601098gx0v11qjx2l_7qfw2c0000gq/T/tmp_pmj3lr9'>

In [2]: import os
   ...: os.getcwd()
Out[2]: '/private/tmp'

In [3]: !ls -al /tmp
lrwxr-xr-x@ 1 root  wheel  11 Aug  2 05:44 /tmp -> private/tmp

else:
dest = os.getcwd()
dest = os.path.join(real_absolute_cwd(), os.path.basename(src))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jigold commented at #14183:

Can you explain this change?

The semantics of ln -s src dst depends on whether dst is a directory or not. I found it easier to reason about file_input_to_src_dest when I had the post-condition: for each triplet, the destination is the desired location of a symlink to the source.

Before this change the post-condition was "the destination is the desired location of a symlink to the source or a folder into which we want a symlink placed". The latter situation feels like a footgun. The presence or absence of a trailing slash (which realname removes) affects the semantics. Moreover basename /foo/ prints foo whereas os.path.basename('/foo/') returns ''.

With these changes, we still use the trailing slash to determine if we want copy-to or copy-into, but once that's resolved, we're always dealing with fully specified source-destination file-pairs.

@daniel-goldstein
Copy link
Contributor

@danking
Copy link
Contributor Author

danking commented Feb 1, 2024

I coincidentally discovered the double-slash issue while working on #14176. I'll revisit this PR when that merges.

Dan King added 17 commits February 7, 2024 12:31
Issues resolved herein:

1. build.yaml tests must not use `exit 0` as it exits the test early.
2. Always prefer `orjson` to `json`.
3. Add `--wait` which waits for the submitted batch to complete and exits success only when the batch is success.
4. Whenever working with paths, we must use the `realpath` which resolves symlinks. In particular,
   on Mac OS X, `/tmp` is a symlink to `/private/tmp` and Python's APIs are inconsistent on whether
   they return a realpath or a path with symlinks. [1]
5. If the destination looks like a directory (e.g. "bar:/foo/", "bar:/"), the tests all suggest we
   should copy *into* not *to*. We now check for a trailing slash and copy *into*.
6. `ln -s src dst` means different things depending on whether dst is an extant folder or not. In
   this PR, I prefer to always be fully explicit so I never rely on `ln` detecting the destination
   is a directory and acting differently. Put differently: `file_input_to_src_dest` now never
   returns a file source and a destination folder.
7. We need to create the `real_absolute_cwd()` on the job before we `cd` into it.
8. `test_dir_outside_curdir` suggests that `--file foo/:/` is meant to copy the contents of foo into
   the root. This cannot be implemented with our symlink strategy (you can't replace the root with a
   symlink), so I changed the interpretation: a trailing slash on the source is meaningless. If the
   destination ends in a slash, we "copy into", otherwise we "copy to".
9. Add examples of --files usage.

[1]:

```ipython3
In [1]: import tempfile
   ...: tempfile.TemporaryDirectory()
Out[1]: <TemporaryDirectory '/var/folders/x1/601098gx0v11qjx2l_7qfw2c0000gq/T/tmp_pmj3lr9'>

In [2]: import os
   ...: os.getcwd()
Out[2]: '/private/tmp'

In [3]: !ls -al /tmp
lrwxr-xr-x@ 1 root  wheel  11 Aug  2 05:44 /tmp -> private/tmp
```
…re missing config.ini file, user friendly error messages when script file or --files files do not exists
@danking danking force-pushed the hailctl-batch-submit-fixes branch from 21a6b8a to ec6c8c5 Compare February 7, 2024 17:32
jigold
jigold previously requested changes Feb 7, 2024
write_hello(f'{dir}/hello2.txt')

dir_basename = os.path.basename(dir)
write_script(dir, f'/{dir_basename}/hello1.txt')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the behavior of needing to use basename is different than in test_dir_mount_in_child_dir_to_child_dir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think maybe there's still a bug here. The semantics of / that hailctl currently uses doesn't work well with symlinks.

In test_dir_mount_in_child_dir_to_child_dir:

--files child/:/child/

The trailing slash on the source, child/, indicates we want to work with the contents of child. The trailing slash on the destination, /child/, indicates we want the destination to be inside /child/. This means hello1.txt and hello2.txt are placed inside the folder /child/. To be clear: I deduced these rules from the code, I'm not married to these being the rules.

In contrast, in test_dir_outside_curdir:

--files {dir}/:/

This should have the same semantics as above but we cannot symlink to / (the OS already exists there) so the only option is to put it in /the-basename. I suspect the code just breaks currently. We should maybe prohibit x/:/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prohibit it if we're doing something special.

@jigold jigold dismissed their stale review February 12, 2024 20:19

see comment

@jigold
Copy link
Contributor

jigold commented Feb 23, 2024

Closing in favor of #14351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[query] Add pgamma, qgamma functions
3 participants