Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance PyTorch easyblock to print individual failed tests #2983

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Aug 10, 2023

(created using eb --new-pr)

This enhances the output in the log file by printing not only the test suite of failed tests but also the individual failed tests. This should make it easier to determine if there is only a single test in a test suite that fails for everyone (or most) so we can disable that single test

Edit: As identified in easybuilders/easybuild-easyconfigs#18421 (comment) there are more issues with those possible solutions:

  1. Patch run_test.py to not shard at all (any maybe disable the exit-on-first-failure)
  2. export BUILD_ENVIRONMENT=slow-gradcheck as a hack to disable the parallelization: https://github.com/pytorch/pytorch/blob/v1.13.1/test/run_test.py#L721 although that might get forgotten when the code changes (and it seems to have changed in PyTorch 2 already and was only introduced in 1.13)
  3. Make matching the test-suite-name optional for this pattern, seems to work for this example at least
  4. Look into parsing the XML report (enabled via --save-xml) which might be the best option but requires quite some work.

Edit: As for 4.:
It needs 2 Python packages (i.e. builddependencies): lxml and unittest-xml-reporting and a patch for PyTorch to propagate --save-xml.
But then it has folders named after the tests with 1 or more xml files containing e.g. <testsuites><testsuite name="pytest" errors="0" failures="0" skipped="127" tests="476" time="408.891" timestamp="2023-08-16T13:49:53.750990" hostname="taurusi8002"><testcase classname="TestJitCPU" name="test_jit_alias_remapping_abs_cpu_float32" time="0.063" file="test_ops_jit.py" /> which does look helpful

@boegel boegel changed the title Print individual failed PyTorch tests enhance PyTorch easyblock to print individual failed tests Aug 15, 2023
@boegel boegel added this to the next release (4.8.1?) milestone Aug 15, 2023
@Flamefire
Copy link
Contributor Author

@branfosj This change has been tested now multiple times and improves things. I added the possible further enhancements to the OP as found in that other build.
How to proceed?

Copy link
Member

@branfosj branfosj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change and then this is good to go. It has been well tested with various PyTorch PRs.

@branfosj branfosj enabled auto-merge August 24, 2023 16:00
@branfosj branfosj merged commit 6623c0e into easybuilders:develop Aug 24, 2023
@Flamefire Flamefire deleted the 20230810095335_new_pr_pytorch branch August 24, 2023 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants