Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate tests to pytest + add GA #127

Merged
merged 51 commits into from
Jul 31, 2024
Merged

Migrate tests to pytest + add GA #127

merged 51 commits into from
Jul 31, 2024

Conversation

eitanturok
Copy link
Contributor

@eitanturok eitanturok commented Jul 29, 2024

In this PR, I

  1. Migrated 400+ tests from absl-py to pytest
  2. Add pyproject.toml to handle us importing torch in setup.py
  3. In pyproject.toml add settings/configurations for pytest and coverage.
  4. Add packages to the setup.py dev requirements for testing
  5. In setup.py, use up to date versions of grouped_gemm and stk packages that properly handle build isolation (@tgale96 merged my PRs into those libraries.)
  6. Update .gitignore, based on Composer's .gitignore
  7. Update Makefile to be used with testing, based on Composer's Makefile
  8. Add GA for testing on 1 and 8 GPUs on every push using ci-testing repo, based on Composer's GA

All the tests currently pass. And you can see the Github Action working properly in my fork here.

Notes:

  1. I kept the original absl-py test files for all the tests that I migrated to pytest. I will delete these in a separate PR, once I've finished migrating all tests to pytest. This makes it so I can make sure all the absl-py and the pytest tests pass at the same time.
  2. There are certain tests that run on multiple GPUs or are launched using a bash script. I have not yet migrated these to pytest but will do so in a subsequent PR.

@eitanturok eitanturok requested review from dakinggg and mihir-db July 29, 2024 16:10
@eitanturok eitanturok changed the title Add Github Action for Tests Migrate tests to pytest + add GA Jul 29, 2024
@eitanturok eitanturok requested a review from b-chu July 29, 2024 16:54
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as initial PR.

@tgale96 any other tests you want to include / would be helpful?

@eitanturok
Copy link
Contributor Author

Updates:

  1. Thanks for the comments. I addressed all of them.
  2. I've migrated 450+ tests from absl-py to pytest, including low level tests for individual ops and higher-level tests that ensure MoEs are implemented properly across various sparsity, memory-optimization, and dropless settings.
  3. The Github Action for testing should work. For reference, the Github Actions were working in my fork here.
  4. parallelism_test.py required 8 GPUs. I've updated it to only use 2 GPUs. We now only run testing jobs on 1 and 2 GPUs.
  5. I've since added a conftest.py file and various autouse fixtures to make it easier to test with distributed inference and add bells and whistles.

To do:
3. Do we want to cut a new tag for stk and grouped_gemm for this release?
2. Should I delete the old test files that were written in absl-py (and remove the absl-py dpeendency)?

@mvpatel2000
Copy link
Contributor

  1. Do we want to cut a new tag for stk and grouped_gemm for this release?

Let's do in follow-on PR

    1. Should I delete the old test files that were written in absl-py (and remove the absl-py dpeendency)?

Yes I think so.

@eitanturok eitanturok requested a review from b-chu July 30, 2024 16:12
Copy link
Collaborator

@b-chu b-chu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eitan, you shouldn't resolve conversations unless you address the comment or answer the question

Have you tested setup.py? Are all tests supposed to be GPU tests?

@eitanturok
Copy link
Contributor Author

eitanturok commented Jul 30, 2024

Eitan, you shouldn't resolve conversations unless you address the comment or answer the question

Have you tested setup.py? Are all tests supposed to be GPU tests?

Sorry, I shouldn't have resolved those conversations.

In particular, I've now addressed your comment about using v0.1.0 of ci-testing. I fixed some bugs and we are now using v0.1.0 for ci-testing and have now resolved that comment. Wanted to make sure that is alright.

I do not have explicit tests for the setup.py. But when we install MegaBlocks to run all of our tests, we implicitly need the setup.py to work correctly. Are there any specific tests you recommend adding for the setup.py?

And yes, all tests are currently GPU tests. We are mainly testing two things: ops and layers. Ops are wrappers around low level kernels; these kernels only work on GPUs. Layers are different MoE configurations where we need to perform inference; again, this is done on GPUs because it is inference. There is not much more in MegaBlocks beyond this and so there are not many non-GPU tests to run. Happy to add non-GPU tests if you have any in mind.

Also, this is just meant to be a first PR. I'll add more tests for more complete coverage in future PRs.

@eitanturok eitanturok requested review from b-chu and mvpatel2000 July 30, 2024 18:15
Copy link
Collaborator

@b-chu b-chu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@b-chu
Copy link
Collaborator

b-chu commented Jul 31, 2024

Have you manually run setup and GPU tests?

@eitanturok
Copy link
Contributor Author

Have you manually run setup and GPU tests?

Yes.

Copy link
Collaborator

@b-chu b-chu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@eitanturok eitanturok merged commit 1ae7703 into databricks:main Jul 31, 2024
@eitanturok eitanturok deleted the eitan-tests branch July 31, 2024 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants