Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add test for filesystem plugins #1221

Merged
merged 4 commits into from
May 3, 2021
Merged

add test for filesystem plugins #1221

merged 4 commits into from
May 3, 2021

Conversation

vnghia
Copy link
Contributor

@vnghia vnghia commented Dec 9, 2020

I can not merge this PR because of #1222 .

I would like to share with you how I think we should test the plugins. The idea is inspired by modular_filesystem_test. I used parameterized tests for control the plugins and environements variable. Please let me know what you think !

Part of #1183

@vnghia vnghia marked this pull request as draft December 9, 2020 10:07
@vnghia vnghia changed the title add test for filesystem plugins [Do not merge] add test for filesystem plugins Dec 9, 2020
@yongtang
Copy link
Member

yongtang commented Dec 9, 2020

@vnvo2409 parameterized pytest looks good! This can help standardize the file system implementations.

@vnghia vnghia mentioned this pull request Dec 12, 2020
@vnghia
Copy link
Contributor Author

vnghia commented Dec 14, 2020

@yongtang
Could you please take a look ? I wrote it with a super flexibility. However, I am not sure if I am overdoing it or not. Thank you !
( This test requires a new feature from pytest==6.2.0)

@vnghia
Copy link
Contributor Author

vnghia commented Dec 14, 2020

One more thing. I find it hard to test the plugins with environment variables. Because most filesystems like s3 will only check for the envs when initialization.

I am thinking of introduce a new env S3_FORCE_REINITIALIZATION ( we will warn that it is for testing only). Everytime this env is present, everything will be reinitialized by GetTransferManager, GetExecutor and GetS3Client and so on for other filesystem. Another way is reload(tensorflow). WDYT ?

@vnghia vnghia mentioned this pull request Dec 15, 2020
@vnghia
Copy link
Contributor Author

vnghia commented Dec 18, 2020

@yongtang

Do you have any thought about this approach ? Thank you !

@yongtang
Copy link
Member

@vnvo2409 I think one concern is that when you change the configuration/initialization, the change applies process wise - meaning all current ongoing pytests (within the same process) will be impacted. Unless we can force pytest to run in a sequence order then there might have unintended effect. One test belonging to one group may get into the way of another test belonging to another group (with different intended settings)?

Copy link
Contributor Author

@vnghia vnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongtang

I am trying to elaborate what I am understand. In summary, there are 2 layers of isolation ( explained below, you could see that I don't use any os.environ for setting env ). So I think these tests won't clash with each other. Please tell me if there are something unclear.

@vnghia
Copy link
Contributor Author

vnghia commented Dec 31, 2020

@yongtang. Since #1222 is done. We could test the plugins with TF nightly, could you take a look at it ? If it is ok, I will expand for all operations ( not only reading/writing )

@yongtang
Copy link
Member

yongtang commented Jan 4, 2021

@vnvo2409 Sounds good to me 👍. Let me know when the PR draft is ready

@vnghia vnghia force-pushed the fs-test branch 3 times, most recently from f56309a to ba5b62d Compare February 13, 2021 15:04
@vnghia
Copy link
Contributor Author

vnghia commented Feb 14, 2021

@yongtang Sorry for putting this PR on hold for too long. Could you take a look please ? I will add some more tests ( mostly directory-related ) in later PRs. Thank you !

In addition, I've already exposed env through tensorflow/tensorflow#46302. Could we update gcs right now or we have to wait for tensorflow 2.5rc0 ?

@vnghia vnghia marked this pull request as ready for review February 14, 2021 19:04
@vnghia vnghia changed the title [Do not merge] add test for filesystem plugins add test for filesystem plugins Feb 14, 2021
@yongtang
Copy link
Member

@vnvo2409 The PR looks good! There are a couple of test failures on Linux, can you take a look (or disable it temporarily for now)?

@yongtang
Copy link
Member

@vnvo2409 For gcs the plan is to enable modular file system (and use TF_ENABLE_LEGACY_FILESYSTEM=1 to switch on/off) before the 2.5 branch and 2.5.0rc0 is out. Though I think we want to send out the email to community (tf developer google group) first so that people is prepared. /cc @mihaimaruseac

@vnghia
Copy link
Contributor Author

vnghia commented Feb 16, 2021

@yongtang with gcs, I means updating the gcs_env

uint64_t GCSNowSeconds(void) {
// TODO: Either implement NowSeconds here, or have TensorFlow API exposed
std::abort();
}

Nothing is related to the release plan I think. But we need the env header right now is exposed in tf-nightly only so I am not sure if we have to wait for tf-2.5rc0

@yongtang
Copy link
Member

@vnvo2409 tensorflow-io is already using tf-nightly now (see https://github.com/tensorflow/io/blob/master/tensorflow_io/core/python/ops/version_ops.py), which was added as part of PR #1309

@vnghia vnghia force-pushed the fs-test branch 3 times, most recently from ea0cc58 to 0db0cc7 Compare March 5, 2021 16:29
@vnghia
Copy link
Contributor Author

vnghia commented Mar 5, 2021

@yongtang I am so sorry that I will have to put it on hold for much longer. It has some weird errors and I don't have a linux machine right now. I will probably try it again when tf 2.5 is out and aws and google-cloud-cpp are removed from TensorFlow.

@vnghia
Copy link
Contributor Author

vnghia commented Mar 5, 2021

@yongtang It seems that the errors are from the upstream. I would like to ask if it is possible to switch to use google-cloud-cpp bigtable and pubsub client instead of implementing our own client with our own proto client ? I could give it a try if you think it's ok.

@yongtang
Copy link
Member

yongtang commented Mar 5, 2021

I would like to ask if it is possible to switch to use google-cloud-cpp bigtable and pubsub client instead of implementing our own client with our own proto client ? I could give it a try if you think it's ok.

@vnvo2409 That will be great and preferred 👍

@vnghia
Copy link
Contributor Author

vnghia commented May 3, 2021

@yongtang This PR is ready to merge. Turn out azurite was too old and it broke the tests.

The philosophy of the test is still the same. But the filesystem are added manually instead of using parameterized tests which allows greater control and less generated tests. We can also write test for a specific filesystem and reuse the resources created in *_fs.

Still, testing with different envs right now aren't possible as the envs are read only once when the filesystem are loaded for the first time. It will be addressed along with #1342 and #1354.

Copy link
Member

@yongtang yongtang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of minor cosmetic comments though those can be updated in follow up PRs.

@yongtang yongtang merged commit e16df8d into tensorflow:master May 3, 2021
vnghia added a commit to vnghia/io that referenced this pull request May 3, 2021
* add initial tests for s3

* add test for az

* bump `azurite` to `3.12.0`

* add `test_filesystem` to `api`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants