-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
super_serial: automate saving and restoring tfrecords #1918
Conversation
…ith metadata in a header file.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Is this related to tensorflow/tensorflow#38483? |
Yes, this is a solution to that problem. Specifically it automates the process in this comment: tensorflow/tensorflow#38483 (comment) starting from the same tutorial code. The "feature description" steps are handled internally using the header file to determine, store, and retrieve the feature descriptions automatically. |
My impression is that this could have a better fit in Tensorflow IO /cc @yongtang @jsimsa |
Agree. Thanks for the PR @markemus but this seems to scope better in tf/io. You may want to open an issue there to see if they'd like to add it to their repository! |
Will do, thanks for the quick feedback everyone. |
I am actually working on a PR that will provide support for |
Good to hear! This functionality really should have been built into the TFRecord API to begin with. They're a nightmare to work with directly atm. |
@markemus The origin of this was quite old see tensorflow/tensorflow#16926. |
Two years later there's still no good way to save and restore TFRecords in Tensorflow. This PR was accepted and has been included in tf-io for over a year now (and also improved, with tests and additional support added). Any chance you folks have changed your mind about adding it to tensorflow proper? I have used this in multiple real projects and so have some of my coworkers. It's easy and it works well, and the header is stored separately so old reading approaches will still work. I'm also happy to add more features if required. |
This module saves Datasets as TFRecords files alongside a .header file containing the metadata for reconstructing the Dataset. Users only need to call
save(tfrecordpath, headerpath)
andload(tfrecordpath, headerpath)
. It really is that easy.Currently using TFRecords in Tensorflow has a steep learning curve, it can be difficult to write code for complex datasets and it requires you to keep the code needed to read the TFRecord back into memory. Super_serial eliminates these headaches entirely.
Includes a test to demonstrate how it works.