-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor datamodule #121
Refactor datamodule #121
Conversation
…st_split; prepare_data_split defaults to value of train_split
…ule to be consistent; use DatasetDict
a2b5f63
to
6fa5d08
Compare
I'm not so much experienced with pytorch-lightning yet, how would it work with that? E.g. if I want to have 5% or 3 examples from my train data each for training and for validation? |
And yes, I also don't like the |
One way to do this is described here -- you can limit the number of batches by either providing an int (num batches) or float (fraction of batches). |
So we have just a parameter |
This will be done by the PIE Dataset / DatasetDict in the future, as it will provide the same interface as the HF implementation (see here). |
OK, this really depends on which level the PIE Dataset / DatasetDict lives on. Is it about |
@ChristophAlt can you have a look again? |
This is in preparation for ArneBinder/pytorch-ie-hydra-template-1#1.
Relevant changes:
train_split
,val_split
, andtest_split
prepare_split
defaults to value oftrain_split
taskmodule.prepare
is only called ifstage == "fit" or stage is None
pytorch_lightning.DataModule
are deprecated, pass remaining keyword arguments passed to__init__
to theDataLoader
ssetup
method a bitPIEDatasetDict = Dict[Union[str, Split], List[Document]],
in analogy to huggingfaceDatasetDict,
to thedata.datasets
package