Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement serialize_brat() and add test for that and load_brat() #31

Merged
merged 14 commits into from
Jan 30, 2022

Conversation

ArneBinder
Copy link
Owner

@ArneBinder ArneBinder commented Jan 12, 2022

  • insignificant refactor of load_brat, semantic is still the same (just outsourced the loop and variable renaming)
  • minor improvement to hf_datasets/brat.py: if data_dir contains subfolders, take these as split names (if subdirectory_mapping is not already specified)
  • re-introduces functionality to create train test splits directly in load_brat when parameter train_test_split: Dict[str,Any] is available and the loaded data has only one split. The parameter train_test_split gets passed to loaded_split.train_test_split as kwargs.
  • add serialize_brat: serialize a corpus of documents (i.e. Dict[str, List[Document]]) to strings or write into folder(s)
  • add test for load_brat
  • add tests
    • load_brat
    • serialize_brat - in memory
    • serialize_brat - write into folder
    • serialize_brat - full cycle, i.e load -> serialize -> load

@ArneBinder ArneBinder changed the title add test for load_brat() implement serialize_brat() and add test for that and load_brat() Jan 16, 2022
@ArneBinder ArneBinder force-pushed the feature/test_load_brat branch 4 times, most recently from 2e3ce76 to 4cde51b Compare January 17, 2022 01:57
@ArneBinder ArneBinder force-pushed the feature/test_load_brat branch from 4cde51b to c9ef5f5 Compare January 17, 2022 10:21
@ArneBinder
Copy link
Owner Author

@ChristophAlt just rebased onto main, could you have a look at this, when you have a bit of time?

…tations and documents when they are not already specified; load_brat: strip annotation typ marker (T or R) from ids
@ChristophAlt ChristophAlt merged commit 860afbb into main Jan 30, 2022
@ChristophAlt ChristophAlt deleted the feature/test_load_brat branch March 4, 2022 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants