Skip to content

hitachi-nlp/FLD-task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FLD Task

FLD utility modules, such as corpus loader, corpus serializer, and metrics calculators.

See the entry-point repository about the whole FLD project.

Release Branches (READ CAREFULLY to determine which branch suits you)

We have currently three branches:

  • NeurIPS_2024 branch (2024-12)
  • NLP_2024_KOBE_BEEF branch (2024-01-24)
  • ICML_2023 branch (2023-08-22)

Please read CAREFULLY the instructions in other FLD repositories to determine which branch is required.

Installation

pip install -e .
python -c "import nltk; nltk.download('punkt')"

Making Prompt-Output Pairs from FLD Corpora

Once the raw FLD corpora are created by FLD-generator, we have to prepare prompt-output pairs for LLM training as follows:

python ./scripts/serialize.py  \
    --train {train_jsonl_path}  \
    --valid {valid_jsonl_path}  \
    --test {test_jsonl_path}  \
    --output-dir {output_dir}

This command will output examples with added prompt_serial and proof_serial fields, corresponding to the prompt and output of the LLMs, respectively.

(Additional) Pushing to Hugging Face Hub

python ./scripts/push_to_hub.py  \
    --train {serialized_train_jsonl_path}  \
    --valid {serialized_valid_jsonl_path}  \
    --test {serialized_test_jsonl_path}  \
    --repo-id {your_name/dataset_name}  \
    --config-name default

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages