Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transliteration with beam-search #48

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from
Draft

Add transliteration with beam-search #48

wants to merge 18 commits into from

Conversation

jerinphilip
Copy link
Owner

@jerinphilip jerinphilip commented Jan 22, 2024

Work in progress

Eventual goal is to implement beam-search for transliteration to generate multiple candidates (unlike the greedy decoding after forced overfitting in the case of translation).

seq2seq is an overkill for transliteration. The overkill mostly happens with an expert user in a deterministic (and not statistical) IME. The hopes of this effort is that NNs as powerful enough function approximators can be used to make life easier for a no-expert user. The following benefits come to mind:

  1. No need to switch between case alterations and symbols (~), simply type all small letters to get associated most likely outputs.
  2. Use beam-search to generate multiple targets that are most likely from a given source variation.
  3. Robustness to typing errors and noise. Some masked character training should allow the network to guess the most suitable character/subword from context.
  4. Long-context selection. GBoard (WFSTs?) fails with really long agglutinated sequences, seq2seq with transformers appear to be doing better on a cursory try-out (this claim will have to be validated).

The model trained for a first-exploration already provides good enough variations among candidates.

naal
0 ||| നാൽ ||| F0= -0.247635 ||| -0.247635
0 ||| നാൾ ||| F0= -2.30577 ||| -2.30577
0 ||| നാല് ||| F0= -2.57854 ||| -2.57854
0 ||| നാള് ||| F0= -4.42439 ||| -4.42439
0 ||| നാല ||| F0= -4.5098 ||| -4.5098

@jerinphilip jerinphilip changed the title Add transliteration with beam-search (multiple candidates) Add transliteration with beam-search Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant