Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any examples of multiple keywords and an uploaded .wav file #117

Open
JustinGOSSES opened this issue Feb 7, 2018 · 5 comments
Open

Any examples of multiple keywords and an uploaded .wav file #117

JustinGOSSES opened this issue Feb 7, 2018 · 5 comments
Labels

Comments

@JustinGOSSES
Copy link

I would like to use this for keyword tagging on a wav file that the user would upload. I would like to do multiple keywords and have the keywords be supplied by the user at JavaScript runtime.

I already have a working python version of pocketsphinx that does this. I want to implement the same functionality in JavaScript.

I've gone through the issues and although there were several people who had similar questions, none provided examples of working code. @miguelmota provided a working version that was close to what I want in terms of multiple keywords, but his keywords are defined before converting to JavaScript. There was also some language in 2016 discussing the keyword issue that suggesting adding keywords at runtime on the JavaScript end might not be possible.

Any examples of these features (1. working on a loaded wav file 2. Keywords provided by the user at JavaScript runtime) would be appreciated.

Thanks to everyone who has worked on this project!

@syl22-00
Copy link
Owner

syl22-00 commented Feb 8, 2018

As you are talking about uploading a wav file, I assume you want to run pocketsphinx on the server side, in which case it does not make much sense to run pocketsphinx.js. You'd rather run a natively compiled version, possibly wrapped in a JavaScript interface if you want to talk to node.js.

As for having multiple key words or phrases, it only works by providing a file that contains one keyword/phrase per line. The argument to point to that file is -kws. With pocketsphinx.js, the file must be pre-loaded using the lazy load method described in the README, or you can also compile the file into pocketsphinx.js, everything is explained in the README.

@JustinGOSSES
Copy link
Author

JustinGOSSES commented Feb 26, 2018

Actually, I really do want to enable a user to process a wav file entirely via front-end JavaScript.

I already have a working version of sever-side (python & command-line) pocketsphinx. The advantage of getting it to run entirely via front-end JavaScript instead would be that the audio data wouldn't have to move from where it already is, on someone else's computer.

This would enable me to avoid meetings to review security and a bunch of emails back and forth about permissions. If the data doesn't move, less hoops to jump through. Additionally, I wouldn't have to worry about maintaining sever code and a sever environment or running the processing for other people on my own machine. I could just publish an internal github pages and leave it there for end-users to use as a user-supplied-keywords+video -> keyword tagging -> data visualization service.

@JustinGOSSES
Copy link
Author

I have looked at the README and got a small multiple keyword tagging example sorta working based on the instructions there but performance is poor relative to server-side code. "Two" is being found very often even though it isn't in the dict or keywords file. Additionally, it sorta seems like a hard browser reset or restart of the browser (chrome) isn't refreshing certain code changes? I'm not sure if that is a emscripten related thing or not? I'm new to emscripten. I was hoping someone had an example along the lines of what I wanted (multiple keyword tagging with keywords supplied at runtime via UI) to speed debugging. In any case, thanks for your work on this project.

@syl22-00
Copy link
Owner

If you do not need to upload the recording on a server, then pocketsphinx.js is for sure a good solution.

There is no reason why performance, in terms of recognition rate, should be different in the browser compared to a natively compiled version. Of course, this assumes you are using the same acoustic model for both tests, and the same init parameters (pocketsphinx.js displays them in the JavaScript console at init time), so you might want to check that. If you have an example of inconsistent performqnce, you can send it along.

I don't think setting multiple keywords at runtime would work as the only way to set multiple key words or phrases is via a file, not via the API. But there might be a way to dynamically create something that looks like a file to pocketsphinx.js.

Otherwise, you might be able to get something working well using grammars, which can be dynamically set via the API.

JavaScript generated by emscripten should be cached by the browser the same way as other JavaScript files. For wasm files, I don't know. At least a hard refresh should work.

Feel free to share your code, we could then integrate it directly into pocketsphinx.js.

@JustinGOSSES
Copy link
Author

This is a small demo that has both grammar and keyword options based on the demos in the webapp folder. To get a multiple keyword spotting demo on github-pages to work required moving some files from the master branch to the gh-pages branch that weren't previously there. I'm noting it, as it caused me a little confusion initially, regarding why things didn't work at first.

It does keyword spotting for multiple keywords based on words in the dict.txt and kws.dict files. Currently, it uses static file versions of each, but the plan is to have users create those files via a preceding webpage, save files to the local computer, and load as pre-step to running the main pocketshpinx.js webpage.

I'll look into the init parameters next. I might have to put this project on the shelf soon, but I'll hope to get back to it.

To help users convert their video files into audio prepped for pocketsphinx.js entirely in the browser, I'll be using this tool, which is an Emscripten conversion of the command line tool FFmpeg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants