-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intent to participate [First lines of novels] #75
Comments
Marking this one complete! Big thanks to everyone who contributed to the dataset. Writeup and highlights here: http://aiweirdness.com/post/168051907512/the-first-line-of-a-novel-by-an-improved-neural I ended up using a syll-rnn (lstm mode) to do the generation, which ran for about 16 hours on my Macbook. Syll-rnn seems to be better at larger datasets than char-rnn, yet can handle a larger vocabulary than word-rnn. Here's the framework I used: https://github.com/learningtitans/torch-rnn/blob/valle-syllables/doc/flags.md#preprocessing Sequence length was 40 syllables (based roughly on the number of syllables in "It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife." 140,000 words of output available here. Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”. Crowdsourced dataset available here: https://github.com/janelleshane/novel-first-lines-dataset |
(We're using issues as a sort of forum, so I'll re-open this to make it easier to find.) Good stuff!
I think the eternal sand is quite appropriate for NaNoGenMo!
|
Thanks for clearing that up! And for adding the completed tag! Yes, eternal sand. People have been making Star Wars jokes at me all day. |
A tiny dataset produced mixed results in my first attempt to generate the first sentence of a novel http://aiweirdness.com/post/167049313837/a-neural-network-tries-writing-the-first-sentence
Highlights:
Lowlights:
The really big repositories I've found (Project Gutenburg, for example) are formatted inconsistently enough that they're difficult to scrape.
So now I'm crowdsourcing a larger dataset: https://docs.google.com/forms/d/e/1FAIpQLScod8P-kcLX98u6gT0rX6-20GwkDo_glz-okVVkrhr6KgQONQ/viewform. This has been posted for about 36 hours and already has 3532 submissions (not all unique). People are welcome to contribute through this form - or let me know if you have a smarter way to contribute a dataset.
At the end of the month, I'll try again with a hopefully much larger dataset, and post the results and dataset afterwards, as well as a link to whatever open-source package I end up using. It won't produce a full novel in the traditional sense, but I'll declare a moral victory if a human announces their admiration of one of the neural network's lines.
The text was updated successfully, but these errors were encountered: