Intent to participate [First lines of novels] #75

janelleshane · 2017-11-04T01:39:27Z

A tiny dataset produced mixed results in my first attempt to generate the first sentence of a novel http://aiweirdness.com/post/167049313837/a-neural-network-tries-writing-the-first-sentence

Highlights:

There was a man and he had seventy first sight.
It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of my life, fire of my loins.
Lowlights:
Stop! I caused the Narguuse man who was new on Alabama, the screaming constipated eggs.
I am an angry grass, the symposium square, proved fatal to the throbbing, the howling wind tire…

The really big repositories I've found (Project Gutenburg, for example) are formatted inconsistently enough that they're difficult to scrape.

So now I'm crowdsourcing a larger dataset: https://docs.google.com/forms/d/e/1FAIpQLScod8P-kcLX98u6gT0rX6-20GwkDo_glz-okVVkrhr6KgQONQ/viewform. This has been posted for about 36 hours and already has 3532 submissions (not all unique). People are welcome to contribute through this form - or let me know if you have a smarter way to contribute a dataset.

At the end of the month, I'll try again with a hopefully much larger dataset, and post the results and dataset afterwards, as well as a link to whatever open-source package I end up using. It won't produce a full novel in the traditional sense, but I'll declare a moral victory if a human announces their admiration of one of the neural network's lines.

janelleshane · 2017-11-30T19:50:49Z

Marking this one complete! Big thanks to everyone who contributed to the dataset.

Writeup and highlights here: http://aiweirdness.com/post/168051907512/the-first-line-of-a-novel-by-an-improved-neural

I ended up using a syll-rnn (lstm mode) to do the generation, which ran for about 16 hours on my Macbook. Syll-rnn seems to be better at larger datasets than char-rnn, yet can handle a larger vocabulary than word-rnn. Here's the framework I used:

https://github.com/learningtitans/torch-rnn/blob/valle-syllables/doc/flags.md#preprocessing

Sequence length was 40 syllables (based roughly on the number of syllables in "It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife."
LSTM size is 512, 3 layers (based on what would fit on my computer; I'm running a 1064-size LSTM now but it's taking a long time and it's not clear that the results will be any better).

140,000 words of output available here. Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”.

https://github.com/janelleshane/novel-first-lines-dataset/blob/master/output_checkpoint10000_temp0p6.txt

Crowdsourced dataset available here: https://github.com/janelleshane/novel-first-lines-dataset

hugovk · 2017-11-30T20:17:17Z

(We're using issues as a sort of forum, so I'll re-open this to make it easier to find.)

Good stuff!

Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”.

I think the eternal sand is quite appropriate for NaNoGenMo!

As a way at the ground, and the cat could have been in the town and a shock and the type on the back of the pilsage and belched and the color of the great little person who was still and the imface of the decoction of the heat between the box against the three interesting seament and the eternal sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand ...

janelleshane · 2017-11-30T20:35:34Z

Thanks for clearing that up! And for adding the completed tag!

Yes, eternal sand. People have been making Star Wars jokes at me all day.

janelleshane closed this as completed Nov 30, 2017

hugovk added the completed For completed novels! label Nov 30, 2017

hugovk reopened this Nov 30, 2017

hugovk added the preview There is an excerpt somewhere in the thread! label Nov 30, 2017

hugovk mentioned this issue Dec 11, 2017

Press and other coverage #2

Open

cpressey mentioned this issue Oct 18, 2018

Language survey 2017 #135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intent to participate [First lines of novels] #75

Intent to participate [First lines of novels] #75

janelleshane commented Nov 4, 2017 •

edited by hugovk

Loading

janelleshane commented Nov 30, 2017

hugovk commented Nov 30, 2017

janelleshane commented Nov 30, 2017

Intent to participate [First lines of novels] #75

Intent to participate [First lines of novels] #75

Comments

janelleshane commented Nov 4, 2017 • edited by hugovk Loading

janelleshane commented Nov 30, 2017

hugovk commented Nov 30, 2017

janelleshane commented Nov 30, 2017

janelleshane commented Nov 4, 2017 •

edited by hugovk

Loading