Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swann's Way Through The Night Land #91

Open
VincentToups opened this issue Nov 12, 2014 · 15 comments
Open

Swann's Way Through The Night Land #91

VincentToups opened this issue Nov 12, 2014 · 15 comments

Comments

@VincentToups
Copy link

My project, Swann's Way Through The Night Land, generates a novel by using Word2vec to construct vectors for all sentences in two public domain novels (The Nightland, by William Hope Hodgson and Swann's Way, by Marcel Proust) and then replacing all sentences in the first with their closest matches from the second.

@VincentToups
Copy link
Author

Read the generated novel here

@cpressey
Copy link

Interesting. I was thinking of trying something similar using py-editdist.

I tried reading The Night Land a year ago, but only managed to get halfway through, as you will perceive. (Haven't read Swann's Way though.)

It's definitely weird trying to "see through" to the original story while trying to read this!

@VincentToups
Copy link
Author

I've been thinking of a variety of ways to build edit distance metrics on top of sentences. It turns out these things are pretty similar to my dissertation work, which was on, in part, embedding neural spike trains in appropriate metric spaces for automatic clustering.

At present I produce the vector for a particular sentence by summing up the vectors for individual words, but this doesn't really capture the fact that sentences are often as much about how one gets to the meaning as they are about the meaning itself. An edit distance metric on the word2vec vectors would preserve information about both the meaning of the sentence and the path dependence. Not sure that it would improve the results here dramatically, though.

I am super busy this month, sadly, so I probably won't have time to really fiddle with this stuff.

@VincentToups
Copy link
Author

Oh yeah, cpressey, try reading "The Night Land, A Story Retold" which is by James Stoddard. It is basically a rewrite of the original book to render it slightly more readable.

I also enthusiastically recommend the stories available here, some of which are absolutely great examples of science fiction, and all are set in The Night Land's setting.

Proust doesn't really need any recommendation, of course.

@cpressey
Copy link

@VincentToups Excellent, thanks for the links.

@enkiv2
Copy link

enkiv2 commented Nov 12, 2014

This one is really quite neat. It reads like a fairly oblique human-written
book. Are you doing edit distance based on the english translation of A
Cote Chez Swann, or the original french?

On Wed Nov 12 2014 at 8:25:43 AM Chris Pressey [email protected]
wrote:

@VincentToups https://github.com/VincentToups Excellent, thanks for the
links.


Reply to this email directly or view it on GitHub
#91 (comment)
.

@MichaelPaulukonis
Copy link

emacs lisp ?!!?!?!

(setq impressed t)

@VincentToups
Copy link
Author

Word2vec did all the heavy lifting, but I still resent (good naturedly)
the implication that Emacs Lisp isn't a real enough programming language.
Emacs has built in support for efficient access and editing of large files
and passable token users for words and sentences.

I started the project in Clojure, in fact, but changed when I realized how
much easier it would be in Emacs Lisp.

@VincentToups
Copy link
Author

And @enkiv2 I'm just using the English version of Swann's Way on project gutenberg. Check out the resources directory in the repo. The text is in there.

@christiaanw
Copy link

I couldn't directly see which novel you were retelling through the sentences of the other, so I Ctrl-F-ed for madeleine.

Could also have done that for Combray, or Swann, of course.

And it really gets me, because I'm trying to make sense of the gap in thought between these long sentences.

In a sense it reminds me of Say Anything, they're also using sentence similarity metrics, but they're using it to get the next sentence in for a partly user-generated story.

@VincentToups
Copy link
Author

@christiaanw I think this approach definitely leaves things to be desired. I might try using the King James Bible, which is closer stylistically to The Night Land, and might produce more interesting results.

Novels are so complexly interdependent things, its hard for any generational approach to capture that level of correlation while simultaneously evidencing a superficial "arc" of story and character development. I would guess that simulation would be a much more fruitful approach to novel generation. Also more consistent with the idea of the novel as a "fictive dream," in which as much is revealed as is hidden.

I expect that a very simple set of simulation rules would be able to generate some basic interesting stories, but stretching that out the length of a novel would probably stress the complexity of any reasonably sized simulation.

@cpressey
Copy link

@VincentToups Encouraged by your results, I decided to go ahead with a similar replacement approach, except at the word level instead of the sentence level, and using the (much simpler) Levenshtein edit distance metric. Just to see what it would be like.

Replacing the words of "The Masque of the Red Death" with words from "Don Quixote" resulted in -- what else? -- "The Basque of the Red Death".

@VincentToups
Copy link
Author

Woah! This is kind of amazing!

@MichaelPaulukonis
Copy link

I still resent (good naturedly) the implication that Emacs Lisp isn't a real enough programming language.

No implication was intended. I do all of my non-.NET work in Emacs.

While I've played with Emacs Lisp over the years, I've never gotten to the point where I could write anything serious with it. Much to my regret.

Now, Javascript - that's something I can handle. AND I can use this knowledge at work. (:::sigh::: I've never worked anywhere that has another Emacs user.)

I'm curious to see if the new Guile Emacs will change the playing field for Emacs or Guile.

And then there is Elnode, which can even be run on Heroku. I've thought about setting up a markov-page with disassociated-press as the back-end, since that's such a .... weird implementation. Once I saw that Jamie Zawinski referred to the source-code as obscure and impossible to understand [paraphrase, I can't find back the source], I knew I never had a chance of understanding it myself.

@VincentToups
Copy link
Author

Emacs Lisp is a Lisp, which means that for projects where you want something comfortable and easy to use, its great. I'd be happy if Guile Emacs improved the performance of my Emacs Lisp code but I don't really care to use Scheme in Emacs - Emacs Lisp is good enough and I have an enormous amount of code written in it already.

Re NaNoGenMo, I regenreated the novel with a vector set generated from the two books, which seems to have improved, slightly, the results. I also generated a comical output with a "source novel" containing just a few negative and a few positive sentences:

I

MIRDATH THE BEAUTIFUL

I felt good.

It was horrible.

It was great.
Then it was good.
She felt good.

He felt good.

Then it was horrible.

Then it was great.

It was good.

I felt bad.

Then it was bad.

I felt good.

It was horrible.

It was great.

Then it was good.

She felt good.

He felt good.

Sadly, it is still pretty much random. I have thought a lot about this but I don't think there is an easy solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants