Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling up to 1000 genomes #15

Open
nikete opened this issue Apr 15, 2016 · 2 comments
Open

Scaling up to 1000 genomes #15

nikete opened this issue Apr 15, 2016 · 2 comments

Comments

@nikete
Copy link
Collaborator

nikete commented Apr 15, 2016

We have two options here: cluster style allreduce or hogwild, we need back of the envelope calculations for which of the two is best. Assuming current machine is on a spinning disk, figure out howmuch faster hogwild on a SSD would be (about 250 times the training size of 1 robot set)

@nikete
Copy link
Collaborator Author

nikete commented May 5, 2016

On a very applied level Hogwild has only been pain to use, it seems give that the data only grows by 100X given the lower depth, we can learn a simple enough model in spanning tree cluster mode and hogwild is not needed.

On a learning theoretic note, it remains a important open issue how to incoprorate the data from 100 0 genomes. The easiest thing is to do initial passes of learning on them and then adjust weights with a few ast passes ont he 50X data. This seems unlikely to lead to much imrpovements to the degree that the varying levels of the features accross both representations will wash out any learning that cn abe transfered. Even with clipping the number of candidates alignments, we do not at the moment have a good normalization strategy to go from depth 7 to 50

@nikete
Copy link
Collaborator Author

nikete commented May 5, 2016

A more principled apporach is to use the data from the individual that we in both the 1000G sample and the 50X depth and truth set sample, to calibrate. Simplest thing that could work is to use single feature approach for the nonstructured case described in http://web.stanford.edu/~kuleshov/papers/nips2015.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant