-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic Tree Generation Part 4 #1486
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1486 +/- ##
==========================================
- Coverage 41.85% 41.53% -0.33%
==========================================
Files 177 176 -1
Lines 29490 29833 +343
Branches 6097 6191 +94
==========================================
+ Hits 12344 12390 +46
- Misses 16266 16534 +268
- Partials 880 909 +29
Continue to review full report at Codecov.
|
f2ee821
to
9df5781
Compare
8bc4271
to
583acfc
Compare
Okay, this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've finished an initial review. I plan on reviewing in more detail once you've removed the whitespace changes from the commits.
rmgpy/kinetics/arrhenius.pyx
Outdated
while boo: | ||
boo = False | ||
try: | ||
params = curve_fit(kfcn,xdata,ydata,sigma=sigmas,p0=[1.0,1.0,w0/10.0],xtol=xtol,ftol=ftol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section seems to be indented with two spaces. Also, could you add spaces after the commas and possibly add a line break in the call to curve_fit if it ends up too long after adding spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
rmgpy/kinetics/uncertainties.pyx
Outdated
def __get__(self): | ||
return self._mu | ||
def __set__(self, value): | ||
self._mu = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're not doing anything to the value, why do you need to make it a property, rather than a normal attribute?
rmgpy/kinetics/uncertainties.pxd
Outdated
|
||
cdef public double _Tref | ||
cdef public double _mu | ||
cdef public double _sigma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigma
doesn't seem to exist in the .pyx
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
# DEALINGS IN THE SOFTWARE. # | ||
# # | ||
############################################################################### | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be more efficient to cimport numpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem to work here, the import works, but the imported module doesn't have basic things like sqrt or pi.
rmgpy/molecule/graph.pyx
Outdated
@@ -392,6 +392,19 @@ cdef class Graph(object): | |||
vertex.edges = edges | |||
|
|||
return new | |||
|
|||
cpdef _merge(self, Graph other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of underscored methods is that they are not intended to be used outside of the module. Naming it this way also does not help convey the difference compared to the main merge
and split
methods at all.
I think the ideal approach would be to add an argument to merge
and split
indicating whether or not to copy the graph. If that would be too difficult to implement, then the method names should be more reflective of the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, these methods were to get around the fact that subgraph isomorphism did not previously work on disconnected graphs. Since Richard fixed that, do you still need these methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem to be used anymore, so I've removed it.
rmgpy/molecule/group.py
Outdated
if atom.label == label: return atom | ||
raise ValueError('No atom in the functional group has the label "{0}".'.format(label)) | ||
if atom.label == label: | ||
alist.append(atom) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use a list comprehension: alist = [atom for atom in self.verticies if atom.label == label]
Also, you should update the docstring to indicate that this will always return a list. It would be nice to also rename the method to getLabeledAtoms
, but it's not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rmgpy/molecule/molecule.py
Outdated
if atom.label == label: return atom | ||
raise ValueError('No atom in the molecule has the label "{0}".'.format(label)) | ||
if atom.label == label: | ||
alist.append(atom) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the Group method, this could also be a list comprehension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rmgpy/molecule/molecule.py
Outdated
# Do the isomorphism comparison | ||
result = Graph.findSubgraphIsomorphisms(self, other, initialMap, saveOrder=saveOrder) | ||
return result | ||
return Graph.findSubgraphIsomorphisms(self,group,initialMap,saveOrder=saveOrder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed spaces after commas 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
rmgpy/rmg/main.py
Outdated
@@ -386,13 +386,15 @@ def loadDatabase(self): | |||
solvent=self.solvent | |||
|
|||
if self.kineticsEstimator == 'rate rules': | |||
autoGeneratedTrees = ['r_recombination'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make autoGenerated
an attribute of KineticsFamily (and saved in the groups file) to avoid hard-coding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
87ef826
to
4970bef
Compare
skip map possibilities that map multiple graph atoms to the same subgraph atom
note this does not protect against merging two graphs with atoms that are the same reference
and adapt associated test
using this option makes getReactionMatches more robust
don't check accessibility as all nodes are guaranteed to be accessible in a generated tree and R_Recombination has issues with sample molecule generation
skip tree generation related lines when reading chemkin file add generate tree statement indicators
In some cases the symmetry of a group can cause what in most cases stays a regularization dimension to stop being a regularization dimension This can lead to nodes that are splitable looking like they aren't splitable in terms of regularization dimensions This commit clears the regularization dimensions in this case causing it to recompute them and be able to split the node
…e generation Breaks reactions into batches based on a modified stratified sampling scheme Effectively: The top and bottom outlierFraction of all reactions are always included in the first batch The remaining reactions are ordered by the rate coefficients at T The list of reactions is then split into stratumNum similarly sized intervals batches sample equally from each interval, but randomly within each interval until they reach maxBatchSize reactions A list of lists of reactions containing the batches is returned
…s from the tree before reoptimizing with an additional batch Remove nodes that have less than maxRxnToReoptNode reactions that match and clear the regularization dimensions of their parent This is used to remove smaller easier to optimize and more likely to change nodes before adding a new batch in cascade model generation
When the number of reactions is greater than maxBatchSize tree generation switches to the faster Cascade algorithm maxBatchSize is the maximum number of reactions in a batch of the cascade algorithm outlierFraction is the fraction of reactions that are fastest and slowest that are forced to be included in the first batch of the cascade algorithm stratumNum is the number of strata in the stratified sampling scheme used to construct the Cascade algorithm batches maxRxnsToReoptNode is the maximum number of matching reactions at which the Cascade algorithm will prune the node and reoptimize it in the next batch
During node generation the nodes matching the most reactions are created first this means that a naive parallelization will hand all of the most computationally expensive fits to the same process this commit shuffles them randomly so hard fits are spread evenly between processors and then reorders them after processing
One issue with the Cascade algorithm is that information about the regularization dimensions of nodes that match lots of reactions is prohibitively expensive to compute this commit tests each "guessed" regularization dimension after it is applied to check whether it actually is a regularization dimension, if it isn't that particular regularization is reversed and ignored
Ok, I made the whitespace changes and removed the travis and deploy commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge after tests pass.
This adds parallelization of tree generation, integration with RMG kinetics estimation and on-the-fly uncertainty calculation.