Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Tree Generation Part 4 #1486

Merged
merged 64 commits into from
Aug 7, 2019
Merged

Automatic Tree Generation Part 4 #1486

merged 64 commits into from
Aug 7, 2019

Conversation

mjohnson541
Copy link
Contributor

This adds parallelization of tree generation, integration with RMG kinetics estimation and on-the-fly uncertainty calculation.

@codecov
Copy link

codecov bot commented Jan 20, 2019

Codecov Report

Merging #1486 into master will decrease coverage by 0.32%.
The diff coverage is 31.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1486      +/-   ##
==========================================
- Coverage   41.85%   41.53%   -0.33%     
==========================================
  Files         177      176       -1     
  Lines       29490    29833     +343     
  Branches     6097     6191      +94     
==========================================
+ Hits        12344    12390      +46     
- Misses      16266    16534     +268     
- Partials      880      909      +29
Impacted Files Coverage Δ
...atabase/kinetics/families/R_Recombination/rules.py 100% <ø> (ø) ⬆️
...abase/kinetics/families/intra_H_migration/rules.py 100% <ø> (ø) ⬆️
rmgpy/data/kinetics/library.py 43.2% <ø> (-0.25%) ⬇️
rmgpy/molecule/molecule.py 0% <0%> (ø) ⬆️
rmgpy/molecule/group.py 0% <0%> (ø) ⬆️
rmgpy/data/kinetics/database.py 49.37% <0%> (ø) ⬆️
rmgpy/molecule/element.py 0% <0%> (ø) ⬆️
rmgpy/reaction.py 0% <0%> (ø) ⬆️
rmgpy/rmg/main.py 23.07% <0%> (-0.04%) ⬇️
rmgpy/data/kinetics/common.py 69.19% <100%> (-1.79%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0131fe...acbbd8e. Read the comment docs.

@mjohnson541
Copy link
Contributor Author

This now removes hard coding for R_Recombination and uses a generated tree for R_Recombination:
index

@mjohnson541
Copy link
Contributor Author

Okay, this is ready for review.

Copy link
Contributor

@mliu49 mliu49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finished an initial review. I plan on reviewing in more detail once you've removed the whitespace changes from the commits.

while boo:
boo = False
try:
params = curve_fit(kfcn,xdata,ydata,sigma=sigmas,p0=[1.0,1.0,w0/10.0],xtol=xtol,ftol=ftol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems to be indented with two spaces. Also, could you add spaces after the commas and possibly add a line break in the call to curve_fit if it ends up too long after adding spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

def __get__(self):
return self._mu
def __set__(self, value):
self._mu = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not doing anything to the value, why do you need to make it a property, rather than a normal attribute?


cdef public double _Tref
cdef public double _mu
cdef public double _sigma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigma doesn't seem to exist in the .pyx file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

# DEALINGS IN THE SOFTWARE. #
# #
###############################################################################
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more efficient to cimport numpy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to work here, the import works, but the imported module doesn't have basic things like sqrt or pi.

@@ -392,6 +392,19 @@ cdef class Graph(object):
vertex.edges = edges

return new

cpdef _merge(self, Graph other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of underscored methods is that they are not intended to be used outside of the module. Naming it this way also does not help convey the difference compared to the main merge and split methods at all.

I think the ideal approach would be to add an argument to merge and split indicating whether or not to copy the graph. If that would be too difficult to implement, then the method names should be more reflective of the difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, these methods were to get around the fact that subgraph isomorphism did not previously work on disconnected graphs. Since Richard fixed that, do you still need these methods?

Copy link
Contributor Author

@mjohnson541 mjohnson541 Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to be used anymore, so I've removed it.

if atom.label == label: return atom
raise ValueError('No atom in the functional group has the label "{0}".'.format(label))
if atom.label == label:
alist.append(atom)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a list comprehension: alist = [atom for atom in self.verticies if atom.label == label]

Also, you should update the docstring to indicate that this will always return a list. It would be nice to also rename the method to getLabeledAtoms, but it's not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if atom.label == label: return atom
raise ValueError('No atom in the molecule has the label "{0}".'.format(label))
if atom.label == label:
alist.append(atom)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the Group method, this could also be a list comprehension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Do the isomorphism comparison
result = Graph.findSubgraphIsomorphisms(self, other, initialMap, saveOrder=saveOrder)
return result
return Graph.findSubgraphIsomorphisms(self,group,initialMap,saveOrder=saveOrder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed spaces after commas 🙁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -386,13 +386,15 @@ def loadDatabase(self):
solvent=self.solvent

if self.kineticsEstimator == 'rate rules':
autoGeneratedTrees = ['r_recombination']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make autoGenerated an attribute of KineticsFamily (and saved in the groups file) to avoid hard-coding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mjohnson541 mjohnson541 force-pushed the atg4 branch 4 times, most recently from 87ef826 to 4970bef Compare July 24, 2019 15:14
@mjohnson541 mjohnson541 added the Before Py3 Should be merged before Python 3 transition label Aug 1, 2019
skip map possibilities that map multiple graph atoms to the same subgraph atom
note this does not protect against merging two graphs with atoms that are the same reference
and adapt associated test
using this option makes getReactionMatches more robust
don't check accessibility as all nodes are guaranteed to be accessible in a generated tree
and R_Recombination has issues with sample molecule generation
skip tree generation related lines when reading chemkin file

add generate tree statement indicators
In some cases the symmetry of a group can cause what in most cases stays a regularization dimension to stop being a regularization dimension

This can lead to nodes that are splitable looking like they aren't splitable in terms of regularization dimensions

This commit clears the regularization dimensions in this case causing it to recompute them and be able to split the node
…e generation

Breaks reactions into batches based on a modified stratified sampling scheme
Effectively:
The top and bottom outlierFraction of all reactions are always included in the first batch
The remaining reactions are ordered by the rate coefficients at T
The list of reactions is then split into stratumNum similarly sized intervals
batches sample equally from each interval, but randomly within each interval
until they reach maxBatchSize reactions
A list of lists of reactions containing the batches is returned
…s from the tree before reoptimizing with an additional batch

Remove nodes that have less than maxRxnToReoptNode reactions that match
and clear the regularization dimensions of their parent
This is used to remove smaller easier to optimize and more likely to change nodes
before adding a new batch in cascade model generation
When the number of reactions is greater than maxBatchSize tree generation switches to the faster Cascade algorithm
maxBatchSize is the maximum number of reactions in a batch of the cascade algorithm
outlierFraction is the fraction of reactions that are fastest and slowest that are forced to be included in the first batch of
the cascade algorithm
stratumNum is the number of strata in the stratified sampling scheme used to construct the Cascade algorithm batches
maxRxnsToReoptNode is the maximum number of matching reactions at which the Cascade algorithm will prune the node and
reoptimize it in the next batch
During node generation the nodes matching the most reactions are created first
this means that a naive parallelization will hand all of the most computationally expensive fits to the same process
this commit shuffles them randomly so hard fits are spread evenly between processors and then reorders them after processing
One issue with the Cascade algorithm is that information
about the regularization dimensions of nodes that match lots of reactions
is prohibitively expensive to compute

this commit tests each "guessed" regularization dimension after it is applied to check
whether it actually is a regularization dimension, if it isn't that particular regularization is reversed and ignored
@mjohnson541
Copy link
Contributor Author

Ok, I made the whitespace changes and removed the travis and deploy commits.

Copy link
Contributor

@mliu49 mliu49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge after tests pass.

@mliu49 mliu49 merged commit 5f95c4d into master Aug 7, 2019
@mliu49 mliu49 deleted the atg4 branch August 7, 2019 03:47
@mliu49 mliu49 mentioned this pull request Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Before Py3 Should be merged before Python 3 transition Complexity: Medium Status: Ready for Review PR is complete and ready to be reviewed Topic: Kinetics Twin RMG-database PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants