Automatic Tree Generation Part 4 #1486

mjohnson541 · 2018-10-12T18:37:33Z

This adds parallelization of tree generation, integration with RMG kinetics estimation and on-the-fly uncertainty calculation.

codecov · 2019-01-20T22:08:21Z

Codecov Report

Merging #1486 into master will decrease coverage by 0.32%.
The diff coverage is 31.84%.

@@            Coverage Diff             @@
##           master    #1486      +/-   ##
==========================================
- Coverage   41.85%   41.53%   -0.33%     
==========================================
  Files         177      176       -1     
  Lines       29490    29833     +343     
  Branches     6097     6191      +94     
==========================================
+ Hits        12344    12390      +46     
- Misses      16266    16534     +268     
- Partials      880      909      +29

Impacted Files	Coverage Δ
...atabase/kinetics/families/R_Recombination/rules.py	`100% <ø> (ø)`	⬆️
...abase/kinetics/families/intra_H_migration/rules.py	`100% <ø> (ø)`	⬆️
rmgpy/data/kinetics/library.py	`43.2% <ø> (-0.25%)`	⬇️
rmgpy/molecule/molecule.py	`0% <0%> (ø)`	⬆️
rmgpy/molecule/group.py	`0% <0%> (ø)`	⬆️
rmgpy/data/kinetics/database.py	`49.37% <0%> (ø)`	⬆️
rmgpy/molecule/element.py	`0% <0%> (ø)`	⬆️
rmgpy/reaction.py	`0% <0%> (ø)`	⬆️
rmgpy/rmg/main.py	`23.07% <0%> (-0.04%)`	⬇️
rmgpy/data/kinetics/common.py	`69.19% <100%> (-1.79%)`	⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0131fe...acbbd8e. Read the comment docs.

mjohnson541 · 2019-06-03T18:26:43Z

This now removes hard coding for R_Recombination and uses a generated tree for R_Recombination:

mjohnson541 · 2019-07-12T21:23:50Z

Okay, this is ready for review.

mliu49

I've finished an initial review. I plan on reviewing in more detail once you've removed the whitespace changes from the commits.

mliu49 · 2019-05-23T15:23:40Z

rmgpy/kinetics/arrhenius.pyx

+        while boo:
+          boo = False
+          try:
+            params = curve_fit(kfcn,xdata,ydata,sigma=sigmas,p0=[1.0,1.0,w0/10.0],xtol=xtol,ftol=ftol)


This section seems to be indented with two spaces. Also, could you add spaces after the commas and possibly add a line break in the call to curve_fit if it ends up too long after adding spaces.

mliu49 · 2019-05-23T15:28:10Z

rmgpy/kinetics/uncertainties.pyx

+      def __get__(self):
+          return self._mu
+      def __set__(self, value):
+          self._mu = value


If you're not doing anything to the value, why do you need to make it a property, rather than a normal attribute?

mliu49 · 2019-05-23T15:30:13Z

rmgpy/kinetics/uncertainties.pxd

+
+    cdef public double _Tref
+    cdef public double _mu
+    cdef public double _sigma


sigma doesn't seem to exist in the .pyx file.

mliu49 · 2019-05-23T15:32:17Z

rmgpy/kinetics/uncertainties.pyx

+# DEALINGS IN THE SOFTWARE.                                                   #
+#                                                                             #
+###############################################################################
+import numpy as np


I think it might be more efficient to cimport numpy.

Doesn't seem to work here, the import works, but the imported module doesn't have basic things like sqrt or pi.

mliu49 · 2019-05-23T15:38:11Z

rmgpy/molecule/graph.pyx

@@ -392,6 +392,19 @@ cdef class Graph(object):
            vertex.edges = edges

        return new
+
+    cpdef _merge(self, Graph other):


The point of underscored methods is that they are not intended to be used outside of the module. Naming it this way also does not help convey the difference compared to the main merge and split methods at all.

I think the ideal approach would be to add an argument to merge and split indicating whether or not to copy the graph. If that would be too difficult to implement, then the method names should be more reflective of the difference.

IIRC, these methods were to get around the fact that subgraph isomorphism did not previously work on disconnected graphs. Since Richard fixed that, do you still need these methods?

Doesn't seem to be used anymore, so I've removed it.

mliu49 · 2019-07-16T15:42:10Z

rmgpy/molecule/group.py

-            if atom.label == label: return atom
-        raise ValueError('No atom in the functional group has the label "{0}".'.format(label))
+            if atom.label == label:
+                alist.append(atom)


You could use a list comprehension: alist = [atom for atom in self.verticies if atom.label == label]

Also, you should update the docstring to indicate that this will always return a list. It would be nice to also rename the method to getLabeledAtoms, but it's not necessary.

rmgpy/molecule/group.py

mliu49 · 2019-07-16T15:52:19Z

rmgpy/molecule/molecule.py

-            if atom.label == label: return atom
-        raise ValueError('No atom in the molecule has the label "{0}".'.format(label))
+            if atom.label == label:
+                alist.append(atom)


As with the Group method, this could also be a list comprehension.

mliu49 · 2019-07-16T15:55:58Z

rmgpy/molecule/molecule.py

-        # Do the isomorphism comparison
-        result = Graph.findSubgraphIsomorphisms(self, other, initialMap, saveOrder=saveOrder)
-        return result
+        return Graph.findSubgraphIsomorphisms(self,group,initialMap,saveOrder=saveOrder)


You removed spaces after commas 🙁

mliu49 · 2019-07-16T15:58:53Z

rmgpy/rmg/main.py

@@ -386,13 +386,15 @@ def loadDatabase(self):
            solvent=self.solvent

        if self.kineticsEstimator == 'rate rules':
+            autoGeneratedTrees = ['r_recombination']


Could you make autoGenerated an attribute of KineticsFamily (and saved in the groups file) to avoid hard-coding?

skip map possibilities that map multiple graph atoms to the same subgraph atom

note this does not protect against merging two graphs with atoms that are the same reference

and adapt associated test

using this option makes getReactionMatches more robust

don't check accessibility as all nodes are guaranteed to be accessible in a generated tree and R_Recombination has issues with sample molecule generation

skip tree generation related lines when reading chemkin file add generate tree statement indicators

In some cases the symmetry of a group can cause what in most cases stays a regularization dimension to stop being a regularization dimension This can lead to nodes that are splitable looking like they aren't splitable in terms of regularization dimensions This commit clears the regularization dimensions in this case causing it to recompute them and be able to split the node

…e generation Breaks reactions into batches based on a modified stratified sampling scheme Effectively: The top and bottom outlierFraction of all reactions are always included in the first batch The remaining reactions are ordered by the rate coefficients at T The list of reactions is then split into stratumNum similarly sized intervals batches sample equally from each interval, but randomly within each interval until they reach maxBatchSize reactions A list of lists of reactions containing the batches is returned

…s from the tree before reoptimizing with an additional batch Remove nodes that have less than maxRxnToReoptNode reactions that match and clear the regularization dimensions of their parent This is used to remove smaller easier to optimize and more likely to change nodes before adding a new batch in cascade model generation

When the number of reactions is greater than maxBatchSize tree generation switches to the faster Cascade algorithm maxBatchSize is the maximum number of reactions in a batch of the cascade algorithm outlierFraction is the fraction of reactions that are fastest and slowest that are forced to be included in the first batch of the cascade algorithm stratumNum is the number of strata in the stratified sampling scheme used to construct the Cascade algorithm batches maxRxnsToReoptNode is the maximum number of matching reactions at which the Cascade algorithm will prune the node and reoptimize it in the next batch

During node generation the nodes matching the most reactions are created first this means that a naive parallelization will hand all of the most computationally expensive fits to the same process this commit shuffles them randomly so hard fits are spread evenly between processors and then reorders them after processing

One issue with the Cascade algorithm is that information about the regularization dimensions of nodes that match lots of reactions is prohibitively expensive to compute this commit tests each "guessed" regularization dimension after it is applied to check whether it actually is a regularization dimension, if it isn't that particular regularization is reversed and ignored

mjohnson541 · 2019-08-07T02:54:52Z

Ok, I made the whitespace changes and removed the travis and deploy commits.

mliu49

Will merge after tests pass.

mjohnson541 force-pushed the atg4 branch from 12bc1d9 to 089eacc Compare January 20, 2019 22:08

mjohnson541 force-pushed the atg4 branch from 089eacc to 81e9f74 Compare January 20, 2019 23:31

mjohnson541 mentioned this pull request Jan 31, 2019

Automatic Tree Generation Part 3 #1461

Merged

mjohnson541 force-pushed the atg4 branch from daeb948 to 1173ac9 Compare April 11, 2019 20:04

mjohnson541 force-pushed the atg4 branch from 1173ac9 to e1799e6 Compare May 1, 2019 18:56

mjohnson541 force-pushed the atg4 branch 2 times, most recently from f2ee821 to 9df5781 Compare May 20, 2019 19:51

mjohnson541 requested a review from mliu49 May 20, 2019 19:53

mjohnson541 added Complexity: Medium Status: Ready for Review PR is complete and ready to be reviewed Topic: Kinetics labels May 22, 2019

mjohnson541 force-pushed the atg4 branch from 9df5781 to 2a95bca Compare May 23, 2019 17:13

mjohnson541 force-pushed the atg4 branch from 239cc42 to 46ec9b2 Compare June 3, 2019 18:24

mjohnson541 added the Twin RMG-database PR label Jun 3, 2019

mjohnson541 force-pushed the atg4 branch from 46ec9b2 to 17ed74d Compare June 3, 2019 18:35

mjohnson541 self-assigned this Jun 3, 2019

mjohnson541 force-pushed the atg4 branch 4 times, most recently from 8bc4271 to 583acfc Compare June 8, 2019 13:40

mjohnson541 force-pushed the atg4 branch from 277e816 to 91f0c10 Compare July 12, 2019 20:05

mliu49 reviewed Jul 16, 2019

View reviewed changes

mjohnson541 force-pushed the atg4 branch 4 times, most recently from 87ef826 to 4970bef Compare July 24, 2019 15:14

mjohnson541 added the Before Py3 Should be merged before Python 3 transition label Aug 1, 2019

mjohnson541 added 23 commits August 6, 2019 22:53

fix bugs with subgraph isomorphism involving duplicate labels

688dd26

skip map possibilities that map multiple graph atoms to the same subgraph atom

fix bug related to the merging a graph with itself

840d796

note this does not protect against merging two graphs with atoms that are the same reference

add repr method for LibraryReaction

1b1a644

and adapt associated test

adapt isEntryMatch to deal with duplicate labels

ab8bbef

adapt family test to getLabeledAtom changes

dafb9bf

modify testing families to work with new recombination and repr methods

18782a6

adapt recipe application to handle duplicate labels

9ae40cf

adapt tree cleaning to handle duplicate labels

99a7cd5

add a resonance option to isEntryMatch

bc8f177

using this option makes getReactionMatches more robust

remove hardcoding of R_Recombination

78c6a53

adapt database tests to new trees

dfe50b1

don't check accessibility as all nodes are guaranteed to be accessible in a generated tree and R_Recombination has issues with sample molecule generation

ensure generated tree comments appended during loading

09206a3

skip tree generation related lines when reading chemkin file add generate tree statement indicators

improve generated tree rule comments

814beff

don't add rules from training for generated trees

d342ef0

skip node generation if the root node is empty

1ad8e3c

use consistent list of reactions in familyTest

f95a363

privatize appropriate tree generation methods

acbbd8e

mjohnson541 force-pushed the atg4 branch from cee367a to acbbd8e Compare August 7, 2019 02:54

mliu49 approved these changes Aug 7, 2019

View reviewed changes

mliu49 merged commit 5f95c4d into master Aug 7, 2019

mliu49 deleted the atg4 branch August 7, 2019 03:47

mliu49 mentioned this pull request Nov 26, 2019

RMG v3.0.0 Release Planning #1830

Closed

mliu49 mentioned this pull request Dec 16, 2019

RMG-Py v3.0.0 Release #1852

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Tree Generation Part 4 #1486

Automatic Tree Generation Part 4 #1486

mjohnson541 commented Oct 12, 2018

codecov bot commented Jan 20, 2019 •

edited

Loading

mjohnson541 commented Jun 3, 2019

mjohnson541 commented Jul 12, 2019

mliu49 left a comment

mliu49 May 23, 2019

mjohnson541 Jul 23, 2019

mliu49 May 23, 2019

mliu49 May 23, 2019

mjohnson541 Jul 23, 2019

mliu49 May 23, 2019

mjohnson541 Jul 22, 2019

mliu49 May 23, 2019

mliu49 Jul 16, 2019

mjohnson541 Jul 22, 2019 •

edited

Loading

mliu49 Jul 16, 2019

mjohnson541 Jul 23, 2019

mliu49 Jul 16, 2019

mjohnson541 Jul 23, 2019

mliu49 Jul 16, 2019

mjohnson541 Jul 23, 2019

mliu49 Jul 16, 2019

mjohnson541 Jul 23, 2019

mjohnson541 commented Aug 7, 2019

mliu49 left a comment

Automatic Tree Generation Part 4 #1486

Automatic Tree Generation Part 4 #1486

Conversation

mjohnson541 commented Oct 12, 2018

codecov bot commented Jan 20, 2019 • edited Loading

Codecov Report

mjohnson541 commented Jun 3, 2019

mjohnson541 commented Jul 12, 2019

mliu49 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjohnson541 Jul 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjohnson541 commented Aug 7, 2019

mliu49 left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 20, 2019 •

edited

Loading

mjohnson541 Jul 22, 2019 •

edited

Loading