-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checks for correct topology parameterization #155
Conversation
`Forcefield.apply` will now check to make sure parameters are found for all angles, proper dihedrals, and improper dihedrals. By default, if parameters are not found for all angles and proper dihedrals, the script will exit with an error. A warning will be raised if parameters are not found for all impropers. However, the behavior (error/warning) for each of these topology objects can be toggled by the user. Unit tests have been added to test this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks! Just a few code simplification suggestions.
Also just to double check, missing a bond will already yield an error?
foyer/forcefield.py
Outdated
"system impropers: {}, Parameterized impropers: {}" | ||
"".format(len(data.impropers), len(structure.impropers))) | ||
if assert_improper_params: | ||
raise Exception(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you factor these 3 checks out into a consistent "error or warn" function? Lots of repeated code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, should help to condense this
foyer/forcefield.py
Outdated
@@ -589,7 +633,7 @@ def createSystem(self, topology, atomtype=True, use_residue_map=True, | |||
# Execute scripts found in the XML files. | |||
for script in self._scripts: | |||
exec(script, locals()) | |||
return sys | |||
return sys, data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is there any other way to reach this data? Perhaps setting it as an attribute? This function is called createSystem
so it's somewhat unintuitive that it would return other things as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something did feel a little off when I took this approach... data
is only available from within createSystem
, but yeah I could just throw a line like self._system_data = data
or something like that before the return.
foyer/tests/test_forcefield.py
Outdated
with pytest.raises(Exception): | ||
ethane = oplsaa_dihedral_typo.apply(ethane) | ||
with pytest.warns(UserWarning): | ||
ethane = oplsaa_dihedral_typo.apply(ethane, assert_dihedral_params=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Could you factor out this test as well and just @parameterize
the function with the forcefield file name and the assert flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that should make the tests a bit cleaner
Will work on getting those changes made. And yes, unparameterized/missing bonds already yield an error, although the message itself is not particularly helpful (something about an empty array or similar, so not mentioning the missing bonds explicitly), but occurs in OpenMM-land and would be difficult for us to catch since that's where the parameterization happens. |
Also, it looks like there are several molecules in the TraPPE and OPLS tests where parameters must not be defined for all relevant angles or dihedrals as some of these tests are failing after hitting the checks I've put in. So this is something we'll need to look into... |
This is exactly the problem that I encountered when using foyer with an xml I built from the ground up: some dihedrals, including at least one crucial dihedral, weren't getting assigned due to nothing more than my typos. Nothing in my workflow was flagging it and it went unnoticed for several months and many, many GFlops of simulations. |
foyer/forcefield.py
Outdated
@@ -337,32 +353,24 @@ def apply(self, topology, references_file=None, assert_angle_params=True, | |||
have parameters assigned. OpenMM will generate an error if bond parameters | |||
are not assigned. | |||
''' | |||
data = self._SystemData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
foyer/forcefield.py
Outdated
data.atoms = list(topology.atoms()) | ||
for atom in data.atoms: | ||
data.excludeAtomWith.append([]) | ||
self._SystemData.atoms = list(topology.atoms()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you limit this to just adding a single line here which sets data = self._SystemData()
in place of the existing data = app.ForceField._SystemData()
?
I need to look into the ParmEd source a bit deeper, but it looks like even when improper dihedrals are present |
Discovered a problem with saving topological data from |
It turns out that several of atomtypes we have SMARTS strings defined for in the OPLS force field XML did not have the
Many of these molecules are functionalized alkanes, where parameters are missing for an angle or dihedral that involves the α, β, and γ carbons. In these cases the parameters simply do not exist in the OPLS forcefield shipped with GROMACS (which we converted to this forcefield XML). Many of these missing dihedrals are C-CT-CT-CT dihedrals where the CT-CT-CT-CT dihedral parameters would probably be sufficient (as OPLS uses in many other cases), but we would leave it up to the user to make this decision and add these parameters themselves, keeping the XML shipped with Foyer the same as what is contained in GROMACS. I'm not sure there's really much else we can do. @justinGilmer and myself are working on a website for Foyer that will contain more comprehensive documentation than is available on Github, and we plan to fully document these limitations. |
I've removed secondary alcohols from our list of molecules successfully parameterized using our TraPPE forcefield XML, as these feature a dihedral functional form that we do not currently support. |
This sounds like it's worth understanding properly. Is it a specific type of improper that causes the problem? I can help debug if needed |
with pytest.raises(Exception): | ||
ethane = oplsaa_with_typo.apply(ethane) | ||
with pytest.warns(UserWarning): | ||
ethane = oplsaa_with_typo.apply(ethane, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Here's a ZIP file with a minimal example that reproduces the strange storage behavior for impropers which includes a dummy forcefield file, a PDB file for an ethene molecule (which should have two impropers), and a script that takes the PDB and atomtypes it. Here's the atomtyping script:
And here's output from inspection of
So the two impropers are correctly assigned, but they are stored within |
Okay so here is what I've found from my digging so far: The relevant function in the ParmEd source is load_topology, where an OpenMM
What I gather from this is that there is no way from just the data in a ParmEd |
Does parmed hash dihedral types based on their contents? That would be an easy way to avoid double counting. More generally, I can't think of a scenario where a forcefield would define the exact same dihedral in two different ways. I would expect that to be an incorrectly constructed FF file. Ideally we would of course not allow this to silently slip by so a double counting check would be helpful but perhaps not absolutely necessary. |
Right, I no longer think double counting will be a problem. Before looking into the source I was thinking perhaps it was possible for an improper dihedral to exist within both |
👍 While it's perfectly possible, I don't think I've seen a forcefield implement impropers using this functional form. |
…o topology-warn
Currently if parameters are not found for certain topology objects (angles, proper dihedrals, and improper dihedrals) Foyer will continue through without notifying the user. This can lead to serious problems as the incorrect parameterization may not be caught until after production runs have been performed on systems parameterized by Foyer. In many cases, parameters are not found due to typos in the user's force field XML file, although there are cases (e.g. coarse-grained systems) where dihedral forces may not be desired for atoms separated by three bonds. I've implemented a fix here that provides either a warning or error to the user if the systems is found to be incorrectly parameterized. By default, an error is raised if angles and proper dihedrals are not parameterized correctly and a warning is raised if impropers are not parameterized correctly. This behavior can be overridden by the user.