Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about fingerprints of stereoisomers #72

Open
BJWiley233 opened this issue Jun 15, 2023 · 1 comment
Open

Question about fingerprints of stereoisomers #72

BJWiley233 opened this issue Jun 15, 2023 · 1 comment

Comments

@BJWiley233
Copy link

Hi,

I have question about calculating e3fp.fingerprint.metrics.tanimoto coefficients of stereoisomers. I have made 50 conformers for following molecules with hydrogens. There are some locations in the matrix created from call to tanimoto where the coefficient is 1.0. I am wondering why this makes sense as they are not the same molecule?

smi='C1C[C@@H]2CNCCN2C1'
mol = Chem.MolFromSmiles(smi, sanitize=True)
molh = Chem.AddHs(mol)
molh.SetProp('_Name', smi)
cmol = AllChem.EmbedMultipleConfs(molh, numConfs=50)
d=fprints_dict_from_mol(molh, bits=2048, first=50, stereo=True)[5]
db=FingerprintDatabase(fp_type=Fingerprint, name="TestDB", level=5)
db.add_fingerprints(d)

smi2='C1C[C@H]2CNCCN2C1'
mol2 = Chem.MolFromSmiles(smi2, sanitize=True)
molh2 = Chem.AddHs(mol2)
molh2.SetProp('_Name', smi2)
cmol2 = AllChem.EmbedMultipleConfs(molh2, numConfs=50)
d2=fprints_dict_from_mol(molh2, bits=2048, first=50, stereo=True)[5]
db2=FingerprintDatabase(fp_type=Fingerprint, name="TestDB", level=5)
db2.add_fingerprints(d2)
tan=tanimoto(db, db2)
np.where(tan==1)
>>> (array([ 0,  0, 13, 13, 14, 14, 16, 32, 32, 34]),
 array([30, 32, 30, 32, 30, 32, 40, 30, 32, 40]))
@sethaxen
Copy link
Collaborator

First the fingerprints are generated using a large number of bits, where a bit is 'on' if the 3D environment of an atom hashes to that bit (for details see the paper). To reduce the size of the fingerprint, it is repeatedly halved and ORed to get to the target size of 2048, which makes bit collision more common but still not extremely frequent. Tanimoto coefficient measures the similarity of the bits for this folded representation.

So it's not impossible for two molecules to end up with the same fingerprint even if their structures are different. To answer with more detail, I suggest visually comparing the two conformers from which the two matching fingerprints were generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants