Indexing error makes many fingerprint kernel implementations incorrect #74

AustinT · 2025-01-12T15:01:18Z

In a variety of different fingerprint kernels, x1 and x2 are compared with something like

x1[-1] <something> x2[-1]

which only compares the last element in the kernel matrix, and does not do a pairwise comparison. This causes many kernels to not implement the function claimed in their docstring. Specifically, I think the following kernels are wrong:

Braun-Blanquet: the max(x1, x2) in the denominator is not computed correctly here:

gauche/gauche/kernels/fingerprint_kernels/braun_blanquet_kernel.py

Line 37 in 3380401

denom = torch.max(x1_norm[-1], x2_norm[-1])
Faith: number of matching 0s (d) is only computed with the last element:

gauche/gauche/kernels/fingerprint_kernels/faith_kernel.py

Line 35 in 3380401

d = torch.sum((x1[-1] == 0) & (x2[-1] == 0), dim=-1, keepdims=True)
Intersection kernel: same problem as Faith
Rand: same problem as Faith
Rogers-Tanimoto: same problem as Faith

Here is a minimal example showing the problem:

import torch
from gauche.kernels.fingerprint_kernels.rand_kernel import batch_rand_sim

t = torch.as_tensor([[1, 1, 0., 0.], [1., 0., 1., 0.]])
print(batch_rand_sim(t, t)[0, 1])
print(batch_rand_sim(t[:1], t[1:]))

Output:

tensor(0.7500, dtype=torch.float64)
tensor([[0.5000]], dtype=torch.float64)

What this shows: if I compute a whole kernel matrix and take element [0, -1], it gives a different result than directly computing element [0, -1]. This is because d is computed just using the last element in each batch, rather than computed separately for each pair.

I think this was not caught in the tests because they just tested 1xD matrices. I suspect that a lot of the "batch" similarities also don't work for batches of matrices.

If you can't fix this promptly, I suggest taking these methods out of Gauche and re-releasing. You can add corrected methods back in later.

The text was updated successfully, but these errors were encountered:

AustinT · 2025-01-12T15:01:36Z

I think this was also introduced in #60

Ryan-Rhys · 2025-01-16T17:56:30Z

Thanks for the super detailed example! Will try to get a fix in within the next week! Will add you as a PR reviewer if that's OK?

AustinT · 2025-01-19T17:08:24Z

Thanks for the super detailed example! Will try to get a fix in within the next week! Will add you as a PR reviewer if that's OK?

Yes, happy to be a reviewer 🙂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing error makes many fingerprint kernel implementations incorrect #74

Indexing error makes many fingerprint kernel implementations incorrect #74

AustinT commented Jan 12, 2025

AustinT commented Jan 12, 2025

Ryan-Rhys commented Jan 16, 2025

AustinT commented Jan 19, 2025

Indexing error makes many fingerprint kernel implementations incorrect #74

Indexing error makes many fingerprint kernel implementations incorrect #74

Comments

AustinT commented Jan 12, 2025

AustinT commented Jan 12, 2025

Ryan-Rhys commented Jan 16, 2025

AustinT commented Jan 19, 2025