Matching keys lengths and new metrics #358

IdoAmosIBM · 2024-06-25T13:58:33Z

Added 2 new metrics: mathews correlation coef. and balanced accuracy.
Added a new functionality to default collate allowing to match the length of some keys in the batch dict to other keys - example: for encoder model, matching the length of the labels to the length of the input after crop padding is applied.

merged latest changes from main

mosheraboh

Thanks Ido!
Few comments and questions inline.

mosheraboh · 2024-06-26T07:22:06Z

fuse/eval/metrics/metrics_common.py

@@ -160,7 +160,10 @@ def collect(self, batch: Dict) -> None:
            batch_to_collect = {}

            for name, key in self._keys_to_collect.items():
-                value = batch[key]
+                try:


Was it just for debugging?
Or would you like to print a more informative message and then reraise the exception?

mosheraboh · 2024-06-26T07:52:41Z

fuse/eval/metrics/classification/metrics_classification_common.py

+
+    def mcc_wrapper(
+        self,
+        pred: Optional[str] = None,


is it a string or a list of numpy array?
Are you expecting class predictions or scores after sofrmax?

mosheraboh · 2024-06-26T07:54:51Z

fuse/data/utils/collates.py

@@ -247,3 +255,22 @@ def crop_padding(
        cropped_sequences = [ids[:min_length] for ids in input_ids_list]
        batched_sequences = torch.stack(cropped_sequences, dim=0)
        return batched_sequences
+
+    @staticmethod
+    def match_length_to_target_key(


Cropping?
What happens when the target_key is shorter then keys_to_match?

this will raise an error, maybe I'll change the name length to crop_length_to_target_key? The motivation is when the encoder and labels should have the same length and pad cropping was applied to the encoder input - so we want to match the labels.

I assume you mean decoder input?
I asked Michal to solve it in a different way.
(labels and decoder should always have the same length).

sorry, I do mean encoder, this is for the case of an encoder-only model

mosheraboh · 2024-06-26T07:58:10Z

fuse/data/utils/collates.py

@@ -43,6 +43,7 @@ def __init__(
        keep_keys: Sequence[str] = tuple(),
        raise_error_key_missing: bool = True,
        special_handlers_keys: Optional[Dict[str, Callable]] = None,
+        post_collate_special_handlers_keys: Optional[List[Callable]] = None,


Why not to use special_handlers_keys instead?
Can you give an example of a use case?

a use case is match_target_key_to_length from below, we can't use special handlers because it handles one element in the batch dict each time, we need to match one key to another after the special handler has been applied.

Why is it a list and not a single callable?

to allow multiple use cases, e.g. aligning a set of keys to key A and a different set of keys to key B.

mosheraboh · 2024-06-26T14:45:37Z

fuse/eval/metrics/classification/metrics_classification_common.py

+
+    def balanced_acc_wrapper(
+        self,
+        pred: Union[List, np.ndarray],


It's still not clear if you expect here sofrmax scores (we call pred) or class predictions (we call cls_pred).

oh I understand, I expect class predictions I used the same names as in MetricAccuracy. Do you want me to change to cls_pred?

mosheraboh

Thanks Ido!
I still have few comments.
In order not to block you, I'm ok with merging and potentially make some modifications in a new PR.

SagiPolaczek

LGTM!
Thanks!

Ido Amos [email protected] and others added 4 commits June 2, 2024 07:24

initial commit

27a0095

added post batch collate ops to colate funciton

9040cf1

addded mathews corr coef and balanced acc metrics

a43fca0

Merge branch 'master' into collate-match-pad-to-key

7d2ebbc

merged latest changes from main

IdoAmosIBM requested review from SagiPolaczek and mosheraboh June 25, 2024 13:58

mosheraboh reviewed Jun 26, 2024

View reviewed changes

fixed type annotation errors and formatting

9091794

mosheraboh reviewed Jun 26, 2024

View reviewed changes

mosheraboh previously approved these changes Jun 26, 2024

View reviewed changes

formatting

cb6607f

IdoAmosIBM dismissed mosheraboh’s stale review via cb6607f June 27, 2024 09:07

SagiPolaczek marked this pull request as ready for review June 30, 2024 08:37

added documentation for metrics

8aed388

SagiPolaczek approved these changes Jun 30, 2024

View reviewed changes

SagiPolaczek merged commit d690708 into master Jun 30, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matching keys lengths and new metrics #358

Matching keys lengths and new metrics #358

IdoAmosIBM commented Jun 25, 2024

mosheraboh left a comment

mosheraboh Jun 26, 2024

mosheraboh Jun 26, 2024

mosheraboh Jun 26, 2024

IdoAmosIBM Jun 26, 2024

mosheraboh Jun 26, 2024

IdoAmosIBM Jun 26, 2024

mosheraboh Jun 26, 2024

IdoAmosIBM Jun 26, 2024

mosheraboh Jun 26, 2024

IdoAmosIBM Jun 26, 2024

mosheraboh Jun 26, 2024

IdoAmosIBM Jun 26, 2024

mosheraboh left a comment

SagiPolaczek left a comment

Matching keys lengths and new metrics #358

Matching keys lengths and new metrics #358

Conversation

IdoAmosIBM commented Jun 25, 2024

mosheraboh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mosheraboh left a comment

Choose a reason for hiding this comment

SagiPolaczek left a comment

Choose a reason for hiding this comment