Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DomainAwareDataset str repr edge case handling #49

Merged
merged 6 commits into from
Jan 17, 2024

Conversation

YanisLalou
Copy link
Collaborator

Issue #12

  • Handling the str output of a DomainAwareDataset if the number of domin_names_ is too large
  • Print the number of datapoints in the dataset

…in_names_ is too large + Print the number of datapoints in the dataset
@YanisLalou
Copy link
Collaborator Author

The goal is to have a similar string representation as torchvision for datasets:
Dataset CIFAR10 Number of datapoints: 50000 Root location: datasets/CIFAR10 Split: Train

However rn in the DomainAwareDataset object we don't have the Dataset name + Root location


def __repr__(self) -> str:
return self.__str__()
head = self.__str__()
body = [f"Number of datapoints: {sum(len(tup[0]) for tup in self.domains_)}"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shoudl also print teh numer of domains and call it total sizes (giving both total n and d)

output = "\n".join([head] + body)
return output

def get_domain_representation(self, max_domains=5, max_length=50):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming the function to _get_domain_representation to emphasize its role as an internal function. This change will help clarify that it should not be considered part of the class's public API.

Copy link
Collaborator

@rflamary rflamary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK but you need to do some testing (add at leats one test) with different types of datasets and check that nothing breaks

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (1a1de71) 83.98% compared to head (865aac2) 84.16%.

Files Patch % Lines
skada/datasets/_base.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #49      +/-   ##
==========================================
+ Coverage   83.98%   84.16%   +0.18%     
==========================================
  Files          35       35              
  Lines        2135     2160      +25     
==========================================
+ Hits         1793     1818      +25     
  Misses        342      342              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@YanisLalou YanisLalou added the enhancement New feature or request label Jan 12, 2024
@YanisLalou YanisLalou requested a review from kachayev January 12, 2024 13:38
@YanisLalou YanisLalou self-assigned this Jan 12, 2024
@kachayev
Copy link
Collaborator

@YanisLalou Flake8 is not happy here as well.

@rflamary rflamary merged commit 803ba26 into scikit-adaptation:main Jan 17, 2024
4 checks passed
@YanisLalou YanisLalou deleted the dataset_repr_branch branch January 18, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants