Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can create too many distinct numeric values #153

Open
yoid2000 opened this issue Jan 29, 2025 · 0 comments
Open

Can create too many distinct numeric values #153

yoid2000 opened this issue Jan 29, 2025 · 0 comments

Comments

@yoid2000
Copy link
Contributor

In cases of continuous numeric data where nevertheless there are a "medium" number of distinct values, SynDiffix can end up creating substantially more distinct values than the original data.

What happens is that there are enough original distinct values that many of them have relatively low counts (10-20 say), and when combined with another column some of the values get suppressed. Then during microdata assignment, random values are assigned which are not original values, and more distinct values end up being created.

We need to do something whereby when the original data values are not suppressed, then we assign microdata only from the original values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant