Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename matchtype dont use to dont_use #328

Merged
merged 1 commit into from
Jun 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion client/src/main/java/zingg/client/MatchType.java
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ public enum MatchType implements Serializable {
NULL_OR_BLANK("NULL_OR_BLANK"),
ONLY_ALPHABETS_EXACT("ONLY_ALPHABETS_EXACT"),
ONLY_ALPHABETS_FUZZY("ONLY_ALPHABETS_FUZZY"),
DONT_USE("DONT USE");
DONT_USE("DONT_USE");

private String value;
private static Map<String, MatchType> types;
Expand Down
2 changes: 1 addition & 1 deletion core/src/test/resources/testFebrl/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"fieldDefinition":[
{
"fieldName" : "id",
"matchType" : "dont use",
"matchType" : "dont_use",
"fields" : "fname",
"dataType": "\"string\""
},
Expand Down
2 changes: 1 addition & 1 deletion core/src/test/resources/testPeekModel/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"fieldDefinition":[
{
"fieldName" : "id",
"matchType" : "dont use",
"matchType" : "dont_use",
"fields" : "fname",
"dataType": "\"string\""
},
Expand Down
6 changes: 3 additions & 3 deletions docs/setup/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,13 @@ Broad matches with typos, abbreviations and other variations.
#### EXACT
Less tolerant with variations, but would still match inexact strings to some degree. Preferable for country codes, pin codes and other categorical variables where you expect less variations

#### "DONT USE"
#### "DONT_USE"
Name says it :-) Appears in the output but no computation is done on these. Helpful for fields like ids which are required in the output.


````json
"fieldDefinition" : [ {
"matchType" : "DONT USE",
"matchType" : "DONT_USE",
"fieldName" : "id",
"fields" : "id"
},
Expand All @@ -98,7 +98,7 @@ Number of Spark partitions over which the input data is distributed. Keep it equ
Fraction of the data to be used for training the models. Adjust it between 0.0001 and 0.1 to keep the sample size small enough so that it finds enough edge cases fast. If the size is bigger, the findTrainingData job will spend more time combing through samples. If the size is too small, Zingg may not find the right edge cases.

### showConcise
When this flag is set to true, during [Label](./training/label.md) and [updateLabel](../updatingLabels.md), only those fields are displayed on console which help build the model. In other words, fields that have matchType as "DONT USE", are not displayed to the user. Default is false.
When this flag is set to true, during [Label](./training/label.md) and [updateLabel](../updatingLabels.md), only those fields are displayed on console which help build the model. In other words, fields that have matchType as "DONT_USE", are not displayed to the user. Default is false.

### collectMetrics
Application captures a few measurements for runtime metrics such as *no. of data records, no. of features, running phase* and a few more.
Expand Down
2 changes: 1 addition & 1 deletion examples/febrl120k/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"fieldDefinition":[
{
"fieldName" : "id",
"matchType" : "dont use",
"matchType" : "dont_use",
"fields" : "id",
"dataType": "\"string\""
},
Expand Down