-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception handling in PipeUtil::read() #229
Conversation
@@ -41,12 +41,12 @@ public void execute() throws ZinggClientException { | |||
|
|||
public void processRecordsCli(Dataset<Row> lines) throws ZinggClientException { | |||
LOG.info("Processing Records for CLI updateLabelling"); | |||
getMarkedRecordsStat(lines); | |||
printMarkedRecordsStat(); | |||
if (lines == null || lines.count() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to have a clean flow without returns - please put in if lines != null as discussed earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed Return.
Earlier thought that following code block was already very big. It should not be cluttered further.
@@ -75,11 +75,11 @@ protected void getMarkedRecordsStat(Dataset<Row> markedRecords) { | |||
|
|||
public void processRecordsCli(Dataset<Row> lines) throws ZinggClientException { | |||
LOG.info("Processing Records for CLI Labelling"); | |||
printMarkedRecordsStat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see earlier comment about returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed return statement.
@@ -51,7 +51,7 @@ public Dataset<Row> getUnmarkedRecords() throws ZinggClientException { | |||
unmarkedRecords = PipeUtil.read(spark, false, false, PipeUtil.getTrainingDataUnmarkedPipe(args)); | |||
try { | |||
markedRecords = PipeUtil.read(spark, false, false, PipeUtil.getTrainingDataMarkedPipe(args)); | |||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we still need to catch the other expcetions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read() throws only ZinggClientException.
Earlier, why didnot read() mandate to ctch Exception everywhere it is called. Is there any difference in Exception and ZinggClientException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes ZCE extends from throwable
@@ -41,11 +41,12 @@ public Matcher() { | |||
setZinggOptions(ZinggOptions.MATCH); | |||
} | |||
|
|||
protected Dataset<Row> getTestData() { | |||
return PipeUtil.read(spark, true, args.getNumPartitions(), true, args.getData()); | |||
protected Dataset<Row> getTestData() throws ZinggClientException{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To satisfy the requirement of catching or throwing the ZinggClientException. All the intermediate functions are mandating to add this.
In this regard, are types Exception and ZinggClientException different?
input = reader.load(); | ||
} | ||
if (addSource) { | ||
input = input.withColumn(ColName.SOURCE_COL, functions.lit(p.getName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we throw an exception here, how will we handle cases where we dont want exceptions - eg findtrainingData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't throwing any new exception here. Catching one and throwing another.
So if an exception is already thrown (that is indeed done in spark.read() in this case), it is handled somewhere (first catch block). So with this change, there will not be any impact in program behavior. Just things made explicit to have the exception handled by the caller. the specific type of the exception. Else, we are loosing the error message in the process.
Still, we have to see which one to use 'Exception' or 'ZinggClientException' in the flow.
We are already handling Exception in top level classes. Why are we loosing ex.getMessage(). I note that intermediate function calls do not h ave explicit exception specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense. but the message here has to be that we could not read the and also wrap the actual exception thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used wrapped exception.
getMarkedRecordsStat(lines); | ||
printMarkedRecordsStat(); | ||
if (lines == null || lines.count() == 0) { | ||
LOG.info("There is no marked record for updating. Please run findTrainingData/label jobs to generate training data."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to still print this out in else, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. made changes in Labeller and UpdateLabeller.
Pipe's props is initialized with blank hashmap. makes getProps() never return null
Here is a replaced Fraft PR to handle PipeUtil::Read() exception handling.
Please provide your inputs on the approach.
Note: Though major flows have been tested. Still more testing required.