Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-40088] Be more robust when exporting collections #89

Merged
merged 2 commits into from
Jan 18, 2017

Conversation

jglick
Copy link
Member

@jglick jglick commented Dec 6, 2016

We were already catching NotExportableException, a common problem, but not other miscellaneous exceptions.

JENKINS-40083 should not have caused the whole API call to fail. The test output before the fix:

java.io.IOException: Failed to write generic
	at org.kohsuke.stapler.export.Property.safeGetValue(Property.java:151)
	at org.kohsuke.stapler.export.Property.writeTo(Property.java:126)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:227)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:279)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:222)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:168)
	at org.kohsuke.stapler.export.Property.writeTo(Property.java:139)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:227)
	at org.kohsuke.stapler.export.Model.writeTo(Model.java:198)
	at org.kohsuke.stapler.export.Model.writeTo(Model.java:218)
	at org.kohsuke.stapler.export.Model.writeTo(Model.java:182)
	at org.kohsuke.stapler.export.JSONDataWriterTest.serialize(JSONDataWriterTest.java:19)
	at org.kohsuke.stapler.export.JSONDataWriterTest.exceptionHandling(JSONDataWriterTest.java:67)
	at …
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.kohsuke.stapler.export.MethodProperty.getValue(MethodProperty.java:66)
	at org.kohsuke.stapler.export.Property.safeGetValue(Property.java:145)
	... 36 more
Caused by: java.lang.RuntimeException: oops
	at org.kohsuke.stapler.export.JSONDataWriterTest$Broken.generic(JSONDataWriterTest.java:70)
	... 42 more

Now it logs the same, but skips the broken element.

@reviewbybees

@ghost
Copy link

ghost commented Dec 6, 2016

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

buffer.finished();
} catch (IOException x) {
if (x.getCause() instanceof InvocationTargetException) {
LOGGER.log(Level.WARNING, "skipping export of " + value, x);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like extreme log spam

Copy link
Member

@daniel-beck daniel-beck Dec 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the kind of warning about something that's broken but the admin can do nothing about. As with past occurrences, this should be less than INFO unless in debug mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following your complaint. We routinely log unexpected exceptions at WARNING, such as here to take just one random example. The admin can uninstall the responsible plugin, or file a bug, or patch it…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several logged exceptions per API request seems a bit much to me, especially given how common the problem of unexportable data seems to me. I'd expect the frequency to be about as annoying as what #48 fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it. #48 downgraded a message printed after upgrading some components against others which were still using what used to be a slightly unusual but otherwise supported construction.

The case fixed here is limited to genuine plugin bugs, generally ones which are trivial to diagnose and assign when encountered; and either trivial to fix completely, or at least ameliorate by catching in the @Exported method.

And again we already have tons of code in Jenkins core which catches a broad range of exceptions from extensions or other plugin code, logs it, and continues, including code paths which might be encountered at a high frequency depending on usage patterns. Logging these things at FINE will just mean that the bugs are never fixed. If there is a persistent issue with logs filling up with hundreds of more or less identical stack traces due to buggy plugins, that is best handled by the logging system, for example by printing a message saying that a given Logger has produced too much noise and it is going to be suppressed for the rest of the session.

@jglick
Copy link
Member Author

jglick commented Dec 6, 2016

Build failure on one server is just a failure in the Jenkinsfile; CC @abayer (apparently incompatible change).

@i386
Copy link
Contributor

i386 commented Dec 7, 2016

@jglick if getting the property value throws an exception we probably don't want the object it belongs to at all. For example, a missing property in an actions rest format might introduce unexpected behaviour.

Take this action:

@ExportedBean
class MyAction {
  @Exported
  public void exploding() { throw new RuntimeException("oops"); }
  
  @Exported
  public String name() { return "Bob"; }
}

Would serialise as:

{
  actions: [
      {
          "_class": "SomeOtherAction",
          "property": "value"
      },
      {
          "_class": "MyAction",
          "name": "Bob" 
       }
  ]
}

Blue Ocean plugins see a a valid action data for MyAction by virtue of it being included in the list of actions. If the Action isn't present, then extensions inspecting it will not be invoked.

It would be better if we received a response that excluded the action entirely like:

{
  actions: [
      {
          "_class": "SomeOtherAction"
          "property": "value"
      }
  ]
}

In addition, we want to discriminate between a Action serialization failure and other types of model failure. It's fine for an Action not to serialise (they are optional) but if a Blue Ocean model object failed to serialize we want to translate that into a 500 error on the rest response. This is why we opted for a the ExportInterceptor described in the Google doc - we can check the type of model the property belongs to, throw NotExportedException to exclude that model from the serialized response or return the value if the property can be determined.

@michaelneale
Copy link
Member

BTW PR to fix the Jenkinsfile: #90 (side issue)

@jglick
Copy link
Member Author

jglick commented Dec 7, 2016

It would be better if we received a response that excluded the action entirely

That is exactly what this PR does. Look at the test.

@jglick
Copy link
Member Author

jglick commented Dec 7, 2016

if a Blue Ocean model object failed to serialize

Then you have bigger problems. You will get a clear error in the log to fix, and the UI will get the rest of the information.

Anyway note that this patch only excludes elements of collections (including array elements, or map keys & values). A single reference-typed field of some larger object which cannot be exported will cause the larger object to not be exported either, so if something required via a reference chain of non-collection fields all the way to the root throws an exception, you will still get a failed response.

@rsandell
Copy link
Member

rsandell commented Dec 7, 2016

🐝

@michaelneale
Copy link
Member

It sure if the scope of this should be all collections all the time though? Seems the equivalent of sweeping everything under the rug

@michaelneale
Copy link
Member

🐛 as this means in plenty of places you may have an NPE when showing a list, and a user sees a partial set of data - not knowing something is missing. The scope of this protection needs to be narrowed.

@i386
Copy link
Contributor

i386 commented Dec 7, 2016

@jglick I've written some test cases for this branch in a fork at https://github.com/i386/stapler/blob/9d507bd01887188309ff5ce73b88412e599c2611/core/src/test/java/org/kohsuke/stapler/export/RobustnessTest.java#L93

The behavior this PR does not cover is demonstrated in the test testCollectionOfPipelinesShouldFailIfASingleOneWasThrowingExceptionsButNotWithinAnAction

There are three jobs on my server. One of them will fail in our Pipeline model. We treat those failures very differently - if something is crashing here something is very wrong with Blue Ocean and we'd like to handle that error. If there were actions that failed to serialize related to those Pipeline models then they are excluded.

🐛 With this PR, if I make a request like /blue/rest/organizations/jenkins/pipelines/ to return a list of jobs, I will receive 2 jobs. What I want to see is a 500 error.

@i386
Copy link
Contributor

i386 commented Dec 7, 2016

🐛 this PR also does not cover the use case described in the document where we want to direct users to file bugs in the right place. Because we are serializing Jenkins model as a tree and plugins contribute to this model, we effectively have a dependency on every Jenkins plugin. Asking the engineers to triage 100s of plugin crashes entering from the REST API and move to the right JIRA components isn't scalable for the Blue Ocean team.

If we had a ErrorListener that allowed us to get access to the Class of model serialised and the Throwable we'd be able to do this reporting on our side easily.

@i386
Copy link
Contributor

i386 commented Dec 7, 2016

Hmm, talking this over with @michaelneale perhaps we should be returning the serialization error somehow. Perhaps we don't return any properties on the model apart from _class and a new property called _error?

That would allow us to display a message for crashed plugins in the UI if they require that actions data.

@jglick
Copy link
Member Author

jglick commented Dec 8, 2016

It sure if the scope of this should be all collections all the time though? Seems the equivalent of sweeping everything under the rug

That is what exporting already did for missing @ExportedBean annotations. This change merely extends the existing protection to other exceptions.

a user sees a partial set of data - not knowing something is missing

It is the responsibility of an administrator to check for errors in the log file and decide how to proceed.

if something is crashing here something is very wrong with Blue Ocean and we'd like to handle that error. If there were actions that failed to serialize

Broken is broken—fix the plugin with the bug. That plugin might happen to have been written specifically “for” Blue Ocean, or it might only be incidentally participating, or somewhere in between. For Blue Ocean to be the default UI for Jenkins, with broad-ranging integrations, much of the current prototype code must be properly factored out into the logical domain plugins such as junit, and the distinction becomes meaningless.

we effectively have a dependency on every Jenkins plugin

That is the case anyway, until Jenkins 17.0 is redesigned as a federation of microservices.

Asking the engineers to triage 100s of plugin crashes entering from the REST API and move to the right JIRA components isn't scalable for the Blue Ocean team.

Triaging poorly filed incoming bug reports is hard work. What makes the Blue Ocean team special in this regard? If you want to pay attention to bug reports at all, you need to spend some time filtering through noise. Reassigning a bug report based on a plugin name clearly visible at the top of a stack trace is the easiest case you will encounter.

(BTW the NetBeans project has some automated tools to collect stack traces automatically from user installations, upload them with contextual information, search for duplicates using fuzzy matches, create collated lists of reported errors, and ultimately automatically file bug reports for commonly encountered errors, after trying to guess a component based on the packages seen in the stack trace. Perhaps Jenkins needs a similar system.)

Anyway so far I have heard of one such crash, so it seems you are agonizing over a hypothetical problem.

perhaps we should be returning the serialization error somehow. Perhaps we don't return any properties on the model apart from _class and a new property called _error?

That would potentially break existing clients not expecting such a format. If there is actually a client which could use such information meaningfully, a different format clearly distinct from actual objects could be emitted upon request (for example via a query string parameter). But anyway it is undesirable to have special-case error reporting in certain clients. Runtime errors caught in core/plugin code should be reported and collected in a uniform manner for any Jenkins functionality.

@daniel-beck
Copy link
Member

Maybe a new annotation, or parameter to @Exported could control how Stapler handles a failure to serialize that field? Ignore by default, do… something else otherwise? Then any 'critical' fields can be annotated as such.

@i386
Copy link
Contributor

i386 commented Dec 8, 2016

@jglick in Jenkins today, if an Action crashes the stack trace is reported on the UI and the user can see that there was an error. With your solution, the user wouldn't see anything and we are devolving away from how Jenkins does error reporting. So that fails your own assertion here that we should be reporting errors (making a case for _error or something like it). At least in Jenkins, the whole UI won't become invisible if one Action of 100s installed via plugins throws an exception.

Reassigning a bug report based on a plugin name clearly visible at the top of a stack trace is the easiest case you will encounter.

This kind of failure is completely ambiguous right now and you'd have to be a Jenkins Developer to figure this out. Take JENKINS-40083 as an example. The last three lines being the actual issue. Is it Jenkins, Stapler, Blue Ocean or Analysis Core? You'd have to be a Jenkins developer to figure this one out.

If Blue Ocean touches hundreds of plugins through its REST API we absolutely, as a hard requirement, need to notify people of which component failed. Blue Ocean's aim is to build a better user experience for Jenkins and that means we have to handle failure better too. Both of these is what makes Blue Ocean different.

Example

java.io.IOException: Failed to write lastFinishedBuild
	at org.kohsuke.stapler.export.Property.safeGetValue(Property.java:151)
	at org.kohsuke.stapler.export.Property.writeTo(Property.java:126)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:227)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:223)

snip 10 or so lines

	at org.kohsuke.stapler.ResponseImpl.serveExposedBean(ResponseImpl.java:273)
	at hudson.model.Api.doJson(Api.java:208)
	at io.jenkins.blueocean.rest.pageable.PagedResponse$Processor$1.generateResponse(PagedResponse.java:63)
	at org.kohsuke.stapler.HttpResponseRenderer$Default.handleHttpResponse(HttpResponseRenderer.java:124)
	at org.kohsuke.stapler.HttpResponseRenderer$Default.generateResponse(HttpResponseRenderer.java:69)

snip 20+ lines

Caused by: java.lang.ClassCastException: org.jenkinsci.plugins.workflow.job.WorkflowRun cannot be cast to hudson.model.AbstractBuild
	at hudson.plugins.analysis.core.AbstractProjectAction.getLastFinishedBuild(AbstractProjectAction.java:486)
	... 99 more

That is the case anyway, until Jenkins 17.0 is redesigned as a federation of microservices.

What I meant was that we are loosely coupled to those plugins via REST. Microservices wouldn't help here either.

Anyway so far I have heard of one such crash, so it seems you are agonizing over a hypothetical problem.

I've seen quite a few of these problems reported (up to three now and I don't know how many unreported - likely a lot given the variations in plugins). I am worried about how many I am going to see when 1.0 ships. If I can easily remove this class of problems with some small changes to Stapler that drops the risk of misattribution down quite a bit - then its no longer a concern. For example, yesterday a user of Blue Ocean updated to the latest plugins and received the broken scm-api release - this was attributed to Blue Ocean. It happens, the fear and risk is real. I am interested in these errors but I do not want to take the blame for them 100%. I do not want to spend time uselessly triaging tickets if there is a better way.

I am not investing any of the CloudBees Blue Ocean team time building an exception reporting system - great idea, we should have one, but Blue Ocean has other fish to fry.

That would potentially break existing clients not expecting such a format. If there is actually a client which could use such information meaningfully

We would use _class to report a reason why the extension failed to load in the Blue Ocean UI.

To sum it up - perhaps Blue Ocean should just fork Stapler's export package, include it in our REST API module (have it respect ExportedBean and Exported annotation behaviour, etc) and call it a day. I'd like to spend less time debating about the requirements of our plugin and more about how we can solve them and move on.

@i386
Copy link
Contributor

i386 commented Dec 8, 2016

@daniel-beck that could work nicely.

@jglick
Copy link
Member Author

jglick commented Dec 12, 2016

Maybe a new annotation, or parameter to @Exported could control how Stapler handles a failure to serialize that field?

Along the lines of XStream2.addCriticalField perhaps. In that case, XStream serialization in Jenkins is even more lax than Stapler exporting—any field of an object which cannot be safely deserialized is simply left null (after the error is reported to admins). That needed to be amended in certain cases for security reasons, but there are other cases where we would like to treat a reference-valued field as mandatory.

With your solution, the user wouldn't see anything and we are devolving away from how Jenkins does error reporting.

How so? Most errors in random code in Jenkins are caught and logged without any special display to the user.

At least in Jenkins, the whole UI won't become invisible if one Action of 100s installed via plugins throws an exception.

Right…and that is exactly what this patch extends to Stapler exporting. I am not following your argument here.

you'd have to be a Jenkins Developer to figure this out

That is true of any bug report! There is a runtime exception being thrown from a class defined in Analysis Core, so unless there is a particular reason to argue otherwise, it is a bug in that plugin.

The last three lines being the actual issue.

It is true that the order of nested exceptions printed by default in Java is misleading. I would be happy to resume work on jenkinsci/jenkins#1485 to fix that.

what makes Blue Ocean different

We should rather improve error handling and reporting in Jenkins generally. Only a small proportion of errors are going to happen to have been thrown in the context of an HTTP request thread servicing the B.O. frontend.

I've seen quite a few of these problems reported (up to three now

So what are the other ones?

yesterday a user of Blue Ocean updated to the latest plugins and received the broken scm-api release - this was attributed to Blue Ocean. It happens

Yes, it happens to all features in Jenkins: bug reports without minimal analysis of root cause are filed in inappropriate places. Short of turning off public issue creation, the only solution is to have a dedicated team of triagers, so that developers of individual features are not bothered by spurious reports.

We would use _class to report a reason why the extension failed to load

This is the part I fear would be incompatible; we would need to pick a different format which is more clearly distinguished from normal elements.

perhaps Blue Ocean should just fork

No. We need to fix problems once and for all at a single source to avoid wasted effort and duplication.

@jglick
Copy link
Member Author

jglick commented Dec 15, 2016

BTW the CI failure:

WorkflowScript: 1: The 'postBuild' section has been renamed as of version 0.6. Use 'post' for all post-build actions. @ line 1, column 1.
   pipeline {
   ^

1 error

@michaelneale
Copy link
Member

michaelneale commented Jan 10, 2017

This still doesn't solve the issue, for example:
https://issues.jenkins-ci.org/browse/JENKINS-39647

What happens now:

  • They get an error, things don't load and see stack trace (in ticket)
  • API returns 500

What would happen with this PR in this state:

  • They don't see an error in UI, they now get a screen with no data loaded (ie API returns as if everything is ok). Error is logged only on server.
  • API returns as if everything is ok
  • Possibly worse, they get some rows of data, but don't realise until later points in time some is missing.

Is this satisfactory? it doesn't really solve the issue that sparked this other than the REST equivalent of catching all exceptions and logging and moving on. Sometimes that is ok, but not clear to me it is here?

In an app without arbitrary plugins changing behavior, this would be a clear dichotomy between returning an error always or continuing on after logging.

However with plugins this is more subtle. There are some things to defend against and some to report to the user.

@jglick jglick changed the title [JENKINS-40083] Be more robust when exporting collections [JENKINS-40088] Be more robust when exporting collections Jan 10, 2017
@jglick
Copy link
Member Author

jglick commented Jan 10, 2017

Retroactively linking this to JENKINS-40088 which seems like the most appropriate general tracking issue.

@jglick
Copy link
Member Author

jglick commented Jan 10, 2017

What would happen with this PR in this state: They don't see an error in UI, they now get a screen with no data loaded

Well, we do not know for sure unless someone knows how to reproduce JENKINS-39647. I think what you would see is that pretty much everything is fine—just the data from the broken plugin is omitted (perhaps the UI was not even using that bit of data anyway). It is hard to be sure since this stack trace

java.io.IOException: Failed to write defaultParameterValue
	at org.kohsuke.stapler.export.Property.safeGetValue(Property.java:151)
	at org.kohsuke.stapler.export.Property.writeTo(Property.java:126)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:227)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:223)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:223)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:279)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:222)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:168)
	at org.kohsuke.stapler.export.Property.writeTo(Property.java:139)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:227)
	at org.kohsuke.stapler.export.Property.writeTo(Property.java:135)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:227)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:223)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:279)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:222)
	at org.kohsuke.stapler.export.Property.writeValue(Property.java:168)
	at org.kohsuke.stapler.export.Property.writeTo(Property.java:139)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:227)
	at org.kohsuke.stapler.export.Model.writeNestedObjectTo(Model.java:223)
	at org.kohsuke.stapler.export.Model.writeTo(Model.java:198)
	at org.kohsuke.stapler.ResponseImpl.writeOne(ResponseImpl.java:285)
	at org.kohsuke.stapler.ResponseImpl.serveExposedBean(ResponseImpl.java:273)
	at hudson.model.Api.doJson(Api.java:211)

does not indicate which path in the REST tree corresponds to the problem. Presumably somewhere in your request (you are using ?tree or equivalent, right?) you are asking for default parameters of a job.

Note that the jenkins.err.log from that issue shows the same exception being thrown from a different web request, not an API export:

… hudson.ExpressionFactory2$JexlExpression evaluate
WARNING: Caught exception evaluating: it.defaultParameterValue in /job/…/build. Reason: java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
	at …
Caused by: java.lang.IllegalArgumentException: Illegal choice: 
	at jp.ikedam.jenkins.plugins.extensible_choice_parameter.ExtensibleChoiceParameterDefinition.createValueCommon(ExtensibleChoiceParameterDefinition.java:397)
	at jp.ikedam.jenkins.plugins.extensible_choice_parameter.ExtensibleChoiceParameterDefinition.createValue(ExtensibleChoiceParameterDefinition.java:430)
	at jp.ikedam.jenkins.plugins.extensible_choice_parameter.ExtensibleChoiceParameterDefinition.getDefaultParameterValue(ExtensibleChoiceParameterDefinition.java:450)
	... 156 more

meaning that with or without this PR, the job in question would be unusable without my jenkinsci/extensible-choice-parameter-plugin#27.

@jglick
Copy link
Member Author

jglick commented Jan 10, 2017

parameter to @Exported [to] control how Stapler handles a failure to serialize that field

That would be a natural enhancement to this PR. Blue Ocean could take advantage of it in a few months, once the Jenkins baseline is incremented to a version including the Stapler version with this fix. In the meantime you would get the default behavior for all fields: failure to write a single-valued attribute is treated as fatal; failure to export an element of a collection (or value of a map) is treated as nonfatal. With the parameter, you could fine-tune that behavior for particular fields:

@Exported(critical=false)
public Stuff getThing() {
    // export error on that Stuff will not block exporting of the rest of this object
    return thing;
}
@Exported(critical=true) {
public List<Stuff> getThingies() {
    // export error on any individual Stuff will cause this whole object to also fail
    return thingies;
}

Currently a failure in the getter itself (as opposed to writing of its return value) will always cause export of the whole object to fail IIUC; presumably writing critical=false should also suppress that.

@vivek
Copy link
Contributor

vivek commented Jan 10, 2017

As mentioned elsewhere in this PR, this PR fix would cause unintended issue for blueocean, where instead of HTTP 500, it will be returning 200. 🐛

@jglick
Copy link
Member Author

jglick commented Jan 10, 2017

instead of HTTP 500, it will be returning 200

Right…meaning that instead of the UI being totally broken, with a cryptic HTTP error message in the browser debug log if you happen to look for it, you will get a working UI, with some (usually irrelevant) fragment of data omitted, and a clearly reported error in the Jenkins log. That seems like an unconditional improvement to me.

If there is some client which really needs to retrieve exception texts (and again this seems like a really bad idea to me, for reasons I have explained but am tired of repeating), then it would suffice to recognize a query parameter in the request which would supersede the boolean critical annotation attribute, suppress server-side logging of errors, and replace all erroneous object fields with something like

<error>java.lang.IllegalStateException: oops
	at …
</error>

or

{"error":"java.lang.IllegalStateException: oops\n\tat …\n"}

@vivek
Copy link
Contributor

vivek commented Jan 10, 2017

Right…meaning that instead of the UI being totally broken, with a cryptic HTTP error message in the browser debug log if you happen to look for it, you will get a working UI, with some (usually irrelevant) fragment of data omitted

This is unusual for serialization framework to make unilateral decision on behalf of clients. It should let client decide whether to treat it as error vs success.

@jglick
Copy link
Member Author

jglick commented Jan 10, 2017

I am not sure what the “usual” serialization framework in this context is—Stapler exporting has always had a fixed error handling behavior—but at any rate, what do you think of the proposed pair of features? Would allow both model classes and clients to exert some control over behavior.

@vivek
Copy link
Contributor

vivek commented Jan 10, 2017

To give an example of 'usual' serialization framework: Jackson JSON libraries let you define your own custom serializers, where you control serialization of objects. http://wiki.fasterxml.com/JacksonHowToCustomSerializers.

I liked James proposal of ExportInterceptor on dev list. This is the kind of solution we are looking at where we control certain serialization behavior. https://docs.google.com/document/d/1AalzfYZ0HDfwtFMUNaw_okWMLaaQ0S5Q1o6L4taevL4/edit#heading=h.vr11cprbtd0

@michaelneale
Copy link
Member

michaelneale commented Jan 10, 2017

@jglick

Right…meaning that instead of the UI being totally broken, with a cryptic HTTP error message in the browser debug log if you happen to look for it, you will get a working UI, with some (usually irrelevant) fragment of data omitted, and a clearly reported error in the Jenkins log. That seems like an unconditional improvement to me.

If that was the case, that is ok, but I believe in this case, what is ommitted isn't irrelevant, it is a whole pipeline. Its a row of data missing without failure. What is irrelevant is stuff to do with the action, not item in the collection for that pipeline.

If it only omits the action json then it would be fine, but I don't think that is what this does if I hear people correctly.

and yes, we are well out of the bounds of "usual" behavior here no matter what.

@michaelneale
Copy link
Member

michaelneale commented Jan 11, 2017

@jglick some questions:

  1. with current state of this PR, if you have a collection that contains a nested collection, for example in blue ocean there is a list of pipelines, each pipeline has a list of actions:
    if one of the actions fails to serialise with the "non critical" behavior, will the whole pipeline item be excluded from the collection? or will the pipeline still be serialized and only the action item excluded from the nested collection?
    If it is the latter, that is good, that is what we need here to solve the main issue.

  2. getting a stapler change into jenkins is already a bunch of work, why not include the critical enhancement as part of this PR anyway, as it will take some time to make use of it? EDIT: the default behavior should be for exported things to be not critical of course.

Thoughts?

@jglick
Copy link
Member Author

jglick commented Jan 11, 2017

if one of the actions fails to serialise with the "non critical" behavior, will the whole pipeline item be excluded from the collection? or will the pipeline still be serialized and only the action item excluded from the nested collection?

Yes, that is what should happen, at least based on the simplified test case I included here. Of course you could verify the behavior in the context of one of the reported bugs affecting Blue Ocean.

why not include the critical enhancement as part of this PR anyway

Could certainly do that, if there is consensus it would actually be used and so the effort is justified. If the plan is to develop a custom serialization hook system anyway, then it would be redundant and confusing to define APIs that are not going to be used.

the default behavior should be for exported things to be not critical

I think that could be considered an incompatible change. Clients may be expecting that when they receive an exported structure, all defined non-nullable fields are present. (Whereas merely omitting an element from a collection is unlikely to violate any client expectations.) So my proposal was to keep the default behavior as critical for single-value-typed fields; the definer of the model could override that for particular fields (after verifying that it is OK for any existing clients to begin to see that field missing), or a client could request to receive explicit error objects instead.

@vivek
Copy link
Contributor

vivek commented Jan 11, 2017

Yes, that is what should happen, at least based on the simplified test case I included here. Of course you could verify the behavior in the context of one of the reported bugs affecting Blue Ocean.

This is great. I am going to try it with blueocean. I guess it needs building stapler first with this PR then build Jenkins with this version of stapler. My test going to do is:

  • Call BO /pipelines API with broken action, we should get all data minus the broken action.
  • Call BO /runs API with broken action, we should get all data minus the broken action.
  • I will also call pipeline and run detail API to validate broken action is skipped.

@michaelneale
Copy link
Member

@jglick

Yes, that is what should happen, at least based on the simplified test case I included here. Of course you could verify the behavior in the context of one of the reported bugs affecting Blue Ocean.

I wasn't sure which one you were replying to - the second sentence? (ie it will just exclude stuff from the nested collection?).

My comment on the default behavior of critical - I was just referring to collections (like it is in this PR) - not all fields, agree (sorry for confusion).

@jglick
Copy link
Member Author

jglick commented Jan 11, 2017

the second sentence? (ie it will just exclude stuff from the nested collection?)

Yes, sorry for being unclear.

@vivek
Copy link
Contributor

vivek commented Jan 11, 2017

Ok so I tested this PR with blueocean and it does the right thing, that is failing actions are skipped. However as discussed earlier this also has side effect that if for some reason there is error serializing a pipeline or any other object that blueocean owns, that object will be skipped and unless user reports issue we will never know things broke. So this brings us to using annotation or better interceptor for serialization of exported beans or something equivalent.

In any case, this is good improvement where failing actions won't result in to WSOD. 👍

My test case is in blueocean-plugin's branch validate-stapler-pr89, jenkinsci/blueocean-plugin@5b7383f.

@jglick
Copy link
Member Author

jglick commented Jan 12, 2017

Good news, so it seems like there are three options for this PR:

  1. Amend this PR with something along the lines of my proposal (critical attribute on @Exported, query parameter to request errors instead), and wait to merge until all that is approved.
  2. Amend this PR to include a procedural (rather than declarative) custom serialization API, again waiting to merge until that is done.
  3. Merge as is and try to get into trunk as quickly as possible to prevent WSOD, and do further refinements like those above as a follow-up PR (author TBD) to ensure that critical mistakes in Blue Ocean itself are visible in the UI and not just logs.

@vivek / @michaelneale your preferences?

@michaelneale
Copy link
Member

michaelneale commented Jan 12, 2017

Yes that is good news.

I think #3 is good for expediency, as it gets people out of trouble when ready as long as they happen to get on a new stapler.

#1 would be theoretically ideal (if it wasn't a lot of work to get it approved) as it would behave the same as #3 but have the ability for things to opt in to "critical" for collections?

So basically - #3 with #1 follow up in separate PR I think we are ok with, now that vivek has verified it.

EDIT: @vivek thinks option 3, and wants to look at 2 separately I think.

@vivek
Copy link
Contributor

vivek commented Jan 12, 2017

@jglick @michaelneale #3 with follow up with definitely #2. #1 is nice to have too, but for blueocean its intrusive as we need to decide whats critical and whats not and change them. Something good for future interfaces where we do not need customization that #2 provides.

So basically #3 with #2 and #1 done in different PR with #2 as higher priority than #1.

@jglick
Copy link
Member Author

jglick commented Jan 12, 2017

Works for me, so: @reviewbybees please have at it

@jglick
Copy link
Member Author

jglick commented Jan 17, 2017

Did not make it into 2.41 because I am still awaiting a review.

@jglick jglick merged commit afea375 into jenkinsci:master Jan 18, 2017
@jglick jglick deleted the robustness-JENKINS-40083 branch January 18, 2017 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants