-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-40088] Be more robust when exporting collections #89
Conversation
This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation. |
buffer.finished(); | ||
} catch (IOException x) { | ||
if (x.getCause() instanceof InvocationTargetException) { | ||
LOGGER.log(Level.WARNING, "skipping export of " + value, x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like extreme log spam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the kind of warning about something that's broken but the admin can do nothing about. As with past occurrences, this should be less than INFO unless in debug mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following your complaint. We routinely log unexpected exceptions at WARNING
, such as here to take just one random example. The admin can uninstall the responsible plugin, or file a bug, or patch it…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several logged exceptions per API request seems a bit much to me, especially given how common the problem of unexportable data seems to me. I'd expect the frequency to be about as annoying as what #48 fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it. #48 downgraded a message printed after upgrading some components against others which were still using what used to be a slightly unusual but otherwise supported construction.
The case fixed here is limited to genuine plugin bugs, generally ones which are trivial to diagnose and assign when encountered; and either trivial to fix completely, or at least ameliorate by catching in the @Exported
method.
And again we already have tons of code in Jenkins core which catches a broad range of exceptions from extensions or other plugin code, logs it, and continues, including code paths which might be encountered at a high frequency depending on usage patterns. Logging these things at FINE
will just mean that the bugs are never fixed. If there is a persistent issue with logs filling up with hundreds of more or less identical stack traces due to buggy plugins, that is best handled by the logging system, for example by printing a message saying that a given Logger
has produced too much noise and it is going to be suppressed for the rest of the session.
Build failure on one server is just a failure in the |
@jglick if getting the property value throws an exception we probably don't want the object it belongs to at all. For example, a missing property in an actions rest format might introduce unexpected behaviour. Take this action:
Would serialise as:
Blue Ocean plugins see a a valid action data for It would be better if we received a response that excluded the action entirely like:
In addition, we want to discriminate between a Action serialization failure and other types of model failure. It's fine for an Action not to serialise (they are optional) but if a Blue Ocean model object failed to serialize we want to translate that into a 500 error on the rest response. This is why we opted for a the |
BTW PR to fix the Jenkinsfile: #90 (side issue) |
That is exactly what this PR does. Look at the test. |
Then you have bigger problems. You will get a clear error in the log to fix, and the UI will get the rest of the information. Anyway note that this patch only excludes elements of collections (including array elements, or map keys & values). A single reference-typed field of some larger object which cannot be exported will cause the larger object to not be exported either, so if something required via a reference chain of non-collection fields all the way to the root throws an exception, you will still get a failed response. |
🐝 |
It sure if the scope of this should be all collections all the time though? Seems the equivalent of sweeping everything under the rug |
🐛 as this means in plenty of places you may have an NPE when showing a list, and a user sees a partial set of data - not knowing something is missing. The scope of this protection needs to be narrowed. |
@jglick I've written some test cases for this branch in a fork at https://github.com/i386/stapler/blob/9d507bd01887188309ff5ce73b88412e599c2611/core/src/test/java/org/kohsuke/stapler/export/RobustnessTest.java#L93 The behavior this PR does not cover is demonstrated in the test There are three jobs on my server. One of them will fail in our 🐛 With this PR, if I make a request like |
🐛 this PR also does not cover the use case described in the document where we want to direct users to file bugs in the right place. Because we are serializing Jenkins model as a tree and plugins contribute to this model, we effectively have a dependency on every Jenkins plugin. Asking the engineers to triage 100s of plugin crashes entering from the REST API and move to the right JIRA components isn't scalable for the Blue Ocean team. If we had a |
Hmm, talking this over with @michaelneale perhaps we should be returning the serialization error somehow. Perhaps we don't return any properties on the model apart from That would allow us to display a message for crashed plugins in the UI if they require that actions data. |
That is what exporting already did for missing
It is the responsibility of an administrator to check for errors in the log file and decide how to proceed.
Broken is broken—fix the plugin with the bug. That plugin might happen to have been written specifically “for” Blue Ocean, or it might only be incidentally participating, or somewhere in between. For Blue Ocean to be the default UI for Jenkins, with broad-ranging integrations, much of the current prototype code must be properly factored out into the logical domain plugins such as
That is the case anyway, until Jenkins 17.0 is redesigned as a federation of microservices.
Triaging poorly filed incoming bug reports is hard work. What makes the Blue Ocean team special in this regard? If you want to pay attention to bug reports at all, you need to spend some time filtering through noise. Reassigning a bug report based on a plugin name clearly visible at the top of a stack trace is the easiest case you will encounter. (BTW the NetBeans project has some automated tools to collect stack traces automatically from user installations, upload them with contextual information, search for duplicates using fuzzy matches, create collated lists of reported errors, and ultimately automatically file bug reports for commonly encountered errors, after trying to guess a component based on the packages seen in the stack trace. Perhaps Jenkins needs a similar system.) Anyway so far I have heard of one such crash, so it seems you are agonizing over a hypothetical problem.
That would potentially break existing clients not expecting such a format. If there is actually a client which could use such information meaningfully, a different format clearly distinct from actual objects could be emitted upon request (for example via a query string parameter). But anyway it is undesirable to have special-case error reporting in certain clients. Runtime errors caught in core/plugin code should be reported and collected in a uniform manner for any Jenkins functionality. |
Maybe a new annotation, or parameter to |
@jglick in Jenkins today, if an Action crashes the stack trace is reported on the UI and the user can see that there was an error. With your solution, the user wouldn't see anything and we are devolving away from how Jenkins does error reporting. So that fails your own assertion here that we should be reporting errors (making a case for
This kind of failure is completely ambiguous right now and you'd have to be a Jenkins Developer to figure this out. Take JENKINS-40083 as an example. The last three lines being the actual issue. Is it Jenkins, Stapler, Blue Ocean or Analysis Core? You'd have to be a Jenkins developer to figure this one out. If Blue Ocean touches hundreds of plugins through its REST API we absolutely, as a hard requirement, need to notify people of which component failed. Blue Ocean's aim is to build a better user experience for Jenkins and that means we have to handle failure better too. Both of these is what makes Blue Ocean different. Example
snip 10 or so lines
snip 20+ lines
What I meant was that we are loosely coupled to those plugins via REST. Microservices wouldn't help here either.
I've seen quite a few of these problems reported (up to three now and I don't know how many unreported - likely a lot given the variations in plugins). I am worried about how many I am going to see when 1.0 ships. If I can easily remove this class of problems with some small changes to Stapler that drops the risk of misattribution down quite a bit - then its no longer a concern. For example, yesterday a user of Blue Ocean updated to the latest plugins and received the broken I am not investing any of the CloudBees Blue Ocean team time building an exception reporting system - great idea, we should have one, but Blue Ocean has other fish to fry.
We would use To sum it up - perhaps Blue Ocean should just fork Stapler's export package, include it in our REST API module (have it respect |
@daniel-beck that could work nicely. |
Along the lines of
How so? Most errors in random code in Jenkins are caught and logged without any special display to the user.
Right…and that is exactly what this patch extends to Stapler exporting. I am not following your argument here.
That is true of any bug report! There is a runtime exception being thrown from a class defined in Analysis Core, so unless there is a particular reason to argue otherwise, it is a bug in that plugin.
It is true that the order of nested exceptions printed by default in Java is misleading. I would be happy to resume work on jenkinsci/jenkins#1485 to fix that.
We should rather improve error handling and reporting in Jenkins generally. Only a small proportion of errors are going to happen to have been thrown in the context of an HTTP request thread servicing the B.O. frontend.
So what are the other ones?
Yes, it happens to all features in Jenkins: bug reports without minimal analysis of root cause are filed in inappropriate places. Short of turning off public issue creation, the only solution is to have a dedicated team of triagers, so that developers of individual features are not bothered by spurious reports.
This is the part I fear would be incompatible; we would need to pick a different format which is more clearly distinguished from normal elements.
No. We need to fix problems once and for all at a single source to avoid wasted effort and duplication. |
BTW the CI failure:
|
This still doesn't solve the issue, for example: What happens now:
What would happen with this PR in this state:
Is this satisfactory? it doesn't really solve the issue that sparked this other than the REST equivalent of catching all exceptions and logging and moving on. Sometimes that is ok, but not clear to me it is here? In an app without arbitrary plugins changing behavior, this would be a clear dichotomy between returning an error always or continuing on after logging. However with plugins this is more subtle. There are some things to defend against and some to report to the user. |
Retroactively linking this to JENKINS-40088 which seems like the most appropriate general tracking issue. |
Well, we do not know for sure unless someone knows how to reproduce JENKINS-39647. I think what you would see is that pretty much everything is fine—just the data from the broken plugin is omitted (perhaps the UI was not even using that bit of data anyway). It is hard to be sure since this stack trace
does not indicate which path in the REST tree corresponds to the problem. Presumably somewhere in your request (you are using Note that the
meaning that with or without this PR, the job in question would be unusable without my jenkinsci/extensible-choice-parameter-plugin#27. |
That would be a natural enhancement to this PR. Blue Ocean could take advantage of it in a few months, once the Jenkins baseline is incremented to a version including the Stapler version with this fix. In the meantime you would get the default behavior for all fields: failure to write a single-valued attribute is treated as fatal; failure to export an element of a collection (or value of a map) is treated as nonfatal. With the parameter, you could fine-tune that behavior for particular fields: @Exported(critical=false)
public Stuff getThing() {
// export error on that Stuff will not block exporting of the rest of this object
return thing;
}
@Exported(critical=true) {
public List<Stuff> getThingies() {
// export error on any individual Stuff will cause this whole object to also fail
return thingies;
} Currently a failure in the getter itself (as opposed to writing of its return value) will always cause export of the whole object to fail IIUC; presumably writing |
As mentioned elsewhere in this PR, this PR fix would cause unintended issue for blueocean, where instead of HTTP 500, it will be returning 200. 🐛 |
Right…meaning that instead of the UI being totally broken, with a cryptic HTTP error message in the browser debug log if you happen to look for it, you will get a working UI, with some (usually irrelevant) fragment of data omitted, and a clearly reported error in the Jenkins log. That seems like an unconditional improvement to me. If there is some client which really needs to retrieve exception texts (and again this seems like a really bad idea to me, for reasons I have explained but am tired of repeating), then it would suffice to recognize a query parameter in the request which would supersede the <error>java.lang.IllegalStateException: oops
at …
</error> or {"error":"java.lang.IllegalStateException: oops\n\tat …\n"} |
This is unusual for serialization framework to make unilateral decision on behalf of clients. It should let client decide whether to treat it as error vs success. |
I am not sure what the “usual” serialization framework in this context is—Stapler exporting has always had a fixed error handling behavior—but at any rate, what do you think of the proposed pair of features? Would allow both model classes and clients to exert some control over behavior. |
To give an example of 'usual' serialization framework: Jackson JSON libraries let you define your own custom serializers, where you control serialization of objects. http://wiki.fasterxml.com/JacksonHowToCustomSerializers. I liked James proposal of ExportInterceptor on dev list. This is the kind of solution we are looking at where we control certain serialization behavior. https://docs.google.com/document/d/1AalzfYZ0HDfwtFMUNaw_okWMLaaQ0S5Q1o6L4taevL4/edit#heading=h.vr11cprbtd0 |
If that was the case, that is ok, but I believe in this case, what is ommitted isn't irrelevant, it is a whole pipeline. Its a row of data missing without failure. What is irrelevant is stuff to do with the action, not item in the collection for that pipeline. If it only omits the action json then it would be fine, but I don't think that is what this does if I hear people correctly. and yes, we are well out of the bounds of "usual" behavior here no matter what. |
@jglick some questions:
Thoughts? |
Yes, that is what should happen, at least based on the simplified test case I included here. Of course you could verify the behavior in the context of one of the reported bugs affecting Blue Ocean.
Could certainly do that, if there is consensus it would actually be used and so the effort is justified. If the plan is to develop a custom serialization hook system anyway, then it would be redundant and confusing to define APIs that are not going to be used.
I think that could be considered an incompatible change. Clients may be expecting that when they receive an exported structure, all defined non-nullable fields are present. (Whereas merely omitting an element from a collection is unlikely to violate any client expectations.) So my proposal was to keep the default behavior as critical for single-value-typed fields; the definer of the model could override that for particular fields (after verifying that it is OK for any existing clients to begin to see that field missing), or a client could request to receive explicit error objects instead. |
This is great. I am going to try it with blueocean. I guess it needs building stapler first with this PR then build Jenkins with this version of stapler. My test going to do is:
|
I wasn't sure which one you were replying to - the second sentence? (ie it will just exclude stuff from the nested collection?). My comment on the default behavior of critical - I was just referring to collections (like it is in this PR) - not all fields, agree (sorry for confusion). |
Yes, sorry for being unclear. |
Ok so I tested this PR with blueocean and it does the right thing, that is failing actions are skipped. However as discussed earlier this also has side effect that if for some reason there is error serializing a pipeline or any other object that blueocean owns, that object will be skipped and unless user reports issue we will never know things broke. So this brings us to using annotation or better interceptor for serialization of exported beans or something equivalent. In any case, this is good improvement where failing actions won't result in to WSOD. 👍 My test case is in blueocean-plugin's branch validate-stapler-pr89, jenkinsci/blueocean-plugin@5b7383f. |
Good news, so it seems like there are three options for this PR:
@vivek / @michaelneale your preferences? |
Yes that is good news. I think #3 is good for expediency, as it gets people out of trouble when ready as long as they happen to get on a new stapler. #1 would be theoretically ideal (if it wasn't a lot of work to get it approved) as it would behave the same as #3 but have the ability for things to opt in to "critical" for collections? So basically - #3 with #1 follow up in separate PR I think we are ok with, now that vivek has verified it. EDIT: @vivek thinks option 3, and wants to look at 2 separately I think. |
@jglick @michaelneale #3 with follow up with definitely #2. #1 is nice to have too, but for blueocean its intrusive as we need to decide whats critical and whats not and change them. Something good for future interfaces where we do not need customization that #2 provides. So basically #3 with #2 and #1 done in different PR with #2 as higher priority than #1. |
Works for me, so: @reviewbybees please have at it |
Did not make it into 2.41 because I am still awaiting a review. |
We were already catching
NotExportableException
, a common problem, but not other miscellaneous exceptions.JENKINS-40083 should not have caused the whole API call to fail. The test output before the fix:
Now it logs the same, but skips the broken element.
@reviewbybees