-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
various run build -F/--follow fixes around timeout, lack of pods, and data races #56
various run build -F/--follow fixes around timeout, lack of pods, and data races #56
Conversation
b368c5a
to
044ab73
Compare
@adambkaplan this is at a point where I think this fix for @SaschaSchwarze0 is ready for review it is late'ish Friday .... I'm off Monday, Oct 11. I won't get to any review comments that come Monday Oct 11 until Tuesday fwiw if any review comments come in over the weekend I'll have some cycles to spend some time on them thanks |
044ab73
to
d7eebe5
Compare
manual e2e test ... I'll see about adding an e2e test using our new bats based framework
|
add new pod watcher timeout callback print message on context/request timeout
simplify --follow init remove follow watch lock eliminate races
d7eebe5
to
829acf5
Compare
51b6c75
to
85c5ebe
Compare
meets min e2e added note the TODO / item for follow up, unless a reviewer has a quick/easy solution that I feel is suitable for this PR |
func (r *RunCommand) Log(msg string) { | ||
// concurrent fmt.Fprintf(r.ioStream.Out...) calls need locking to avoid data races, as we 'write' to the stream | ||
r.logLock.Lock() | ||
defer r.logLock.Unlock() | ||
fmt.Fprintf(r.ioStreams.Out, msg) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using a channel for this? the Log
function could write to the channel and another go function reads from it and writes the messages to standard out. I don't "think" that the sync.Mutex here will cause any blocking issues, but the channel might be even less likely to. Just a thought I had while looking through this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm positive mutex here is not a problem with the runtime code cause this method is the only one that uses it.
So we prevent the concurrent writes that race detection flagged.
It was only concurent writes in runtime code that we need to be wary of here then.
Also, I say runtime code, cause I do leverage this in the unit tests in one and only one spot to verify contents of the out buffer.
But still, given all these qualifiers, especially as there is not read / write interaction we are coordinating in the non test code, I think this simpler solution is better.
But good consideration to sort out - thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try it out later with my scenario.
pkg/shp/cmd/build/run.go
Outdated
} | ||
if giveUp { | ||
r.Log(msg) | ||
r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name)) | |
r.Log(fmt.Sprintf("exiting 'shp build run --follow' for BuildRun %q", br.Name)) |
pkg/shp/cmd/build/run.go
Outdated
if err != nil { | ||
r.Log(fmt.Sprintf("error accessing BuildRun %q: %s", r.buildRunName, err.Error())) | ||
} else { | ||
c := br.Status.GetCondition(buildv1alpha1.Succeeded) | ||
giveUp := false | ||
msg := "" | ||
switch { | ||
case c != nil && c.Status == corev1.ConditionTrue: | ||
giveUp = true | ||
msg = fmt.Sprintf("BuildRun '%s' has been marked as successful.\n", br.Name) | ||
case c != nil && c.Status == corev1.ConditionFalse: | ||
giveUp = true | ||
msg = fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name) | ||
case br.IsCanceled(): | ||
giveUp = true | ||
msg = fmt.Sprintf("BuildRun '%s' has been canceled.\n", br.Name) | ||
case br.DeletionTimestamp != nil: | ||
giveUp = true | ||
msg = fmt.Sprintf("BuildRun '%s' has been deleted.\n", br.Name) | ||
case !br.HasStarted(): | ||
r.Log(fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name)) | ||
} | ||
if giveUp { | ||
r.Log(msg) | ||
r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name)) | ||
r.stop() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err != nil { | |
r.Log(fmt.Sprintf("error accessing BuildRun %q: %s", r.buildRunName, err.Error())) | |
} else { | |
c := br.Status.GetCondition(buildv1alpha1.Succeeded) | |
giveUp := false | |
msg := "" | |
switch { | |
case c != nil && c.Status == corev1.ConditionTrue: | |
giveUp = true | |
msg = fmt.Sprintf("BuildRun '%s' has been marked as successful.\n", br.Name) | |
case c != nil && c.Status == corev1.ConditionFalse: | |
giveUp = true | |
msg = fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name) | |
case br.IsCanceled(): | |
giveUp = true | |
msg = fmt.Sprintf("BuildRun '%s' has been canceled.\n", br.Name) | |
case br.DeletionTimestamp != nil: | |
giveUp = true | |
msg = fmt.Sprintf("BuildRun '%s' has been deleted.\n", br.Name) | |
case !br.HasStarted(): | |
r.Log(fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name)) | |
} | |
if giveUp { | |
r.Log(msg) | |
r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name)) | |
r.stop() | |
} | |
} | |
if err != nil { | |
r.Log(fmt.Sprintf("error accessing BuildRun %q: %s", r.buildRunName, err.Error())) | |
return | |
} | |
c := br.Status.GetCondition(buildv1alpha1.Succeeded) | |
switch { | |
case c != nil && c.Status == corev1.ConditionTrue: | |
r.Log(fmt.Sprintf("BuildRun '%s' has been marked as successful.\n", br.Name)) | |
r.stop() | |
case c != nil && c.Status == corev1.ConditionFalse: | |
r.Log(fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name)) | |
r.stop() | |
case br.IsCanceled(): | |
r.Log(fmt.Sprintf("BuildRun '%s' has been canceled.\n", br.Name)) | |
r.stop() | |
case br.DeletionTimestamp != nil: | |
r.Log(fmt.Sprintf("BuildRun '%s' has been deleted.\n", br.Name)) | |
r.stop() | |
case !br.HasStarted(): | |
r.Log(fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name)) | |
} |
I think that this is cleaner and easier to tell exactly what is going on instead of having to track to the end of the switch statement to see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
maybe in one of my iterations prior to push I had something after the else block, but there's no telling :-)
pkg/shp/cmd/build/run_test.go
Outdated
@@ -1,21 +1,21 @@ | |||
package build | |||
|
|||
import ( | |||
"runtime" | |||
"bytes" | |||
"github.com/shipwright-io/cli/pkg/shp/reactor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in the wrong group
add lock to avoid data race on Out stream used for logging
85c5ebe
to
71ba9a0
Compare
/approve updates based on comments @SaschaSchwarze0 @coreydaley pushed, included in existing commits ideally, we minimally wait for feedback from @SaschaSchwarze0 per #56 (review) that it addresses his scenario before potentially tagging for merge so I'll put a /hold on it now, in case someone wants to lgtm the code changes, but we don't release the hold until @SaschaSchwarze0 says it worked for him, or that he won't be able to get to it To be honest, I don't have a ready made, "build is valid, but the build run create will result in a Pod never getting created" scenario, but if I get out from under today, I'll spend some time trying to devise one. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gabemontero The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@gabemontero were you able to resolve the issues with the builds not succeeding? |
yeah mean the new e2e's? No. And depending on the timing of the remaining items, I'm inclined to leave the e2e as is (since it can still verify the log is being followed, which is the key), and opening an issue to clean that up when someone has time to look into it. |
/lgtm |
opened #57 for ^^ |
bump @SaschaSchwarze0 do you think you'll be able to try this PR in your env in the next few days? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/unhold
Changes
Fixes #53
--request-timeout
and--context
options thatshp
inherits from k8s to apply request or context deadline timeout / aborts onship build run --follow
Submitter Checklist
See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.
/kind bug
Release Notes
@adambkaplan
@coreydaley
@SaschaSchwarze0
@otaviof