various run build -F/--follow fixes around timeout, lack of pods, and data races #56

gabemontero · 2021-10-08T17:15:18Z

Changes

Fixes #53

adds/tweaks integration from k8s --request-timeout and --context options that shp inherits from k8s to apply request or context deadline timeout / aborts on ship build run --follow
add callbacks for lack of pod events, including cross reference with build run when those occur, just as we do when pod events do occur (still no explicit buildrun watch to go with pod watch, as multiple watches proves untenable with unit testing, plus over-indulgent, as we only need buildrun for terminal states), to exit --follow
slight refactor of SubCommand interface to allow for more initialization in Complete() vs. Run() for --follow
which allowed vast simplification of data race prevention on RunCommand
and removal of what proved to be kludgy thread coordination in --follow unit tests

Submitter Checklist

[ /] Includes tests if functionality changed/was added
[n/a ] Includes docs if changes are user-facing
[ /] Set a kind label on this PR
[/ ] Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

/kind bug

Release Notes

fully integrate options '--context' and '--request-timeout' into 'shp build run --follow' 
allow for exit if the underlying Pod for a BuildRun is never created
concurrency fixes

@adambkaplan
@coreydaley
@SaschaSchwarze0
@otaviof

gabemontero · 2021-10-09T00:15:07Z

@adambkaplan
@coreydaley
@SaschaSchwarze0
@otaviof

this is at a point where I think this fix for @SaschaSchwarze0 is ready for review

it is late'ish Friday .... I'm off Monday, Oct 11. I won't get to any review comments that come Monday Oct 11 until Tuesday

fwiw if any review comments come in over the weekend I'll have some cycles to spend some time on them

thanks

gabemontero · 2021-10-09T13:28:36Z

manual e2e test ... I'll see about adding an e2e test using our new bats based framework

gmontero ~/go/src/github.com/shipwright-io/cli-1  (fix-follow-no-pod)$ shp build run buildpack-nodejs-build -F
Pod 'buildpack-nodejs-build-l5hc6-9vllp-pod-thsfd' is in state "Pending"...
Pod 'buildpack-nodejs-build-l5hc6-9vllp-pod-thsfd' is in state "Pending"...
Pod 'buildpack-nodejs-build-l5hc6-9vllp-pod-thsfd' is in state "Pending"...
[place-tools] 2021/10/09 13:25:36 Copied /ko-app/entrypoint to /tekton/tools/entrypoint
[build-and-push] ===> DETECTING
[build-and-push] 5 of 9 buildpacks participating
[build-and-push] paketo-buildpacks/ca-certificates 2.4.1
[build-and-push] paketo-buildpacks/node-engine     0.8.0
[build-and-push] paketo-buildpacks/npm-install     0.4.0
[build-and-push] paketo-buildpacks/node-module-bom 0.1.2
[build-and-push] paketo-buildpacks/npm-start       0.4.0
[build-and-push] ===> ANALYZING
[source-default] 2021/10/09 13:25:38 Info: ssh (/usr/bin/ssh): OpenSSH_8.0p1, OpenSSL 1.1.1g FIPS  21 Apr 2020
[source-default] 2021/10/09 13:25:38 Info: git (/usr/bin/git): git version 2.27.0
[source-default] 2021/10/09 13:25:38 Info: git-lfs (/usr/bin/git-lfs): git-lfs/2.11.0 (GitHub; linux amd64; go 1.14.4)
[source-default] 2021/10/09 13:25:38 /usr/bin/git clone -h
[source-default] 2021/10/09 13:25:38 /usr/bin/git submodule -h
[source-default] 2021/10/09 13:25:38 /usr/bin/git clone --quiet --no-tags --single-branch --depth 1 -- https://github.com/shipwright-io/sample-nodejs /workspace/source
[source-default] 2021/10/09 13:25:38 /usr/bin/git -C /workspace/source submodule update --init --recursive --depth 1
[source-default] 2021/10/09 13:25:38 /usr/bin/git -C /workspace/source rev-parse --abbrev-ref HEAD
[source-default] 2021/10/09 13:25:38 Successfully loaded https://github.com/shipwright-io/sample-nodejs (main) into /workspace/source
[source-default] 2021/10/09 13:25:38 /usr/bin/git -C /workspace/source rev-parse --verify HEAD
[source-default] 2021/10/09 13:25:38 /usr/bin/git -C /workspace/source log -1 --pretty=format:%an
[build-and-push] Previous image with name "" not found
[build-and-push] Restoring metadata for "paketo-buildpacks/ca-certificates:helper" from app image
[build-and-push] ===> RESTORING
[build-and-push] ===> BUILDING
[build-and-push] 
[build-and-push] Paketo CA Certificates Buildpack 2.4.1
[build-and-push]   https://github.com/paketo-buildpacks/ca-certificates
[build-and-push]   Launch Helper: Reusing cached layer
[build-and-push] Paketo Node Engine Buildpack 0.8.0
[build-and-push]   Resolving Node Engine version
[build-and-push]     Candidate version sources (in priority order):
[build-and-push]       package.json -> ">= 0.10.12"
[build-and-push]       <unknown>    -> ""
[build-and-push] 
[build-and-push]     Selected Node Engine version (using package.json): 16.9.1
[build-and-push] 
[build-and-push]   Executing build process
[build-and-push]     Installing Node Engine 16.9.1
[build-and-push]       Completed in 2.796s
[build-and-push] 
[build-and-push]   Configuring build environment
[build-and-push]     NODE_ENV     -> "production"
[build-and-push]     NODE_HOME    -> "/layers/paketo-buildpacks_node-engine/node"
[build-and-push]     NODE_VERBOSE -> "false"
[build-and-push] 
[build-and-push]   Configuring launch environment
[build-and-push]     NODE_ENV     -> "production"
[build-and-push]     NODE_HOME    -> "/layers/paketo-buildpacks_node-engine/node"
[build-and-push]     NODE_VERBOSE -> "false"
[build-and-push] 
[build-and-push]     Writing profile.d/0_memory_available.sh
[build-and-push]       Calculates available memory based on container limits at launch time.
[build-and-push]       Made available in the MEMORY_AVAILABLE environment variable.
[build-and-push] 
[build-and-push] Paketo NPM Install Buildpack 0.4.0
[build-and-push]   Resolving installation process
[build-and-push]     Process inputs:
[build-and-push]       node_modules      -> "Not found"
[build-and-push]       npm-cache         -> "Not found"
[build-and-push]       package-lock.json -> "Not found"
[build-and-push] 
[build-and-push]     Selected NPM build process: 'npm install'
[build-and-push] 
[build-and-push]   Executing build process
[build-and-push]     Running 'npm install --unsafe-perm --cache /layers/paketo-buildpacks_npm-install/npm-cache'
[build-and-push]       Completed in 1.013s
[build-and-push] 
[build-and-push]   Configuring launch environment
[build-and-push]     NPM_CONFIG_LOGLEVEL -> "error"
[build-and-push] 
[build-and-push]   Configuring environment shared by build and launch
[build-and-push]     PATH -> "$PATH:/layers/paketo-buildpacks_npm-install/modules/node_modules/.bin"
[build-and-push] 
[build-and-push] 
[build-and-push] Paketo Node Module Bill of Materials Generator Buildpack 0.1.2
[build-and-push]   Resolving CycloneDX Node.js Module version
[build-and-push]     Selected CycloneDX Node.js Module version: 3.0.7
[build-and-push] 
[build-and-push]   Executing build process
[build-and-push]     Installing CycloneDX Node.js Module 3.0.7
[build-and-push]       Completed in 242ms
[build-and-push] 
[build-and-push]   Configuring environment
[build-and-push]     Appending CycloneDX Node.js Module onto PATH
[build-and-push] 
[build-and-push]   Running CycloneDX Node.js Module
[build-and-push]     Running 'cyclonedx-bom -o bom.json'
[build-and-push]       Completed in 236ms
[build-and-push] 
[build-and-push] Paketo NPM Start Buildpack 0.4.0
[build-and-push]   Assigning launch processes
[build-and-push]     web: node server.js
[build-and-push] 
[build-and-push] ===> EXPORTING
[build-and-push] Reusing layers from image 'index.docker.io/gmontero/sample-nodejs@sha256:1cabccf7ba39c8041e2563f43a1aa878ab6c98f085fcff3c5dfc9c436783564c'
[build-and-push] Reusing layer 'paketo-buildpacks/ca-certificates:helper'
[build-and-push] Reusing layer 'paketo-buildpacks/node-engine:node'
[build-and-push] Reusing layer 'paketo-buildpacks/npm-install:modules'
[build-and-push] Reusing 1/1 app layer(s)
[build-and-push] Reusing layer 'launcher'
[build-and-push] Reusing layer 'config'
[build-and-push] Adding label 'io.buildpacks.lifecycle.metadata'
[build-and-push] Adding label 'io.buildpacks.build.metadata'
[build-and-push] Adding label 'io.buildpacks.project.metadata'
[build-and-push] Saving docker.io/gmontero/sample-nodejs:latest...
[build-and-push] *** Images (sha256:9da0768da7a03aeef6e06d42f833e9772e5d07cae9d8ebccb79e75204584922e):
[build-and-push]       docker.io/gmontero/sample-nodejs:latest
[build-and-push] Adding cache layer 'paketo-buildpacks/node-engine:node'
[build-and-push] Adding cache layer 'paketo-buildpacks/npm-install:modules'
[build-and-push] Adding cache layer 'paketo-buildpacks/node-module-bom:cyclonedx-node-module'
Pod 'buildpack-nodejs-build-l5hc6-9vllp-pod-thsfd' has succeeded!
gmontero ~/go/src/github.com/shipwright-io/cli-1  (fix-follow-no-pod)$

add new pod watcher timeout callback print message on context/request timeout

simplify --follow init remove follow watch lock eliminate races

gabemontero · 2021-10-09T14:21:18Z

meets min e2e added

note the TODO / item for follow up, unless a reviewer has a quick/easy solution that I feel is suitable for this PR

coreydaley · 2021-10-10T15:23:06Z

pkg/shp/cmd/build/run.go

+func (r *RunCommand) Log(msg string) {
+	// concurrent fmt.Fprintf(r.ioStream.Out...) calls need locking to avoid data races, as we 'write' to the stream
+	r.logLock.Lock()
+	defer r.logLock.Unlock()
+	fmt.Fprintf(r.ioStreams.Out, msg)
+}
+


Have you considered using a channel for this? the Log function could write to the channel and another go function reads from it and writes the messages to standard out. I don't "think" that the sync.Mutex here will cause any blocking issues, but the channel might be even less likely to. Just a thought I had while looking through this.

Yeah I'm positive mutex here is not a problem with the runtime code cause this method is the only one that uses it.

So we prevent the concurrent writes that race detection flagged.

It was only concurent writes in runtime code that we need to be wary of here then.

Also, I say runtime code, cause I do leverage this in the unit tests in one and only one spot to verify contents of the out buffer.

But still, given all these qualifiers, especially as there is not read / write interaction we are coordinating in the non test code, I think this simpler solution is better.

But good consideration to sort out - thanks.

SaschaSchwarze0

Will try it out later with my scenario.

SaschaSchwarze0 · 2021-10-11T07:40:57Z

pkg/shp/cmd/build/run.go

+		}
+		if giveUp {
+			r.Log(msg)
+			r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name))


Suggested change

r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name))

r.Log(fmt.Sprintf("exiting 'shp build run --follow' for BuildRun %q", br.Name))

coreydaley · 2021-10-11T15:30:57Z

pkg/shp/cmd/build/run.go

+	if err != nil {
+		r.Log(fmt.Sprintf("error accessing BuildRun %q: %s", r.buildRunName, err.Error()))
+	} else {
+		c := br.Status.GetCondition(buildv1alpha1.Succeeded)
+		giveUp := false
+		msg := ""
+		switch {
+		case c != nil && c.Status == corev1.ConditionTrue:
+			giveUp = true
+			msg = fmt.Sprintf("BuildRun '%s' has been marked as successful.\n", br.Name)
+		case c != nil && c.Status == corev1.ConditionFalse:
+			giveUp = true
+			msg = fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name)
+		case br.IsCanceled():
+			giveUp = true
+			msg = fmt.Sprintf("BuildRun '%s' has been canceled.\n", br.Name)
+		case br.DeletionTimestamp != nil:
+			giveUp = true
+			msg = fmt.Sprintf("BuildRun '%s' has been deleted.\n", br.Name)
+		case !br.HasStarted():
+			r.Log(fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name))
+		}
+		if giveUp {
+			r.Log(msg)
+			r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name))
+			r.stop()
+		}
+	}


Suggested change

if err != nil {

r.Log(fmt.Sprintf("error accessing BuildRun %q: %s", r.buildRunName, err.Error()))

} else {

c := br.Status.GetCondition(buildv1alpha1.Succeeded)

giveUp := false

msg := ""

switch {

case c != nil && c.Status == corev1.ConditionTrue:

giveUp = true

msg = fmt.Sprintf("BuildRun '%s' has been marked as successful.\n", br.Name)

case c != nil && c.Status == corev1.ConditionFalse:

giveUp = true

msg = fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name)

case br.IsCanceled():

giveUp = true

msg = fmt.Sprintf("BuildRun '%s' has been canceled.\n", br.Name)

case br.DeletionTimestamp != nil:

giveUp = true

msg = fmt.Sprintf("BuildRun '%s' has been deleted.\n", br.Name)

case !br.HasStarted():

r.Log(fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name))

}

if giveUp {

r.Log(msg)

r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name))

r.stop()

}

}

if err != nil {

r.Log(fmt.Sprintf("error accessing BuildRun %q: %s", r.buildRunName, err.Error()))

return

}

c := br.Status.GetCondition(buildv1alpha1.Succeeded)

switch {

case c != nil && c.Status == corev1.ConditionTrue:

r.Log(fmt.Sprintf("BuildRun '%s' has been marked as successful.\n", br.Name))

r.stop()

case c != nil && c.Status == corev1.ConditionFalse:

r.Log(fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name))

r.stop()

case br.IsCanceled():

r.Log(fmt.Sprintf("BuildRun '%s' has been canceled.\n", br.Name))

r.stop()

case br.DeletionTimestamp != nil:

r.Log(fmt.Sprintf("BuildRun '%s' has been deleted.\n", br.Name))

r.stop()

case !br.HasStarted():

r.Log(fmt.Sprintf("BuildRun '%s' has been marked as failed.\n", br.Name))

}

I think that this is cleaner and easier to tell exactly what is going on instead of having to track to the end of the switch statement to see what happens.

I agree

maybe in one of my iterations prior to push I had something after the else block, but there's no telling :-)

coreydaley · 2021-10-12T11:57:11Z

pkg/shp/cmd/build/run_test.go

@@ -1,21 +1,21 @@
 package build

 import (
-	"runtime"
+	"bytes"
+	"github.com/shipwright-io/cli/pkg/shp/reactor"


this is in the wrong group

add lock to avoid data race on Out stream used for logging

gabemontero · 2021-10-12T17:35:50Z

/approve

updates based on comments @SaschaSchwarze0 @coreydaley pushed, included in existing commits

ideally, we minimally wait for feedback from @SaschaSchwarze0 per #56 (review) that it addresses his scenario before potentially tagging for merge

so I'll put a

/hold

on it now, in case someone wants to lgtm the code changes, but we don't release the hold until @SaschaSchwarze0 says it worked for him, or that he won't be able to get to it

To be honest, I don't have a ready made, "build is valid, but the build run create will result in a Pod never getting created" scenario, but if I get out from under today, I'll spend some time trying to devise one.

openshift-ci · 2021-10-12T17:35:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coreydaley · 2021-10-12T17:52:16Z

@gabemontero were you able to resolve the issues with the builds not succeeding?

gabemontero · 2021-10-12T17:56:09Z

@gabemontero were you able to resolve the issues with the builds not succeeding?

yeah mean the new e2e's? No. And depending on the timing of the remaining items, I'm inclined to leave the e2e as is (since it can still verify the log is being followed, which is the key), and opening an issue to clean that up when someone has time to look into it.

coreydaley · 2021-10-12T17:57:17Z

/lgtm

gabemontero · 2021-10-12T18:08:05Z

@gabemontero were you able to resolve the issues with the builds not succeeding?

yeah mean the new e2e's? No. And depending on the timing of the remaining items, I'm inclined to leave the e2e as is (since it can still verify the log is being followed, which is the key), and opening an issue to clean that up when someone has time to look into it.

opened #57 for ^^

gabemontero · 2021-10-15T14:50:34Z

bump @SaschaSchwarze0

do you think you'll be able to try this PR in your env in the next few days?
or should we merge, let you try whenever you get the chance, and open a new issue / reopen etc. if something is still amiss?

thanks

SaschaSchwarze0

/unhold

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. release-note labels Oct 8, 2021

openshift-ci bot requested review from alicerum and imjasonh October 8, 2021 17:15

gabemontero force-pushed the fix-follow-no-pod branch 2 times, most recently from b368c5a to 044ab73 Compare October 9, 2021 00:04

gabemontero changed the title ~~WIP: utilize --request-timeout in pod watcher; add new pod watcher timeout callback; print message on context/request timeout~~ various fixes around timeout, lack of pods, and data races Oct 9, 2021

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 9, 2021

gabemontero force-pushed the fix-follow-no-pod branch from 044ab73 to d7eebe5 Compare October 9, 2021 13:27

gabemontero added 3 commits October 9, 2021 09:32

utilize --request-timeout in pod watcher

af38749

add new pod watcher timeout callback print message on context/request timeout

no pod events yet prototype

d01f94c

add iostreams to complete;

d80f348

simplify --follow init remove follow watch lock eliminate races

gabemontero force-pushed the fix-follow-no-pod branch from d7eebe5 to 829acf5 Compare October 9, 2021 14:01

gabemontero changed the title ~~various fixes around timeout, lack of pods, and data races~~ various run build -F/--follow fixes around timeout, lack of pods, and data races Oct 9, 2021

gabemontero force-pushed the fix-follow-no-pod branch from 51b6c75 to 85c5ebe Compare October 9, 2021 14:14

coreydaley reviewed Oct 10, 2021

View reviewed changes

SaschaSchwarze0 requested changes Oct 11, 2021

View reviewed changes

coreydaley reviewed Oct 11, 2021

View reviewed changes

coreydaley mentioned this pull request Oct 11, 2021

[WIP] Refactoring and code cleanup #38

Closed

coreydaley reviewed Oct 12, 2021

View reviewed changes

gabemontero added 3 commits October 12, 2021 13:29

add no pod events yet callback

9ba2e51

add lock to avoid data race on Out stream used for logging

add --follow no pod yet unit test

d9c10e8

add e2e for buildrun log following

71ba9a0

gabemontero force-pushed the fix-follow-no-pod branch from 85c5ebe to 71ba9a0 Compare October 12, 2021 17:30

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2021

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2021

openshift-ci bot assigned coreydaley Oct 12, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2021

SaschaSchwarze0 approved these changes Oct 17, 2021

View reviewed changes

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2021

openshift-merge-robot merged commit c91c9b7 into shipwright-io:main Oct 17, 2021

gabemontero deleted the fix-follow-no-pod branch October 18, 2021 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

various run build -F/--follow fixes around timeout, lack of pods, and data races #56

various run build -F/--follow fixes around timeout, lack of pods, and data races #56

gabemontero commented Oct 8, 2021 •

edited

Loading

gabemontero commented Oct 9, 2021

gabemontero commented Oct 9, 2021

gabemontero commented Oct 9, 2021

coreydaley Oct 10, 2021

gabemontero Oct 10, 2021

SaschaSchwarze0 left a comment

SaschaSchwarze0 Oct 11, 2021 •

edited

Loading

coreydaley Oct 11, 2021

gabemontero Oct 12, 2021

coreydaley Oct 12, 2021

gabemontero commented Oct 12, 2021

openshift-ci bot commented Oct 12, 2021

coreydaley commented Oct 12, 2021

gabemontero commented Oct 12, 2021

coreydaley commented Oct 12, 2021

gabemontero commented Oct 12, 2021

gabemontero commented Oct 15, 2021

SaschaSchwarze0 left a comment

	r.Log(fmt.Sprintf("exting 'ship build run --follow' for BuildRun %q", br.Name))
	r.Log(fmt.Sprintf("exiting 'shp build run --follow' for BuildRun %q", br.Name))

various run build -F/--follow fixes around timeout, lack of pods, and data races #56

various run build -F/--follow fixes around timeout, lack of pods, and data races #56

Conversation

gabemontero commented Oct 8, 2021 • edited Loading

Changes

Submitter Checklist

Release Notes

gabemontero commented Oct 9, 2021

gabemontero commented Oct 9, 2021

gabemontero commented Oct 9, 2021

coreydaley Oct 10, 2021

Choose a reason for hiding this comment

gabemontero Oct 10, 2021

Choose a reason for hiding this comment

SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

SaschaSchwarze0 Oct 11, 2021 • edited Loading

Choose a reason for hiding this comment

coreydaley Oct 11, 2021

Choose a reason for hiding this comment

gabemontero Oct 12, 2021

Choose a reason for hiding this comment

coreydaley Oct 12, 2021

Choose a reason for hiding this comment

gabemontero commented Oct 12, 2021

openshift-ci bot commented Oct 12, 2021

coreydaley commented Oct 12, 2021

gabemontero commented Oct 12, 2021

coreydaley commented Oct 12, 2021

gabemontero commented Oct 12, 2021

gabemontero commented Oct 15, 2021

SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

gabemontero commented Oct 8, 2021 •

edited

Loading

SaschaSchwarze0 Oct 11, 2021 •

edited

Loading