Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if a side container in a taskrun failes, entire taskrun should fail #729

Closed
chmouel opened this issue Apr 4, 2019 · 10 comments
Closed

if a side container in a taskrun failes, entire taskrun should fail #729

chmouel opened this issue Apr 4, 2019 · 10 comments
Labels
design This task is about creating and discussing a design kind/documentation Categorizes issue or PR as related to documentation. kind/question Issues or PRs that are questions around the project or a particular feature

Comments

@chmouel
Copy link
Member

chmouel commented Apr 4, 2019

Reusing the title from #682 since they are the same but this time we are using side container instead of init ones

Expected Behavior

When you have one step that fails it should be marked as fail and not continue to the next step

Actual Behavior

Continue to the next steps and show as Completed/Succeeded

Steps to Reproduce the Problem

Sample task

apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: echo-hello-world
spec:
  steps:
    - name: echo
      image: ubuntu
      command:
        - echo
      args:
        - "hello world"
    - name: error
      image: ubuntu
      command:
        - exit
    - name: final
      image: ubuntu
      command:
        - echo
      args:
        - "hello moto"

First step should success, second shoudl error and third should work as seen from the containerStatuses :

- containerID: cri-o://e8f02dae0c84276cc1fb15b282c8b5241042525c13576fca4353239484e92436
    image: docker.io/library/ubuntu:latest
    imageID: docker.io/library/ubuntu@sha256:f2557f94cac1cc4509d0483cb6e302da841ecd6f82eb2e91dc7ba6cfd0c580ab
    lastState: {}
    name: build-step-echo
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://e8f02dae0c84276cc1fb15b282c8b5241042525c13576fca4353239484e92436
        exitCode: 0
        finishedAt: 2019-04-04T22:06:00Z
        reason: Completed
        startedAt: 2019-04-04T22:06:00Z
  - containerID: cri-o://9358cbe2aeeecbe571c102d6489fb3a366c9d6198217d022d5b7a2647ad62998
    image: docker.io/library/ubuntu:latest
    imageID: docker.io/library/ubuntu@sha256:f2557f94cac1cc4509d0483cb6e302da841ecd6f82eb2e91dc7ba6cfd0c580ab
    lastState: {}
    name: build-step-error
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://9358cbe2aeeecbe571c102d6489fb3a366c9d6198217d022d5b7a2647ad62998
        exitCode: 1
        finishedAt: 2019-04-04T22:06:01Z
        reason: Error
        startedAt: 2019-04-04T22:06:01Z
  - containerID: cri-o://f400bcd955d115462011c9cafe786d5b3431f4b0fc37d440ce04e6edca4f9b52
    image: docker.io/library/ubuntu:latest
    imageID: docker.io/library/ubuntu@sha256:f2557f94cac1cc4509d0483cb6e302da841ecd6f82eb2e91dc7ba6cfd0c580ab
    lastState: {}
    name: build-step-final
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://f400bcd955d115462011c9cafe786d5b3431f4b0fc37d440ce04e6edca4f9b52
        exitCode: 0
        finishedAt: 2019-04-04T22:06:02Z
        reason: Completed
        startedAt: 2019-04-04T22:06:02Z

but third step should have not been started since the second one has failed and should show as completed/success :

% k get pod
NAME                                           READY   STATUS      RESTARTS   AGE
echo-hello-world-task-run-pod-cfd7fd           0/4     Completed   0          3m30s
% k get taskrun
NAME                        TYPE        STATUS   STARTTIME   COMPLETIONTIME
echo-hello-world-task-run   Succeeded   False    4m38s       4m24s

Additional Info

@vdemeester
Copy link
Member

@chmouel So… it's a bit trickier. As we don't use initContainers, all containers in a Pod starts in parallel… So the flow is the following:

  • all the container are starting, initial step command being "transformed" using entrypoint …
  • if it's the first step, it starts right away, and write a file when finished (if failed, it's gonna prepend .err to that file)
  • if it's not the first step, it's gonna wait for a file (well actually either the "success file" or the "error file") to be written (by the previous step) before actually running the real step command.
    • If the previous step was error, it's doing a no-op (aka exit 0)

This can be a bit confusing as it will exit 0 and thus makes us think it actually executed. It kinda has been discuss here quikcly 😅

@chmouel
Copy link
Member Author

chmouel commented Apr 5, 2019

I see thanks for taking the time to describes how it works, I guess then it's a UI problem, to check if the containers have failed or not.

Is it a pattern that we are going to keep ? if it is then perhaps we can document it ? (I would not mind doing it)

@vdemeester vdemeester added design This task is about creating and discussing a design kind/question Issues or PRs that are questions around the project or a particular feature labels Apr 5, 2019
@vdemeester
Copy link
Member

I see thanks for taking the time to describes how it works, I guess then it's a UI problem, to check if the containers have failed or not.

Is it a pattern that we are going to keep ? if it is then perhaps we can document it ? (I would not mind doing it)

Right… Good question 😅. I'm not a huge fan of " it's a UI problem, to check if the containers have failed or not." as it means each UI implementation will have to have this knowledge (and if it changed or evolve, well. we break them). I think that:

  1. We should document it (for sure !)
  2. We may want to think of a better way to show that those steps are skip due to previous errors. In the PR we discussed maybe using a special exit code for that.

@chmouel
Copy link
Member Author

chmouel commented Apr 5, 2019

Perhaps as a first step, we can maybe fix that the status is not set as "Succeeded" when there is a failure ? For example :

status:
  conditions:
  - lastTransitionTime: 2019-04-04T22:06:04Z
    message: 'build step "build-step-error" exited with code 1 (image: "docker.io/library/ubuntu@sha256:f2557f94cac1cc4509d0483cb6e302da841ecd6f82eb2e91dc7ba6cfd0c580ab");
      for logs run: kubectl -n tekton-pipelines logs echo-hello-world-task-run-pod-cfd7fd
      -c build-step-error'
    status: "False"
    type: Succeeded

the type here get sets as failed after all the task has finished ?

that would make explicit that there is an issue for UI to then go digg into the issue ?

@vdemeester
Copy link
Member

@chmouel as discussed on Slack, this conditions are "standard", and it is kinda explicit for the consumer that this is a failure : condition of type Succeeded is false — hence it's a failure. Also this is what knative/serving is looking for and it would be a huge API changes just to rename it 😓

@chmouel
Copy link
Member Author

chmouel commented Apr 5, 2019

fair enough, so let's try to document this, then. (there is a nice github tag for this?)

@vdemeester vdemeester added the kind/documentation Categorizes issue or PR as related to documentation. label Apr 5, 2019
@chmouel chmouel closed this as completed Apr 11, 2019
@bobcatfish
Copy link
Collaborator

I just want to double check @chmouel @vdemeester do we want to keep this open so we update the docs? or did that already happen?

@bobcatfish bobcatfish reopened this Apr 16, 2019
@vdemeester
Copy link
Member

@bobcatfish we may want to keep it opened until docs are clearer about it, indeed

@vdemeester
Copy link
Member

I think we can "close" this 👼 @imjasonh proposal on API spec should help with making it more clear but it's "standard" conditions 👼

/close

@tekton-robot
Copy link
Collaborator

@vdemeester: Closing this issue.

In response to this:

I think we can "close" this 👼 @imjasonh proposal on API spec should help with making it more clear but it's "standard" conditions 👼

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design This task is about creating and discussing a design kind/documentation Categorizes issue or PR as related to documentation. kind/question Issues or PRs that are questions around the project or a particular feature
Projects
None yet
Development

No branches or pull requests

4 participants