Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure application mode clusters aren't stuck in status updating #710

Merged
merged 2 commits into from
Aug 17, 2023

Conversation

live-wire
Copy link
Contributor

Fixes #709

  • I am not sure if there is any benefit from setting the status to Restarting, so removed it. It shouldn't be updater's responsibility to change desired state.
  • In cases of Deploy failed, the status kept looping between: Restarting > Deploying > Failed > Restarting. I believe the reconciler should look at a Failed job and decide to respawn a new one.

Copy link
Contributor

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m back next week but iirc restart state is actually useful i some stages. I’m just sensing that we might not be catching some state that is causing the state to be stuck. I can help out with this next week.

@regadas
Copy link
Contributor

regadas commented Aug 2, 2023

@live-wire do you have a specific job where this is happening? or is it happening for every job?

@live-wire
Copy link
Contributor Author

@regadas Seems to happen on every job (with or without high availability flink properties). To reproduce it, just delete the job manager pod.

@regadas regadas merged commit bf6cc0e into master Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job Manager is not brought back up
2 participants