-
Notifications
You must be signed in to change notification settings - Fork 854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Umbrella issue] How we monitor k8s-infra ? #2588
Comments
For this milestone, I would like to focus on how to flew out methods and practices about how we should do monitoring for k8s-infra. /area infra |
/priority important-longterm |
If Prometheus is the tool picked I'm happy to jump in and help, I have a decent amount of experience. |
Related to: - Ref: kubernetes#2588 Bootstrap a new suberepo that will host the Terraform resources consuming the GCP monitoring API. Signed-off-by: Arnaud Meukam <[email protected]>
Related to: - Ref: kubernetes#2588 Bootstrap a new suberepo that will host the Terraform resources consuming the GCP monitoring API. I also bumped the terraform provider for this subrepo and will the other declarations of the provider in a followup PR. Signed-off-by: Arnaud Meukam <[email protected]>
Related: - Part of: kubernetes#2588 - Fixes: kubernetes#2942 - followup of: kubernetes#2898 Ensure service account tf-monitoring-deployer can be used in build cluster prow-build-trusted Signed-off-by: Arnaud Meukam <[email protected]>
/milestone v1.24 |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/milestone clear |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/lifecycle frozen |
there is an effort to deploy an unified stack for monitoring. See: #7377 /close |
@ameukam: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
We initially had this conversation in #401.
Also kubernetes/test-infra#23317 (comment):
Some questions from
thockin
:Cluster monitoring
a) What should we use?
GKE Workload metrics : https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#workload-metrics
Managed service for Prometheus : https://cloud.google.com/stackdriver/docs/managed-prometheus
b) How do we set it up with git-ops?
- #1376
- #1624
c) What exactly are we concerned about (signals)?
d) How are alerts delivered to a group of people?
e) How do we manage that group?
f) Do we need an on-call rotation?
App monitoring
a) Same tool as cluster monitoring?
b) What is the minimum expectation for an app to be deployed into community space
c) How do we manage groups of alerts for each app (ggroups?)
d) How do we manage on-call for each app?
GCP quotas monitoring
How do we monitoring them ?
More questions can be added.
/milestone v1.23
/are infra
The text was updated successfully, but these errors were encountered: