Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Autoscaler Monitoring Non-Tagged Nodes and Generating Errors - Failed to check cloud provider has instance for ip-*: node is not present in aws: could not find instance #7839

Open
ratkokorlevski-rldatix opened this issue Feb 14, 2025 · 1 comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@ratkokorlevski-rldatix
Copy link

[BUG] Cluster Autoscaler Monitoring Non-Tagged Nodes and Generating Errors

Helm Chart Version: 9.45.0

Cloud Provider: AWS

Cluster Autoscaler Image: cluster-autoscaler:v1.32.0

Issue Description:

The Cluster Autoscaler is configured with AWS auto-discovery based on specific tags. However, it is also monitoring nodes that do not have the required discovery tags. These nodes are self-managed and should be ignored by the autoscaler. As a result, we are receiving numerous errors in the logs, which are triggering false alerts.

Values File Configuration:

installCRDs: true

tolerations:
  - key: "CriticalAddonsOnly"
    operator: "Exists"
    effect: "NoSchedule"

nodeSelector:
  kubernetes.io/os: linux

namespace: "kube-system"
cloudProvider: aws
replicaCount: 1

extraObjects:
  - apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: volumeattachments-access
    rules:
    - apiGroups: ["storage.k8s.io"]
      resources: ["volumeattachments"]
      verbs: ["get", "list", "watch"]
  - apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: cluster-autoscaler-volumeattachments-binding
    subjects:
    - kind: ServiceAccount
      name: cluster-autoscaler-sa
      namespace: kube-system
    roleRef:
      kind: ClusterRole
      name: volumeattachments-access
      apiGroup: rbac.authorization.k8s.io

awsRegion: {{metadata.annotations.aws_region}}
autoDiscovery:
  clusterName: {{metadata.annotations.aws_cluster_name}}
rbac:
  serviceAccount:
    name: cluster-autoscaler-sa
    annotations:
       eks.amazonaws.com/role-arn: "{{metadata.annotations.cluster_autoscaler_sa_role_arn}}"

Error Logs:

W0214 14:09:15.647336       1 clusterstate.go:1084] Failed to check cloud provider has instance for ip-10-215-4-59.eu-west-2.compute.internal: node is not present in aws: could not find instance {aws:///eu-west-2a/i-068492674ff8b32a6 i-068492674ff8b32a6}
W0214 14:09:15.647340       1 clusterstate.go:1084] Failed to check cloud provider has instance for ip-10-215-9-69.eu-west-2.compute.internal: node is not present in aws: could not find instance {aws:///eu-west-2b/i-0dfe837fcb45cc00d i-0dfe837fcb45cc00d}
W0214 14:09:15.647347       1 clusterstate.go:1084] Failed to check cloud provider has instance for ip-10-215-12-15.eu-west-2.compute.internal: node is not present in aws: could not find instance {aws:///eu-west-2c/i-071de7e9bd5a797f2 i-071de7e9bd5a797f2}
W0214 14:09:15.647352       1 clusterstate.go:1084] Failed to check cloud provider has instance for ip-10-215-13-23.eu-west-2.compute.internal: node is not present in aws: could not find instance {aws:///eu-west-2c/i-0468b443f93d9182d i-0468b443f93d9182d}
W0214 14:09:15.647361       1 clusterstate.go:1084] Failed to check cloud provider has instance for ip-10-215-13-13.eu-west-2.compute.internal: node is not present in aws: could not find instance {aws:///eu-west-2c/i-03bc772bb2f78c1bf i-03bc772bb2f78c1bf}
W0214 14:09:15.647366       1 clusterstate.go:1084] Failed to check cloud provider has instance for ip-10-215-6-185.eu-west-2.compute.internal: node is not present in aws: could not find instance {aws:///eu-west-2a/i-0ffdfcd664721fb68 i-0ffdfcd664721fb68}

These instances are Self Managed and not being tagged with autodiscovery tag.

Cluster autoscaler works good within the Autoscaling Group that is tagged with autodiscovery tags

Expected Behavior:

  • The Cluster Autoscaler should monitor only the nodes with the appropriate AWS auto-discovery tags.
  • Nodes without the auto-discovery tags should be completely ignored.

Actual Behavior:

  • The autoscaler monitors self-managed nodes without the auto-discovery tags.
  • It generates repetitive errors, impacting our alerting and log clarity.

Steps to Reproduce:

  1. Deploy the Cluster Autoscaler with the above configuration.
  2. Add unmanaged nodes without the discovery tags.
  3. Observe the logs.

Suggested Solution:

  • Ensure the Cluster Autoscaler only considers nodes with the defined discovery tags.
  • Provide a configuration option to ignore nodes without the tags explicitly.

Additional Context:

  • AWS region: eu-west-2
  • Self-managed nodes have no kubernetes.io/cluster/<cluster-name> tags.

Sensitive Data:

  • AWS account IDs and instance IDs have been redacted.

Thank you for your support in resolving this issue.

@ratkokorlevski-rldatix ratkokorlevski-rldatix added the kind/bug Categorizes issue or PR as related to a bug. label Feb 14, 2025
@adrianmoisey
Copy link
Member

/area cluster-autoscaler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants