Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_eks: Cluster creation with AlbControllerOptions is running into error in China Region #28420

Closed
YikaiHu opened this issue Dec 19, 2023 · 3 comments
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. effort/medium Medium work item – several days of effort p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@YikaiHu
Copy link

YikaiHu commented Dec 19, 2023

Describe the bug

Same as #22005, but in China Region.

We found that It may be caused by image can not be pulled in cn-north-1 region.

Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": rpc error: code = Unknown desc = failed to pull and unpack image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": failed to resolve reference "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": pulling from host 602401143452.dkr.ecr.us-west-2.amazonaws.com failed with status code [manifests v2.4.1]: 401 Unauthorized

Expected Behavior

Creation of the custom resource Custom::AWSCDK-EKS-HelmChart to be succesfull

Current Behavior

Custom::AWSCDK-EKS-HelmChart is running into error "Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress' "

Reproduction Steps

k logs aws-load-balancer-controller-75c785bc8c-72zpg -n kube-system

Error from server (BadRequest): container "aws-load-balancer-controller" in pod "aws-load-balancer-controller-75c785bc8c-72zpg" is waiting to start: trying and failing to pull image

kubectl describe pod aws-load-balancer-controller-75c785bc8c-72zpg -n kube-system

Name: aws-load-balancer-controller-75c785bc8c-72zpg
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: aws-load-balancer-controller
Node: ip-10-0-3-136.cn-north-1.compute.internal/10.0.3.136
Start Time: Mon, 17 Jul 2023 16:30:59 +0800
Labels: app.kubernetes.io/instance=aws-load-balancer-controller
app.kubernetes.io/name=aws-load-balancer-controller
pod-template-hash=75c785bc8c
Annotations: kubernetes.io/psp: eks.privileged
prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Pending
IP: 10.0.3.160
IPs:
IP: 10.0.3.160
Controlled By: ReplicaSet/aws-load-balancer-controller-75c785bc8c
Containers:
aws-load-balancer-controller:
Container ID:
Image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
Image ID:
Ports: 9443/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/controller
Args:
--cluster-name=Workshop-Cluster
--ingress-class=alb
--aws-region=cn-north-1
--aws-vpc-id=vpc-0e4a9201452c76b0e
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Liveness: http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
Environment:
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: cn-north-1
AWS_REGION: cn-north-1
AWS_ROLE_ARN: arn:aws-cn:iam::743271379588:role/clo-workshop-07-CLWorkshopEC2AndEKSeksClusterStack-1XO6CGEC91JGY
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jct6t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
cert:
Type: Secret (a volume populated by a Secret)
SecretName: aws-load-balancer-tls
Optional: false
kube-api-access-jct6t:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 16m default-scheduler Successfully assigned kube-system/aws-load-balancer-controller-75c785bc8c-72zpg to ip-10-0-3-136.cn-north-1.compute.internal
Normal Pulling 14m (x4 over 16m) kubelet Pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1"
Warning Failed 14m (x4 over 16m) kubelet Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": rpc error: code = Unknown desc = failed to pull and unpack image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": failed to resolve reference "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": pulling from host 602401143452.dkr.ecr.us-west-2.amazonaws.com failed with status code [manifests v2.4.1]: 401 Unauthorized
Warning Failed 14m (x4 over 16m) kubelet Error: ErrImagePull
Warning Failed 14m (x6 over 16m) kubelet Error: ImagePullBackOff
Normal BackOff 87s (x62 over 16m) kubelet Back-off pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1"

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.40.0

Framework Version

No response

Node.js Version

16.17.0

OS

macos 12.5.1

Language

TypeScript

Language Version

No response

Other information

013241004608.dkr.ecr.us-gov-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
151742754352.dkr.ecr.us-gov-east-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
558608220178.dkr.ecr.me-south-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
590381155156.dkr.ecr.eu-south-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-northeast-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-northeast-3.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-southeast-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-central-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-north-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-west-3.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.sa-east-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.us-east-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.us-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
800184023465.dkr.ecr.ap-east-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
877085696533.dkr.ecr.af-south-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/amazon/aws-load-balancer-controller:v2.4.1
961992271922.dkr.ecr.cn-northwest-1.amazonaws.com.cn/amazon/aws-load-balancer-controller:v2.4.1

Find a solution in kubernetes-sigs/aws-load-balancer-controller#1694, you can manually replace the ecr template url in cloudformation.

https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases?page=2

@YikaiHu YikaiHu added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 19, 2023
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Dec 19, 2023
@pahud
Copy link
Contributor

pahud commented Dec 19, 2023

Thank you for your feedback. For China regions, I guess we should either replace the image with the solution you provided as a workaround. Thank you.

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Dec 19, 2023
@pahud
Copy link
Contributor

pahud commented Dec 19, 2023

@YikaiHu Looks like you can specify the optional image URI in here?

readonly repository?: string;

@pahud pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 19, 2023
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. effort/medium Medium work item – several days of effort p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

2 participants