Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to pull EKS add-on images or SSM with FIPS bottlerocket images in an isolated VPC #4372

Open
JaseKoonce opened this issue Jan 23, 2025 · 6 comments
Labels
status/needs-triage Pending triage or re-evaluation type/bug Something isn't working

Comments

@JaseKoonce
Copy link

Image I'm using:
bottlerocket-aws-k8s-1.31-fips-x86_64-v1.31.0-24c0647f

What I expected to happen:
The worker nodes will be able to pull the neccesary eks-addon images and I will be able to SSM into the worker nodes.

What actually happened:
The pods are unable to pull the necessary images and I am unable to SSM into the worker nodes.

How to reproduce the problem:
Deploy worker nodes using any FIPS enabled bottlerocket image into an isolated VPC and attempt to deploy amazon EKS add-ons or ssm into the worker nodes.

@JaseKoonce JaseKoonce added status/needs-triage Pending triage or re-evaluation type/bug Something isn't working labels Jan 23, 2025
@arnaldo2792
Copy link
Contributor

Hey @JaseKoonce, thanks for letting us know about your problem. Could you please share a few details of your setup?

  • If possible, could you please share the instance system logs? You can retrieve them with the AWS CLI with:
aws ec2 get-console-output --instance-id <INSTANCE_ID> --output text --latest
  • Which region are you using?
  • In regards to your "isolated VPC", how are you isolating the VPC? Are you following what's described in this guide, or a different configuration?

@JaseKoonce
Copy link
Author

Command output (limited to what I believe to be relevant):

  OK  ] Reached target Multi-User System.
[  785.504514] host-containers@control[2646]: time="2025-01-23T15:36:33Z" level=error msg="retries exhausted: failed to resolve reference \"ecr.aws/arn:aws:ecr-fips:us-gov-west-1:<removed>:repository/bottlerocket-control:v0.7.20\": RequestError: send request failed\ncaused by: Post \"https://ecr-fips.us-gov-west-1.amazonaws.com/\": dial tcp 15.200.151.187:443: i/o timeout" ref="ecr.aws/arn:aws:ecr-fips:us-gov-west-1:<removed>:repository/bottlerocket-control:v0.7.20"

Please let me know if you need additional output.

Region: us-gov-west-1

The VPC has 0 egress other than VPC endpoints. We have the following endpoints deployed:

ecr.dkr
ecr-api
ssm
ssm-messages
ec2
ec2messages
sts

@arnaldo2792
Copy link
Contributor

Thanks! We are looking into this!

@arnaldo2792
Copy link
Contributor

Hello @JaseKoonce, could you please request a support case with AWS and ask to be routed to the Bottlerocket maintainers? The problem is not with Bottlerocket but rather with an AWS service.

@JaseKoonce
Copy link
Author

JaseKoonce commented Jan 24, 2025

Thank you @arnaldo2792, we already have one open and went ahead and added this info. Thank you for your time and help.

@arnaldo2792
Copy link
Contributor

Sorry for the late reply, this was a bit of a ride! I figured out why Bottlerocket FIPS AMIs fail when used on private subnets.

ECR doesn't expose ecr-fips endpoints over PrivateLinks (there is a request here). Since Bottlerocket FIPS AMIs are configured with use_fips_endpoint=true, all the calls to pull down container images from ECR will attempt to use the ecr-fips endpoint even if the image URL uses the normal ecr endpoint:

msg="Failed while calling BatchGetImage" error="RequestError: send request failed\ncaused by: Post \"https://ecr-fips.us-west-2.amazonaws.com/\": dial tcp 18.246.116.42:443: i/o timeout" ref="ecr.aws/arn:aws:ecr:us-west-2:328549459982:repository/bottlerocket-control:v0.7.20"

Setting use_fips_endpoint=false solved the problem. After I changed the value, I was able to connect to my instance through SSM since the control container image was pulled down using the ECR PrivateLink endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/needs-triage Pending triage or re-evaluation type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants