Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaking-Test] VPA - pkg/admission-controller/certs_test.go #7831

Open
omerap12 opened this issue Feb 13, 2025 · 11 comments
Open

[Flaking-Test] VPA - pkg/admission-controller/certs_test.go #7831

omerap12 opened this issue Feb 13, 2025 · 11 comments
Assignees
Labels
area/vertical-pod-autoscaler help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/flake Categorizes issue or PR as related to a flaky test. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@omerap12
Copy link
Member

The recent PR #7454 introduced a feature that allows the VPA admission controller to reload the caBundle certificate and patch the webhook. Since its merge, we've observed a flaky test that was added in the same PR.

The test TestUnchangedCAReloader intermittently fails with the following error:

E0212 15:12:52.541801   26749 certs.go:88] "Failed to reload certificate" err="open /tmp/TestKeypairReloader3869426187/001/cert.crt: no such file or directory"
--- FAIL: TestUnchangedCAReloader (1.73s)
    certs_test.go:384: 
        	Error Trace:	/home/runner/work/autoscaler/autoscaler/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/certs_test.go:384
        	Error:      	Should be false
        	Test:       	TestUnchangedCAReloader
        	Messages:   	expected patch to not be called
    certs_test.go:393: 
        	Error Trace:	/home/runner/work/autoscaler/autoscaler/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/certs_test.go:393
        	Error:      	Not equal: 
        	            	expected: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUU5akNDQXQ2Z0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFOTVFzd0NRWURWUVFLRXdKallUQWUKRncweU5UQXlNVEl4TlRFeU5UVmFGdzB5TnpBeU1USXhOVEV5TlRWYU1BMHhDekFKQmdOVkJBb1RBbU5oTUlJQwpJakFOQmdrcWhraUc5dzBCQVFFRkFBT0NBZzhBTUlJQ0NnS0NBZ0VBeCtvREFtQ1BmVEhnODZ1bzlBeWxRREhmCjJCVGJVbUI5a2FUSnNKV3p6bE9BTkg4OEorVkNpTGZrWXJlQUhpc1lxdDlYQkNhd0xRazNHSFJmWVdQdDB2VlAKNDFPZXhFaXBRdkY5SWdDcFo4ZGFUZlZCYVh2YnBkaUh2NVdoak5vWlVROUdyWjFKQzVnMlpuUmhTcjd4VE03QQpSMnRHYTZOc2dqY3ZOVk80dFRqTW0wSTdRWk85NXVILzR0cWNRZ0tUOVIzbVFZbkFCaTNaL3R6WE[45](https://github.com/kubernetes/autoscaler/actions/runs/13288454108/job/37102891947#step:8:46)cDQwZmU3CktoeU51MU9valM2QUlidVA4eGlyRjlJRFF5a0RveUY0SDdVbVNwNTRGOEtqMUNybnl1dlFyanZ2dFlFVFFYTUsKa1hDUWVSc21YWGN5aWQzRm00MlIwdWdRTFQ2bXB0VThiZ010VlZrbWxubVBGNnJna1ZldWZ6M1lXczFrMExLSwp3QkpFQ1lzZE1tdEVFOHVoeEx4ek5tUnByV1lYUFhjZFhQdmVPaFYwaTBIa3BKQkVMRmk2bW1pMWNqaWpuOGJQCllqZFJVSTZHejlVRUFkU2ZxaFR6d2oxNXkzTGJheHFJY3UzK0xXRjlLeEx3ZWxKYzNhaXpZbDlBQWlUN0NTWWcKYjFkYS9RSEQ2VWVEY3pKYWZRbDMwTEt3amxXZVp6MUh3UEhaYXRkZzJjZUt4NnFhZnB0NXBPamgwN0c3K2poNQphaDRUZFcyRDRvempKSHgzdTdXa0w4TGN5R1FoNEdWNzBjZ1V2SlNZSkxKd205cS9OTmhIMG5rWUVCV2RiVFVXCmRUbDVwNTBqdTI2T3ZXSnZDblZaK2RERE1jS0M1aFg5eDlycW9vc0NpKzA1NHBDRTY1Sy8vMWppOHNaTUo5NHEKK0xKSGpuRVZXSnhRL0RleHp3MENBd0VBQWFOaE1GOHdEZ1lEVlIwUEFRSC9CQVFEQWdLRU1CMEdBMVVkSlFRVwpNQlFHQ0NzR0FRVUZCd01DQmdnckJnRUZCUWNEQVRBUEJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXCkJCUkRnb1p6a1VaaWtiZWx3MTM3amVaU0pWNEhkVEFOQmdrcWhraUc5dzBCQVFzRkFBT0NBZ0VBaHVXWDlLTUIKb1lJeWhTWkpwcXpGSzE0NFVybE1MVjA3bjFaUVRKRXRLM3dCS3locGo0d2hNZ1g0TEJhMEtQUHlXTHNWRWE2eQpadlltd2RXMWFUQ0c1ckV3bjVkWER1TzBQUWdTM3ErTHRlNTdtTUJKMlU1UG02OSt0QVNOZWMyMjI2anViZFJ6Cll6TFdRTSt1Z1NRUWVLeldKbUVKSnd2ZTJqbUorcWxDKzlXMHAxbE8raktHdWQvV1pHYXUwS2d0eHFmU2xMamMKUm9vQU9yUnNVTFlpL3d3eitncFlpT1VrU3FRLzdTcEJuYTI2NGZreVRRVlp6d2pFUmx0VXA0Rm5ramZFY3NUUwpJMmxkZmUvLzlwcmM0bnN2THc5dVQrOGYvVmRBSk56VGhNeS9tRmhJMFZmeUZDY0w1bTRLc2c2aEh5cmxFaFgyCkFLNHd3NUpWYUZITnVWektDVjhOOG45NlFMUC9paGg4eW5Rb3Nwd3MvYU5DWnFWNmtxWURETWlDQmJyUEc1ak8KTFE0UXNmMzRTUVBDU05nTjRTYzYwV3RrcURuNmlpcXRoQlEyRlpPODNTajFWNFIrUVFGNk1vK1JFa3QwRTZDMwpncU9Xb1JhQWpGcFMvekl5WUlRaXE4WVJXY2dhSlF1Mko4dTNnaUF6bm1EYVIrL2Q3SGxkLzg4dG1qWDlRMjYrCnFZZENxeE5Yci9paXF6OWo1Z0h2QjhSZnRjV2tLMkE3aTV6NGo3WlpYdmFNdmlLWm83N1NRS1pFVXN4NDJ4WVMKUFBlWGFRYWVZb3FCZWlUOEJwOGpSTEUzYUxqalZIUnhGUk8wcFUxRlNDRnJWOVdtQWVtWEJIUXp0WFFoeVZtdAp6QVV2WDArUW91WS9WanFHbS82MDErMVVuL1AxZzVJaVRTZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo="
        	            	actual  : ""
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUU5akNDQXQ2Z0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFOTVFzd0NRWURWUVFLRXdKallUQWUKRncweU5UQXlNVEl4TlRFeU5UVmFGdzB5TnpBeU1USXhOVEV5TlRWYU1BMHhDekFKQmdOVkJBb1RBbU5oTUlJQwpJakFOQmdrcWhraUc5dzBCQVFFRkFBT0NBZzhBTUlJQ0NnS0NBZ0VBeCtvREFtQ1BmVEhnODZ1bzlBeWxRREhmCjJCVGJVbUI5a2FUSnNKV3p6bE9BTkg4OEorVkNpTGZrWXJlQUhpc1lxdDlYQkNhd0xRazNHSFJmWVdQdDB2VlAKNDFPZXhFaXBRdkY5SWdDcFo4ZGFUZlZCYVh2YnBkaUh2NVdoak5vWlVROUdyWjFKQzVnMlpuUmhTcjd4VE03QQpSMnRHYTZOc2dqY3ZOVk80dFRqTW0wSTdRWk85NXVILzR0cWNRZ0tUOVIzbVFZbkFCaTNaL3R6WE45cDQwZmU3CktoeU51MU9valM2QUlidVA4eGlyRjlJRFF5a0RveUY0SDdVbVNwNTRGOEtqMUNybnl1dlFyanZ2dFlFVFFYTUsKa1hDUWVSc21YWGN5aWQzRm00MlIwdWdRTFQ2bXB0VThiZ010VlZrbWxubVBGNnJna1ZldWZ6M1lXczFrMExLSwp3QkpFQ1lzZE1tdEVFOHVoeEx4ek5tUnByV1lYUFhjZFhQdmVPaFYwaTBIa3BKQkVMRmk2bW1pMWNqaWpuOGJQCllqZFJVSTZHejlVRUFkU2ZxaFR6d2oxNXkzTGJheHFJY3UzK0xXRjlLeEx3ZWxKYzNhaXpZbDlBQWlUN0NTWWcKYjFkYS9RSEQ2VWVEY3pKYWZRbDMwTEt3amxXZVp6MUh3UEhaYXRkZzJjZUt4NnFhZnB0NXBPamgwN0c3K2poNQphaDRUZFcyRDRvempKSHgzdTdXa0w4TGN5R1FoNEdWNzBjZ1V2SlNZSkxKd205cS9OTmhIMG5rWUVCV2RiVFVXCmRUbDVwNTBqdTI2T3ZXSnZDblZaK2RERE1jS0M1aFg5eDlycW9vc0NpKzA1NHBDRTY1Sy8vMWppOHNaTUo5NHEKK0xKSGpuRVZXSnhRL0RleHp3MENBd0VBQWFOaE1GOHdEZ1lEVlIwUEFRSC9CQVFEQWdLRU1CMEdBMVVkSlFRVwpNQlFHQ0NzR0FRVUZCd01DQmdnckJnRUZCUWNEQVRBUEJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXCkJCUkRnb1p6a1VaaWtiZWx3MTM3amVaU0pWNEhkVEFOQmdrcWhraUc5dzBCQVFzRkFBT0NBZ0VBaHVXWDlLTUIKb1lJeWhTWkpwcXpGSzE0NFVybE1MVjA3bjFaUVRKRXRLM3dCS3locGo0d2hNZ1g0TEJhMEtQUHlXTHNWRWE2eQpadlltd2RXMWFUQ0c1ckV3bjVkWER1TzBQUWdTM3ErTHRlNTdtTUJKMlU1UG02OSt0QVNOZWMyMjI2anViZFJ6Cll6TFdRTSt1Z1NRUWVLeldKbUVKSnd2ZTJqbUorcWxDKzlXMHAxbE8raktHdWQvV1pHYXUwS2d0eHFmU2xMamMKUm9vQU9yUnNVTFlpL3d3eitncFlpT1VrU3FRLzdTcEJuYTI2NGZreVRRVlp6d2pFUmx0VXA0Rm5ramZFY3NUUwpJMmxkZmUvLzlwcmM0bnN2THc5dVQrOGYvVmRBSk56VGhNeS9tRmhJMFZmeUZDY0w1bTRLc2c2aEh5cmxFaFgyCkFLNHd3NUpWYUZITnVWektDVjhOOG45NlFMUC9paGg4eW5Rb3Nwd3MvYU5DWnFWNmtxWURETWlDQmJyUEc1ak8KTFE0UXNmMzRTUVBDU05nTjRTYzYwV3RrcURuNmlpcXRoQlEyRlpPODNTajFWNFIrUVFGNk1vK1JFa3QwRTZDMwpncU9Xb1JhQWpGcFMvekl5WUlRaXE4WVJXY2dhSlF1Mko4dTNnaUF6bm1EYVIrL2Q3SGxkLzg4dG1qWDlRMjYrCnFZZENxeE5Yci9paXF6OWo1Z0h2QjhSZnRjV2tLMkE3aTV6NGo3WlpYdmFNdmlLWm83N1NRS1pFVXN4NDJ4WVMKUFBlWGFRYWVZb3FCZWlUOEJwOGpSTEUzYUxqalZIUnhGUk8wcFUxRlNDRnJWOVdtQWVtWEJIUXp0WFFoeVZtdAp6QVV2WDArUW91WS9WanFHbS82MDErMVVuL1AxZzVJaVRTZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
        	            	+
        	Test:       	TestUnchangedCAReloader
        	Messages:   	expected CA to not change
E0212 15:12:57.508017   267[49](https://github.com/kubernetes/autoscaler/actions/runs/13288454108/job/37102891947#step:8:50) config.go:151] "Unable to parse webhook labels" err="labels 'foo,bar' are invalid, the format should be: 'key1:value1,key2:value2'"
FAIL

A failure from a recent run can be seen here: GitHub Actions Run.

It appears the test is failing due to intermittent certificate reload issues and unexpected patch invocations. Further investigation is needed to determine the root cause and stabilize the test.

/area vertical-pod-autoscaler
/kind flake

@k8s-ci-robot k8s-ci-robot added area/vertical-pod-autoscaler kind/flake Categorizes issue or PR as related to a flaky test. labels Feb 13, 2025
@omerap12
Copy link
Member Author

cc @adrianmoisey

@omerap12
Copy link
Member Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Feb 13, 2025
@adrianmoisey
Copy link
Member

E0212 15:12:52.541801   26749 certs.go:88] "Failed to reload certificate" err="open /tmp/TestKeypairReloader3869426187/001/cert.crt: no such file or directory"

This seems strange. I looked into the code that was writing the cert, and it seems fine to me. I also checked the filesystem on GitHub Actions at /tmp, and it's a regular ext4 filesystem.

I tried to reproduce this locally, but couldn't get it to fail

@omerap12
Copy link
Member Author

@adrianmoisey could it be something related to the runner's OS? We're using ubuntu-latest. Have you tried reproducing it locally with this architecture? If not, I can set up a VM and test it there.

@adrianmoisey
Copy link
Member

@adrianmoisey could it be something related to the runner's OS? We're using ubuntu-latest. Have you tried reproducing it locally with this architecture? If not, I can set up a VM and test it there.

I highly doubt it, since it's a VM running a stable OS and a stable filesystem.

One option is to run a test, in a loop, on GitHub Actions. That could help test it

@omerap12
Copy link
Member Author

@adrianmoisey could it be something related to the runner's OS? We're using ubuntu-latest. Have you tried reproducing it locally with this architecture? If not, I can set up a VM and test it there.

I highly doubt it, since it's a VM running a stable OS and a stable filesystem.

One option is to run a test, in a loop, on GitHub Actions. That could help test it

Good idea

@adrianmoisey
Copy link
Member

Another similar flake: https://github.com/kubernetes/autoscaler/actions/runs/13314589637/job/37185375732?pr=7834

E0213 18:35:01.939167   25032 certs.go:88] "Failed to reload certificate" err="tls: private key does not match public key"
--- FAIL: TestUnchangedCAReloader (1.65s)
    certs_test.go:384: 
        	Error Trace:	/home/runner/work/autoscaler/autoscaler/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/certs_test.go:384
        	Error:      	Should be false
        	Test:       	TestUnchangedCAReloader
        	Messages:   	expected patch to not be called
E0213 18:35:12.790110   2[50](https://github.com/kubernetes/autoscaler/actions/runs/13314589637/job/37185375732?pr=7834#step:8:51)32 config.go:151] "Unable to parse webhook labels" err="labels 'foo,bar' are invalid, the format should be: 'key1:value1,key2:value2'"
FAIL
FAIL	k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller	23.299s

@omerap12
Copy link
Member Author

omerap12 commented Feb 17, 2025

Ill try to dig into this in the future but any help will be appreciated.
/help

@k8s-ci-robot
Copy link
Contributor

@omerap12:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Ill try to dig into in the future this but any help will be appreciated.
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Feb 17, 2025
@adrianmoisey
Copy link
Member

This happened again: https://github.com/kubernetes/autoscaler/actions/runs/13444469376/job/37566506181?pr=7813

Similar flake to the previous one:

E0220 20:58:31.252767   24830 certs.go:88] "Failed to reload certificate" err="tls: failed to find any PEM data in certificate input"
--- FAIL: TestUnchangedCAReloader (3.49s)
    certs_test.go:384: 
        	Error Trace:	/home/runner/work/autoscaler/autoscaler/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/certs_test.go:384
        	Error:      	Should be false
        	Test:       	TestUnchangedCAReloader
        	Messages:   	expected patch to not be called
E0220 20:58:42.779159   2[48](https://github.com/kubernetes/autoscaler/actions/runs/13444469376/job/37566506181?pr=7813#step:8:49)30 config.go:151] "Unable to parse webhook labels" err="labels 'foo,bar' are invalid, the format should be: 'key1:value1,key2:value2'"
FAIL

@adrianmoisey
Copy link
Member

/assign

I'm intrigued now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/flake Categorizes issue or PR as related to a flaky test. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants