Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE Failover - moving Alias IP ranges #26

Open
leeanntsab opened this issue Dec 12, 2024 · 1 comment
Open

CFE Failover - moving Alias IP ranges #26

leeanntsab opened this issue Dec 12, 2024 · 1 comment

Comments

@leeanntsab
Copy link

leeanntsab commented Dec 12, 2024

Describe the bug

This is not necessarily a bug but more of an issue the customer is experiencing in GCP.

The customer has deployed two big-ip instances in GCP using GDM template. CFE is deployed using tags/labels to identify resources for failover. Recently customer has experienced a network outage on GCP, which triggered CFE to failover peer A to peer B. The failover did not occur. What the customer observed in the GCP console is the alias IP ranges were removed successfully from the failed instance but were never added to the new active instance. Upon researching the GCP log, the customer saw invalid fingerprint errors during the failed failover event. Per Google support, this means that there are multiple requests to the resource and/or the resource is being used by another process. In addition to this we also observed errors in restnoded IP x.x.x.x is already being used by another resource.

Expected behavior

The expected behavior is CFE process should move the alias IP range from the failed instance to the new active instance by removing the entries on the failed instance, and then adding the entries to the new active instance.

Current behavior

This step of the CFE process does not happen. The customer observe the removal of the alias IP entries but it is never added to the new active instance. Thus traffic never failover to the peer.

Possible solution

From troubleshooting with the customer, and Matt (internal). They discover the issue is with timing mechanism inside of GCP that allows for the alias IP to move. To allow for this extra timing, the customer suggest to add a sleep or delay timer in the CFE script of about 30 seconds before adding the alias IPs to the new peer.

Steps to reproduce

You can reproduce this by configuring two big-ip instances in active/standby with CFE in GCP using any of the GDM templates. Assign one or more alias IP ranges to the instances, and force standby on the current active. You can observe the behavior mentioned above.

@mikeshimkus
Copy link
Contributor

@leeanntsab Please open a case with F5 if this is still an issue (if you haven't already).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants