Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time sync woes, CockroachDB cluster failure following add/expunge testing #7534

Open
davepacheco opened this issue Feb 13, 2025 · 14 comments
Open
Milestone

Comments

@davepacheco
Copy link
Collaborator

I wanted to record here in GitHub the investigation of several related issues in the dublin test environment today. For folks within Oxide, this information came from this document but I'm intending to include all the relevant bits here.

Context:

  • this is a Racklette environment with four sleds in cubbies 14, 15, 16, and 17.
  • @leftwo was doing many rounds of sled add/expunge testing, expunging and then re-adding sleds 15 and 17 in a loop. Sleds 14 and 16 have never been expunged.
  • Problems showed up after at least 7 loops and several hours.

Initial symptoms: after a few hours, the CockroachDB cluster was found offline:

Feb 12 14:07:07.794   sled_id=$(omdb db sleds | grep BRM23230010 | awk '{print $7}')
Feb 12 14:07:07.794 WARN Failed to make connection, backend: [fd00:1122:3344:101::3]:32221, error: could not connect to server: Connection refused                                                                             
        Is the server running on host "fd00:1122:3344:101::3" and accepting                                    
        TCP/IP connections on port 32221?

CockroachDB on sled 14 is offline, with the SMF service in maintenance:

svc:/oxide/cockroachdb:default (CockroachDB)
  Zone: oxz_cockroachdb_46a676ec-b910-44d2-8a68-e3058b33e74e
 State: maintenance since Wed Feb 12 09:23:19 2025
Reason: Restarting too quickly.
   See: http://illumos.org/msg/SMF-8000-L5
   See: /pool/ext/5026d687-4899-4bd8-a059-d92bf3dcad5e/crypt/zone/oxz_cockroachdb_46a676ec-b910-44d2-8a68-e3058b33e74e/root/var/svc/log/oxide-cockroachdb:default.log
Impact: This service is not running.

It took some digging to find the real problem from the logs, but ultimately it was in /data/logs/cockroach.oxzcockroachdb46a676ec-b910-44d2-8a68-e3058b33e74e.root.2025-02-12T09_23_18Z.021171.log:

...
F250212 09:23:18.853798 705 1@server/server.go:248 <E2><8B><AE> [n1] 40  clock synchronization error: this node is more than 500ms away from at least half of the known node
s (1 of 2 are within the offset)
...

Here's a summary of what we believe happened. I'll put more details in separate comments (for others who helped with the investigation, please add your notes too!):

  1. The system initially set up successfully.
    • The two boundary NTP zones were deployed on sleds 14 and 15.
    • 5 CockroachDB nodes were deployed on sled 14 (1 node), sled 16 (2 nodes), and sled 15 and 17 (2 nodes between them).
  2. Shortly after setup, concurrently with the first sled being expunged (sled 17), the boundary NTP server on sled 14 lost external connectivity. This is not yet understood.
  3. Several more sled add/expunge cycles completed, apparently successfully: sled 15, then sled 17 again, then sled 15 again, then sled 17 again, etc. Each time a sled was expunged, it was re-added. As part of these operations, the boundary NTP zone that had started on sled 17 wound up bouncing between sleds 15 and 17: when the sled hosting this zone was expunged, a replacement boundary NTP zone got placed onto the sled that had just been re-added. Note that when this happened, that sled previously had an internal NTP zone, so time was already sync'd with one of the two boundary NTP zones.
  4. Although everything appeared to be working, in at least the last of these loops, it seems that the newly-deployed boundary NTP zone also had no external network connectivity, just like the one on sled 14. It appears that in step 3, when the system with an internal NTP zone (with time sync'd up) had that internal NTP zone replaced with a boundary NTP zone, the system clock immediately began drifting at about 47 ppm (about 0.00047ms per second).
  5. In the specific case we looked into:
    • Sled 14 continued to host one boundary NTP server. Its clock appears to have remained in sync through all of this.
    • Sled 17 had the other boundary NTP server. Its clock was drifting somewhat quickly from real time.
    • For reasons not yet understood, sled 16's internal NTP server was tracking sled 17's boundary NTP rather than sled 14's.
    • Recall that for the CockroachDB cluster: one node was on sled 14, tracking sled 14's time (which was tracking the correct time); the remaining four nodes were on sled 16 and 17, both tracking sled 17's time (which was drifting).
  6. After about 2h15m, the drift exceeded Cockroach's tolerances. CockroachDB on sled 14 was the outlier (if ironically the only correct one). It shut itself down, resulting in the initial symptoms reported above.
  7. At this point, the cluster was still healthy, but down to only 4 of the usual 5 nodes. 3 nodes are required for the cluster to be available, even for reads.
  8. After a few more hours, expungement testing resumed by clean-slate'ing sled 17. This took out two CockroachDB nodes, reducing the cluster to 2 working nodes. 2 < 3, so the cluster ground to a halt.

It was around this time we started debugging.

  • I believe at this point, the clocks were pretty much sync'd up again. I'm guessing this happened because when sled 17 was removed, sled 16 had to start using sled 14 for its upstream NTP, so 3 of the 3 possible CockroachDB nodes were tracking the same clock.
  • We used svcadm clear to bring up the CockroachDB node on sled 14 and it came up fine.

There remain a few open questions:

  • Why did the boundary NTP zones lose external network connectivity, starting when one unrelated sled was expunged?
  • Why did sled 16 track sled 17's clock, even though sled 17 was reporting itself as stratum 11 (known not to be sync'd with upstream) while sled 14 was reporting stratum 4?

There are a a few issues we probably want to file here:

  • Whatever's causing the connectivity to be lost
  • It seems like this configuration of boundary NTP servers makes possible a split-brain situation that can put the CockroachDB cluster at risk. Might we want to configure things differently so that if one boundary NTP zone can't sync with upstream, but internally everything is working, the rack is still guaranteed to have a consistent time?
@davepacheco
Copy link
Collaborator Author

Collecting sled identities

Sled expungements:

root@[fd00:1122:3344:103::3]:32221/omicron> select id,serial_number,time_modified from sled WHERE sled_policy = 'expunged' ORDER BY time_modified;
               	id              	| serial_number |     	time_modified
---------------------------------------+---------------+--------------------------------
  6d5906e3-6d45-4dd4-836c-b3d3876186c4 | BRM23230010   | 2025-02-11 22:36:34.459588+00
  8fa59c49-b8f9-4ebc-915a-6d4ad47a2c5f | BRM27230037   | 2025-02-11 23:51:48.032935+00
  68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858 | BRM23230010   | 2025-02-12 01:47:44.908351+00
  17f710e3-37b6-44c0-86ee-a979967071d8 | BRM27230037   | 2025-02-12 03:47:35.455577+00
  e6bfacbc-ac31-475b-865a-8e0a55aeecde | BRM23230010   | 2025-02-12 05:29:09.924015+00
  b36d653a-c4c1-45f7-8a03-79be17781917 | BRM27230037   | 2025-02-12 06:47:23.609849+00
(6 rows)

Cubby numbers:

root@oxz_switch0:~# pilot host list
CUBBY IP                    	SERIAL  	IMAGE
14	fe80::aa40:25ff:fe04:351  BRM42220026 ci e5d74e5/e083ffd 2025-02-11 18:42
15	fe80::aa40:25ff:fe04:614  BRM27230037 ci e5d74e5/e083ffd 2025-02-11 18:42
16	fe80::aa40:25ff:fe04:6d3  BRM23230018 ci e5d74e5/e083ffd 2025-02-11 18:42

(BRM23230010 is currently powered off and presumed to be sled 17)

root@[fd00:1122:3344:103::3]:32221/omicron> select s.id,s.serial_number,s.time_modified,to_hex(sa.subnet_octet),s.ip from sled s LEFT JOIN sled_underlay_subnet_allocation s
a ON (s.id = sa.sled_id) WHERE s.sled_policy = 'expunged' ORDER BY time_modified;
               	id              	| serial_number |     	time_modified     	| to_hex |      	ip
---------------------------------------+---------------+-------------------------------+--------+------------------------
  6d5906e3-6d45-4dd4-836c-b3d3876186c4 | BRM23230010   | 2025-02-11 22:36:34.459588+00 | NULL   | fd00:1122:3344:104::1
  8fa59c49-b8f9-4ebc-915a-6d4ad47a2c5f | BRM27230037   | 2025-02-11 23:51:48.032935+00 | NULL   | fd00:1122:3344:102::1
  68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858 | BRM23230010   | 2025-02-12 01:47:44.908351+00 | 21 	| fd00:1122:3344:121::1
  17f710e3-37b6-44c0-86ee-a979967071d8 | BRM27230037   | 2025-02-12 03:47:35.455577+00 | 22 	| fd00:1122:3344:122::1
  e6bfacbc-ac31-475b-865a-8e0a55aeecde | BRM23230010   | 2025-02-12 05:29:09.924015+00 | 23 	| fd00:1122:3344:123::1
  b36d653a-c4c1-45f7-8a03-79be17781917 | BRM27230037   | 2025-02-12 06:47:23.609849+00 | 24 	| fd00:1122:3344:124::1
(6 rows)

Live sleds:

root@oxz_switch0:~# omdb db sleds
note: using database URL postgresql://root@[fd00:1122:3344:101::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (124.0.0)
note: listing all commissioned sleds (use -F to filter, e.g. -F in-service)
 SERIAL   	IP                         	ROLE  	POLICY  	STATE   ID                              	 
 BRM27230037  [fd00:1122:3344:126::1]:12345  -     	in service  active  2ea49c14-47a1-46c8-abcc-67128965d58e
 BRM23230018  [fd00:1122:3344:103::1]:12345  scrimlet  in service  active  3d3e5b66-4b0e-49a1-b694-f62e96f57fd1
 BRM23230010  [fd00:1122:3344:125::1]:12345  -     	in service  active  9f15aad3-1a92-4c68-bc06-87159fab5764
 BRM42220026  [fd00:1122:3344:101::1]:12345  scrimlet  in service  active  e755cbaf-e092-4208-b202-85a56f5a318b

CockroachDB node identities:

root@oxz_cockroachdb_46a676ec:~# cockroach node status --insecure --host [fd00:1122:3344:103::3]:32221
  id |        	address         	|      	sql_address       	|      	build      	|     	started_at     	|     	updated_at     	| locality | is_available | is_live
-----+--------------------------------+--------------------------------+-------------------------+----------------------------+----------------------------+----------+--------------+----------
   1 | [fd00:1122:3344:101::3]:32221  | [fd00:1122:3344:101::3]:32221  | v22.1.22-29-g865aff1595 | 2025-02-12 18:06:49.245464 | 2025-02-12 20:03:49.733492 |      	| true     	| true
   2 | [fd00:1122:3344:103::3]:32221  | [fd00:1122:3344:103::3]:32221  | v22.1.22-29-g865aff1595 | 2025-02-11 21:54:19.44737  | 2025-02-12 20:03:51.030739 |      	| true     	| true
   6 | [fd00:1122:3344:103::21]:32221 | [fd00:1122:3344:103::21]:32221 | v22.1.22-29-g865aff1595 | 2025-02-11 22:36:12.93359  | 2025-02-12 20:03:48.872761 |      	| true     	| true
  16 | NULL                       	| NULL                       	| NULL                	| NULL                   	| 2025-02-12 18:08:07.539978 | NULL 	| false    	| false
  17 | NULL                       	| NULL                       	| NULL                	| NULL                   	| 2025-02-12 18:08:07.53998  | NULL 	| false    	| false
(5 rows)
root@[fd00:1122:3344:103::3]:32221/omicron> select m.crdb_node_id,bpz.sled_id,s.serial_number from cockroachdb_zone_id_to_node_id m, bp_omicron_zone bpz, sled s WHERE crdb_node_id IN ('16', '17') AND m.omicron_zone_id = bpz.id AND bpz.sled_id = s.id AND bpz.blueprint_id = 'e79910a5-85e3-419a-a550-5c29e3c91807';
  crdb_node_id |           	sled_id            	| serial_number
---------------+--------------------------------------+----------------
  17       	| 9f15aad3-1a92-4c68-bc06-87159fab5764 | BRM23230010
  16       	| 9f15aad3-1a92-4c68-bc06-87159fab5764 | BRM23230010
(2 rows)

History of sled 15 in particular:

root@[fd00:1122:3344:103::3]:32221/omicron> select id,time_created,sled_policy from sled where serial_number = 'BRM27230037' ORDER BY time_created;
               	id              	|     	time_created      	| sled_policy
---------------------------------------+-------------------------------+--------------
  8fa59c49-b8f9-4ebc-915a-6d4ad47a2c5f | 2025-02-11 21:55:23.851457+00 | expunged
  17f710e3-37b6-44c0-86ee-a979967071d8 | 2025-02-12 01:25:48.829228+00 | expunged
  b36d653a-c4c1-45f7-8a03-79be17781917 | 2025-02-12 05:01:28.855347+00 | expunged
  2ea49c14-47a1-46c8-abcc-67128965d58e | 2025-02-12 06:54:00.373544+00 | in_service
(4 rows)

History of sled 17 in particular:

root@[fd00:1122:3344:103::3]:32221/omicron> select id,time_created,sled_policy from sled where serial_number = 'BRM23230010' ORDER BY time_created;
               	id              	|     	time_created      	| sled_policy
---------------------------------------+-------------------------------+--------------
  6d5906e3-6d45-4dd4-836c-b3d3876186c4 | 2025-02-11 21:55:23.993512+00 | expunged
  68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858 | 2025-02-11 23:37:11.082565+00 | expunged
  e6bfacbc-ac31-475b-865a-8e0a55aeecde | 2025-02-12 03:22:00.291124+00 | expunged
  9f15aad3-1a92-4c68-bc06-87159fab5764 | 2025-02-12 06:27:17.320269+00 | in_service
(4 rows)

@davepacheco
Copy link
Collaborator Author

Blueprint history

root@[fd00:1122:3344:103::3]:32221/omicron> select t.version,t.time_made_target,t.blueprint_id,t.enabled,bp.comment from bp_target t, blueprint bp WHERE t.blueprint_id = bp.id ORDER BY version;
  version |   	time_made_target    	|         	blueprint_id         	| enabled |                                                                      	comment
----------+-------------------------------+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
    	1 | 2025-02-11 21:55:27.784198+00 | 28c2018d-1876-40de-9eba-5f50cea8ad25 |  false  | initial blueprint from rack setup
    	2 | 2025-02-11 22:35:43.546722+00 | 74d9e382-02cf-4fd8-9861-104c06e50cc7 |  true   | sled 6d5906e3-6d45-4dd4-836c-b3d3876186c4 expunged (expunged 10 disks, 47 datasets, 15 zones)
    	3 | 2025-02-11 22:48:58.715662+00 | 74d9e382-02cf-4fd8-9861-104c06e50cc7 |  false  | sled 6d5906e3-6d45-4dd4-836c-b3d3876186c4 expunged (expunged 10 disks, 47 datasets, 15 zones)
    	4 | 2025-02-11 23:38:46.914794+00 | 843bae02-1e70-4bdd-a08b-cac81fa363de |  true   | sled 68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858: added 10 disks, updated 0, removed 0 disks, sled 68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858: added zone: boundary_ntp
    	5 | 2025-02-11 23:41:40.114065+00 | 5d2870c4-c2cc-46a6-bfca-c205824aeca1 |  true   | sled 68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858: added zone: crucible
    	6 | 2025-02-11 23:50:00.08244+00  | 4c6c66a2-9326-4038-84f1-14fcd831c27f |  true   | sled 8fa59c49-b8f9-4ebc-915a-6d4ad47a2c5f expunged (expunged 10 disks, 51 datasets, 17 zones)
    	7 | 2025-02-12 00:09:35.448679+00 | 4c6c66a2-9326-4038-84f1-14fcd831c27f |  false  | sled 8fa59c49-b8f9-4ebc-915a-6d4ad47a2c5f expunged (expunged 10 disks, 51 datasets, 17 zones)
    	8 | 2025-02-12 01:29:24.262657+00 | 644dd4b3-9a85-435c-9f28-8b86c07aca28 |  true   | sled 17f710e3-37b6-44c0-86ee-a979967071d8: added 10 disks, updated 0, removed 0 disks, sled 17f710e3-37b6-44c0-86ee-a979967071d8: added zone: boundary_ntp
    	9 | 2025-02-12 01:34:04.76048+00  | a994a01f-42c6-4849-a89f-cb380fccabba |  true   | sled 17f710e3-37b6-44c0-86ee-a979967071d8: added zone: crucible
   	10 | 2025-02-12 01:34:56.424285+00 | a994a01f-42c6-4849-a89f-cb380fccabba |  false  | sled 17f710e3-37b6-44c0-86ee-a979967071d8: added zone: crucible
   	11 | 2025-02-12 01:45:11.140121+00 | b1d59267-20d3-4e0e-b7bb-5e7ea129d56f |  true   | sled 68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858 expunged (expunged 10 disks, 51 datasets, 17 zones)
   	12 | 2025-02-12 01:48:19.627434+00 | b1d59267-20d3-4e0e-b7bb-5e7ea129d56f |  false  | sled 68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858 expunged (expunged 10 disks, 51 datasets, 17 zones)
   	13 | 2025-02-12 03:27:25.265384+00 | dbbd05b6-bbef-4d9f-91bc-f040df927dfd |  true   | sled e6bfacbc-ac31-475b-865a-8e0a55aeecde: added 10 disks, updated 0, removed 0 disks, sled e6bfacbc-ac31-475b-865a-8e0a55aeecde: added zone: boundary_ntp
   	14 | 2025-02-12 03:34:31.480614+00 | 28b1579b-d15f-4c7b-a74e-8599f651b91b |  true   | sled e6bfacbc-ac31-475b-865a-8e0a55aeecde: added zone: crucible
   	15 | 2025-02-12 03:36:44.82581+00  | 28b1579b-d15f-4c7b-a74e-8599f651b91b |  false  | sled e6bfacbc-ac31-475b-865a-8e0a55aeecde: added zone: crucible
   	16 | 2025-02-12 03:45:46.69233+00  | 2c2d7acd-13c8-4ac6-89b9-f3e4bcaa69fb |  true   | sled 17f710e3-37b6-44c0-86ee-a979967071d8 expunged (expunged 10 disks, 51 datasets, 17 zones)
   	17 | 2025-02-12 03:48:49.098792+00 | 2c2d7acd-13c8-4ac6-89b9-f3e4bcaa69fb |  false  | sled 17f710e3-37b6-44c0-86ee-a979967071d8 expunged (expunged 10 disks, 51 datasets, 17 zones)
   	18 | 2025-02-12 05:07:04.210577+00 | ed6605a7-8691-4e7a-b2f9-deda1b47e650 |  true   | sled b36d653a-c4c1-45f7-8a03-79be17781917: added 10 disks, updated 0, removed 0 disks, sled b36d653a-c4c1-45f7-8a03-79be17781917: added zone: boundary_ntp
   	19 | 2025-02-12 05:16:23.888674+00 | e06f88ff-0e4f-4e4d-8b40-8280ec9cf65a |  true   | sled b36d653a-c4c1-45f7-8a03-79be17781917: added zone: crucible
   	20 | 2025-02-12 05:27:10.496012+00 | 3a210753-90a0-45c7-a3d9-42e14f25ce25 |  true   | sled e6bfacbc-ac31-475b-865a-8e0a55aeecde expunged (expunged 10 disks, 51 datasets, 17 zones)
   	21 | 2025-02-12 05:29:42.639837+00 | 3a210753-90a0-45c7-a3d9-42e14f25ce25 |  false  | sled e6bfacbc-ac31-475b-865a-8e0a55aeecde expunged (expunged 10 disks, 51 datasets, 17 zones)
   	22 | 2025-02-12 06:27:54.342782+00 | 34b873c0-578b-41b6-a4fd-e2e32f0fbacc |  true   | sled 9f15aad3-1a92-4c68-bc06-87159fab5764: added 10 disks, updated 0, removed 0 disks, sled 9f15aad3-1a92-4c68-bc06-87159fab5764: added zone: boundary_ntp
   	23 | 2025-02-12 06:37:02.503389+00 | 895b6657-f0a9-4674-8cac-1dae04142c93 |  true   | sled 9f15aad3-1a92-4c68-bc06-87159fab5764: added zone: crucible
   	24 | 2025-02-12 06:40:44.628681+00 | 895b6657-f0a9-4674-8cac-1dae04142c93 |  false  | sled 9f15aad3-1a92-4c68-bc06-87159fab5764: added zone: crucible
   	25 | 2025-02-12 06:45:38.027007+00 | e79910a5-85e3-419a-a550-5c29e3c91807 |  true   | sled b36d653a-c4c1-45f7-8a03-79be17781917 expunged (expunged 10 disks, 51 datasets, 17 zones)
   	26 | 2025-02-12 06:48:41.065114+00 | e79910a5-85e3-419a-a550-5c29e3c91807 |  false  | sled b36d653a-c4c1-45f7-8a03-79be17781917 expunged (expunged 10 disks, 51 datasets, 17 zones)
   	27 | 2025-02-12 06:54:34.192361+00 | b3426f7f-37d0-45b8-8d87-c542032a3c6c |  true   | sled 2ea49c14-47a1-46c8-abcc-67128965d58e: added 10 disks, updated 0, removed 0 disks, sled 2ea49c14-47a1-46c8-abcc-67128965d58e: added zone: boundary_ntp
   	28 | 2025-02-12 06:58:36.758584+00 | 570d7eef-3039-48c3-a4a5-62faa84f29b5 |  true   | sled 2ea49c14-47a1-46c8-abcc-67128965d58e: added zone: crucible
   	29 | 2025-02-12 07:09:37.527586+00 | 143803f2-05d3-42e9-b867-d8bd8834b42a |  true   |
   	30 | 2025-02-12 07:12:12.135239+00 | 143803f2-05d3-42e9-b867-d8bd8834b42a |  false  |
(30 rows)

@davepacheco
Copy link
Collaborator Author

History of boundary NTP zones

Used this:

$ omdb nexus blueprints list | grep 2025 | cut -c 7-42 | while read bp; do echo $bp; omdb nexus blueprints show $bp | grep boundary; echo; echo; done

To see when boundary zones changed. Results (assembled by hand from above):

Initial: fd00:1122:3344:102::10, fd00:1122:3344:101::10 (sled 15, 14)
Around 2025-02-11 23:50:00: Blueprint 4c6c66a2-9326-4038-84f1-14fcd831c27f: fd00:1122:3344:101::10, fd00:1122:3344:121::2c (sled 14, 17)
Around 2025-02-12 01:45:11: Blueprint b1d59267-20d3-4e0e-b7bb-5e7ea129d56f: fd00:1122:3344:101::10, fd00:1122:3344:122::2c (sled 14, 15)
Around 2025-02-12 03:45:46: Blueprint 2c2d7acd-13c8-4ac6-89b9-f3e4bcaa69fb: fd00:1122:3344:123::2c, fd00:1122:3344:101::10 (sled 14, 17)
Around 2025-02-12 05:29:42: Blueprint 3a210753-90a0-45c7-a3d9-42e14f25ce25: fd00:1122:3344:124::2c, fd00:1122:3344:101::10 (sled 14, 15)
Around 2025-02-12 06:45:38: Blueprint e79910a5-85e3-419a-a550-5c29e3c91807: fd00:1122:3344:125::2c, fd00:1122:3344:101::10 (sled 14, 17)

Sled 16 (internal) NTP tracking log

This is a sled that never hosted a boundary NTP zone.

===================================================================================================================================
   Date (UTC) Time 	IP Address   St   Freq ppm   Skew ppm 	Offset L Co  Offset sd Rem. corr. Root delay Root disp. Max. error
===================================================================================================================================
2025-02-12 06:13:32 fd00:1122:3344:124::2c  4	-56.701  	0.185  3.354e-05 N  2  4.345e-04 -8.753e-05  2.636e-02  8.120e-03  2.134e-02
2025-02-12 06:14:37 fd00:1122:3344:124::2c  4	-56.440  	0.199  2.339e-05 N  2  3.942e-04 -9.856e-05  2.636e-02  8.109e-03  2.150e-02
2025-02-12 06:15:42 fd00:1122:3344:124::2c  4	-56.344  	0.122  6.942e-06 N  2  3.511e-04 -9.673e-05  2.636e-02  8.107e-03  2.149e-02
2025-02-12 06:16:47 fd00:1122:3344:124::2c  4	-56.525  	0.136  2.591e-06 N  2  3.074e-04 -8.329e-05  2.636e-02  8.109e-03  2.146e-02
2025-02-12 06:17:51 fd00:1122:3344:124::2c  4	-56.531  	0.062  1.104e-05 N  2  2.664e-04 -7.010e-05  2.636e-02  8.104e-03  2.145e-02
2025-02-12 06:18:55 fd00:1122:3344:124::2c  4	-56.509  	0.061  9.569e-06 N  2  2.277e-04 -6.461e-05  2.636e-02  8.103e-03  2.143e-02
2025-02-12 06:20:00 fd00:1122:3344:124::2c  4	-56.435  	0.071  1.025e-05 N  2  1.873e-04 -5.796e-05  2.636e-02  8.098e-03  2.143e-02
2025-02-12 06:21:04 fd00:1122:3344:124::2c  4	-56.376  	0.068  8.400e-06 N  2  1.420e-04 -5.179e-05  2.636e-02  8.104e-03  2.142e-02
2025-02-12 06:22:08 fd00:1122:3344:124::2c  4	-56.321  	0.066  3.084e-06 N  2  9.479e-05 -4.317e-05  2.636e-02  8.106e-03  2.141e-02
2025-02-12 06:23:13 fd00:1122:3344:124::2c  4	-56.394  	0.070  2.875e-06 N  2  4.874e-05 -3.108e-05  2.636e-02  8.111e-03  2.140e-02
2025-02-12 06:25:22 fd00:1122:3344:124::2c  4	-56.520  	0.081  1.292e-05 N  2  4.573e-05 -2.437e-06  2.636e-02  8.122e-03  2.146e-02
2025-02-12 06:26:27 fd00:1122:3344:124::2c  4	-56.623  	0.062  2.081e-05 N  2  9.621e-05 -1.362e-05  2.636e-02  8.124e-03  2.139e-02
2025-02-12 06:27:32 fd00:1122:3344:124::2c  4	-56.490  	0.080  3.830e-05 N  2  1.683e-04 -2.632e-05  2.642e-02  1.035e-02  2.141e-02
2025-02-12 06:28:37 fd00:1122:3344:124::2c  4	-56.928  	0.135  1.840e-05 N  2  2.251e-04 -4.799e-05  2.642e-02  1.038e-02  2.369e-02
2025-02-12 06:29:41 fd00:1122:3344:124::2c  4	-56.821  	0.042  4.214e-05 N  2  2.749e-04 -5.332e-05  2.642e-02  1.036e-02  2.373e-02
2025-02-12 06:30:46 fd00:1122:3344:124::2c  4	-56.800  	0.037  3.756e-05 N  2  3.264e-04 -7.348e-05  2.642e-02  1.036e-02  2.373e-02
2025-02-12 06:31:50 fd00:1122:3344:124::2c  4	-57.034  	0.111  2.590e-05 N  2  3.716e-04 -8.600e-05  2.642e-02  1.040e-02  2.375e-02
2025-02-12 06:32:54 fd00:1122:3344:124::2c  4	-57.139  	0.038  3.290e-05 N  2  4.216e-04 -8.927e-05  2.640e-02  1.166e-02  2.380e-02
2025-02-12 06:33:59 fd00:1122:3344:124::2c  4	-57.141  	0.022  5.349e-05 N  2  4.730e-04 -9.847e-05  2.640e-02  1.181e-02  2.505e-02
2025-02-12 06:35:04 fd00:1122:3344:124::2c  4	-57.129  	0.027  5.728e-05 N  2  5.251e-04 -1.194e-04  2.640e-02  1.175e-02  2.523e-02
2025-02-12 06:36:09 fd00:1122:3344:124::2c  4	-57.118  	0.025  4.357e-05 N  2  5.666e-04 -1.373e-04  2.640e-02  1.173e-02  2.519e-02
2025-02-12 06:37:13 fd00:1122:3344:124::2c  4	-57.112  	0.023  4.634e-05 N  2  6.108e-04 -1.425e-04  2.640e-02  1.172e-02  2.517e-02
2025-02-12 06:38:18 fd00:1122:3344:124::2c  4	-57.137  	0.033  4.063e-05 N  2  6.670e-04 -1.494e-04  2.640e-02  1.235e-02  2.518e-02
2025-02-12 06:39:22 fd00:1122:3344:124::2c  4	-57.136  	0.022  4.865e-05 N  2  7.128e-04 -1.545e-04  2.640e-02  1.232e-02  2.580e-02
2025-02-12 06:40:27 fd00:1122:3344:124::2c  4	-57.134  	0.021  4.323e-05 N  2  7.541e-04 -1.641e-04  2.640e-02  1.232e-02  2.579e-02
2025-02-12 06:41:32 fd00:1122:3344:124::2c  4	-57.120  	0.069  3.885e-05 N  2  8.051e-04 -1.693e-04  2.640e-02  1.231e-02  2.579e-02
2025-02-12 06:42:36 fd00:1122:3344:124::2c  4	-57.121  	0.083  1.819e-05 N  2  8.525e-04 -1.725e-04  2.640e-02  1.328e-02  2.580e-02
2025-02-12 06:43:40 fd00:1122:3344:124::2c  4	-57.073  	0.113  4.168e-05 N  2  8.968e-04 -1.624e-04  2.640e-02  1.325e-02  2.674e-02
2025-02-12 06:55:34 fd00:1122:3344:125::2c  4	-34.186 	21.514  1.791e-02 N  1  6.182e-06  4.779e-09  2.618e-02  9.855e-02  2.746e-02
2025-02-12 06:55:39 fd00:1122:3344:125::2c  4	-14.496 	29.843  1.244e-03 N  1  3.606e-04 -5.387e-03  2.601e-02  1.445e-01  1.297e-01
2025-02-12 06:57:43 fd00:1122:3344:101::10  4	-57.156  	1.665 -2.569e-02 N  1  3.061e-05  3.973e-09  2.642e-02  3.620e-02  1.687e-01
2025-02-12 06:58:48 fd00:1122:3344:125::2c 11	-56.534  	7.079  2.744e-02 N  1  4.241e-04 -2.571e-05  1.724e-04  3.528e-03  7.527e-02
===================================================================================================================================
   Date (UTC) Time 	IP Address   St   Freq ppm   Skew ppm 	Offset L Co  Offset sd Rem. corr. Root delay Root disp. Max. error
===================================================================================================================================
2025-02-12 06:58:51 fd00:1122:3344:125::2c 11	-47.809 	30.075  3.004e-03 N  1  6.398e-04 -1.899e-02  1.324e-04  7.118e-04  3.123e-02
2025-02-12 06:59:55 fd00:1122:3344:125::2c 11	-12.373 	32.946  1.700e-03 N  1  5.638e-04  1.333e-08  1.568e-04  5.772e-04  2.759e-02
2025-02-12 07:00:59 fd00:1122:3344:125::2c 11	-10.018 	14.265 -3.011e-04 N  1  3.924e-04  4.203e-08  1.044e-04  4.049e-04  4.835e-03
2025-02-12 07:02:04 fd00:1122:3344:125::2c 11	-10.334 	11.153 -1.269e-04 N  1  4.083e-04  1.451e-04  1.866e-04  4.184e-04  1.760e-03
2025-02-12 07:03:09 fd00:1122:3344:125::2c 11	-11.490 	18.801 -6.980e-04 N  1  6.429e-04  1.523e-04  1.703e-04  6.461e-04  1.572e-03
2025-02-12 07:04:14 fd00:1122:3344:125::2c 11	-15.488 	18.057 -3.302e-04 N  1  3.273e-04  9.401e-05  1.215e-04  3.396e-04  3.114e-03
2025-02-12 07:05:19 fd00:1122:3344:125::2c 11	-14.157  	0.438  2.821e-04 N  1  9.608e-06  5.785e-05  1.357e-04  1.110e-05  2.124e-03
2025-02-12 07:06:23 fd00:1122:3344:125::2c 11	-14.159  	0.271 -1.832e-05 N  1  8.380e-06  4.860e-08  1.161e-04  8.976e-06  3.954e-04
2025-02-12 07:07:28 0.0.0.0      	0	-14.159  	0.271  0.000e+00 ?  0  0.000e+00  6.083e-11  1.161e-04  9.189e-05  1.683e-04
2025-02-12 16:53:39 fd00:1122:3344:101::10  4	-55.159  	9.023 -1.561e+00 N  1  1.528e-05  4.350e-08  2.644e-02  7.708e-02  4.487e-02
2025-02-12 17:02:19 fd00:1122:3344:101::10  4	-57.880  	2.550 -2.481e-03 N  1  3.090e-04  1.556e-01  2.646e-02  8.028e-02  1.658e+00
2025-02-12 17:03:24 fd00:1122:3344:101::10  4	-57.572  	2.968  9.488e-04 N  1  3.225e-04 -4.314e-04  2.643e-02  8.037e-02  2.518e-01
2025-02-12 17:04:28 fd00:1122:3344:101::10  4	-57.486  	3.588  1.032e-04 N  1  2.831e-04  6.084e-08  2.645e-02  8.041e-02  9.523e-02
2025-02-12 17:05:33 fd00:1122:3344:101::10  4	-57.435  	4.834  7.324e-05 N  1  2.608e-04 -7.798e-05  2.644e-02  8.046e-02  9.404e-02
2025-02-12 17:06:38 fd00:1122:3344:101::10  4	-57.177  	7.306  1.546e-04 N  1  1.957e-04 -9.243e-05  2.641e-02  8.048e-02  9.422e-02
2025-02-12 17:07:43 fd00:1122:3344:101::10  4	-56.568  	9.600  9.455e-05 N  1  1.744e-04 -3.894e-05  2.642e-02  8.054e-02  9.450e-02
2025-02-12 17:08:48 fd00:1122:3344:101::10  4	-57.173  	0.348 -1.042e-04 N  1  6.823e-06 -6.565e-05  2.643e-02  8.016e-02  9.459e-02
2025-02-12 17:09:52 fd00:1122:3344:101::10  4	-57.151  	0.300  6.933e-06 N  1  9.200e-06 -1.760e-09  2.643e-02  8.017e-02  9.351e-02
2025-02-12 17:10:57 fd00:1122:3344:101::10  4	-57.081  	0.171  1.431e-05 N  1  4.889e-06 -3.446e-06  2.645e-02  8.018e-02  9.347e-02
2025-02-12 17:12:02 fd00:1122:3344:101::10  4	-57.066  	0.114  4.261e-06 N  1  4.875e-06  1.549e-10  2.645e-02  8.018e-02  9.350e-02
2025-02-12 17:13:07 fd00:1122:3344:101::10  4	-57.068  	0.083 -1.237e-06 N  1  4.829e-06 -1.785e-06  2.645e-02  8.017e-02  9.348e-02
2025-02-12 17:14:11 fd00:1122:3344:101::10  4	-57.068  	0.059  2.658e-07 N  1  4.246e-06 -5.067e-07  2.645e-02  8.016e-02  9.347e-02
2025-02-12 17:15:16 fd00:1122:3344:101::10  4	-57.081  	0.060 -7.889e-06 N  1  4.589e-06 -6.774e-07  2.645e-02  8.016e-02  9.346e-02

Sled 14 (boundary) NTP tracking info

This is the sled that lost external network connectivity but does not appear to have drifted much from real time.

This is the log after restoring external connectivity by hand using opteadm add-router-entry -p opte0 0.0.0.0/0 ig system:

===================================================================================================================================
   Date (UTC) Time 	IP Address   St   Freq ppm   Skew ppm 	Offset L Co  Offset sd Rem. corr. Root delay Root disp. Max. error
===================================================================================================================================
2025-02-11 22:33:27 172.20.0.5   	3	-59.821  	0.041 -4.842e-08 N  1  2.441e-06 -2.002e-07  2.627e-02  5.980e-04  1.374e-02
2025-02-11 22:33:35 172.20.0.5   	3	-59.820  	0.040  8.767e-07 N  1  2.405e-06 -1.455e-07  2.627e-02  5.973e-04  1.374e-02
2025-02-11 22:33:43 172.20.0.5   	3	-59.824  	0.040 -2.585e-06 N  1  2.329e-06 -7.336e-07  2.627e-02  5.969e-04  1.374e-02
2025-02-11 22:33:51 172.20.0.5   	3	-59.831  	0.042 -2.614e-06 N  1  2.426e-06  8.666e-07  2.627e-02  5.976e-04  1.374e-02
2025-02-11 22:33:59 172.20.0.5   	3	-59.832  	0.038  7.054e-08 N  1  2.323e-06  1.557e-07  2.627e-02  5.970e-04  1.374e-02
2025-02-11 22:34:07 172.20.0.5   	3	-59.834  	0.037  3.995e-07 N  1  2.357e-06  8.318e-08  2.627e-02  5.974e-04  1.374e-02
2025-02-11 22:34:23 172.20.0.5   	3	-59.833  	0.036  1.320e-07 N  1  2.419e-06 -2.590e-07  2.627e-02  5.982e-04  1.375e-02
2025-02-11 22:34:31 172.20.0.5   	3	-59.833  	0.034  6.525e-08 N  1  2.352e-06 -3.700e-07  2.627e-02  5.973e-04  1.374e-02
2025-02-11 22:34:39 172.20.0.5   	3	-59.833  	0.033  1.194e-07 N  1  2.335e-06 -3.813e-07  2.627e-02  5.969e-04  1.374e-02
2025-02-11 22:34:47 172.20.0.5   	3	-59.835  	0.034 -3.365e-07 N  1  2.387e-06 -4.283e-07  2.627e-02  5.973e-04  1.374e-02
2025-02-11 22:34:55 172.20.0.5   	3	-59.838  	0.035 -1.316e-06 N  1  2.429e-06 -8.943e-08  2.627e-02  5.972e-04  1.374e-02
2025-02-11 22:35:04 172.20.0.5   	3	-59.843  	0.037 -1.522e-06 N  1  2.564e-06  8.134e-07  2.627e-02  5.986e-04  1.374e-02
2025-02-11 22:35:12 172.20.0.5   	3	-59.844  	0.034  9.739e-08 N  1  2.506e-06  9.319e-07  2.627e-02  5.981e-04  1.374e-02
2025-02-11 22:35:20 172.20.0.5   	3	-59.846  	0.033 -3.888e-07 N  1  2.489e-06  6.509e-07  2.627e-02  5.976e-04  1.374e-02
2025-02-11 22:35:28 172.20.0.5   	3	-59.847  	0.032 -4.886e-07 N  1  2.481e-06  7.505e-07  2.627e-02  5.983e-04  1.374e-02
2025-02-11 22:35:36 172.20.0.5   	3	-59.849  	0.031 -1.104e-06 N  1  2.405e-06  8.192e-07  2.627e-02  5.976e-04  1.374e-02
2025-02-11 22:35:44 172.20.0.5   	3	-59.850  	0.030 -2.263e-07 N  1  2.358e-06  9.002e-07  2.627e-02  5.972e-04  1.374e-02
2025-02-11 22:35:52 172.20.0.5   	3	-59.851  	0.028 -1.439e-07 N  1  2.323e-06  7.702e-07  2.627e-02  5.970e-04  1.374e-02
2025-02-11 22:36:00 172.20.0.5   	3	-59.851  	0.027  2.905e-08 N  1  2.293e-06  6.755e-07  2.627e-02  5.967e-04  1.374e-02
2025-02-11 22:36:08 172.20.0.5   	3	-59.852  	0.029 -1.103e-06 N  1  2.363e-06  5.253e-07  2.627e-02  5.976e-04  1.374e-02
2025-02-11 22:36:16 172.20.0.5   	3	-59.855  	0.030 -1.539e-06 N  1  2.402e-06  8.678e-07  2.625e-02  6.128e-04  1.374e-02
2025-02-11 22:36:24 172.20.0.5   	3	-59.859  	0.031 -1.547e-06 N  1  2.372e-06  8.135e-07  2.626e-02  6.126e-04  1.375e-02
2025-02-11 22:36:33 172.20.0.5   	3	-59.862  	0.030 -4.528e-07 N  1  2.372e-06  7.818e-07  2.628e-02  6.288e-04  1.375e-02
2025-02-11 22:36:41 172.20.0.5   	3	-59.868  	0.069 -1.756e-05 N  1  7.113e-06  8.090e-07  2.631e-02  1.150e-04  1.378e-02
2025-02-12 22:45:44 172.20.0.5   	3	-60.157  	0.005 -2.515e-02 N  1  2.599e-05  5.519e-08  2.609e-02  3.012e-04  1.179e-01
2025-02-12 22:45:45 172.20.0.5   	3	-60.158  	0.002 -6.231e-05 N  1  2.488e-05  2.496e-02  2.607e-02  2.997e-04  3.850e-02
2025-02-12 22:45:46 172.20.0.5   	3	-60.158  	0.003 -3.011e-04 N  1  1.470e-04  2.231e-02  2.607e-02  4.482e-04  3.836e-02
2025-02-12 22:45:47 172.20.0.5   	3	-60.160  	0.007 -1.676e-04 N  1  1.684e-04  1.985e-02  2.607e-02  4.694e-04  3.609e-02
2025-02-12 22:45:48 172.20.0.5   	3	-60.164  	0.012 -2.575e-04 N  1  1.474e-04  1.729e-02  2.608e-02  4.228e-04  3.352e-02
2025-02-12 22:45:49 172.20.0.5   	3	-60.167  	0.012 -8.249e-05 N  1  1.331e-04  1.482e-02  2.608e-02  4.084e-04  3.101e-02
2025-02-12 22:45:50 172.20.0.5   	3	-60.168  	0.010 -4.943e-05 N  1  1.171e-04  1.216e-02  2.607e-02  3.922e-04  2.836e-02
2025-02-12 22:45:51 172.20.0.5   	3	-60.168  	0.009 -2.810e-05 N  1  1.030e-04  9.495e-03  2.607e-02  3.780e-04  2.564e-02
===================================================================================================================================
   Date (UTC) Time 	IP Address   St   Freq ppm   Skew ppm 	Offset L Co  Offset sd Rem. corr. Root delay Root disp. Max. error
===================================================================================================================================
2025-02-12 22:45:52 172.20.0.5   	3	-60.168  	0.008 -1.803e-05 N  1  9.359e-05  6.790e-03  2.607e-02  3.692e-04  2.294e-02
2025-02-12 22:45:53 172.20.0.5   	3	-60.169  	0.007 -2.244e-05 N  1  8.038e-05  4.080e-03  2.607e-02  3.553e-04  2.021e-02
2025-02-12 22:45:54 172.20.0.5   	3	-60.169  	0.011 -7.753e-05 N  1  1.322e-05  1.345e-03  2.610e-02  2.919e-04  1.749e-02
2025-02-12 22:45:55 172.20.0.5   	3	-60.169  	0.018  2.836e-04 N  1  6.914e-05 -2.646e-06  2.609e-02  3.778e-04  1.477e-02
2025-02-12 22:45:56 172.20.0.5   	3	-60.169  	0.029  2.251e-04 N  1  1.213e-04  3.693e-07  2.615e-02  4.984e-04  1.379e-02
2025-02-12 22:45:57 172.20.0.5   	3	-60.169  	0.064  2.890e-04 N  1  8.433e-05  2.496e-07  2.605e-02  4.457e-04  1.394e-02
2025-02-12 22:45:58 172.20.0.5   	3	-60.169  	0.128  1.322e-04 N  1  1.005e-04  1.303e-07  2.612e-02  5.404e-04  1.396e-02
2025-02-12 22:45:59 172.20.0.5   	3	-60.169  	0.184 -5.618e-05 N  1  1.182e-04 -1.811e-05  2.607e-02  5.061e-04  1.392e-02
2025-02-12 22:46:00 172.20.0.5   	3	-60.169  	0.222 -6.166e-05 N  1  1.074e-04  2.979e-05  2.611e-02  4.076e-04  1.369e-02
2025-02-12 22:46:01 172.20.0.5   	3	-60.171  	0.435 -1.024e-04 N  1  1.255e-05  3.925e-05  2.608e-02  3.172e-04  1.360e-02
2025-02-12 22:46:02 172.20.0.5   	3	-60.198  	1.186 -9.758e-06 N  1  7.773e-06 -9.678e-08  2.608e-02  3.171e-04  1.352e-02
2025-02-12 22:46:03 172.20.0.5   	3	-60.230  	1.919 -9.603e-07 N  1  1.350e-05  1.632e-06  2.611e-02  3.144e-04  1.339e-02
2025-02-12 22:46:04 172.20.0.5   	3	-60.263  	2.617  6.269e-06 N  1  1.932e-05  2.260e-06  2.613e-02  3.097e-04  1.339e-02
2025-02-12 22:46:05 172.20.0.5   	3	-60.152  	3.941  1.792e-05 N  1  9.052e-06 -3.463e-06  2.609e-02  3.064e-04  1.339e-02
2025-02-12 22:46:06 172.20.0.5   	3	-58.643  	8.819  2.166e-05 N  1  6.164e-06  1.782e-09  2.605e-02  3.079e-04  1.338e-02
2025-02-12 22:46:07 172.20.0.5   	3	-56.436 	12.613 -4.083e-06 N  1  7.460e-06  1.861e-09  2.607e-02  2.984e-04  1.337e-02
2025-02-12 22:46:08 172.20.0.5   	3	-56.101  	8.164 -4.386e-06 N  1  5.743e-06  2.603e-06  2.602e-02  2.971e-04  1.336e-02
2025-02-12 22:46:09 172.20.0.5   	3	-56.836  	7.493 -5.528e-06 N  1  6.295e-06  1.374e-06  2.602e-02  3.053e-04  1.333e-02
2025-02-12 22:46:10 172.20.0.5   	3	-57.807  	7.586 -7.185e-06 N  1  7.739e-06  1.805e-06  2.602e-02  3.129e-04  1.333e-02
2025-02-12 22:46:11 172.20.0.5   	3	-58.173  	6.035 -1.204e-06 N  1  7.068e-06  2.091e-06  2.602e-02  3.181e-04  1.334e-02
2025-02-12 22:46:12 172.20.0.5   	3	-58.754  	5.413 -4.722e-06 N  1  7.054e-06  2.266e-06  2.602e-02  3.237e-04  1.334e-02
2025-02-12 22:46:13 172.20.0.5   	3	-59.439  	5.909 -6.987e-06 N  1  9.202e-06  2.428e-06  2.602e-02  3.337e-04  1.335e-02
2025-02-12 22:46:14 172.20.0.5   	3	-59.781  	4.390 -2.014e-06 N  1  7.011e-06  2.921e-06  2.602e-02  3.152e-04  1.336e-02
2025-02-12 22:46:15 172.20.0.5   	3	-60.298  	4.319 -5.964e-06 N  1  7.673e-06  2.638e-06  2.602e-02  3.214e-04  1.333e-02
2025-02-12 22:46:16 172.20.0.5   	3	-60.505  	3.862 -1.409e-06 N  1  7.818e-06  2.104e-06  2.602e-02  3.227e-04  1.334e-02
2025-02-12 22:46:17 172.20.0.5   	3	-60.654  	3.489 -1.772e-06 N  1  7.845e-06  2.468e-06  2.602e-02  3.241e-04  1.334e-02
2025-02-12 22:46:18 172.20.0.5   	3	-60.625  	3.046  9.120e-07 N  1  7.482e-06  2.707e-06  2.602e-02  3.271e-04  1.334e-02
2025-02-12 22:46:21 172.20.0.5   	3	-60.770  	2.793 -2.630e-06 N  1  7.896e-06  1.223e-06  2.602e-02  3.332e-04  1.335e-02
2025-02-12 22:46:23 172.20.0.5   	3	-60.903  	2.497 -2.135e-06 N  1  8.063e-06  2.597e-06  2.602e-02  3.359e-04  1.335e-02
2025-02-12 22:46:25 172.20.0.5   	3	-60.981  	2.026 -9.157e-07 N  1  7.186e-06  2.876e-06  2.604e-02  3.130e-04  1.336e-02
2025-02-12 22:46:27 172.20.0.5   	3	-60.648  	1.954  7.832e-06 N  1  7.212e-06  2.462e-06  2.604e-02  3.162e-04  1.334e-02
2025-02-12 22:46:29 172.20.0.5   	3	-60.630  	1.640 -1.080e-06 N  1  7.177e-06 -2.695e-06  2.604e-02  3.191e-04  1.335e-02
===================================================================================================================================
   Date (UTC) Time 	IP Address   St   Freq ppm   Skew ppm 	Offset L Co  Offset sd Rem. corr. Root delay Root disp. Max. error
===================================================================================================================================
2025-02-12 22:46:33 172.20.0.5   	3	-60.588  	1.455  9.849e-07 N  1  7.635e-06 -1.135e-06  2.602e-02  3.341e-04  1.335e-02
2025-02-12 22:46:37 172.20.0.5   	3	-60.198  	1.434  1.239e-05 N  1  7.584e-06 -1.722e-06  2.603e-02  3.306e-04  1.336e-02
2025-02-12 22:46:41 172.20.0.5   	3	-60.065  	1.094  2.360e-06 N  1  6.953e-06  9.339e-10  2.602e-02  3.315e-04  1.337e-02
2025-02-12 22:46:45 172.20.0.5   	3	-60.032  	0.950  6.496e-07 N  1  7.198e-06 -1.831e-06  2.602e-02  3.323e-04  1.335e-02
2025-02-12 22:46:53 172.20.0.5   	3	-59.990  	0.806  1.854e-06 N  1  7.433e-06 -1.322e-06  2.602e-02  3.319e-04  1.336e-02
2025-02-12 22:47:01 172.20.0.5   	3	-59.920  	0.807  2.408e-06 N  1  7.801e-06 -2.283e-06  2.602e-02  3.397e-04  1.336e-02
2025-02-12 22:47:09 172.20.0.5   	3	-60.002  	0.762 -5.689e-06 N  1  8.541e-06 -2.807e-06  2.602e-02  3.434e-04  1.337e-02

Based on this, it looks like the route was probably added around 2025-02-12T22:45Z. Sled 14’s first reported data point shows an offset of -2.515e-02, or -25ms. This is not nearly enough to explain the CockroachDB observations about the time delta.

@davepacheco
Copy link
Collaborator Author

Clock divergence investigation

We got on this path by starting with the CockroachDB cluster failure, bringing the one node back online, then getting access to the CockroachDB web console, and looking at its metrics. It was pretty quickly clear that the cluster had a time sync problem:

Image

and it resulted in kicking one of the live nodes out of the cluster:

Image

Here are some more precise data points:

Image Image Image Image Image

This tells a clear story:

  • Prior to 06:45: time is very well sync'd up across the cluster (<1ms difference everywhere)
  • Starting at 06:45, node 1's clock diverges from the other four. (There's a brief period that's more split-brain but it quickly becomes node 1 vs. the field.)
  • Around 09:15, node 1's clock has diverged too far from the others. It shuts itself down and the cluster is down to four working nodes.

More precisely:

CockroachDB shows:
06:45Z: range is 0.91ms = 0.00091s
06:50Z: range is (sled 14: -3.79ms; sled 16: -.26ms, sled 17: +6ms; range = 10.59ms)
06:55Z: range is (sled 14: -12.8ms; sled 16: 1.7ms, 1.49ms; sled 17: +10.7ms, +10.9ms; range = 23.7ms)
07:00Z: range is (sled 14: -24.35ms; sled 16: 12.54ms, 12.47ms, sled 17: 12.05ms, 12.09ms; range = 36.89ms)
After this point, sleds 16 and 17 are pretty consistent with each other until peak range at 09:15Z
09:15Z: range is (sled 14: -256.85ms; sled 16: 128.39ms, 128.42ms; sled 17: 128.55ms, 128.57ms; range = 385.42ms)

So the divergence started after 06:45Z and before 06:50Z.

Here are the nodes, based on data in an earlier comment:

Node 1: subnet 101: sled 14
Node 2: subnet 103: sled 16
Node 6: subnet 103: sled 16
Node 16: sled 17 (now clean-slated no data about this available)
Node 17: sled 17 (now clean-slated, no data about this available)

Recall that sled 14 is a boundary NTP zone for the duration, but it lost upstream connectivity many hours earlier.
We can’t look at sled 17 because it's been clean-slated in preparation for expungement. So let’s look at the tracking log on sled 16 (the full log is pasted in an earlier comment). Interestingly, there are no events on sled 16’s tracking log in that interval. However, the expungement of sled 15 probably happened in this interval.

Here's an annotated excerpt from the sled 16 tracking log:

06:41:32 fd00:1122:3344:124::2c reporting stratum 4 offset 0.00003885 (seconds?)
06:42:36 fd00:1122:3344:124::2c reporting stratum 4 offset 0.00001819
06:43:40 fd00:1122:3344:124::2c reporting stratum 4 offset 0.00004168 (this is sled 15)
06:55:34 fd00:1122:3344:125::2c reporting stratum 4 offset 0.01791 (this is sled 17; sled 15 has just been expunged?)
06:55:39 fd00:1122:3344:125::2c reporting stratum 4 offset 0.001244 (sled 17)
06:57:43 fd00:1122:3344:101::10 reporting stratum 4 offset -0.024446 (sled 14)
06:58:48 fd00:1122:3344:125::2c reporting stratum 11 offset 0.02744 (sled 17)
06:58:51 fd00:1122:3344:125::2c reporting stratum 11 offset 0.003004 (sled 17)

It’s pretty consistent after that point up to:

07:06:23 fd00:1122:3344:125::2c reporting stratum 11 offset -0.00001832 (continuation of above)
07:07:28 0.0.0.0 offset 0 – not clear what this means
16:53:39 fd00:1122:3344:101::10 reporting stratum 4 offset -1.561 seconds

Putting all this together:

06:43Z: sled 16 is using sled 15 as its NTP upstream, and clocks are aligned (including sled 14’s)
Sled 15 gets expunged
06:55Z: sled 16 is now using sled 17, with a brief look at sled 14 around 06:57:43

Here's a summary of the observations up to this point:

  1. at 06:43Z: boundary NTP servers were sleds 14 and 15 (according to the blueprints, anyway)
  2. up to 06:43Z, sled 16's NTP was pointed at sled 15 (at least, its tracking file was reporting data from sled 15's NTP server)
  3. up to about 06:45Z, sleds 14, 16, and 17 were all in sync (within 1ms), per CockroachDB's metric
  4. around 06:45Z: sled 15 is expunged. The boundary NTP servers become sleds 14 and 17.
  5. by 06:55: sled 16 is now tracking sled 17 as its NTP upstream (not sled 15 any more) (according to its tracking file)
  6. by 07:00Z: sled 14's clock was about 37ms from the others (sled 16 and sled 17)
  7. by 09:00Z: sled 14's clock was about 385ms from the others, and this delta had increased about linearly since 07:00

More simply:

  1. According to CockroachDB, sled 16's clock was <1ms apart from sled 14's around 06:45Z. At this time, it was pointed at sled 15.
  2. According to CockroachDB, sled 16's clock was 380ms apart from sled 14's around 09:00Z. At this time, it was pointed at sled 17.
  3. According to CockroachDB, sled 16’s clock was in sync with sled 17’s from 06:45Z to 09:00Z (within a few ms).
  4. Sled 14 was disconnected from its upstream this whole time. Before it become disconnected, its offset was about 0.00001756 seconds. After restoring connectivity, it was 0.02515 seconds. I assume that while disconnected, its clock would have drifted against real-time in a fairly linear way, or at least that it didn't drift, say, 300ms and then somehow get back to 25ms.

(1) implies that the boundary NTP instances on sleds 14 and 15 were pretty in sync at 06:45Z, even though at least one was disconnected from upstream.
(4) strongly suggests sled 14 remained within 25ms of real time for this whole period.
That plus (3) plus (2) implies that the boundary NTP zone deployed on sled 17 began drifting from both sled 14 and real time, accumulating about 380ms of drift in about 2h15m, or 0.00047ms per second, for an error of 47ppm.

Why might this happen? Recall that the sled tracking sled 17 is reporting sled 17's boundary NTP as stratum 11, which means sled 17 never thinks it sync’d up with upstream. That could explain much of the behavior here. But there remain two questions:

  1. Why did sled 16 start tracking sled 17 instead of sled 14, when sled 14 was reporting stratum 4 while sled 17 was reporting stratum 11?
  2. How did sled 17, newly added, have a clock that was so close to in sync at first, despite being a boundary NTP system having never sync’d with its upstream?
    • Answer, I think: sled 17 was added in blueprint 34b873c0-578b-41b6-a4fd-e2e32f0fbacc (as sled 9f15aad3-1a92-4c68-bc06-87159fab5764) with an internal NTP zone, thus sync’ing with upstream. In blueprint e79910a5-85e3-419a-a550-5c29e3c91807 (a few later), that zone was promoted to boundary NTP.

So here's a hypothesis:

  • When sled 17 was most recently set up, it was able to sync its clock as an internal NTP node. At this point, chrony has established some parameters related to the local clock drift to keep things roughly in sync.
  • When its boundary NTP zone was deployed, being a brand new zone, maybe chrony had none of the previously-computed parameters available to it? Instead, maybe it applied some defaults, which were not right, and caused the clock to start drifting. This would normally be corrected by NTP but it (presumably) was disconnected from its upstream like sled 14.

I'm still not sure the answer to question 1 above.

@davepacheco
Copy link
Collaborator Author

Timeline of expungement and CockroachDB cluster status

(hand-assembled from raw data elsewhere in this issue)

2025-02-11T21:49Z sled 14 boundary NTP: first non-1986 time on sled 14 (source: chrony tracking log)
2025-02-11T22:36Z sled 14 boundary NTP: last time it talked to upstream (source: chrony tracking log)
	It lost connectivity.  This is still being investigated.

2025-02-11T22:36:34Z sled BRM23230010 as 6d5906e3-6d45-4dd4-836c-b3d3876186c4 became decommissioned – presumed sled 17 (source: sled table time_modified for decommissioned sled)
	(based on initial system target blueprint (28c2018d-1876-40de-9eba-5f50cea8ad25), it had: 2x CockroachDB + internal NTP) 

2025-02-11 23:51:48Z sled BRM27230037 as 8fa59c49-b8f9-4ebc-915a-6d4ad47a2c5f (sled 15) became decommissioned (source: sled table time_modified for decommissioned sled)
           (based on blueprint 5d2870c4-c2cc-46a6-bfca-c205824aeca1, appears to have had boundary NTP fd00:1122:3344:102::10 + CockroachDB + internal DNS :2)

2025-02-12T01:47:44.908 sled BRM23230010 as 68ca4aa3-c1ce-4b51-963a-b3bdc2c7c858 became decommissioned (sled 17)
	(based on blueprint a994a01f-42c6-4849-a89f-cb380fccabba, appears to have had: boundary NTP (fd00:1122:3344:121::2c), expunged internal NTP)

2025-02-12T03:47:35.455 sled BRM27230037 as 17f710e3-37b6-44c0-86ee-a979967071d8 became decommissioned (sled 15)
	(based on blueprint 28b1579b-d15f-4c7b-a74e-8599f651b91b, appears to have had: boundary NTP (fd00:1122:3344:122::2c))

2025-02-12T05:29:09.924 sled BRM23230010 as e6bfacbc-ac31-475b-865a-8e0a55aeecde became decommissioned (sled 17)
	(based on blueprint e06f88ff-0e4f-4e4d-8b40-8280ec9cf65a, appears to have had: boundary NTP (fd00:1122:3344:123::2c)

2025-02-12T06:45Z CockroachDB starts seeing clocks diverge wildly on n1 (sled 14) (source: CockroachDB console metrics)

2025-02-12T06:47:23.609Z sled BRM27230037 as b36d653a-c4c1-45f7-8a03-79be17781917 became decommissioned (sled 15)
	(based on blueprint 895b6657-f0a9-4674-8cac-1dae04142c93, appears to have had: boundary NTP (fd00:1122:3344:124::2c) 

2025-02-12T06:54Z sled 15 (BRM27230037) re-added (source: time_created of current record for sled BRM23230010 in database, above)

2025-02-12T09:15Z CockroachDB node 1 (on sled 14) crashes -- clock too far off (source: CockroachDB metrics)
              CockroachDB went from 5 -> 4 live nodes

2025-02-12T14:46Z Alan powers off sled 17 in preparation for expungement (see log above)
	CockroachDB goes from 4 -> 2 live nodes and is down

About 18:06Z: we clear CockroachDB on sled 14. At this point, the other nodes (node 16) must have switched to tracking sled 14, so clocks are back in sync, it comes up, and we’re debugging.

@jgallagher
Copy link
Contributor

Investigating the NTP zone's lack of connectivity

2. Shortly after setup, concurrently with the first sled being expunged (sled 17), the boundary NTP server on sled 14 lost external connectivity. This is not yet understood.

At the time we started debugging, this boundary NTP zone's lack of connectivity was due to a missing OPTE route. The router layer for the NTP zone's OPTE port looked like this:

# opteadm dump-layer -p opte0 router                                                                                                                      
Port opte0 - Layer router                                                                                                                               
======================================================================                                                                                  
Inbound Flows                                                                                                                                           
----------------------------------------------------------------------                                                                                  
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION                                                                                                       
                                                                                                                                                        
Outbound Flows                                                                                                                                          
----------------------------------------------------------------------                                                                                  
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION                                                                                                       
                                                                                                                                                        
Inbound Rules                                                                                                                                           
----------------------------------------------------------------------                                                                                  
ID   PRI  HITS  PREDICATES  ACTION                                                                                                                      
DEF  --   509   --          "allow"                                                                                                                     
                                                                                                                                                        
Outbound Rules                                                                                                                                          
----------------------------------------------------------------------                                                                                  
ID   PRI  HITS   PREDICATES                           ACTION                                                                                            
7    27   0      inner.ip.dst=172.30.2.0/24           "Meta: Target = Subnet: 172.30.2.0/24"                                                            
6    27   10     inner.ip.dst=172.30.3.0/24           "Meta: Target = Subnet: 172.30.3.0/24"                                                            
5    27   0      inner.ip.dst=172.30.1.0/24           "Meta: Target = Subnet: 172.30.1.0/24"                                                            
8    139  0      inner.ip6.dst=fd77:e9d2:9cd9:3::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:3::/64"                                                    
3    139  0      inner.ip6.dst=fd77:e9d2:9cd9:1::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:1::/64"                                                    
2    139  0      inner.ip6.dst=fd77:e9d2:9cd9:2::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:2::/64"                                                    
1    267  0      inner.ip6.dst=::/0                   "Meta: Target = IG(None)"                                                                         
DEF  --   15029  --                                   "deny"  

Note the very high hit count on the output deny rule. In comparison to other similar zones, @bnaecker noticed that we should have an outbound default route:

9    75   79     inner.ip.dst=0.0.0.0/0               "Meta: Target = IG(None)"                                                                         

We manually added this rule via opteadm add-router-entry -p opte0 0.0.0.0/0 ig system, and the zone immediately regained upstream connectivity.


We haven't yet been able to reproduce this. Two theories from walking through the relevant code paths:

  1. On the sled-agent side, we update our in-memory version with the new incoming routes, then later delete entries, then later add entries. We could update our version, succeed in deleting some entry(ies), then fail to add entries; I think we wouldn't retry adding until we got a new version from nexus. However, this would result in a 500 from sled-agent on the PUT /vpc-routes call, which AFAICT never happened on this sled.
  2. On the Nexus side, if vpc_resolve_router_rules returned Ok(set_of_routes) with a set that was missing the 0.0.0.0 route for some reason, we'd push that to the sled agent. That's a pretty complicated datastore method; I don't see an obvious way it could miss the default route.

One possibility for 2: there are handful of Result::ok() calls inside vpc_resolve_router_rules that discard any errors and treat them as a None result. In a unit test, calling this method to ask for routes for the builtin services VPC router returns this set:

ResolvedVpcRoute { dest: V4(Ipv4Net { addr: 0.0.0.0, width: 0 }), target: InternetGateway(Some(001de000-074c-4000-8000-000000000002)) }
ResolvedVpcRoute { dest: V4(Ipv4Net { addr: 172.30.1.0, width: 24 }), target: VpcSubnet(V4(Ipv4Net { addr: 172.30.1.0, width: 24 })) }
ResolvedVpcRoute { dest: V4(Ipv4Net { addr: 172.30.2.0, width: 24 }), target: VpcSubnet(V4(Ipv4Net { addr: 172.30.2.0, width: 24 })) }
ResolvedVpcRoute { dest: V4(Ipv4Net { addr: 172.30.3.0, width: 24 }), target: VpcSubnet(V4(Ipv4Net { addr: 172.30.3.0, width: 24 })) }
ResolvedVpcRoute { dest: V6(Ipv6Net { addr: ::, width: 0 }), target: InternetGateway(Some(001de000-074c-4000-8000-000000000002)) }
ResolvedVpcRoute { dest: V6(Ipv6Net { addr: fd77:e9d2:9cd9:1::, width: 64 }), target: VpcSubnet(V6(Ipv6Net { addr: fd77:e9d2:9cd9:1::, width: 64 })) }
ResolvedVpcRoute { dest: V6(Ipv6Net { addr: fd77:e9d2:9cd9:2::, width: 64 }), target: VpcSubnet(V6(Ipv6Net { addr: fd77:e9d2:9cd9:2::, width: 64 })) }
ResolvedVpcRoute { dest: V6(Ipv6Net { addr: fd77:e9d2:9cd9:3::, width: 64 }), target: VpcSubnet(V6(Ipv6Net { addr: fd77:e9d2:9cd9:3::, width: 64 })) }

If I hack up the .ok() call on the query that looks up internet gateways, I get back a set that's missing the default routes:

ResolvedVpcRoute { dest: V4(Ipv4Net { addr: 172.30.1.0, width: 24 }), target: VpcSubnet(V4(Ipv4Net { addr: 172.30.1.0, width: 24 })) }
ResolvedVpcRoute { dest: V4(Ipv4Net { addr: 172.30.2.0, width: 24 }), target: VpcSubnet(V4(Ipv4Net { addr: 172.30.2.0, width: 24 })) }
ResolvedVpcRoute { dest: V4(Ipv4Net { addr: 172.30.3.0, width: 24 }), target: VpcSubnet(V4(Ipv4Net { addr: 172.30.3.0, width: 24 })) }
ResolvedVpcRoute { dest: V6(Ipv6Net { addr: fd77:e9d2:9cd9:1::, width: 64 }), target: VpcSubnet(V6(Ipv6Net { addr: fd77:e9d2:9cd9:1::, width: 64 })) }
ResolvedVpcRoute { dest: V6(Ipv6Net { addr: fd77:e9d2:9cd9:2::, width: 64 }), target: VpcSubnet(V6(Ipv6Net { addr: fd77:e9d2:9cd9:2::, width: 64 })) }
ResolvedVpcRoute { dest: V6(Ipv6Net { addr: fd77:e9d2:9cd9:3::, width: 64 }), target: VpcSubnet(V6(Ipv6Net { addr: fd77:e9d2:9cd9:3::, width: 64 })) }

I have neither evidence this actually happened on dublin nor a hypothesis for why this query failed, but if it did, I think it would explain the situation we saw.

@davepacheco
Copy link
Collaborator Author

I have also saved the output of omdb reconfigurator export into /staff/core/omicron-7534. I'm mostly interested in this for writing better tooling to deal with the stuff I did by hand above.

FelixMcFelix added a commit that referenced this issue Feb 13, 2025
This stems from part of #7534. In understanding that, we have a
hypothetical failure mode where any lookup which fails due to a database
failure (rather than a name lookup) could lead to gaps in routes, which
would require a router bump to retry.

This PR changes resolution of all items such that any unexpected DB failures
(transient or otherwise) *will* cause the router's resolution to bail. It
will take some time before it's recomputed, but we know it will be done.
@jgallagher
Copy link
Contributor

After finishing the expungement of sled 17 yesterday, today we discovered the boundary NTP zone that had been placed on sled 15 was in the same state (i.e., missing the default IPv4 route in opte):

BRM27230037 # opteadm dump-layer -p opte0 router
Port opte0 - Layer router
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Outbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Inbound Rules
----------------------------------------------------------------------
ID   PRI  HITS  PREDICATES  ACTION
DEF  --   34    --          "allow"

Outbound Rules
----------------------------------------------------------------------
ID   PRI  HITS  PREDICATES                           ACTION
8    27   0     inner.ip.dst=172.30.1.0/24           "Meta: Target = Subnet: 172.30.1.0/24"
7    27   2     inner.ip.dst=172.30.3.0/24           "Meta: Target = Subnet: 172.30.3.0/24"
3    27   0     inner.ip.dst=172.30.2.0/24           "Meta: Target = Subnet: 172.30.2.0/24"
6    139  0     inner.ip6.dst=fd77:e9d2:9cd9:1::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:1::/64"
4    139  0     inner.ip6.dst=fd77:e9d2:9cd9:3::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:3::/64"
2    139  0     inner.ip6.dst=fd77:e9d2:9cd9:2::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:2::/64"
5    267  0     inner.ip6.dst=::/0                   "Meta: Target = IG(None)"
DEF  --   9943  --                                   "deny"

We confirmed that sled-agent claimed to have the most recent version of the system route (10):

BRM27230037 # curl http://[fd00:1122:3344:126::1]:12345/vpc-routes  | jq
[
  {
    "id": {
      "vni": 100,
      "kind": {
        "type": "system"
      }
    },
    "version": {
      "router_id": "001de000-074c-4000-8000-000000000001",
      "version": 10
    }
  },
  {
    "id": {
      "vni": 100,
      "kind": {
        "type": "custom",
        "subnet": "172.30.3.0/24"
      }
    },
    "version": null
  },
  {
    "id": {
      "vni": 100,
      "kind": {
        "type": "custom",
        "subnet": "172.30.2.0/24"
      }
    },
    "version": null
  }
]

We then started capturing sled-agent debug!-level logs via dtrace:

dtrace -x strsize=8k -q -n 'slog$1:::debug{printf("%s\n", json(copyinstr(arg0),"ok"));}' $(pgrep -nx sled-agent)

and manually sent a PUT /vpc-routes request instructing sled-agent to delete and clear its version for the system router:

BRM27230037 # cat kill-routes.json
[
  {
    "id": {
      "vni": 100,
      "kind": {
        "type": "system"
      }
    },
    "version": null,
    "routes": []
  }
]
BRM27230037 # curl -H "Content-Type: application/json" -X PUT --data @kill-routes.json http://[fd00:1122:3344:126::1]:12345/vpc-routes

sled-agent logged that it thought it needed 8 routes, including 0.0.0.0/0:

vpc router ensure to_delete: {
    ResolvedVpcRoute {
        dest: V4(
 Ipv4Net {
                addr: 0.0.0.0,
                width: 0,
            },
        ),
        target: InternetGateway(
            Some(
                001de000-074c-4000-8000-000000000002,
            ),
        ),
    },
    ResolvedVpcRoute {
        dest: V6(
            Ipv6Net {
                addr: fd77:e9d2:9cd9:1::,
                width: 64,
            },
        ),
        target: VpcSubnet(
            V6(
                Ipv6Net {
                    addr: fd77:e9d2:9cd9:1::,
                    width: 64,
                },
            ),
        ),
    },
    ResolvedVpcRoute {
        dest: V6(
            Ipv6Net {
                addr: fd77:e9d2:9cd9:3::,
                width: 64,
            },
        ),
 target: VpcSubnet(
            V6(
                Ipv6Net {
                    addr: fd77:e9d2:9cd9:3::,
                    width: 64,
                },
            ),
        ),
    },
    ResolvedVpcRoute {
        dest: V4(
            Ipv4Net {
                addr: 172.30.3.0,
                width: 24,
            },
        ),
        target: VpcSubnet(
            V4(
                Ipv4Net {
                    addr: 172.30.3.0,
                    width: 24,
                },
),
        ),
    },
    ResolvedVpcRoute {
        dest: V6(
            Ipv6Net {
                addr: ::,
                width: 0,
            },
        ),
        target: InternetGateway(
            Some(
                001de000-074c-4000-8000-000000000002,
            ),
        ),
    },
    ResolvedVpcRoute {
        dest: V6(
            Ipv6Net {
                addr: fd77:e9d2:9cd9:2::,
                width: 64,
            },
        ),
        target: VpcSubnet(
            V6(
                Ipv6Net {
                    addr: fd77:e9d2:9cd9:2::,
                    width: 64,
                },
            ),
        ),
    },
    ResolvedVpcRoute {
        dest: V4(
            Ipv4Net {
                addr: 172.30.1.0,
 width: 24,
            },
        ),
        target: VpcSubnet(
            V4(
                Ipv4Net {
                    addr: 172.30.1.0,
                    width: 24,
                },
            ),
        ),
    },
    ResolvedVpcRoute {
        dest: V4(
            Ipv4Net {
                addr: 172.30.2.0,
                width: 24,
            },
        ),
        target: VpcSubnet(
            V4(
                Ipv4Net {
                    addr: 172.30.2.0,
 width: 24,
                },
            ),
        ),
    },
}

The next request from Nexus that re-sent version 10 also had 8 routes, including 0.0.0.0/0:

new routes: [
    ResolvedVpcRouteSet {
        id: RouterId {
vni: Vni(
                100,
            ),
            kind: System,
        },
        version: Some(
            RouterVersion {
                router_id: 001de000-074c-4000-8000-000000000001,
                version: 10,
            },
        ),
        routes: {
            ResolvedVpcRoute {
                dest: V6(
                    Ipv6Net {
                        addr: fd77:e9d2:9cd9:3::,
                        width: 64,
                    },
                ),
                target: VpcSubnet(
                    V6(
                        Ipv6Net {
                            addr: fd77:e9d2:9cd9:3::,
                            width: 64,
                        },
                    ),
                ),
            },
ResolvedVpcRoute {
                dest: V6(
                    Ipv6Net {
                        addr: fd77:e9d2:9cd9:2::,
                        width: 64,
                    },
                ),
                target: VpcSubnet(
V6(
                        Ipv6Net {
                            addr: fd77:e9d2:9cd9:2::,
                            width: 64,
                        },
                    ),
                ),
            },
            ResolvedVpcRoute {
dest: V4(
                    Ipv4Net {
                        addr: 172.30.3.0,
                        width: 24,
                    },
                ),
                target: VpcSubnet(
                    V4(
                        Ipv4Net {
                            addr: 172.30.3.0,
                            width: 24,
                        },
                    ),
                ),
            },
            ResolvedVpcRoute {
                dest: V6(
                    Ipv6Net {
                        addr: fd77:e9d2:9cd9:1::,
                        width: 64,
                    },
                ),
                target: VpcSubnet(
                    V6(
                        Ipv6Net {
                            addr: fd77:e9d2:9cd9:1::,
                            width: 64,
                        },
                    ),
                ),
            },
            ResolvedVpcRoute {
                dest: V4(
                    Ipv4Net {
addr: 172.30.2.0,
                        width: 24,
                    },
                ),
                target: VpcSubnet(
                    V4(
                        Ipv4Net {
                            addr: 172.30.2.0,
width: 24,
                        },
                    ),
                ),
            },
            ResolvedVpcRoute {
                dest: V4(
                    Ipv4Net {
                        addr: 0.0.0.0,
                        width:0,
                    },
                ),
                target: InternetGateway(
                    Some(
                        001de000-074c-4000-8000-000000000002,
                    ),
                ),
            },
            ResolvedVpcRoute {
                dest: V6(
                    Ipv6Net {
                        addr: ::,
                        width: 0,
                    },
                ),
                target: InternetGateway(
                    Some(
001de000-074c-4000-8000-000000000002,
                    ),
                ),
            },
            ResolvedVpcRoute {
                dest: V4(
                    Ipv4Net {
                        addr: 172.30.1.0,
                        width: 24,
                    },
                ),
                target: VpcSubnet(
                    V4(
                        Ipv4Net {
                            addr: 172.30.1.0,
                            width: 24,
                        },
                    ),
                ),
            },
        },
    },
    ResolvedVpcRouteSet {
        id: RouterId {
            vni: Vni(
                100,
            ),
            kind: Custom(
                V4(
                    Ipv4Net {
                        addr: 172.30.3.0,
                        width: 24,
                    },
                ),
            ),
        },
        version: None,
        routes: {},
    },
    ResolvedVpcRouteSet {
        id: RouterId {
vni: Vni(
                100,
            ),
            kind: Custom(
                V4(
                    Ipv4Net {
                        addr: 172.30.2.0,
                        width: 24,
                    },
                ),
,
        },
        version: None,
        routes: {},
    },
]

and after sled-agent processed that request, opteadm reported the expected set of routes and the NTP zone had connectivity:

BRM27230037 # opteadm dump-layer -p opte0 router
Port opte0 - Layer router
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Outbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Inbound Rules
----------------------------------------------------------------------
ID   PRI  HITS  PREDICATES  ACTION
DEF  --   124   --          "allow"

Outbound Rules
----------------------------------------------------------------------
ID   PRI  HITS   PREDICATES                           ACTION
16   27   0      inner.ip.dst=172.30.3.0/24           "Meta: Target = Subnet: 172.30.3.0/24"
13   27   0      inner.ip.dst=172.30.1.0/24           "Meta: Target = Subnet: 172.30.1.0/24"
11   27   0      inner.ip.dst=172.30.2.0/24           "Meta: Target = Subnet: 172.30.2.0/24"
10   75   90     inner.ip.dst=0.0.0.0/0               "Meta: Target = IG(None)"
14   139  0      inner.ip6.dst=fd77:e9d2:9cd9:2::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:2::/64"
12   139  0      inner.ip6.dst=fd77:e9d2:9cd9:1::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:1::/64"
9    139  0      inner.ip6.dst=fd77:e9d2:9cd9:3::/64  "Meta: Target = Subnet: fd77:e9d2:9cd9:3::/64"
15   267  0      inner.ip6.dst=::/0                   "Meta: Target = IG(None)"
DEF  --   10243  --                                   "deny"

BRM27230037 # zlogin $(zoneadm list | grep ntp)
[Connected to zone 'oxz_ntp_ec3f76ac-894a-4cfa-b0da-b136b452a52b' pts/7]
Last login: Thu Feb 13 16:24:48 on pts/3
The illumos Project     helios-2.0.23216        February 2025
root@oxz_ntp_ec3f76ac:~# ping 1.1.1.1
1.1.1.1 is alive

This seems to imply that the problem is on the sled-agent side: it knew there were supposed to be 8 routes (which it presumably got from Nexus), but only 7 were present. Manually deleting those routes allowed it to process the existing set from Nexus again, which restored things to working order.

@citrus-it
Copy link
Contributor

Why might this happen? Recall that the sled tracking sled 17 is reporting sled 17's boundary NTP as stratum 11, which means sled 17 never thinks it sync’d up with upstream. That could explain much of the behavior here.

This actually means something different from what it did in early releases.
A boundary NTP server contains the following configuration line:

# Enable local reference mode, which keeps us operating as an NTP server that
# appears synchronised even if there are currently no active upstreams. When
# in this mode, we report as stratum 10 to clients. The `distance' parameter
# controls when we will decide to abandon the upstreams and switch to the local
# reference. By setting `activate`, we prevent the server from ever activating
# its local reference until it has synchronised with upstream at least once and
# the root distance has dropped below the provided threshold. This prevents
# a boundary server in a cold booted rack from authoritatively advertising a
# time from the 1980s prior to gaining external connectivity.
#
# distance: Distance from root above which we use the local reference, opting
#           to ignore the upstream.
# activate: Distance from root below which we must fall once to ever consider
#           the local reference.
#
local stratum 10 distance 0.4 activate 0.5

If a boundary server is advertising with stratum 10 it therefore means two things:

  • It is not currently synchronised with an upstream;
  • It, however, /was/ once successfully synchronised with an upstream with a short distance to the root.

If it was never synchronised, then it will never advertise an authoritative time, which we did deliberately to avoid potential race conditions around sled boot with lack of upstream connectivity to the NTP zone.

In my experience the clocks across a fleet of gimlets running free are very consistent, but we should perhaps look at the units in Dublin.

@citrus-it
Copy link
Contributor

Looking at sled 17 in dublin which appears to be in the same state again, and knowing what actions were performed, I suspect that the following sequence of events has occurred:

  • The sled was up and running with an internal NTP zone, and therefore had synchronised time;
  • The internal NTP zone was expunged;
  • The clock drifted slightly in this window - not unexpected;
  • A boundary NTP zone was created;
  • That new zone came online, synchronised with an upstream server and started slewing to bring the clock back into line;
  • The tracking.log has more detail but the initial applied skew was 660.647 ppm and started dropping quickly each second until 26 seconds after synchronisation started it had reduced to 1.203 ppm;
  • At this point, upstream connectivity failed (that's a separate omicron/nexus bug which is being investigated);
  • The server steadfastly remains advertising itself as stratum 2, and the applied clock skew is still 1.203 ppm;
  • That has resulted in it advertising (very authoritatively) a time which increasingly wrong.

The configuration states that the server should revert to the local reference at stratum 10 once the root distance increases above 0.4. That distance is calculated as $delay / 2 + dispersion$.

Here is the current state - some 89 minutes after connectivity was lost:

Reference ID    : AC140005 (172.20.0.5)
Stratum         : 3
Ref time (UTC)  : Thu Feb 13 18:33:28 2025
System time     : 0.000000059 seconds slow of NTP time
Last offset     : +0.000001809 seconds
RMS offset      : 0.000014176 seconds
Frequency       : 47.053 ppm slow
Residual freq   : +0.018 ppm
Skew            : 1.203 ppm
Root delay      : 0.025872087 seconds
Root dispersion : 0.013005064 seconds
Update interval : 4.1 seconds
Leap status     : Normal

Which means that $distance = 0.025872087 / 2 * 0.013005064 = 0.000168$ - it certainly has not risen to 0.4

I've placed the chrony logs, current state snapshot and core files from both chronyd processes in this zone at /staff/core/omicron-7534/chrony.

@jordanhendricks
Copy link
Contributor

Which means that distance = 0.025872087 / 2 ∗ 0.013005064 = 0.000168 - it certainly has not risen to 0.4

Minor typo, but that should be an +, no? So it is:

0.025872087 / 2 + 0.013005064 = 0.0259411075

which is not close to 0.4, still, but just double checking my understanding

@jordanhendricks
Copy link
Contributor

jordanhendricks commented Feb 14, 2025

I spent some time today looking into understanding why the boundary ntp server on sled 17 was behaving as it was. Andy's comment here (#7534 (comment)) has a good summary of what he thinks happened.

A couple of things that threw me off initially coming up to speed on this particular issue:

  • the evidence we have of what sled 17's boundary ntp zone was doing at the time this issue was initially investigated is only from other sleds, because sled 17 was clean slated. So the data in /staff/core/omicron-7534/chrony is from a reproduction of the issue, and the timestamps are different from the ones from the initial investigation.
  • as such, there is a boundary ntp zone on sled 17 as of the time of when I was looking at it, and as of this writing, because a new one was created. It also doesn't, as of right now, have external connectivity, as the issue causing a loss of external connectivity was reproduced as a part of investigating Creating a new service OPTE port causes loss of connectivity for preexisting service OPTE ports #7541
  • chronyc activity will report that the source is online even though it hasn't had connectivity for some time. This is not unexpected, since AIUI online/offline is a flag one can set. (chronyc -n sources shows that the source hasn't been reached for hours in the LastRx column, for example).
  • Another (likely incorrect) idea I had was that perhaps the boundary ntp server had attempted to step the clock 3 times on startup (since we configure it as such with makestep) and didn't have connectivity during that time, leaving the clock wrong indefinitely. Andy also pointed out this wasn't likely because the clock drift was likely under 0.1 seconds when the new boundary server started up.

As pointed out in Andy's comment, the distance at which the boundary ntp server is configured to fall back to stratum 10 is 0.4. The distance has increased since his comment, but not by a lot: the root delay is still the same, but the dispersion is increasing:

root@oxz_ntp_8a93967c:~# chronyc -n tracking
Reference ID    : AC140005 (172.20.0.5)
Stratum         : 3
Ref time (UTC)  : Thu Feb 13 18:33:28 2025
System time     : 0.000000101 seconds slow of NTP time
Last offset     : +0.000001809 seconds
RMS offset      : 0.000014176 seconds
Frequency       : 47.053 ppm slow
Residual freq   : +0.018 ppm
Skew            : 1.203 ppm
Root delay      : 0.025872087 seconds
Root dispersion : 0.044839203 seconds
Update interval : 4.1 seconds
Leap status     : Normal

That's a distance of ~0.0577, up from ~0.0259 several hours ago. Assuming the distance increases linearly (which would make sense if only the dispersion has changed in the past several hours), and if my back of the envelope math is correct, that would mean about 12 hours until we hit the threshold distance for the boundary server to raise its stratum value.

For comparison, here is the other current (working) boundary ntp tracking info from sled 14, which we can compute has a distance of ~0.0136.

root@oxz_ntp_97dbfd9a:~# chronyc -n tracking
Reference ID    : AC140005 (172.20.0.5)
Stratum         : 3
Ref time (UTC)  : Fri Feb 14 00:07:53 2025
System time     : 0.000002403 seconds slow of NTP time
Last offset     : +0.000001394 seconds
RMS offset      : 0.000021891 seconds
Frequency       : 60.341 ppm slow
Residual freq   : +0.003 ppm
Skew            : 0.842 ppm
Root delay      : 0.026359992 seconds
Root dispersion : 0.000425011 seconds
Update interval : 8.1 seconds
Leap status     : Normal

This leaves us with at least two questions to explore:

  • We know we can reproduce this issue (boundary zone losing connectivity with some amount of clock skew, then drifting forever while still advertising itself as stratum 2) using the other (opte-related) bug above. It seems plausible that we could reproduce it in a more controlled environment by deploying our ntp configuration in a zone on a different system, then artificially inducing a loss of network connectivity with something like ipd. Is this issue reproducible in such an environment? If so, should we consider changing the distance at which to fall back to stratum 10 (or assess other config options)? And if not, is there something distinct about sled 17's circumstances that is causing the root dispersion to increase at the rate it is (seemingly not very fast)?
  • Another thing to explore is how chrony handles calculating the root delay and root dispersion values when it doesn't have connectivity with the source.

@morlandi7 morlandi7 added this to the 13 milestone Feb 14, 2025
@citrus-it
Copy link
Contributor

citrus-it commented Feb 15, 2025

I span up an OmniOS VM on colo to get some more data on how chrony behaves when it is:

  • configured with a single upstream server;
  • with that upstream server is specified by IP address.
    and then what happens if:
  • connectivity to that upstream server is severed.

Here is the starting stable state:

root@chrony:~# chronyc tracking
Reference ID    : D8EF2304 (time2.google.com)
Stratum         : 2
Ref time (UTC)  : Sat Feb 15 17:12:02 2025
System time     : 0.000010400 seconds slow of NTP time
Last offset     : -0.000019197 seconds
RMS offset      : 0.000161933 seconds
Frequency       : 25.781 ppm fast
Residual freq   : -0.007 ppm
Skew            : 0.926 ppm
Root delay      : 0.014468231 seconds
Root dispersion : 0.000747685 seconds
Update interval : 64.6 seconds
Leap status     : Normal

root@chrony:~# chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* time2.google.com              1   7   377   193    -92us[ -112us] +/- 7723us

Running a simple monitoring script to track root delay, dispersion and distance.

#!/bin/ksh

while :; do
        out=`chronyc -c tracking`
        IFS=, set -- $out
        delay=${11}
        disp=${12}
        ((distance = delay / 2 + disp))
        printf "%s %.5f %.5f -- %.5f\n" `date` delay disp distance
        sleep 10
done

We're off to a good start, and then I remove connectivity by adding a blackhole route:

15 February 2025 at 17:20:22 UTC 0.01446 0.00092 -- 0.00816
15 February 2025 at 17:20:32 UTC 0.01446 0.00094 -- 0.00818
15 February 2025 at 17:20:42 UTC 0.01446 0.00083 -- 0.00806
15 February 2025 at 17:20:52 UTC 0.01446 0.00084 -- 0.00808
15 February 2025 at 17:21:02 UTC 0.01446 0.00086 -- 0.00810
15 February 2025 at 17:21:12 UTC 0.01446 0.00088 -- 0.00812
15 February 2025 at 17:21:22 UTC 0.01446 0.00090 -- 0.00813
route add -hostroute add -host -blackhole 216.239.35.4 127.0.0.1
add host 216.239.35.4: gateway 127.0.0.1
root@chrony:~#
15 February 2025 at 17:21:32 UTC 0.01446 0.00092 -- 0.00815
15 February 2025 at 17:21:42 UTC 0.01446 0.00094 -- 0.00817
15 February 2025 at 17:21:52 UTC 0.01446 0.00095 -- 0.00819
15 February 2025 at 17:22:02 UTC 0.01446 0.00097 -- 0.00821
15 February 2025 at 17:22:12 UTC 0.01446 0.00099 -- 0.00822
15 February 2025 at 17:22:22 UTC 0.01446 0.00101 -- 0.00824

Some time later...

15 February 2025 at 19:27:22 UTC 0.01446 0.01470 -- 0.02193
15 February 2025 at 19:27:32 UTC 0.01446 0.01472 -- 0.02195

This is consistent with what we saw in that the root distance is not increasing quickly and chrony is not going to fall back to its local source any time soon.

At this point we've been down for over two hours.

omnios@chrony:~$ chronyc sources -v

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current best, '+' = combined, '-' = not combined,
| /             'x' = may be in error, '~' = too variable, '?' = unusable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? time2.google.com              1  10     0  130m   +856us[ +934us] +/- 8720us

@citrus-it
Copy link
Contributor

Eventually that test system does fall back to stratum 10:

root@chrony:~# chronyc tracking
Reference ID    : 7F7F0101 ()
Stratum         : 10
Ref time (UTC)  : Tue Feb 18 18:15:18 2025
System time     : 0.000000601 seconds fast of NTP time
Last offset     : +0.000077837 seconds
RMS offset      : 0.000142738 seconds
Frequency       : 25.940 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.000 ppm
Root delay      : 0.000000000 seconds
Root dispersion : 0.000000000 seconds
Update interval : 129.7 seconds
Leap status     : Normal

but this is after 60 hours.

18 February 2025 at 04:58:46 UTC 0.01446 0.39267 -- 0.39991
18 February 2025 at 04:58:56 UTC 0.01446 0.39269 -- 0.39993
18 February 2025 at 04:59:06 UTC 0.01446 0.39271 -- 0.39994
18 February 2025 at 04:59:16 UTC 0.01446 0.39273 -- 0.39996
18 February 2025 at 04:59:26 UTC 0.01446 0.39275 -- 0.39998
18 February 2025 at 04:59:36 UTC 0.01446 0.39276 -- 0.40000
18 February 2025 at 04:59:46 UTC 0.00000 0.00000 -- 0.00000
18 February 2025 at 04:59:56 UTC 0.00000 0.00000 -- 0.00000
18 February 2025 at 05:00:06 UTC 0.00000 0.00000 -- 0.00000
18 February 2025 at 05:00:16 UTC 0.00000 0.00000 -- 0.00000
18 February 2025 at 05:00:26 UTC 0.00000 0.00000 -- 0.00000

This is obviously not terribly quick or desirable, especially if the other boundary time server is also serving stratum 2 but has an active upstream.

So, how do we improve things here?

  1. We should consider (as has already been suggested) having the boundary servers reference each other as well as the configured upstream external to the rack;
  2. We probably want the root dispersion to increase faster than the current value of $1\mu s/s$. The value used by most other NTP server implementations, and the default described in RFC5905 is $15\mu s/s$ which amounts to 1.3s per day. We'd have dropped to stratum 10 after 4 hours instead of 60 -- probably still not fast enough for our needs;
  3. We may want to introduce some sort of fast fail parameter to chrony's local configuration statement, or some other configurable values that set limits on how long a now absent upstream will be considered viable.

FelixMcFelix added a commit that referenced this issue Feb 20, 2025
This stems from attempting to understand #7534/#7541. In reviewing
that, we have a hypothetical failure mode where any lookup which fails
due to a database failure (rather than a name lookup) could lead to gaps
in routes, which would require a router bump to retry.

This PR changes resolution of all items such that any unexpected DB
failures (transient or otherwise) *will* cause the router's resolution
to bail. It will take some time before it's recomputed, but we know it
will be done.
hawkw pushed a commit that referenced this issue Feb 21, 2025
This stems from attempting to understand #7534/#7541. In reviewing
that, we have a hypothetical failure mode where any lookup which fails
due to a database failure (rather than a name lookup) could lead to gaps
in routes, which would require a router bump to retry.

This PR changes resolution of all items such that any unexpected DB
failures (transient or otherwise) *will* cause the router's resolution
to bail. It will take some time before it's recomputed, but we know it
will be done.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants