Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature-request: trivial coordinator heartbeat router #58

Open
maxgruber19 opened this issue Feb 3, 2025 · 1 comment
Open

feature-request: trivial coordinator heartbeat router #58

maxgruber19 opened this issue Feb 3, 2025 · 1 comment

Comments

@maxgruber19
Copy link

maxgruber19 commented Feb 3, 2025

I'd like to have a very easy router that just sends a http get to the trino coordinator to know about its state.

we currently use a custom pythonscript router, make it curl https://trino-coordinator-default/v1/info and check for "starting" field to be false. this leads to a very very basic "queuing procedure" but clients die after couple of seconds when they dont get a feedback from the lb because its stuck in its routing loop. I'll attach a basic example below. Of course this scenario limits the routing functionality to one cluster only instead of multiple clusters dynamically.

The behavior Id like to propose is that the trino-lb should send back "QUEUED_IN_TRINO_LB" as long as its waiting for the coordinator to be alive again. Unfortunately I have no clue about rust, so I dont feel ready to propose some code myself.

If there already is something like that I'm very curious to know.

import time
from typing import Optional
import requests


def isCoordinatorReady():
  try:
    response = requests.get(
      "https://trino-coordinator-default.mesh-platform-core.svc.cluster.local:8443/v1/info",
      verify="/etc/secret-provisioner-tls/ca.crt"
    )
  except Exception as e:
    return False

  if response.status_code == 200 and not response.json()['starting']:
    return True
  return False


def targetClusterGroup(query: str, headers: dict[str, str]) -> Optional[str]:
  while not isCoordinatorReady():
    time.sleep(10)
  return "my-single-cluster"
@maxgruber19
Copy link
Author

I thought about this once again and came to a simple solution to set the routingFallback as not required, when a pythonscript router returns None or throws an exception the lb could treat all clusters as non routable and fall back to the "queued_in_trino_lb" state similar to an empty collection of trino clusters. maybe that makes the changes way easier?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants