-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding query_id attr #67
Conversation
To identify different queries with similar query_id or initial_query_id via system.query_log table you can have an ability to set query_id by yourself using query_id param in Airflow operators you using. This commit allows set query_id param as whatever you want, f.e. query_id = DAG_ID + TASK_ID + EXECUTION_DATE and then simply identify a query to build lineage. This can be helpful in situations when you manage equally named tables and the only way to identify their queries is to use a modfied query_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @1ng4lipt and thank you for your contribution 🙏
-
Which
clickhouse-driver
version hasquery_id
been supported starting from? We need to update our requirements constraints accordingly. -
What happens if two queries in ClickHouse has the same query ID? Will it result in an exception or just
system.query_log
will have two entries with the same IDs? -
To make it more comprehensive, could you please also add support to
ClickHouseOperator
?
airflow-clickhouse-plugin/airflow_clickhouse_plugin/operators/clickhouse_operator.py
Line 35 in 368f69a
return hook.run(self._sql, self._parameters)
Current version ofClickHouseOperator
is outdated compared toClickHouseHook
: could you please addwith_column_types
,types_check
, andquery_id
toClickHouseOperator.__init__
and propagate them tohook.run
call inClickHouseOperator.execute
? BTW are there any otherclickhouse-driver
'sexecute
arguments which the current implementation misses? Let's add them too maybe?
If you are unable to do it within this PR, please let me know: I will create an issue to backlog this enhancement and we may do it as a part of another PR later.
- Please also add tests:
class ClickHouseOperatorTestCase(unittest.TestCase):
Hi @1ng4lipt, |
Hi @ne1r0n thank you for a suggestion: definitely looks useable for ClickHouse users. Speaking about our plugin, its sole purpose is to provide a wrapper for I would suggest generalize this PR to include all the existing |
Hi @bryzgaloff, @ne1r0n and thank you for a suggestions and participation in reviewing my PR @bryzgaloff answers on your questions from here
|
clickhouse_conn_id: str = default_conn_name, | ||
parameters: Optional[Dict[str, Any]] = None, | ||
database: Optional[str] = None, | ||
with_column_types: Optional[bool] = False, | ||
types_check: Optional[bool] = False, | ||
query_id: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1ng4lipt my idea for this PR was to add all the parameters supported by execute
method to all three: hook, operator, and sensor. The execute parameters are available here: https://clickhouse-driver.readthedocs.io/en/latest/api.html#clickhouse_driver.Client.execute — could you please add them all and propagate through all the classes and the tests?
Could you please also check which earliest version of clickhouse-driver
supports all the current execute
method arguments?
If you are busy to add the requested, please let me know: it's ok to accept the PR at the current stage. I will just create an issue to backlog adding other the parameters. Your contribution is already valuable, thanks! 🤝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bryzgaloff I see.. Could you please create an issue for all of this so I can modify those files later?
Hi @1ng4lipt thank you for an update, I plan to review and merge this week. I am busy working on v1.0.0 (major release) of the plugin but to not delay |
Hi @bryzgaloff could you please clarify a date you plan to merge my PR? I badly need this functionality on my prod env |
Hi @1ng4lipt I have released https://github.com/bryzgaloff/airflow-clickhouse-plugin/releases/tag/v1.0.0 which introduces You are in the contributors list now 😉 |
To identify different queries with similar query_id or initial_query_id via system.query_log table you can have an ability to set query_id by yourself using query_id param in Airflow operators you using.
It is possible by clickhouse-driver execute() method - https://clickhouse-driver.readthedocs.io/en/latest/api.html?highlight=query_id#clickhouse_driver.Client.execute_iter
This commit allows set query_id param as whatever you want, f.e. query_id = DAG_ID + TASK_ID + EXECUTION_DATE and then simply identify a query to build lineage. This can be helpful in situations when you manage equally named tables and the only way to identify their queries is to use a modfied query_id.