Suggesting transition to new approach of acquiring column name #168

dongwook-chan · 2023-09-17T11:52:49Z

I'm a contributor to python-mysql-replication and you might remember me from the issue. I would like your opinion on eliminating table_map in python-mysql-replication.

Background:
The current approach in python-mysql-replication for gathering column schema is to SELECT information_schema.columns. pg_chameleon seems to be referring to column name in specific.

pg_chameleon/pg_chameleon/lib/mysql_lib.py

Lines 1428 to 1441 in 3e431ec

    
           for column_name in event_after: 
        
               try: 
        
                   column_type=column_map[column_name] 
        
               except KeyError: 
        
                   self.logger.debug("Detected inconsistent structure for the table  %s. The replay may fail. " % (table_name)) 
        
                   column_type = 'text' 
        
               if column_type in self.hexify and event_after[column_name]: 
        
                   event_after[column_name]=binascii.hexlify(event_after[column_name]).decode() 
        
               elif column_type in self.hexify and isinstance(event_after[column_name], bytes): 
        
                   event_after[column_name] = '' 
        
               elif column_type == 'json': 
        
                   event_after[column_name] = self.__decode_dic_keys(event_after[column_name]) 
        
               elif column_type in self.spatial_datatypes and event_after[column_name]: 
        
                   event_after[column_name] = self.__get_text_spatial(event_after[column_name])

I have used python-mysql-replication connecting to MySQL which serves 2,500+ qps. Under circumstances where replication gap exists, the result of SELECT would represent the column schema at the time of execution of SELECT rather than the time when the event was generated. This results in receiving wrong column names, maybe in wrong orders or dummy column names that does not exist in the MySQL.

Concern:
Given that pf_chameleon depends on python-mysql-replication, I wanted to get your input on potential disruptions. The old approach (SELECTing information_schema) could have had its own set of issues or limitations that users of pf_chameleon might have faced.

Proposed Solutions:

Drop support for the old approach and parse optional_metadata which holds column names. This could lead to a cleaner codebase but might introduce breaking changes for those who rely on the old behavior.
julien-duponchelle/python-mysql-replication#477

I would like to know what you think about this change. I will assist you with anything I can.

The text was updated successfully, but these errors were encountered:

the4thdoctor · 2025-01-07T09:46:43Z

Hi, sorry life has kicked hard in the past years and only now I'm getting back to the project.
I spotted the warn on the new replication library.
However it seems pg_chameleon is working as usual.
As the variable BINLOG_ROW_METADATA is present only on newer mysql/mariadb versions could be possible to turn off the warning based on the database version?
Ta!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggesting transition to new approach of acquiring column name #168

Suggesting transition to new approach of acquiring column name #168

dongwook-chan commented Sep 17, 2023

the4thdoctor commented Jan 7, 2025

Suggesting transition to new approach of acquiring column name #168

Suggesting transition to new approach of acquiring column name #168

Comments

dongwook-chan commented Sep 17, 2023

the4thdoctor commented Jan 7, 2025