Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Artifact Detection in R-Peak Detection using NeuroKit2 #1064

Open
sreeshmaraj opened this issue Jan 23, 2025 · 8 comments
Open

Issue with Artifact Detection in R-Peak Detection using NeuroKit2 #1064

sreeshmaraj opened this issue Jan 23, 2025 · 8 comments

Comments

@sreeshmaraj
Copy link

Hi everyone,

I am using NeuroKit2 to process raw ECG signals and calculate HRV metrics. I have a dataset of over 1000 signals. Below is a sample code I am using for the analysis:

           ` data = pd.read_csv(file_path)             
            ecg_cleaned, _ = nk.ecg_process(data, sampling_rate=500, method='Neurokit')
            ecg_cleaned_signal = ecg_cleaned['ECG_Clean']
            rpeaks = nk.ecg_findpeaks(ecg_cleaned_signal, sampling_rate=500, method='neurokit')
            fixed_rpeaks = nk.signal_fixpeaks(rpeaks, sampling_rate=500, method='kubios')

            # Check the structure of fixed_rpeaks
            if isinstance(fixed_rpeaks, dict):
                rpeaks_indices = fixed_rpeaks.get('ECG_R_Peaks', [])
            elif isinstance(fixed_rpeaks, tuple):
                rpeaks_indices = fixed_rpeaks[1]  # Adjust based on the actual structure
            else:
                raise ValueError(f"Unexpected structure for fixed_rpeaks: {type(fixed_rpeaks)}")

            # Calculate time-domain HRV metrics
            hrv_time = nk.hrv_time(rpeaks_indices, sampling_rate=500)
            time_output_path = os.path.join(time_folder, f"{bin_name}_time_metrics.txt")
            hrv_time.to_csv(time_output_path, sep='\t', index_label='Metric')
            print(f"Time metrics for bin {idx + 1} saved to {time_output_path}")

            # Calculate frequency-domain HRV metrics
            hrv_frequency = nk.hrv_frequency(rpeaks_indices, sampling_rate=500, normalize=True, interpolation_rate=4, psd_method='fft', show=False)
            frequency_output_path = os.path.join(frequency_folder, f"{bin_name}_frequency_metrics.txt")
            hrv_frequency.to_csv(frequency_output_path, sep='\t', index_label='Metric')
            print(f"Frequency metrics for bin {idx + 1} saved to {frequency_output_path}")

            # Calculate nonlinear HRV metrics with additional error handling
            try:
                hrv_nonlinear = nk.hrv_nonlinear(rpeaks_indices, sampling_rate=500)
                nonlinear_output_path = os.path.join(nonlinear_folder, f"{bin_name}_nonlinear_metrics.txt")
                hrv_nonlinear.to_csv(nonlinear_output_path, sep='\t', index_label='Metric')
                print(f"Nonlinear metrics for bin {idx + 1} saved to {nonlinear_output_path}")
            except Exception as e:
                print(f"Error calculating nonlinear metrics for bin {idx + 1} of {file_name}: {e}")
                continue

        except Exception as e:
            print(f"Error processing bin {idx + 1} of {file_name}: {e}")
            continue`

While visualizing the ECG signals and R-peaks, I noticed that some signals with high-amplitude artifacts are being incorrectly identified as R-peaks. These artifacts could negatively impact the HRV calculations.

Here is an example visualization where the artifact is being detected as an R-peak:

Image

Is there any way to clean or filter these artifacts more effectively during R-peak detection in NeuroKit2? For example:

Can I apply additional preprocessing steps to filter out high-amplitude artifacts?
Are there parameters or alternative methods in nk.ecg_findpeaks or nk.ecg_process to improve artifact rejection?
I want to ensure the detected R-peaks are accurate before calculating HRV metrics.

Any guidance, suggestions, or best practices would be greatly appreciated!

Thank you in advance! 😊

Copy link

welcome bot commented Jan 23, 2025

Hi 👋 Thanks for reaching out and opening your first issue here! We'll try to come back to you as soon as possible. ❤️ kenobi

@DerAndereJohannes
Copy link
Collaborator

Hi! It looks like your signal has some very short moments of salt/pepper noise. ecg_process does only apply a high pass filter to get rid of the drift, but does nothing to the higher frequencies as they are needed for the signal itself (other than powerline noise). Frequency processing (using frequency analysis) could be an option, however this may eat into the data that you actually want to.

I can't tell from the image itself, but if the noise is very small (and especially smaller than your r peak size in terms of sample size), then you could try and look at a median filter before applying the ECG processing. Here is an example:

Image

# Imports
import neurokit2 as nk
import matplotlib.pyplot as plt

# Simulate ECG
ecg = nk.ecg_simulate(duration=15, heart_rate=60,sampling_rate=500)

# Simulate salt/pepper peaks
ecg[6000:6010] = 8
ecg[3200:3210] = 8

# Apply median filtering
ecg2 = nk.signal_smooth(ecg, kernel='median', size=25)

# ECG Process
signals, info = nk.ecg_process(ecg, sampling_rate=500)
signals2, info2 = nk.ecg_process(ecg2, sampling_rate=500)

# Visualize
fig, axs = plt.subplots(1, 2)
axs[0].set_title("Cleaned ECG Signal WITHOUT median smoothing")
axs[0].plot(signals['ECG_Clean'])
axs[0].set_ylim(-2, 8)
axs[1].set_title("Cleaned ECG Signal WITH median smoothing")
axs[1].plot(signals2['ECG_Clean'])
axs[1].set_ylim(-2, 8)
plt.tight_layout()
plt.show()

Let me know if this helped.

@sreeshmaraj
Copy link
Author

sreeshmaraj commented Jan 23, 2025 via email

@DerAndereJohannes
Copy link
Collaborator

Hi sreeshmaraj,

Unfortunate to hear that the signal_smooth function does not seem to work properly. Is this from a data or function standpoint? I.e., does the smoothing remove too much of your actual data or does the resulting signal still contain the noise?

I'll be honest and say that I have not really done many HRV based analyses before. What I do know is that what is most important about HRV is that the r peaks are correctly classified, so if the non-cardiac peaks are being detected as rpeaks, then this is indeed a problem.

I personally do not see how standardizing the ecg signal can be beneficial, as it will just scale the signal in a different way. The fake peaks will still remain.

@sreeshmaraj
Copy link
Author

sreeshmaraj commented Jan 23, 2025 via email

@DerAndereJohannes
Copy link
Collaborator

Ah okay, there is not much I can do without seeing your data itself. However do try and play around with the size function argument like i showed here: ecg2 = nk.signal_smooth(ecg, kernel='median', size=25) Try to increase size until the spikes go away and see if your actual ecg signal still looks the same in terms of shape.

Later you can try to find if the spikes are similar in length and find a size value for the entire dataset.

@sreeshmaraj
Copy link
Author

sreeshmaraj commented Jan 24, 2025 via email

@DerAndereJohannes
Copy link
Collaborator

Using a median filter is a very common non-linear technique to smooth the signal. It is a valid pre-processing technique. Just be as transparent as possible in your technique and try to make as little changes as possible to not deviate too much from the original signal.

Your noise might take up more samples as I have assumed, it could be that you need more specific with your corrections. The good thing is that your noise is obvious in amplitude, so we can use this info to get better results.

Instead of a median filter, you could perhaps looks at morphological filters and focus them on the sections that have high amplitudes. Take an example of masking the outliers and applying dilation and erosion in this example:

Image

import numpy as np
import neurokit2 as nk
import matplotlib.pyplot as plt
from scipy.ndimage import grey_erosion, grey_dilation

# Simulate ECG
ecg = nk.ecg_simulate(duration=15, heart_rate=60,sampling_rate=500)

# Simulate salt/pepper peaks
ecg[6000:6010] = 8
ecg[3200:3210] = 8
ecg[1100:1110] = -8

# Constants
size = 12
threshold = 4
padding = 5

# Create a mask
mask = np.abs(ecg) > threshold
indices = np.where(mask)[0]
chunks = np.split(indices, np.where(np.diff(indices) != 1)[0] + 1)

eroded_signal = ecg.copy()
dilated_signal = ecg.copy()

# Apply morphological filters
for chunk in chunks:
    bot = np.maximum(np.min(chunk) - padding, 0)
    top = np.minimum(np.max(chunk) + padding, len(mask))

    eroded_signal[bot+padding:top-padding+1] = grey_erosion(ecg[bot:top], size=size)[padding:-padding+1]
    dilated_signal[bot+padding:top-padding+1] = grey_dilation(eroded_signal[bot:top], size=size)[padding:-padding+1]

signals_full, infos_f = nk.ecg_process(dilated_signal, sampling_rate=500)
signals_ero, infos_e = nk.ecg_process(eroded_signal, sampling_rate=500)
signals_unchanged, infos_u = nk.ecg_process(ecg, sampling_rate=500)

# Visualize
fig, axs = plt.subplots(3, 1, figsize=(14, 7))
axs[0].plot(ecg, label='Original Signal with Noise')
axs[0].scatter(infos_u['ECG_R_Peaks'], [ecg[x] for x in infos_u['ECG_R_Peaks']], c='r')
axs[0].set_title("Original Signal with Salt and Pepper Noise")

axs[1].plot(eroded_signal, label='Eroded Signal', color='orange')
axs[1].scatter(infos_e['ECG_R_Peaks'], [eroded_signal[x] for x in infos_e['ECG_R_Peaks']], c='r')
axs[1].set_title("Signal after Erosion")

axs[2].plot(dilated_signal, label='Dilated Signal', color='green')
axs[2].scatter(infos_f['ECG_R_Peaks'], [dilated_signal[x] for x in infos_f['ECG_R_Peaks']], c='r')
axs[2].set_title("Signal after Erosion AND Dilation")

plt.tight_layout()
plt.show()

Make sure to also adjust size and threshold to be more in line with your data. Also do note that the noise sections are not perfect (please do zoom in), but it should be enough to get the r peaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants