-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-monotonic dataset #81
Comments
Yes, the purple lines are exactly the type of fit that I would like to get. In general, we can expect only 1 sharp kink and the purpose is to detect the direction of the lines before and after. I saw that the same kind of data works just fine as long as the points define a function. For example, if you rotate the points in the picture so that there is only 1 y for a given x position, then the fit will work fine. Although it would be possible the apply a rotation to the data before doing the pmlf fit, I wonder if it is not possible to do something smarter within pmlf itself. Does it make sense? |
Well.. Instead of doing a rotation, you could break x and y into separate 1 dimensional predictions. Let's say that both x and y are functions of the arc length distance from start to end. Then you can fit a 1D pwlf to x, and a 1D pwlf to y. In the end, you'll get something like this. It's just a different way of thinking about the problem, I'm not sure how much help it will be. I can post the code that generated this if you are interested. It is one 3 line segment model to x, and another 3 segment model to y. |
Could you please share the code? It looks very good. When you combine the 2 fits back together, how can you keep track of:
|
import numpy as np
import matplotlib.pyplot as plt
import pwlf
from scipy.spatial.distance import cdist
x = np.array([0.42486339, 0.55889496, 0.70537341, 0.79477838, 0.91180935,
1. , 0.95309654, 0.85868245, 0.77580449, 0.68867638,
0.60731633, 0.523983 , 0.47161506, 0.39268367, 0.3167881 ,
0.21584699, 0.11399514, 0. ])
y = np.array([0. , 0.12927029, 0.25908718, 0.34517628, 0.48018584,
0.64744466, 0.828915 , 0.83684067, 0.83602077, 0.84312654,
0.81279038, 0.81661656, 0.80595791, 0.81361028, 0.79967204,
0.84531293, 0.9150041 , 1. ])
# find the cumlative arc lengths
data = np.vstack((x, y)).T
n = x.size
a = data[0:n-1, :]
b = data[1:n, :]
d = cdist(a, b)
arcLengths = np.diagonal(d)
ArcLengths = np.zeros_like(x)
ArcLengths[1:] = np.cumsum(arcLengths)
# model for each x and y
my_pwlf_x = pwlf.PiecewiseLinFit(ArcLengths, x)
my_pwlf_y = pwlf.PiecewiseLinFit(ArcLengths, y)
res_x = my_pwlf_x.fit(3)
res_y = my_pwlf_y.fit(3)
# generate predictions
new_arc_lengths = np.linspace(ArcLengths.min(), ArcLengths.max(), 100)
xhat = my_pwlf_x.predict(new_arc_lengths)
yhat = my_pwlf_y.predict(new_arc_lengths)
# find new breakpoints... there is two from each fit of three segments
arc_length_breaks = np.array([res_x[1], res_x[2], res_y[1], res_y[2]])
arc_length_breaks_x = my_pwlf_x.predict(arc_length_breaks)
arc_length_breaks_y = my_pwlf_y.predict(arc_length_breaks)
plt.figure()
plt.plot(x, y, 'o-', label='Origingal data')
plt.plot(xhat, yhat, '-', label="pwlf fit")
plt.plot(arc_length_breaks_x, arc_length_breaks_y, 'o', label='breakpoints')
plt.legend()
plt.figure()
plt.plot(ArcLengths, x, label='X')
plt.plot(ArcLengths, y, label='Y')
plt.plot(new_arc_lengths, xhat, label='Xhat')
plt.plot(new_arc_lengths, yhat, label='Yhat')
plt.xlabel('Arc Length')
plt.show()
They will be in separate objects, one for the x predictions, and one for the y predictions.
In this code, I show how you can figure where the new breakpoints are. It's a bit more complicated going in. If x has three segments, and y has three segments, then you can end up with 5 new arc length segments (I think, could be wrong) I must say I think this fitting method is very cool! It seems this would extend to higher dimensions. |
That's great! I would have to think more about the coefficients of the piecewise components. Since the fits are done on x/y vs. arclength, it is not immediately clear how to combine them together into a single set of coefficients in the original data representation. I guess it may be possible to estimate them from the breakpoints but it would be nicer (and probably more stable) to just combine the coefficients directly. |
Which coefficients do you need directly? The arc length stuff is definitely a more complicated modeling approach. As far as doing one model, you could do fits to x as a function of y, and y as a function of x. If both x and y are normalized, then the fit that gives the lower sum of squared residuals ( |
pwlf doesn't work for my case:
|
@loveis98 Can you sketch on that plot what kind of fit you'd expect to that data set? Is your data following some order of occurrences? ie, does x[0] y[0] occur before x[1] y[1] This figure is what your code spit out, and looking at the data, I can't quite see what kind of fit you'd expect. |
Do you have any recommendations in case the dataset is not monotonically increasing as in the simple example below? It is clear that there are 2 linear segments but I suspect that the non-montonic behavior in x is creating the issue...
The text was updated successfully, but these errors were encountered: