Non-monotonic dataset #81

tt7533 · 2021-01-04T19:12:33Z

Do you have any recommendations in case the dataset is not monotonically increasing as in the simple example below? It is clear that there are 2 linear segments but I suspect that the non-montonic behavior in x is creating the issue...

x = np.array([0.42486339, 0.55889496, 0.70537341, 0.79477838, 0.91180935, 1.        , 0.95309654, 0.85868245, 0.77580449, 0.68867638, 0.60731633, 0.523983  , 0.47161506, 0.39268367, 0.3167881 , 0.21584699, 0.11399514, 0.        ])

y = np.array([0.        , 0.12927029, 0.25908718, 0.34517628, 0.48018584, 0.64744466, 0.828915  , 0.83684067, 0.83602077, 0.84312654, 0.81279038, 0.81661656, 0.80595791, 0.81361028, 0.79967204, 0.84531293, 0.9150041 , 1.        ])

my_pwlf = pwlf.PiecewiseLinFit(x, y)

res = my_pwlf.fit(2)

xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)

plt.figure()
plt.plot(x, y, 'o')
plt.plot(xHat, yHat, '-')
plt.show()

The text was updated successfully, but these errors were encountered:

cjekel · 2021-01-04T20:51:05Z

If you are trying to fit a model that looks like:

It's not really going to be possible with pwlf. (I'm not sure if this is the type of fit you mean by monotonic)

There are some tricks you can do with arc lengths, but the setup and normalization can be tricky. One case is you can have one model, that predicts x and y as a function of the cumulative arc length. It may or may not work, and normalization will be very important. I can perhaps try to work on an example that does such.

tt7533 · 2021-01-06T13:39:32Z

Yes, the purple lines are exactly the type of fit that I would like to get. In general, we can expect only 1 sharp kink and the purpose is to detect the direction of the lines before and after.

I saw that the same kind of data works just fine as long as the points define a function. For example, if you rotate the points in the picture so that there is only 1 y for a given x position, then the fit will work fine.

Although it would be possible the apply a rotation to the data before doing the pmlf fit, I wonder if it is not possible to do something smarter within pmlf itself. Does it make sense?

cjekel · 2021-01-07T21:52:18Z

Well.. Instead of doing a rotation, you could break x and y into separate 1 dimensional predictions. Let's say that both x and y are functions of the arc length distance from start to end. Then you can fit a 1D pwlf to x, and a 1D pwlf to y. In the end, you'll get something like this.

It's just a different way of thinking about the problem, I'm not sure how much help it will be. I can post the code that generated this if you are interested. It is one 3 line segment model to x, and another 3 segment model to y.

tt7533 · 2021-01-08T07:25:57Z

Could you please share the code? It looks very good.

When you combine the 2 fits back together, how can you keep track of:

the linear coefficients for all the intermediate pieces
where the breakpoints are

cjekel · 2021-01-08T16:40:35Z

import numpy as np
import matplotlib.pyplot as plt
import pwlf
from scipy.spatial.distance import cdist

x = np.array([0.42486339, 0.55889496, 0.70537341, 0.79477838, 0.91180935,
              1.        , 0.95309654, 0.85868245, 0.77580449, 0.68867638,
              0.60731633, 0.523983  , 0.47161506, 0.39268367, 0.3167881 ,
              0.21584699, 0.11399514, 0.        ])

y = np.array([0.        , 0.12927029, 0.25908718, 0.34517628, 0.48018584,
              0.64744466, 0.828915  , 0.83684067, 0.83602077, 0.84312654,
              0.81279038, 0.81661656, 0.80595791, 0.81361028, 0.79967204,
              0.84531293, 0.9150041 , 1.        ])

# find the cumlative arc lengths
data = np.vstack((x, y)).T
n = x.size
a = data[0:n-1, :]
b = data[1:n, :]
d = cdist(a, b)
arcLengths = np.diagonal(d)
ArcLengths = np.zeros_like(x)
ArcLengths[1:] = np.cumsum(arcLengths)

# model for each x and y
my_pwlf_x = pwlf.PiecewiseLinFit(ArcLengths, x)
my_pwlf_y = pwlf.PiecewiseLinFit(ArcLengths, y)
res_x = my_pwlf_x.fit(3)
res_y = my_pwlf_y.fit(3)

# generate predictions
new_arc_lengths = np.linspace(ArcLengths.min(), ArcLengths.max(), 100)
xhat = my_pwlf_x.predict(new_arc_lengths)
yhat = my_pwlf_y.predict(new_arc_lengths)

# find new breakpoints... there is two from each fit of three segments
arc_length_breaks = np.array([res_x[1], res_x[2], res_y[1], res_y[2]])
arc_length_breaks_x = my_pwlf_x.predict(arc_length_breaks)
arc_length_breaks_y = my_pwlf_y.predict(arc_length_breaks)

plt.figure()
plt.plot(x, y, 'o-', label='Origingal data')
plt.plot(xhat, yhat, '-', label="pwlf fit")
plt.plot(arc_length_breaks_x, arc_length_breaks_y, 'o', label='breakpoints')
plt.legend()

plt.figure()
plt.plot(ArcLengths, x, label='X')
plt.plot(ArcLengths, y, label='Y')
plt.plot(new_arc_lengths, xhat, label='Xhat')
plt.plot(new_arc_lengths, yhat, label='Yhat')
plt.xlabel('Arc Length')
plt.show()

When you combine the 2 fits back together, how can you keep track of:
the linear coefficients for all the intermediate pieces

They will be in separate objects, one for the x predictions, and one for the y predictions.

where the breakpoints are

In this code, I show how you can figure where the new breakpoints are. It's a bit more complicated going in. If x has three segments, and y has three segments, then you can end up with 5 new arc length segments (I think, could be wrong)

I must say I think this fitting method is very cool! It seems this would extend to higher dimensions.

tt7533 · 2021-01-08T18:10:19Z

That's great!

I would have to think more about the coefficients of the piecewise components. Since the fits are done on x/y vs. arclength, it is not immediately clear how to combine them together into a single set of coefficients in the original data representation. I guess it may be possible to estimate them from the breakpoints but it would be nicer (and probably more stable) to just combine the coefficients directly.

cjekel · 2021-01-08T18:46:13Z

Which coefficients do you need directly?

The arc length stuff is definitely a more complicated modeling approach.

As far as doing one model, you could do fits to x as a function of y, and y as a function of x. If both x and y are normalized, then the fit that gives the lower sum of squared residuals (.ssr) is the correct rotation to perform the model. This would only work if one of the rotations ended up being a function (as you pointed out above).

loveis98 · 2021-03-27T22:15:58Z

pwlf doesn't work for my case:

import numpy as np
import matplotlib.pyplot as plt
import pwlf

# your data
x = np.array([269.3048489 , 246.9444148 , 222.3136088 , 227.1488533 ,
       125.1934066 , 215.8092188 , 269.9395692 , 294.6073562 ,
       161.9504059 , 196.4256259 , 245.8174206 , 234.8829358 ,
       239.6989843 , 159.7639371 ,  81.80337907, 140.0753158 ,
        69.00680592,  76.18182163, 133.0220711 ,  95.4940555 ,
        89.97575454, 221.3455973 , 208.2065385 ,  97.75456468,
       220.2748249 , 200.3221571 ,  65.71646475,  78.42181787,
        47.07407336, 146.9804957 ,  98.75615348, 102.0246169 ,
       222.8051893 ,  63.80375236,  74.59112032, 219.1024445 ,
       118.9913702 , 221.2607285 , 135.4354503 ,  64.24703533,
        93.81510511, 105.0106551 ,  43.88495726,  54.49415032,
       100.9858707 , 215.3200696 , 160.9971602 ,  88.5518527 ,
        63.08653883,  95.84675806, 131.8640344 ,  58.82563146,
       219.9422989 ,  93.4774779 ,  98.02485276, 388.5100308 ,
       128.4080906 , 129.6742844 ,   0.        , 222.9426348 ,
        56.08195898,  70.81238332,  56.71379998, 248.7115561 ,
       222.2552054 , 200.081967  , 203.6329238 , 185.244671  ,
       236.0417996 , 222.2993177 ,  64.82775168,  89.53602526,
        96.89067067,  54.2605423 ,  76.72183015, 139.7620421 ,
        65.3025266 ])

y = np.array([  0.        ,  75.49108545,  61.51509185,  70.18320143,
       364.7402165 , 109.568719  ,  41.08417955,  40.83761634,
       207.3589071 ,  96.64997118,  32.1567632 ,  34.67529359,
        31.70877236, 114.413495  , 293.9381061 , 380.7518244 ,
       326.403998  , 323.0019917 , 400.0171516 , 337.6460195 ,
       348.2055969 , 440.1882369 , 431.5473839 , 355.7636629 ,
       433.1805758 , 417.0001996 , 229.2712733 , 240.2519802 ,
       290.1957078 , 393.7448968 , 343.0169598 , 334.7721919 ,
       432.505317  , 326.9564546 , 229.254949  , 421.8708399 ,
       197.019641  , 442.411859  , 370.5243872 , 257.5241186 ,
       339.6452082 , 318.8519392 , 256.6074448 , 302.322697  ,
       324.8686243 , 434.7269326 , 371.0977992 , 349.2665393 ,
       293.5102796 , 341.2584752 , 377.7476104 , 296.7567181 ,
       430.9114486 , 311.1336747 , 319.5093775 , 121.1779074 ,
       333.5332954 , 385.9592333 , 269.3048489 , 130.2690918 ,
       280.9712403 , 300.2219403 , 268.5752292 ,  58.5945506 ,
       132.1067479 , 193.9951334 , 170.2146256 , 132.3320591 ,
        90.50234207,  61.72976525, 319.2954143 , 283.9691051 ,
       301.5498668 , 306.1537013 , 249.3121043 , 164.37592   ,
       224.1410448 ])

# initialize piecewise linear fit with your x and y data
my_pwlf = pwlf.PiecewiseLinFit(x, y, lapack_driver='gelsd')

# fit the data for four line segments
res = my_pwlf.fit(2)

# predict for the determined points
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)

# Get the slopes
my_slopes = my_pwlf.slopes

# Get my model parameters
beta = my_pwlf.beta

# calculate the standard errors associated with each beta parameter
se = my_pwlf.standard_errors()

# calcualte the R^2 value
Rsquared = my_pwlf.r_squared()

# calculate the piecewise R^2 value
R2values = np.zeros(my_pwlf.n_segments)
for i in range(my_pwlf.n_segments):
    # segregate the data based on break point locations
    xmin = my_pwlf.fit_breaks[i]
    xmax = my_pwlf.fit_breaks[i+1]
    xtemp = my_pwlf.x_data
    ytemp = my_pwlf.y_data
    indtemp = np.where(xtemp >= xmin)
    xtemp = my_pwlf.x_data[indtemp]
    ytemp = my_pwlf.y_data[indtemp]
    indtemp = np.where(xtemp <= xmax)
    xtemp = xtemp[indtemp]
    ytemp = ytemp[indtemp]

    # predict for the new data
    yhattemp = my_pwlf.predict(xtemp)

    # calcualte ssr
    e = yhattemp - ytemp
    ssr = np.dot(e, e)

    # calculate sst
    ybar = np.ones(ytemp.size) * np.mean(ytemp)
    ydiff = ytemp - ybar
    sst = np.dot(ydiff, ydiff)

    R2values[i] = 1.0 - (ssr/sst)

# plot the results
plt.figure()
plt.plot(x, y, 'o')
plt.plot(xHat, yHat, '-')
plt.show()

cjekel · 2021-03-31T00:36:48Z

@loveis98 Can you sketch on that plot what kind of fit you'd expect to that data set? Is your data following some order of occurrences? ie, does x[0] y[0] occur before x[1] y[1]

This figure is what your code spit out, and looking at the data, I can't quite see what kind of fit you'd expect.

tt7533 closed this as completed Jan 8, 2021

cjekel reopened this Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-monotonic dataset #81

Non-monotonic dataset #81

tt7533 commented Jan 4, 2021

cjekel commented Jan 4, 2021

tt7533 commented Jan 6, 2021

cjekel commented Jan 7, 2021 •

edited

Loading

tt7533 commented Jan 8, 2021 •

edited

Loading

cjekel commented Jan 8, 2021

tt7533 commented Jan 8, 2021

cjekel commented Jan 8, 2021

loveis98 commented Mar 27, 2021

cjekel commented Mar 31, 2021 •

edited

Loading

Non-monotonic dataset #81

Non-monotonic dataset #81

Comments

tt7533 commented Jan 4, 2021

cjekel commented Jan 4, 2021

tt7533 commented Jan 6, 2021

cjekel commented Jan 7, 2021 • edited Loading

tt7533 commented Jan 8, 2021 • edited Loading

cjekel commented Jan 8, 2021

tt7533 commented Jan 8, 2021

cjekel commented Jan 8, 2021

loveis98 commented Mar 27, 2021

cjekel commented Mar 31, 2021 • edited Loading

cjekel commented Jan 7, 2021 •

edited

Loading

tt7533 commented Jan 8, 2021 •

edited

Loading

cjekel commented Mar 31, 2021 •

edited

Loading