Add smoothing spline baseline#299
Conversation
I think there is a little tricky thing here regarding the testing. Because the distance should be used as-is (with baseline) and the force should be baseline corrected, you end up with an implicit equation in the trap position: `(trap_position - trap2_ref) = tether_length + 2 * bead_radius + 2 * (wlc_force + baseline(trap_position)) / stiffness` Note how trap position appears on the left and right thanks to the baseline.
There was a problem hiding this comment.
So, I've had a first look at this and come to the same conclusion as you, that the smoothing factor based optimization doesn't really work that well unless you also downsample. This suggests that we should provide a good default for this. Otherwise, looking pretty good already!
For tests, I think adding a test which tests unique_sorted and one that tests the baseline (both code paths, with fixed and non-fixed smoothing factor) should suffice.
Adding a package involves adding it to setup.py. For conda, it might be a bit trickier since I don't see this package on conda-forge. I'd have to look into it.
To muddy the waters a bit more, another class of algorithms that's probably worth exploring for this purpose is LO(W)ESS regression.
| x = trap_position.data | ||
| u, c = np.unique(x, return_counts=True) | ||
| m = np.isin(x, [u[c < 2]]) | ||
| ind = np.argsort(x[m]) | ||
|
|
||
| return x[m][ind], force.data[m][ind] |
There was a problem hiding this comment.
Could consider simplifying this to:
def unique_sorted(trap_position, force):
"""Sort and remove duplicates trap_position data to prepare for fit smoothing spline.
Parameters
----------
trap_position : array_like
Trap mirror position
force : array_like
Force data
"""
sorted_position, idx, count = np.unique(trap_position, return_index=True, return_counts=True)
return sorted_position[count < 2], force[idx][count < 2]Saves you a search and a sort (note that the result from np.unique is already sorted).
Given that you return the raw numpy arrays rather than slices, I would also consider just taking raw numpy arrays as input and extracting the data where its called (rather than passing a slice).
| Trap mirror position data | ||
| force : lumicks.pylake.Slice | ||
| Force data | ||
| smoothing_factor : float |
There was a problem hiding this comment.
I wonder whether it would make sense to merge smoothing_factor and smoothing_factors.
If the input is just a value or has only one element, you use it as fixed value and otherwise you perform the optimization. You can use np.atleast_1d to upconvert a value to a numpy array so that you can always use the len operation on it. The default can then just be a default list that generally works.
| ) | ||
| model = csaps(x_sorted, y_sorted, smooth=smoothing_factor) | ||
|
|
||
| return cls(model, trap_position, force) |
There was a problem hiding this comment.
One issue with calling the constructor like this is that you don't actually put the data that you used for fitting in now (you fitted to x_sorted and y_sorted but store the original raw data).
| mse_test_vals = np.zeros(len(smoothing_factors)) | ||
| x_sorted, y_sorted = unique_sorted(trap_position, force) |
There was a problem hiding this comment.
One thing I find a bit surprising is that you choose to pass trap position and force to this function, despite already having x_sorted and y_sorted. Is there a specific reason you prefer this?
| force, | ||
| smoothing_factors, | ||
| n_repeats, | ||
| plot_smoothingfactor_mse, |
There was a problem hiding this comment.
Considering that the function has smoothing_factor in the name, I reckon plot_mse would be sufficient.
I added functions to fit a baseline with a smoothing spline.
The smoothing factor is optimized using k-fold cross validation.
Therefore I added a function to plot the mse on the test data vs the smoothing factors.
Let me know what you think.