1 year ago
#350590

TidyBigfoot22
Iteratively Optimize SMA Smoothing?
What would be an efficient approach to iterating through simple moving average (SMA) filtering on a mild dataset (<10,000 elements)?
I'm trying to remove vertical tangents and extreme peaks from my dataset, while retaining as much resolution as possible. From a process standpoint, my plan was to use scipy's simpsons rule integration to compare the area under the original noisy curve to the SMA applied curve. This works well in my process due to the inherent data properties. I'm using Pandas to calculate the SMA. I'd like to iteratively change the window length (fixed integer), until the error is minimized -- where error = (area under original curve - area under SMA curve)**2.0
Unfortunately, pandas does not accept an array of windows. To ensure I've hit an acceptable target, I plan to compare the error calculated at each window value, and select the one with the smallest error. What would be a code-efficient way to iteratively compare?
This is an example of what I have currently.
noise_data_x = [1,1.1,1,1.2,1.3,1.4,1.5,1.6.........100]
noise_data_y = [2.1,3.4,3.2,4.7,................2.1,5.7]
SMA_data_y = pd.DataFrame(noise_data_y).rolling(window=4).mean()
SMA_data_y_array = []
SMA_data_x_array = []
for i in range(len(SMA_data_y)):
#Drop NAN
if np.isnan(SMA_data_y.iloc[i,0]) == False:
SMA_data_x_array.append(noise_data_x[i])
SMA_data_y_array.append(SMA_data_y.iloc[i,0])
data_cleaned = sci.integrate.simpson(SMA_data_y_array, SMA_data_x_array)
print(data_cleaned)
data_original = sci.integrate.simpson(noise_data_y, noise_data_x)
error = ((data_cleaned - data_original))**2.0
This code works for a one-off approach, but how would you go about iteratively looking at windows for i in range (2,200) with this type of error reduction? Surely there's a better way than duplicating the array hundreds of times. I've looked at using a for loop to pass an array of arrays using np.tile(), but have not had success. Thoughts?
python
iteration
filtering
smoothing
0 Answers
Your Answer