Given 2 time series of equal length and sampled at the same frequency, detect the following:
- They are similar if the time series patterns are similar and the values are within an acceptable deviation range
- They are dissimilar if the patterns are different, or the values are outside of an acceptable deviation range. Acceptable deviation range is inferred by the model from the training data
We generate a synthetic dataset inspired by the UCI Synthetic Control dataset, which is commonly used for time series model validations in the academic community. The dataset contains 500 examples of time series data falling into 5 different pattern classes with each class containing 100-time series. The pattern classes are listed below:
- Normal – Range is bound with no clear uptrend or downtrend
Generated by y(t) = m + rs
2. Increasing – Clear uptrend
Generated by y(t) = m + rs + gt
3. Decreasing – Clear downtrend
Generated by y(t) = m + rs – gt
4. Upward Shift – Normal trend shifts upward and resumes a normal trend
Generate by y(t) = m + rs + kx
5. Downward shift – Normal trend shifts downward and resumes a normal trend
Generated by y(t) = m + rs – kx
For each time series in a given pattern class, we test it against a control group randomly chosen from the same pattern class with the expectation that they will be labeled as similar. Any result labeled as dissimilar is counted as an error.
Also, we compared each time series from a pattern class to a control group chosen from other pattern classes with the expectation that they will be labeled as dissimilar.
We repeat this process 10 times to reduce bias. Each pattern class contains 100-time series. Each series is compared with 10 control groups from each of the 5 different pattern classes, amounting to 100 * 10 * 5 = 5000 comparisons. We count the errors from the 5000 comparisons and report the percentage accuracy of comparisons among the different pattern classes in the table below.
- The values in green are the result from comparing time series within the same pattern class.
- The values in orange are the result from comparing time series in dissimilar pattern classes.
Based on the results, it is clear that the model performs exceedingly well when the patterns are dissimilar. This performance is crucial because we want to catch strong pattern changes with 100% accuracy. These results speak to very low false negatives for the model.
Within the same pattern class, we see a 100% accuracy for all pattern classes, except the normal pattern class which is still a high 96%.
The dataset was synthetically generated much like the UCI synthetic data set. The results showcase the high accuracy we can achieve using the SAX HMM machine learning model for time series canary analysis.
Reference: Synthetic Control Chart Time Series by Dr Robert Alcock.