Sensors and data streams are proliferating as the Internet of Things vision becomes realized. However, using the data from these sensors is not so easy. Specifically, being able to identify anomalies in streaming data is surprisingly difficult. Most techniques are a form of thresholds, i.e. predetermined limits that must be set to notify abnormalities. However, thresholds have some glaring weaknesses, including often finding a problem after it has happened, not before, and not adapting to new states, such that false positives can crowd out the important signal.
There are different methods of anomaly detection in streaming data, but how do you measure their effectiveness? NAB is the first benchmark designed for time-series data that gives credit to finding anomalies earlier and adjusting to changed patterns.
NAB contains a dataset with real-world, labeled data files across multiple domains. We’ve accumulated this valuable data from years of working with customers to address their anomaly problems.
We have developed a unique scoring function that rewards early detection, penalizes late or false results, and gives credit for on-line learning.
NAB is a modular, open source code base. Numenta will be working to build a community around NAB to add data files and test additional algorithms.
This paper introduces an anomaly detection technique using HTM and the Numenta Anomaly Benchmark (NAB). The paper also contains an analysis of the performance of ten algorithms (including HTM) on NAB.
Subutai Ahmad, VP Research presenting NAB and discussing the need for evaluating real-time anomaly detection algorithms. This presentation was delivered at MLConf (Machine Learning Conference) in San Francisco 2015.
Why did we create this benchmark? Why is anomaly detection so hard in streaming data? This paper answers those questions and highlights how business managers can use NAB to ensure they’re getting valuable insights as early as possible.
This peer-reviewed paper was accepted to the IEEE Conference on Machine Learning and Applications December 9-11, 2015 in Miami. It contains technical details on NAB, including the mathematical explanation of the scoring system.
This open source library contains all data files, algorithms and documentation. Use this repository to try NAB for yourself. Test your own techniques against the published algorithms and share your results.
We’ve made it easy for you to try NAB. Visit the repository to test your own techniques and share your results. Use NAB to select the best algorithm for your specific application.
We are committed to adding more real-world data files to our benchmark dataset. Do you have streaming data files with known anomalies? Contact us at firstname.lastname@example.org to see if we can incorporate your data into a future version of NAB.