NOTE: Numenta has announced a strategic partnership with Avik
please read more about the future of
Grok for IT Analytics.
A question we get all the time from machine learning fans is: "How does
Numenta’s Hierarchical Temporal Memory (the HTM)
compare to traditional machine learning algorithms?" There are many ways to
answer this question. In this blog entry, I will focus on one specific
difference, perhaps the most fundamental one.
First a bit of background: There is a well known truism in machine learning, the
"No Free Lunch Theorem," which states that no algorithm is inherently better
than any other algorithm. What distinguishes one algorithm from the next are the
inherent assumptions and how well those assumptions fit the problem domain. For
example, if you are predicting data that lies on a straight line, nothing is
going to beat linear fitting. If the data lies on a circle, it’s hard to imagine
a worse technique.
By far the most common assumption made in machine learning is the "i.i.d"
assumption. In statistics, i.i.d. stands for
independently and identically distributed,
which states that every input record comes from the same probability
distribution and is statistically independent of previous and future records.
This is a very useful assumption – it makes the math easier, leads to
the Central Limit Theorem, allows you to derive accuracy bounds, etc. Just about
every popular technique, such as regression, support vector machines, neural
networks, Bayesian networks, random forests, and decision trees rely on this
Unfortunately, when you think about the real world of streaming data, this
assumption is just plain wrong. Your weekly revenue numbers are not i.i.d. Last
week’s numbers are a better predictor than the numbers from 13 weeks ago.
Yesterday’s weather is correlated with today’s. The web log for a customer
navigating an e-commerce website is likely to follow specific sequences. Your
GPS coordinates from 5 minutes ago are an excellent predictor of your current
location. The list is endless. Streaming temporal patterns are the very
antithesis of i.i.d.
The HTM is an inherently temporal learning algorithm, and doesn’t care about
i.i.d. It greedily constructs sequences and does not assume independence. If you
saw a particular revenue pattern the last two weeks, it assumes you are more
likely to see it this week. If you haven’t seen a pattern for several years, it
will likely forget it. Also, HTM assumes that the underlying distribution can
change. This is what makes it online or adaptive. If your revenue jumps because
you just added an important customer, it will adapt. It inherently assumes your
data stream contains sequences and is constantly changing. We didn’t invent
this – the core ideas are inherent in the neocortex of the brain and lend
themselves well to streaming data analytics.
HTM is not the only technique to break the i.i.d. assumption. Other algorithms,
such as Hidden Markov Models and many time series algorithms, also relax that
assumption. So, how is HTM different from HMMs? Good question. I guess we’ll
just have to tackle that in another blog entry…stay tuned!