Q. How did we create the sparse networks?
A. The sparse networks are based on our HTM algorithms, which model several key computational principles of the neocortex. Our sparse networks were designed as an extension of the HTM Spatial pooler, a neutrally inspired learning algorithm for creating sparse representations from noisy data streams. To learn more about the HTM Spatial Pooler, see this paper and this chapter from our digital book, Biological and Machine Intelligence.
Q. What is The HTM Algorithm?
A. The HTM algorithm is based on the well understood principles and core building blocks of the Thousand Brains Theory. In particular, it focuses on three main properties: sequence learning, continual learning and sparse distributed representations. To learn more about HTM, click here to learn more.
Q. How are HTMs different than RNNs (Recurrent Neural Networks)?
- HTMs have a more complex neuron model;
- HTMs do not employ back-propagation but a simple (and local) unsupervised hebbian-learning rule;
- HTMs are based on very sparse activations and neuron connectivity;
- HTMs are based on binary units and weights. On one hand this means that HTMs can be implemented in an incredibly efficient way. On the other hand, in the case of highly dimensional input patterns, they may strive to solve the credit-assignment problem if stacked in a multi-layer fashion as Multi-Layer LSTMs or Convolutional LSTMs.
Q. Why are HTM algorithms well-suited to machine learning models?
The HTM algorithm supports by design several properties every learning algorithm should possess, including:
- Sequence learning
Being able to model temporally correlated patterns represents a key property of intelligence, as it gives both biological and artificial systems the essential ability to predict the future. It answers the basic question “what will happen next?” based on what it has seen before. Every machine learning algorithm should be able to provide valuable predictions not just based on static spatial information but also grounding it in time.
- High-order predictions
Real-world sequences contain contextual dependencies that span multiple time steps, hence the ability to make high-order predictions becomes fundamental. The term “order” refers to Markov order, specifically the minimum number of previous time steps the algorithm needs to consider in order to make accurate predictions. An ideal algorithm should learn the order automatically and efficiently.
- Multiple simultaneous predictions
For a given temporal context, there could be multiple possible future outcomes. With real-world data, it is often insufficient to only consider the single best prediction when information is ambiguous. A good sequence learning algorithm should be able to make multiple predictions simultaneously and evaluate the likelihood of each prediction online. This requires the algorithm to output a distribution of possible future outcomes.
- Continual Learning
Continuous data streams often have changing statistics. As a result, the algorithm needs to continuously learn from the data streams and rapidly adapt to changes. This property is important for processing continuous real-time perceptual streams, but has not been well studied in machine learning, especially without storing and reprocessing previously encountered data.
- Online learning
For real-time data streams, it is much more valuable if the algorithm can predict and learn new patterns on-the-fly without the need to store entire sequences or batch several sequences together as it normally happens when training gradient-based recurrent neural networks. The ideal sequence learning algorithm should be able to learn from one pattern at a time to improve efficiency and response time as the natural stream unfolds.
- Noise robustness and fault tolerance
Real world sequence learning deals with noisy data sources where sensor noise, data transmission errors and inherent device limitations frequently result in inaccurate or missing data. A good sequence learning algorithm should exhibit robustness to noise in the inputs.
- No hyperparameter tuning
Learning in the cortex is extremely robust for a wide range of problems. In contrast, most machine-learning algorithms require optimizing a set of hyperparameters for each task. It typically involves searching through a manually specified subset of the hyperparameter space, guided by performance metrics on a cross-validation dataset. Hyperparameter tuning presents a major challenge for applications that require a high degree of automation, like data stream mining. An ideal algorithm should have acceptable performance on a wide range of problems without any task-specific hyperparameter tuning.