A discrete probabilistic memory model for discovering dependencies in time
Many domains of machine learning involve discovering dependencies
and structure over time. In the most complex of domains, long-term
temporal dependencies are present. Neural network models such as
LSTM have been developed to deal with long-term dependencies, but the
continuous nature of neural networks is not well suited to
discrete symbol processing tasks. Further, the mathematical underpinnings
of neural networks are unclear, and gradient descent learning of recurrent
neural networks seems particularly susceptible to local optima. We introduce
a novel architecture for discovering dependencies in time. The architecture
is formed by combining two variants of a hidden Markov model (HMM)--the
factorial HMM and the input-output HMM--and adding a further strong
constraint that requires the model to behave as a latch-and-store memory
(the same constraint exploited in LSTM). This model, called an
MIOFHMM, can learn structure that other variants of the HMM cannot,
and can generalize better than LSTM on test sequences that
have different statistical properties (different lengths, different
types of noise) than training sequences. However, the MIOFHMM
is slower to train and is more susceptible to local optima than LSTM.
Retrieve Paper
Return to Publications Page