Return to Past and Current Projects

State-denoised recurrent neural nets

The human brain can be described as a collection of processing pathways---some perceptual, some motor, and some performing abstract cognitive operations. General human intelligence arises by virtue of the fact that pathways can be flexibly configured and interconnected in accordance with task demands. A popular computational perspective on human consciousness (Dehaene, Louder, & Kouider, 2017) treats consciousness as a type of blackboard enabling communication among pathways. As a step toward AI systems with human flexibility, this project proposes a method to ensure that modular pathways can communicate with one another by requiring that they speak a common language, i.e., the output from one pathway is expressed in a form that other pathways have previously learned to process.

Consider two pathways in cascade, call them A and B. When A produces an output that is similar to inputs in B's past training history, B is likely to produce the desired behavior. But when A's output is noisy, that noise propagates to B with possible ill consequences. We explore the idea that information flow can be enhanced by introducing a regularizer that guides the A output to be interpretable by B.

Although A and B might be separately trained modules, they could equally well be the lower and upper halves of a deep net, or time steps t and t+1 of an unfolded recurrent net used to recognize or generate sequences. In our initial experiments, we have focused on the latter, noting that a poorly trained RNN will be susceptible to noise, either in the input or the hidden state, because this noise can amplify over the sequence. To suppress noise, we introduce attractor dynamics that operate between steps of the sequence to regularize the hidden state. The attractor dynamics are trained on a task-orthogonal objective that iteratively denoises hidden states, analogous to a denoising autoencoder (Vincent, Larochelle, Bengio, & Manzagol, 2008).

In initial experiments on four sequence classification tasks, we've shown that this state-denoised recurrent net (SDRNN), which projects the hidden state to 'familiar' attractors, obtains superior out-of-sample performance over a vanilla RNN (either tanh or GRU based), and over the same architecture without the denoising objective.


Denis Kazakov (Computer Science, Boulder)