|Machine learning and
An autoencoder, autoassociator or Diabolo network:19 is an artificial neural network used for learning efficient codings. The aim of an auto-encoder is to learn a compressed, distributed representation (encoding) for a set of data. This means it is being used for dimensionality reduction. Auto-encoders use three or more layers:
- An input layer. For example, in a face recognition task, the neurons in the input layer could map to pixels in the photograph.
- A number of (usually smaller) hidden layers, which will form the encoding.
- An output layer, where each neuron has the same meaning as in the input layer.
If linear neurons are used, or only a single sigmoid hidden layer, then the optimal solution to an auto-encoder is strongly related to principal component analysis (PCA). When the hidden layers are larger than the input layer, an autoencoder can potentially learn the identity function and become useless; however, experimental results have shown that such autoencoders might still learn useful features in this case.:19
Auto-encoders can also be used to learn overcomplete feature representations of data.
An auto-encoder is often trained using one of the many backpropagation variants (conjugate gradient method, steepest descent, etc.) Though often reasonably effective, there are fundamental problems with using backpropagation to train networks with many hidden layers. Once the errors get backpropagated to the first few layers, they are minuscule, and quite ineffectual. This causes the network to almost always learn to reconstruct the average of all the training data. Though more advanced backpropagation methods (such as the conjugate gradient method) help with this to some degree, it still results in very slow learning and poor solutions. This problem is remedied by using initial weights that approximate the final solution. The process to find these initial weights is often called pretraining.
A pretraining technique developed by Geoffrey Hinton for training many-layered "deep" auto-encoders involves treating each neighboring set of two layers like a restricted Boltzmann machine for pre-training to approximate a good solution and then using a backpropagation technique to fine-tune.
- Bengio, Y. (2009). "Learning Deep Architectures for AI". Foundations and Trends in Machine Learning 2. doi:10.1561/2200000006.
- Modeling word perception using the Elman network, Liou, C.-Y., Huang, J.-C. and Yang, W.-C. Neurocomputing, Volume 71, 3150–3157 (2008), doi:10.1016/j.neucom.2008.04.030
- Bourlard, H.; Kamp, Y. (1988). "Auto-association by multilayer perceptrons and singular value decomposition". Biological Cybernetics 59 (4–5): 291–294. doi:10.1007/BF00332918. PMID 3196773.
- Reducing the Dimensionality of Data with Neural Networks (Science, 28 July 2006, Hinton & Salakhutdinov)
|This artificial intelligence-related article is a stub. You can help Wikipedia by expanding it.|