Introduction
As part of my academic research endeavours, I'm undertaking to train myself to analyse research papers with a more methodical and critical eye.
The particular paper reviewed in this article is "A Fast Learning Algorithm for Deep Belief Nets" by Hinton et al. and is part of a larger survey entitled "A survey of deep neural network architectures and their applications" by Liu, W. et al.
The approach I've used to structure my review process is outlined in Research Review Process.
Table of Contents
Research question
How can the performance of neural networks with many hidden layers be improved?
Research aim
This is descriptive and explanatory research.
The research aims to describe how a new learning algorithm (greedy layer-by-layer algorithm) works to solve the problem of 'explaining away' that occurs in DNNs (specifically stacked RBMs with many hidden layers), and that when coupled with a fine-tuning algorithm (top-down), this vastly improves the performance of this type of deep neural network.
Type of research
This is primarily quantitative/empirical research.
This research describes a new pre-training algorithm and how it works and why, it then models it using mathematical concepts, followed by quantitatively testing the design and implementation of the algorithm using a neural network to measure the algorithm's effectiveness at improving the neural network's performance (learning/inference).
For example, through experimentation using the new learning process (algorithm) and a multi-layer neural network using the standard MNIST dataset, a quantitative result of 1.25% error discrepancy was recorded using this dataset. Similarly, other experiments, e.g., using SVM, were conducted to indicate a comparative error discrepancy using the same dataset.
Mode of enquiry
This is scientific research based on a systematic research procedure and empirical testing. It uses an approach that focuses on specifying and applying well-defined algorithms to a neural network, and working with a common, unvarying dataset, i.e MNIST. Furthermore, it follows a repeatable design where experimentation is used to produce observable outcomes that indicate the effectiveness of the algorithm while other variables remain constant, i.e, underlying dataset and neural network.
For example, the learning and fine-tuning algorithms (greedy layer-by-layer & up-down, respectively) are by definition repeatable, the MNIST data set is unvarying, and the neural network is a fixed configuration/design. Furthermore, the work also uses an SVM to compare results with that of the neural network. The learning process is also described as a repeatable process, e.g the application of the Gibbs sampling process that underlies the fine-tuning algorithms described in the paper. These are all hallmarks of a well-defined, repeatable scientific approach.
Methodology
The research is primarily empirical in nature, utilising experimental outcomes to inform observation. It uses a measurable design-science-based approach that uses a well-defined neural network configuration and the application of algorithms which help define and evaluate a repeatable learning process, i.e it designs, develops and evaluates the performance of the research model.
For example, the theory of how 'explaining away' reduces the performance in densely connected networks is explained, it is then modelled mathematically before the algorithms are described. The algorithms are then applied by experimentally testing their result on a neural network acting on the test data.
The main research objectives are:
- Shows how using 'complementary priors' removes 'explaining away' using theoretical explanation
- Derives a new unsupervised learning algorithm (greedy layer-by-layer) that uses complementary priors
- Describes a hybrid neural network that uses associative memory and 3 hidden layers
- Use the unsupervised learning algorithm to pre-train a neural network to test/prove the algorithm's effectiveness (fast and accurate)
- Shows how to determine what the model has learnt by visualising the learnt weights to generate an image based.
Research Methods
- Mathematical modeling
- Algorithm design
- Algorithm implementation (greedy-layer-by-layer)
- Classification experiments using neural networks
The primary research method is using experiments to measure/evaluate the performance of the neural network when it is pre-trained using the greedy-layer-by-layer algorithm.
The problem of explaining away is modeled mathematically, a new algorithm is developed to prevent it, the research then uses experiments that use an existing character digit dataset (MNIST) as input to a pre-trained neural network (using the pre-training algorithm defined in this paper), and the results are then evaluated, specifically how well the data is mapped to the character symbols/labels. Other experimental procedures are also carried out to see how the pre-trained model performs in comparison to other models that do not use the algorithm (most notably, SVM).
Research techniques
- Contrastive divergence algorithm
- Gibbs sampling (Hidden Markov Monteo carlo method)
- Greedy-layer-by-layer algorithm (complementary pairs)
- DBN (Deep Belief Network)
- Various learning algorithms (Backpropagation, SVMs, squared error and online updates, LeNet5 CNN, cross entropy, etc.)
- Generation of an image using a learnt model
The main technique is experimentation using a neural network with 3 hidden layers and using the developed pre-training algorithm to test how well it removes the 'explaining away' in order to improve the network's inference performance. The MNIST data set is used as training data for the network.
The algorithm that is developed is applied to a DNN model (DBN), and it is then tested by experimenting on the model to see the performance that results. The results in an error rate of 1..25% in comparison to the closest rival, which is SVM at 1.4%.
Data
MNIST dataset: 10,000 character digits, grey-scale 32x32 images used to train the neural network (pre-trained with a new algorithm)
The input dataset for the model testing is the MNIST dataset of character digits, which is a repository of 2D images that are well-known and used by researchers for the classification of character digits. The output of the model is numerical data that indicates/predicts the classification of the input data belonging to specific classes of digit characters.
Information
The MNIST data is processed using the neural network, resulting in output from the neural network (model).
The results from the neural network show that using a pre-training algorithm that configures/trains each layer using complementary pairs improves the performance of DBNs, i.e reduces the error discrepancy in predicted vs actual outputs
Knowledge
The phenomenon of 'explaining away' that occurs in DNNs (of stacked RMBs) restricts their performance.
Using complementary pairs to configure/train each layer to establish initial weights removes 'explaining away' and results in a better-performing neural network.
Correlation vs Causation
Variables:
- Learning approach (with or without greedy layer-by-layer algorithm)
- MNIST dataset
- Neural network configuration
After using various types of comparative learning algorithms in comparison to the research's approach (which uses the greedy-layer-to-layer pre-training algorithm), the same dataset is used (MNIST) throughout, therefore only the approach to learning changes. This means each model's discrepancy error is evaluated until the lowest value is found to see which model causes the discrepancy value to be the lowest. The neural network configuration is unchanged.
Literature review
Referenced papers
See reference chronology here
This paper was published in 2006.
Citations
IEEE reports 3888 citations while ACM reports 3280 citations.
- IEEE: https://ieeexplore.ieee.org/document/6796673
- ACM: https://dl.acm.org/doi/10.1162/neco.2006.18.7.1527
This suggests that this is a very popular piece of research.
Reasoning method (deduction)
The research works from a theoretical description of the 'explaining away' phenomenon, models it mathematically and from this basis, derives a learning process that incorporates the design of a new algorithm and applies it to a neural network. The neural network is used to test the pre-training algorithm and to see if it indeed improves the learning/mapping process as proposed. This is a deductive process:
See this paper's deduction process here
Subjectivity/Objectivity
Construct Validity
- No obvious flaws
Internal Validity
Research Correctness
Objectivity
-
-
The same constant data is used as was used by others (MNIST)
-
The research uses an objective measure of performance (inference error) using the dataset shows that its error rate is better with this model than with previous models.
-
Subjectivity/Specificity
- No obvious subjectivity
Research technique
Objectivity:
Algorithms used in this research are repeatable and inherently automatable. This means all parts of the process, i.e, data, model, and algorithm, are non-varying in nature and therefore can be replicated/verified by third parties.
The comparison of alternate models' performance on the NMIST dataset is suitable for evaluating how the pre-trained model's performance compares to those models that do not use it. The research techniques fit the requirements of this research.
Subjectivity/Specificity:
- The research techniques are limited to only used on 2D image data (pixels)
- Only a 3-layer neural network is used
Research techniques vs research question
Objectivity:
Varying the application of the learning algorithm while keeping other parameters constant, i.e the common data (MNIST) and the design of the neural network design means that it is simple to evaluate the effect of varying the single variable, i.e, the application of the algorithm, making only the algorithm the independent variable.
This supports the research question as the neural networks' output/performance (error function) of the classification task directly indicates if the neural network worked better than other results from other models that had not used the pre-training algorithm.
Subjectivity/Specificity:
- The performance is only measured using data that reflects 2D character images. Larger images or more complex images are not assessed.
- Only a specific configuration/design of the DBN (3-layer is used) to remove the effects of 'explaining away'
Conclusion vs methods
Objectivity:
Using experimental results based on empirical testing, observation, and comparison supports the conclusion that the research's specific approach is better than the other approaches that were tested.
Subjectivity/Specificity:
- Experiments were based on only 2D image data (pixels) so the conclusions can only be representative of character-based image data
External Validity
Subjectivity/Specificity:
There is no evidence presented that this approach has or will generalise well to wider applications (beyond showing 2D character inference optimisation by eliminating 'explaining away')
Data Validity
Data subjectivity (specificity/narrowness)
Objectivity:
Image data for a neural network classification task is appropriate for evaluating the learning of a neural network for the classification of this data against known classification labels.
The research data is also a well-known dataset that is often used for testing classification performance in models, and so it is appropriate for this type of research.
Subjectivity/Specificity:
-
Only the MNIST dataset is used, so the data used to represent the solution presented in this research is limited.
-
This limits the research's outcomes and approaches to dealing with small geometric character recognition.
Data vs Research Question
Objectivity:
Varying the application of the learning algorithm while keeping other parameters constant, i.e the common data (MNIST) and the design of the neural network design means that it is simple to evaluate the effect of varying the single variable, i.e, the application of the algorithm, making only the algorithm the independent variable.
This supports the research question as the neural networks' output/performance (error function) of the classification task directly indicates if the neural network worked better than other results from other models that had not used the pre-training algorithm.
Subjectivity/Specificity:
- Only 2D character digits, pixel information is used to show how the techniques in the research improve inference performance.
Summary of general risks to validity
-
The paper is very technical
-
It relies on an understanding of many different ideas and processes such draw deeply on existing knowledge.
-
Those inexperienced researchers may find it difficult to validate construct and internal validity without being well acquainted with the theory, algorithms and approaches discussed.
-
Credibility concerns
Objectivity:
There are gaps in the referenced papers; however, as this paper tests a new algorithm using experimentation and comparison with other approaches, the literature is less influential. In this respect, the literature is relatively objective.
Subjectivity:
- The research is 19 years old (2025) and techniques here could be outdated or have been improved by subsequent research possibly making this research deprecated.
Relevance, Contribution, Originality and Novelty
Implications & Contributions
A key aspect is that Hinton et al have identified and understood exactly what the problem of explaining away is, and so were able to create an algorithm to circumvent it.
The improvements to the Performance/Learning of/neural networks as a result of this new algorithm improve the performance of all DBNs, and therefore have a great/wide applicability to all domains that use DBNs. The results of the paper are very generalizable.
Another particularly interesting aspect is that the paper shows a way to determine what the model learnt by generating an image based on the learnt weights to 'see' what and how it learned the dataset.
Opinion
This research takes an objectivist approach, which aims to uncover hidden truths (such as the benefit of using compliments pairs and pre-training), favouring empirical testing, a systematic approach and an investigation into cause and effect.