Why does unsupervised pre-training help deep learning?

Posted on
generalization

They’re pretty sure that it performs regularization by starting off the supervised training in a good spot, instead of by somehow improving the optimization path.