Google's neural machine translation system: bridging the gap between human and machine translation

Posted on 2020-06-30 at 08:22:30 UTC-0600

This model was superseded by this one.

They did some careful things with residual connections to make sure it was very parallelizable. They put each LSTM layer on a separate GPU. They quantized the models such that they could train using full floating-point computations with a couple restrictions and then convert the models to quantized versions.