Transection - Transformers for English to Chinese Translation

This post presents how to train a sequence to sequence Transformer model for English to Chinese translation, nicely abbreviated to Transection. We adopt BART’s (Lewis, Mike, et al. 2019) architecture for this model and train it in two ways. The first way is to train it from scratch and second way is to fine-tune it from BART’s pre-trained base checkpoint that is available in 🤗transformers. The training examples consist of around 5M English-Chinese sequence pairs, along with a test set that has around 40k pairs. Later in this blog, the performance on the test set measured by sacrebleu is compared between the two ways. In addition, a popular pre-trained model in this domain, i.e., Helsinki-NLP/opus-mt-en-zh from 🤗Huggingface’s models hub is used as a baseline to the two ways. Not only in sacrebleu, their performance in other metrics such as generalisation, model size, and training cost is also discussed.

TL;DR: For quick access to the outcomes of this project without reading through this post, here is the code repository and a demo site. If you want to know more what is behind and how the model is trained, then it is better to have a look the blog that is organised as follows.

More resources.

Disclaimer: The introduction to Autocoder is by no means showing it generates codes with a good sense of reasoning. More of this blog’s purpose is to introduce the state-of-the-art generative model GPT-2 and present an example of how it is used for code completion.

Last, thanks to JiaZheng who kindly provided the computation resources for making this blog possible.

If you have any thoughts you’d like to share with me about the blog, send me an email via or you are welcome to talk with me on my Twitter.

Written on September 30, 2020