Transection - Transformers for English to Chinese Translation
This post presents how to train a sequence to sequence Transformer model for English to Chinese translation, nicely abbreviated to Transection. We adopt BART’s (Lewis, Mike, et al. 2019) architecture for this model and train it in two ways. The first way is to train it from scratch and second way is to fine-tune it from BART’s pre-trained base checkpoint that is available in 🤗transformers. The training examples consist of around 5M English-Chinese sequence pairs, along with a test set that has around 40k pairs. Later in this blog, the performance on the test set measured by sacrebleu is compared between the two ways. In addition, a popular pre-trained model in this domain, i.e., Helsinki-NLP/opus-mt-en-zh
from 🤗Huggingface’s models hub is used as a baseline to the two ways. Not only in sacrebleu, their performance in other metrics such as generalisation, model size, and training cost is also discussed.
TL;DR: For quick access to the outcomes of this project without reading through this post, here is the code repository and a demo site. If you want to know more what is behind and how the model is trained, then it is better to have a look the blog that is organised as follows.
More resources.
- The video inspiring the project
- Some commercial products leveraging GPT-2 for auto code completion
- Related Paper - Code Generation as a Dual Task of Code Summarization
Disclaimer: The introduction to Autocoder is by no means showing it generates codes with a good sense of reasoning. More of this blog’s purpose is to introduce the state-of-the-art generative model GPT-2 and present an example of how it is used for code completion.
Last, thanks to JiaZheng who kindly provided the computation resources for making this blog possible.
If you have any thoughts you’d like to share with me about the blog, send me an email via wangcongcongcc@gmail.com or you are welcome to talk with me on my Twitter.