Is ChatGPT A Good Translator? A Preliminary Study

Important disclaimer: the following content is AI-generated, please make sure to fact check the presented information by reading the full paper.

January 20, 2023 · 366 words · Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, Zhaopeng Tu

Is ChatGPT A Good Translator? A Preliminary Study

Table of Contents

Link to paper
Abstract
Paper Content
- Introduction
- Conclusion

Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

ChatGPT is evaluated for machine translation
Candidate prompts generally work well
ChatGPT performs competitively with commercial translation products on high-resource European languages
ChatGPT lags behind significantly on low-resource or distant languages
ChatGPT does not perform as well as commercial systems on biomedical abstracts or Reddit comments

Paper Content

Introduction

ChatGPT is an intelligent chatting machine
It is trained to follow instructions and provide detailed responses
It can answer followup questions, admit mistakes, challenge incorrect premises, and reject inappropriate requests
It can do various natural language processing tasks, including question answering, storytelling, logic reasoning, code debugging, and machine translation

Evaluation setting

Compared 3 commercial translation products
Evaluated on Flores-101, WMT19 Biomedical Translation Task, WMT20 Robustness Task
Sampled 50 sentences from each set for evaluation
Used BLEU score, ChrF++, and TER as metrics

Translation prompts

ChatGPT was asked to provide ten concise prompts or templates for machine translation
Three candidate prompts were summarized from the results, with an extra added to one of them
The three candidate prompts were compared on a Chinese-to-English translation task, with TP3 performing the best in terms of all three metrics

Multilingual translation

Four languages are evaluated: German, English, Romanian, and Chinese
12 directions of translation are tested
German-English translation is considered a high-resource task
Romanian-English translation is considered a low-resource task
ChatGPT performs competitively for German-English translation
ChatGPT lags behind for Romanian-English translation
Translating between different language families is harder than within the same language family

Translation robustness

ChatGPT was evaluated on WMT19 Bio and WMT20 Rob2 and Rob3 test sets.
WMT19 Bio test set contains Medline abstracts, WMT20 Rob2 contains comments from reddit.com, and WMT20 Rob3 contains a crowdsourced speech recognition corpus.
ChatGPT does not perform as well as Google Translate or DeepL Translate on WMT19 Bio and WMT2 Rob2 test sets.

Conclusion

Studied ChatGPT for machine translation
ChatGPT performs competitively on high-resource European languages, but not low-resource or distant languages
ChatGPT not as good as commercial systems on biomedical abstracts or Reddit comments, but good for spoken language
Future work includes investigating impact of historical context and iterative refinement of translation