D 2023

MUNI-NLP Systems for Low-resource Indic Machine Translation

SIGNORONI, Edoardo and Pavel RYCHLÝ

Basic information

Original name

MUNI-NLP Systems for Low-resource Indic Machine Translation

Authors

SIGNORONI, Edoardo and Pavel RYCHLÝ

Edition

Singapore, Proceedings of the Eighth Conference on Machine Translation, p. 959-966, 8 pp. 2023

Publisher

Association for Computational Linguistics

Other information

Language

English

Type of outcome

Proceedings paper

Country of publisher

United States of America

Confidentiality degree

is not subject to a state or trade secret

Publication form

electronic version available online

References:

Marked to be transferred to RIV

Yes

RIV identification code

RIV/00216224:14330/23:00138713

Organization

Fakulta informatiky – Repository – Repository

ISBN

979-8-89176-041-7

EID Scopus

Keywords in English

low-resource;machine translation;NLP

Links

LM2023062, research and development project.
Changed: 31/7/2025 00:50, RNDr. Daniel Jakubík

Abstract

In the original language

The WMT 2023 Shared Task on Low-Resource Indic Language Translation featured to and from Assamese, Khasi, Manipuri, Mizo on one side and English on the other. We submitted systems supervised neural machine translation systems for each pair and direction and experimented with different configurations and settings for both preprocessing and training. Even if most of them did not reach competitive performance, our experiments uncovered some interesting points for further investigation, namely the relation between dataset and model size, and the impact of the training framework. Moreover, the results of some of our preliminary experiments on the use of word embeddings initialization, backtranslation, and model depth were in contrast with previous work. The final results also show some disagreement in the automated metrics employed in the evaluation.

Files attached