세션1: AI,IoT P1-19_Text data augmentation for Korean_DANG THANH VU 수정
페이지 정보
조회 104회 작성일 22-05-10 17:33
본문
Data augmentation (DA) is a universal technique to reduce overfitting and increase the robustness of machine learning models by enhancing the quality and quantity of the training dataset. Data augmentation is essential in vision tasks but is rarely applied to text datasets since it is less straightforward. In particular, only a few studies about data augmentation for non-English language, e.g. Korean, the language is spoken by a minor population. This study fills the gap by demonstrating several common data augmentation methods and Korean corpora with pre-trained language models. In short, we evaluate the performance of two text data augmentation approaches, known as text transformation and back translation. We compare these augmentations among Korean corpus on four downstream tasks, which are semantic textual similarity (STS), natural language inference (NLI), question duplication verification (QDV), and sentiment classification (STC). Compared to without augmentation cases, the performance gain when applying text data augmentation is, 2.24%, 2.19%, 0.66%, and 0.08% on STS, NLI, QDV, and STC tasks, respectively.
첨부파일
- P1-19_Text data augmentation for Korean.pdf (220.5K) 8회 다운로드 | DATE : 2022-05-31 10:33:09