세션1: AI,IoT P1-19_Text data augmentation for Korean_DANG THANH VU 수정
페이지 정보
조회 111회 작성일 22-05-10 17:33
본문
Data augmentation (DA) is a universal technique to reduce overfitting and increase the robustness of machine learning models by enhancing the quality and quantity of the training dataset. Data augmentation is essential in vision tasks but is rarely applied to text datasets since it is less straightforward. In particular, only a few studies about data augmentation for non-English language, e.g. Korean, the language is spoken by a minor population. This study fills the gap by demonstrating several common data augmentation methods and Korean corpora with pre-trained language models. In short, we evaluate the performance of two text data augmentation approaches, known as text transformation and back translation. We compare these augmentations among Korean corpus on four downstream tasks, which are semantic textual similarity (STS), natural language inference (NLI), question duplication verification (QDV), and sentiment classification (STC). Compared to without augmentation cases, the performance gain when applying text data augmentation is, 2.24%, 2.19%, 0.66%, and 0.08% on STS, NLI, QDV, and STC tasks, respectively.
첨부파일
- P1-19_Text data augmentation for Korean.pdf (220.5K) 8회 다운로드 | DATE : 2022-05-31 10:33:09
- 이전글P1-25_Attention PPO 강화학습을 활용한 드론의 자율 착륙 시스템 22.05.13