2022 공학심포지엄

학생포스터 발표

발표포스터 제출안내

  • 제출일시: 2022. 5. 31.(화) 18:00까지
    파일규격: 가로 90cm * 세로 120cm(형식은 자유롭게, 반드시 pdf로 전환)
    파 일 명: 예) P1-01 OOO(성명)
    업로드 방법: 본인의 논문제목 클릭 >글수정 비번(초록제출시 입력한 번호)입력 후 ‘확인’클릭 > 하단 파일선택 클릭해서 논문초록(pdf파일) 탑재 > 자동등록방지 숫자 입력 > 작성완료 클릭

세션1: AI,IoT P1-19_Text data augmentation for Korean_DANG THANH VU 수정

페이지 정보


조회 104회 작성일 22-05-10 17:33

본문

Data augmentation (DA) is a universal technique to reduce overfitting and increase the robustness of machine learning models by enhancing the quality and quantity of the training dataset. Data augmentation is essential in vision tasks but is rarely applied to text datasets since it is less straightforward. In particular, only a few studies about data augmentation for non-English language, e.g. Korean, the language is spoken by a minor population. This study fills the gap by demonstrating several common data augmentation methods and Korean corpora with pre-trained language models.  In short, we evaluate the performance of two text data augmentation approaches, known as text transformation and back translation. We compare these augmentations among Korean corpus on four downstream tasks, which are semantic textual similarity (STS), natural language inference (NLI), question duplication verification (QDV), and sentiment classification (STC). Compared to without augmentation cases, the performance gain when applying text data augmentation is, 2.24%, 2.19%, 0.66%, and 0.08% on STS, NLI, QDV, and STC tasks, respectively.

첨부파일