P1-19_Text data augmentation for Korean > 학생포스터 발표

발표포스터 제출안내

제출일시: 2022. 5. 31.(화) 18:00까지

파일규격: 가로 90cm * 세로 120cm(형식은 자유롭게, 반드시 pdf로 전환)

파 일 명: 예) P1-01 OOO(성명)

업로드 방법: 본인의 논문제목 클릭 >글수정 비번(초록제출시 입력한 번호)입력 후 ‘확인’클릭 > 하단 파일선택 클릭해서 논문초록(pdf파일) 탑재 > 자동등록방지 숫자 입력 > 작성완료 클릭

세션1: AI,IoT P1-19_Text data augmentation for Korean_DANG THANH VU 수정

페이지 정보

조회 111회 작성일 22-05-10 17:33

목록
- 수정
- 검색

본문

Data augmentation (DA) is a universal technique to reduce overfitting and increase the robustness of machine learning models by enhancing the quality and quantity of the training dataset. Data augmentation is essential in vision tasks but is rarely applied to text datasets since it is less straightforward. In particular, only a few studies about data augmentation for non-English language, e.g. Korean, the language is spoken by a minor population. This study fills the gap by demonstrating several common data augmentation methods and Korean corpora with pre-trained language models. In short, we evaluate the performance of two text data augmentation approaches, known as text transformation and back translation. We compare these augmentations among Korean corpus on four downstream tasks, which are semantic textual similarity (STS), natural language inference (NLI), question duplication verification (QDV), and sentiment classification (STC). Compared to without augmentation cases, the performance gain when applying text data augmentation is, 2.24%, 2.19%, 0.66%, and 0.08% on STS, NLI, QDV, and STC tasks, respectively.

첨부파일

P1-19_Text data augmentation for Korean.pdf (220.5K) 8회 다운로드 | DATE : 2022-05-31 10:33:09

이전글P1-25_Attention PPO 강화학습을 활용한 드론의 자율 착륙 시스템 22.05.13

2022 공학심포지엄

학생포스터 발표

발표포스터 제출안내

페이지 정보

본문

첨부파일