About Me
Hello, I am Sugyeong Eo!
I am a Ph.D student in computer science and engineering at Korea university. I belong to NLP & AI Lab (Advisor: Prof. Heuiseok Lim). I am the founder and CSO of KU-NMT Group. Feel free to contact me!
Research Interest
Natural Language Processing, Neural Machine Translation, Quality Estimation, Question-Answer pair Generation (Question Generation), Curriculum Learning, Hallucination.
Education
2020.09 - : Graduate, Major in Computer Science and Engineering at Korea University
2016.02 - 2020.08: Undergraduate, Received B.A. degree, Major in Linguistics and Cognitive Science(1st), Language and Technology(2nd) at Hankuk University of Foreign Studies (HUFS)
Academic Services
Program committee: NAACL 2022-Industry Track
Program committee: ACL 2023
Program committee: EMNLP 2023
Publications
Top Conference (Main)
-
KNOTICED: A Dataset for Critical Error Detection in English-Korean Machine Translation
Sugyeong Eo, Jungwoo Lim, Chanjun Park, Dahyun Jung, Seonmin Koo, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
LREC-COLING 2024 -
Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation
Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, Heuiseok Lim
EACL 2024 -
Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing
Chanjun Park, Jaehyung Seo, Seolhwa Lee, Junyoung Son, Hyeonseok Moon, Sugyeong Eo, Chanhee Lee, Heuiseok Lim
EACL 2024 -
KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing
Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
EMNLP 2023 -
CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients
Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim
EMNLP 2023 -
Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection
DaHyun Jung, Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
IJCNLP-AACL 2023 -
Towards Diverse and Effective Question-Answer Pair Generation from Children Storybooks
Sugyeong Eo, Hyeonseok Moon, Jinsung Kim, Yuna Hur, Jeongwook Kim, Songeun Lee, Changwoo Chun, Sungsoo Park, Heuiseok Lim
ACL 2023 - Findings -
PEEP-Talk: A Situational Dialogue-based Chatbot for English Education
Seungjun Lee, Yoonna Jang, Chanjun Park, Jungseob Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Seounghoon Lee, Bernardo Yahya, Heuiseok Lim
ACL 2023 - Demo -
KU X Upstage’s submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task
Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
WMT 2022 -
QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation
Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Gyeongmin Kim, Jungseob Lee, Heuiseok Lim
COLING 2022 -
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
NAACL 2022 - Findings -
Priming Ancient Korean Neural Machine Translation
Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
LREC 2022 -
Empirical Analysis of Synthetic Data Generation Using Noising Strategies for Automatic Post-editing
Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jeongsub Lee, Sugyeong Eo, Heuiseok Lim
LREC 2022 -
Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification
Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
NAACL-HLT 2021 Industry Track- (Poster/Oral presentation)
Top Conference (Workshop)
-
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
ICML 2023 - DataPerf workshop -
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
Chanjun Park, Seonmin Koo, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
ICML 2023 - DataPerf workshop -
Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?
SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim
COLING 2022 - The 1st Workshop on Customized Chat Grounding Persona and Knowledge -
A Self-Supervised Automatic Post-Editing Data Generation Tool
Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Seungjun Lee, Heuiseok Lim
ICML 2022 – DataPerf workshop -
How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus
Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
NeurIPS 2021 - Data-centric AI (DCAI) workshop -
A New Tool for Efficiently Generating Quality Estimation Datasets
Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
NeurIPS 2021 - Data-centric AI (DCAI) workshop -
Automatic Knowledge Augmentation for Generative Commonsense Reasoning
Jaehyung Seo, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
NeurIPS 2021 - Data-centric AI (DCAI) workshop -
BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text
Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim
ACL 2021 -WAT(Workshop on Asian Translation) 2021 Workshop -
Dealing with the Paradox of Quality Estimation
Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
MT Summit 2021 - LoResMT
International Journal (SCI/SCIE)
-
Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction
Seonmin Koo, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
IEEE Access, 2023 -
Doubts on the reliability of parallel corpus filtering
Hyeonseok Moon, Chanjun Park, Seonmin Koo, Jungseob Lee, Seungjun Lee, Jaehyung Seo, Sugyeong Eo, Yoonna Jang, Hyunjoong Kim, Hyoung-gyu Lee, Heuiseok Lim
ESWA, 2023 -
A Survey on Evaluation Metrics for Machine Translation
Seungjun Lee, Jungseob Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo,Seonmin Koo, Heuiseok Lim
Mathematics, 2023 -
Enhancing Machine Translation Quality Estimation via Fine-grained Error Analysis and Large Language Model
Dahyun Jung, Chanjun Park, Sugyeong Eo, Heuiseok Lim
Mathematics, 2023 -
Plain Template Insertion: Korean-Prompt-based Engineering for Few-shot Learners
Jaehyung Seo, Hyeonseok Moon, Chanhee Lee, Sugyeong Eo, Chanjun Park, Jihoon Kim, Changwoo Chun, Heuiseok Lim
IEEE Access, 2022 -
PU-GEN: Enhancing Generative Commonsense Reasoning for Language Models with Human-Centered Knowledge
Jaehyung Seo, Dongsuk Oh, Sugyeong Eo, Chanjun Park, Kisu Yang, Hyeonseok Moon, Kinam Park, Heuiseok Lim
Knowledge-Based Systems, 2022 -
BERTOEIC: Solving TOEIC Problems Using Simple and Efficient Data Augmentation Techniques with Pretrained Transformer Encoders
Jeongwoo Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim
Applied Sciences, 2022 -
Return on Advertising Spend Prediction with Task Decomposition based LSTM Model
Hyeonseok Moon, Taemin Lee, Jaehyung Seo, Chanjun Park, Sugyeong Eo, Imatitikua D. AIyanyo, Jeongbae Park, Aram So, Kyoungwha Ok, Kinam Park
Mathematics, 2022 -
Word-level Quality Estimation for Korean-English Neural Machine Translation
Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
IEEE Access, 2022 -
Dense-to-Question and Sparse-to-Answer: Hybrid Retriever System for Industrial Frequently Asked Questions
Jaehyung Seo, Taemin Lee, Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Imatitikua D AIyanyo, Kinam Park, Aram So, Sungmin Ahn, Jeongbae Park
Mathematics, 2022 -
Mimicking Infants’ Bilingual Language Acquisition for Domain Specialized Neural Machine Translation
Chanjun Park, Woo-Young Go, Sugyeong Eo, Hyeonseok Moon, Seolhwa Lee, Heuiseok Lim
IEEE Access, 2022 -
An Automatic Post Editing with Efficient and Simple Data Generation Method
Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim
IEEE Access, 2022 -
Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC
Chanjun Park, Midan Shim, Sugyeong Eo, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
arxiv, 2021 -
An Empirical Study on Automatic Post Editing for Neural Machine Translation
Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
IEEE Access, 2021 -
Comparative Analysis of Current Approaches to Quality Estimation for Neural Machine Translation
Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
Applied Sciences, 2021
Domestic Conference & Journal
Conference: 13 Journal: 10
Domestic Patents
1. DIVERSE AND EFFECTIVE QUESTION-ANSWER PAIR GENERATION SYSTEMS FOR EDUCATION
HeuiSeok Lim, Sugyeong Eo, Hyeonseok Moon, Jinsung Kim, Yuna Hur, Jeongwook Kim
Apply for a patent (10-2023-0024355)
1. DEVICE AND METHOD FOR GENERATING OF TRAINING DATA FOR QUALITY ESTIMATION IN MACHINE TRANSLATION
HeuiSeok Lim, Sugyeong Eo, Chanjun Park, Hyeonseok Moon
10-2023-0071825, 10-2593-447
2. DEVICE AND METHOD FOR GENERATING TRAINING DATA FOR AUTOMATIC POST EDITING
HeuiSeok Lim, Hyeonseok Moon, Chanjun Park, Sugyeong Eo
Apply for a patent (10-2021-0118924)
Book Chapters
Natural Language Processing Bible
HeuiSeok Lim, Korea University NLP&AI Lab
Human Science
Honors & Awards
- Received Korea University Best Paper Award 2023
- Received Naver Ph.D. Fellowship 2022
- 1st place in Quality Estimation Shared Task 2022 - Sentence-level “Critical Error Detection”, WMT 2022 (EMNLP 2022)
- Best Paper Award, The 34th Annual Conference on Human & Cognitive Language Technology (HCLT2022)
▶️ Paper: KoCED: 윤리 및 사회적 문제를 초래하는 기계번역 오류 탐지를 위한 학습 데이터셋 (KoCED: English-Korean Critical Error Detection Dataset) - Best Paper Award, The 33rd Annual Conference on Human & Cognitive Language Technology (HCLT2021) - NLP Application 2 Section
▶️ Paper: KommonGen: 한국어 생성 모델의 상식 추론 평가 데이터셋 (KommonGen: A Dataset for Korean Generative Commonsense Reasoning Evaluation) - Ranked 4th on the CommonGen 1.1 Leaderboard (Nov. 2022 Ranked 7th, CommonGen 1.1)
Invited Talk
- Basic practice of natural language processing for everyone
PLACE: Hankuk University of Foreign Studies (2022.07)