About Me

Hello, I am Sugyeong Eo!
I am a Ph.D student in computer science and engineering at Korea university. I belong to NLP & AI Lab (Advisor: Prof. Heuiseok Lim). I am the founder and CSO of KU-NMT Group. Feel free to contact me!

Research Interest

Natural Language Processing, Neural Machine Translation, Quality Estimation, Question-Answer pair Generation (Question Generation), Curriculum Learning, Hallucination.

Education

2020.09 - : Graduate, Major in Computer Science and Engineering at Korea University
2016.02 - 2020.08: Undergraduate, Received B.A. degree, Major in Linguistics and Cognitive Science(1st), Language and Technology(2nd) at Hankuk University of Foreign Studies (HUFS)

Academic Services

Program committee: NAACL 2022-Industry Track
Program committee: ACL 2023
Program committee: EMNLP 2023

Publications

Top Conference (Main)

  1. KNOTICED: A Dataset for Critical Error Detection in English-Korean Machine Translation
    Sugyeong Eo, Jungwoo Lim, Chanjun Park, Dahyun Jung, Seonmin Koo, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    LREC-COLING 2024

  2. Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation
    Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, Heuiseok Lim
    EACL 2024

  3. Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing
    Chanjun Park, Jaehyung Seo, Seolhwa Lee, Junyoung Son, Hyeonseok Moon, Sugyeong Eo, Chanhee Lee, Heuiseok Lim
    EACL 2024

  4. KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing
    Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    EMNLP 2023

  5. CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients
    Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim
    EMNLP 2023

  6. Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection
    DaHyun Jung, Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    IJCNLP-AACL 2023

  7. Towards Diverse and Effective Question-Answer Pair Generation from Children Storybooks
    Sugyeong Eo, Hyeonseok Moon, Jinsung Kim, Yuna Hur, Jeongwook Kim, Songeun Lee, Changwoo Chun, Sungsoo Park, Heuiseok Lim
    ACL 2023 - Findings

  8. PEEP-Talk: A Situational Dialogue-based Chatbot for English Education
    Seungjun Lee, Yoonna Jang, Chanjun Park, Jungseob Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Seounghoon Lee, Bernardo Yahya, Heuiseok Lim
    ACL 2023 - Demo

  9. KU X Upstage’s submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    WMT 2022

  10. QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Gyeongmin Kim, Jungseob Lee, Heuiseok Lim
    COLING 2022

  11. A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
    Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
    NAACL 2022 - Findings

  12. Priming Ancient Korean Neural Machine Translation
    Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
    LREC 2022

  13. Empirical Analysis of Synthetic Data Generation Using Noising Strategies for Automatic Post-editing
    Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jeongsub Lee, Sugyeong Eo, Heuiseok Lim
    LREC 2022

  14. Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification
    Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    NAACL-HLT 2021 Industry Track- (Poster/Oral presentation)

Top Conference (Workshop)

  1. Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
    Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    ICML 2023 - DataPerf workshop

  2. Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
    Chanjun Park, Seonmin Koo, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    ICML 2023 - DataPerf workshop

  3. Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?
    SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim
    COLING 2022 - The 1st Workshop on Customized Chat Grounding Persona and Knowledge

  4. A Self-Supervised Automatic Post-Editing Data Generation Tool
    Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Seungjun Lee, Heuiseok Lim
    ICML 2022 – DataPerf workshop

  5. How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus
    Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
    NeurIPS 2021 - Data-centric AI (DCAI) workshop

  6. A New Tool for Efficiently Generating Quality Estimation Datasets
    Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
    NeurIPS 2021 - Data-centric AI (DCAI) workshop

  7. Automatic Knowledge Augmentation for Generative Commonsense Reasoning
    Jaehyung Seo, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    NeurIPS 2021 - Data-centric AI (DCAI) workshop

  8. BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text
    Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim
    ACL 2021 -WAT(Workshop on Asian Translation) 2021 Workshop

  9. Dealing with the Paradox of Quality Estimation
    Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
    MT Summit 2021 - LoResMT

International Journal (SCI/SCIE)

  1. Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction
    Seonmin Koo, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    IEEE Access, 2023

  2. Doubts on the reliability of parallel corpus filtering
    Hyeonseok Moon, Chanjun Park, Seonmin Koo, Jungseob Lee, Seungjun Lee, Jaehyung Seo, Sugyeong Eo, Yoonna Jang, Hyunjoong Kim, Hyoung-gyu Lee, Heuiseok Lim
    ESWA, 2023

  3. A Survey on Evaluation Metrics for Machine Translation
    Seungjun Lee, Jungseob Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo,Seonmin Koo, Heuiseok Lim
    Mathematics, 2023

  4. Enhancing Machine Translation Quality Estimation via Fine-grained Error Analysis and Large Language Model
    Dahyun Jung, Chanjun Park, Sugyeong Eo, Heuiseok Lim
    Mathematics, 2023

  5. Plain Template Insertion: Korean-Prompt-based Engineering for Few-shot Learners
    Jaehyung Seo, Hyeonseok Moon, Chanhee Lee, Sugyeong Eo, Chanjun Park, Jihoon Kim, Changwoo Chun, Heuiseok Lim
    IEEE Access, 2022

  6. PU-GEN: Enhancing Generative Commonsense Reasoning for Language Models with Human-Centered Knowledge
    Jaehyung Seo, Dongsuk Oh, Sugyeong Eo, Chanjun Park, Kisu Yang, Hyeonseok Moon, Kinam Park, Heuiseok Lim
    Knowledge-Based Systems, 2022

  7. BERTOEIC: Solving TOEIC Problems Using Simple and Efficient Data Augmentation Techniques with Pretrained Transformer Encoders
    Jeongwoo Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim
    Applied Sciences, 2022

  8. Return on Advertising Spend Prediction with Task Decomposition based LSTM Model
    Hyeonseok Moon, Taemin Lee, Jaehyung Seo, Chanjun Park, Sugyeong Eo, Imatitikua D. AIyanyo, Jeongbae Park, Aram So, Kyoungwha Ok, Kinam Park
    Mathematics, 2022

  9. Word-level Quality Estimation for Korean-English Neural Machine Translation
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    IEEE Access, 2022

  10. Dense-to-Question and Sparse-to-Answer: Hybrid Retriever System for Industrial Frequently Asked Questions
    Jaehyung Seo, Taemin Lee, Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Imatitikua D AIyanyo, Kinam Park, Aram So, Sungmin Ahn, Jeongbae Park
    Mathematics, 2022

  11. Mimicking Infants’ Bilingual Language Acquisition for Domain Specialized Neural Machine Translation
    Chanjun Park, Woo-Young Go, Sugyeong Eo, Hyeonseok Moon, Seolhwa Lee, Heuiseok Lim
    IEEE Access, 2022

  12. An Automatic Post Editing with Efficient and Simple Data Generation Method
    Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim
    IEEE Access, 2022

  13. Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC
    Chanjun Park, Midan Shim, Sugyeong Eo, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
    arxiv, 2021

  14. An Empirical Study on Automatic Post Editing for Neural Machine Translation
    Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
    IEEE Access, 2021

  15. Comparative Analysis of Current Approaches to Quality Estimation for Neural Machine Translation
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    Applied Sciences, 2021

Domestic Conference & Journal

Conference: 13 Journal: 10

Domestic Patents

1. DIVERSE AND EFFECTIVE QUESTION-ANSWER PAIR GENERATION SYSTEMS FOR EDUCATION
HeuiSeok Lim, Sugyeong Eo, Hyeonseok Moon, Jinsung Kim, Yuna Hur, Jeongwook Kim
Apply for a patent (10-2023-0024355)

1. DEVICE AND METHOD FOR GENERATING OF TRAINING DATA FOR QUALITY ESTIMATION IN MACHINE TRANSLATION
HeuiSeok Lim, Sugyeong Eo, Chanjun Park, Hyeonseok Moon
10-2023-0071825, 10-2593-447

2. DEVICE AND METHOD FOR GENERATING TRAINING DATA FOR AUTOMATIC POST EDITING
HeuiSeok Lim, Hyeonseok Moon, Chanjun Park, Sugyeong Eo
Apply for a patent (10-2021-0118924)

Book Chapters

Natural Language Processing Bible
HeuiSeok Lim, Korea University NLP&AI Lab
Human Science

Honors & Awards

  • Received Korea University Best Paper Award 2023
  • Received Naver Ph.D. Fellowship 2022
  • 1st place in Quality Estimation Shared Task 2022 - Sentence-level “Critical Error Detection”, WMT 2022 (EMNLP 2022)
  • Best Paper Award, The 34th Annual Conference on Human & Cognitive Language Technology (HCLT2022)
    ▶️ Paper: KoCED: 윤리 및 사회적 문제를 초래하는 기계번역 오류 탐지를 위한 학습 데이터셋 (KoCED: English-Korean Critical Error Detection Dataset)
  • Best Paper Award, The 33rd Annual Conference on Human & Cognitive Language Technology (HCLT2021) - NLP Application 2 Section
    ▶️ Paper: KommonGen: 한국어 생성 모델의 상식 추론 평가 데이터셋 (KommonGen: A Dataset for Korean Generative Commonsense Reasoning Evaluation)
  • Ranked 4th on the CommonGen 1.1 Leaderboard (Nov. 2022 Ranked 7th, CommonGen 1.1)

Invited Talk

  • Basic practice of natural language processing for everyone
    PLACE: Hankuk University of Foreign Studies (2022.07)