Deep-Learning

Figure 3: End-to-end speech scoring model architecture from InterSpeech paper

Prompt-Aware Speech Scoring System for Second Language Learners

Developed a prompt-aware automatic speech scoring system at RIIID that solves the cold-start item problem using BERT/CLIP prompt embeddings. Published at InterSpeech 2023 and deployed to SANTA Say TOEIC Speaking app.

Figure 1: RCL contrastive learning modules from SURCL paper

Knowledge Tracing with Contrastive Learning

Proposed ACCL and RCL contrastive learning methods at RIIID, achieving state-of-the-art on student modeling across 6 benchmarks (dropout prediction, knowledge tracing). Deployed to Santa TOEIC platform.

ICASSP 2019 Duration Controllable TTS: attention alignment and PDC model architecture

Emotional Text-to-Speech and Voice Conversion Systems

Led development of duration-controllable TTS and emotional voice conversion at Humelo, producing two ICASSP publications (2019 Oral 1st author, 2020). Won Minister of Science and ICT Special Award at K-Startup 2018.

CBRNN architecture with transfer learning for polyphonic sound event detection (ICASSP 2019)

Polyphonic Sound Event Detection with Transfer Learning

Developed convolutional bidirectional LSTM with synthetic data-based transfer learning for polyphonic sound event detection at Humelo, achieving +28.4% F1 improvement. Published at ICASSP 2019 as corresponding author.

Speech Emotion Recognition & Classification System

Built a multi-class speech emotion recognition system at Humelo using SpeechCNN and CRNN architectures with MFCC/Mel-spectrogram features, integrated into the Emotional TTS pipeline.

Overall system architecture for EEG signal classification with Deep RL (IEEE SMC 2018)

Attentional Control for Time-Series Data (Master's Thesis)

Master’s thesis at KAIST on attentional control for time-series classification and synthesis, solving the memory-based vs. memoryless trade-off for EEG signals. Oral presentation at IEEE SMC 2018.