Overview
At Humelo (휴멜로), I managed and developed a speech emotion classification system for the Emotional TTS pipeline. The system performed feature extraction and multi-class emotion detection from audio signals.
Key Achievements
- Built feature extraction and classifier for multi-class emotion detection from audio signals
- Integrated the emotion recognition module into the Emotional TTS pipeline
graph LR
A[Audio Input] --> B[MFCC + Mel\nExtraction]
B --> C[SpeechCNN / CRNN]
C --> D[Emotion\nClassifier]
D --> E[Multi-class\nOutput]
Technical Approach
Feature Extraction
Implemented feature extraction using MFCC and Mel-spectrogram analysis from audio signals.
Classifier Architecture
Developed and evaluated two architectures:
- SpeechCNN: A convolutional neural network for audio-based emotion classification
- CRNN: A convolutional recurrent neural network combining CNN with recurrent layers
Multi-Class Emotion Detection
The system classified audio segments into multiple emotion categories.
Tech Stack
- Deep Learning Framework: TensorFlow
- Architectures: SpeechCNN, CRNN
- Audio Features: MFCC, Mel-spectrogram analysis
- Domain: Speech emotion recognition, multi-class classification
Period
April 2018 - February 2019 | Humelo (휴멜로)
