Speech Emotion Recognition & Classification System

Overview

At Humelo (휴멜로), I managed and developed a speech emotion classification system for the Emotional TTS pipeline. The system performed feature extraction and multi-class emotion detection from audio signals.

Key Achievements

Built feature extraction and classifier for multi-class emotion detection from audio signals
Integrated the emotion recognition module into the Emotional TTS pipeline

graph LR
    A[Audio Input] --> B[MFCC + Mel\nExtraction]
    B --> C[SpeechCNN / CRNN]
    C --> D[Emotion\nClassifier]
    D --> E[Multi-class\nOutput]

Technical Approach

Feature Extraction

Implemented feature extraction using MFCC and Mel-spectrogram analysis from audio signals.

Classifier Architecture

Developed and evaluated two architectures:

SpeechCNN: A convolutional neural network for audio-based emotion classification
CRNN: A convolutional recurrent neural network combining CNN with recurrent layers

Multi-Class Emotion Detection

The system classified audio segments into multiple emotion categories.

Tech Stack

Deep Learning Framework: TensorFlow
Architectures: SpeechCNN, CRNN
Audio Features: MFCC, Mel-spectrogram analysis
Domain: Speech emotion recognition, multi-class classification

Period

April 2018 - February 2019 | Humelo (휴멜로)

Overview#

Key Achievements#

Technical Approach#

Feature Extraction#

Classifier Architecture#

Multi-Class Emotion Detection#

Tech Stack#

Period#