Overview

At RIIID (뤼이드), built a model registry using MLFlow and dataset pipelines using Airflow, Athena, and BigQuery, serving 4+ products.

Key Achievements

  • Built model registry with MLFlow
  • Built dataset pipelines with Airflow, Athena, and BigQuery
  • Infrastructure served 4+ products: SANTA TOEIC, IVYGlobal SAT, CASA GRANDE, and INICIE
graph TD
    subgraph Products
        P1[SANTA TOEIC] ~~~ P2[IVYGlobal SAT] ~~~ P3[CASA GRANDE] ~~~ P4[INICIE]
    end
    subgraph Serving
        S1[BentoML] ~~~ S2[Model Registry]
    end
    subgraph Training
        T1[MLFlow] ~~~ T2[Multi-GPU DDP]
    end
    subgraph Pipeline
        L1[Airflow DAGs] ~~~ L2[Dataset Pipeline]
    end
    subgraph Storage
        D1[Athena] ~~~ D2[BigQuery] ~~~ D3[S3]
    end

    Products --> Serving --> Training --> Pipeline --> Storage

Technical Approach

  • Model Registry: Built on MLFlow for model versioning and experiment tracking
  • Dataset Pipelines: Orchestrated with Apache Airflow, using AWS Athena and GCP BigQuery for data processing
  • Containerization: Pipeline components containerized with Docker

Tech Stack

  • Model Management: MLFlow
  • Orchestration: Apache Airflow
  • Data Processing: AWS Athena, GCP BigQuery
  • Containerization: Docker
  • Products Served: SANTA TOEIC, IVYGlobal SAT, CASA GRANDE, INICIE

Period

January 2021 - September 2021 | RIIID (뤼이드)

Talks

  • PyCon KR 2021: “하나의 코드 베이스, 파이프라인으로 여러 도메인에 AI 모델들을 배포할 수 있을까” (Can We Deploy AI Models Across Multiple Domains with a Single Codebase and Pipeline?)