PROD · DEPLOYMENT

배포 전략

🐳 Docker & K8s 🐦 카나리 배포 ⚖️ 오토스케일링

Agent 서비스를 컨테이너화하고 Kubernetes로 배포합니다. 카나리·블루-그린 전략으로 무중단 업데이트와 오토스케일링을 구현합니다.

프로덕션 Docker 이미지

dockerfileDockerfile — 멀티 스테이지 빌드

# ─── 1단계: 의존성 빌드 ──────────────────────────────
FROM python:3.12-slim AS builder
WORKDIR /build

RUN pip install uv
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-editable

# ─── 2단계: 프로덕션 이미지 ─────────────────────────
FROM python:3.12-slim AS production
WORKDIR /app

# 비 root 사용자 (보안)
RUN groupadd -r agent && useradd -r -g agent agent

# 의존성만 복사 (소스코드 제외)
COPY --from=builder /build/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"

COPY --chown=agent:agent . .
USER agent

# 헬스체크
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

EXPOSE 8000

CMD ["uvicorn", "main:app", \
     "--host", "0.0.0.0", \
     "--port", "8000", \
     "--workers", "4", \
     "--timeout-keep-alive", "120"]

Kubernetes 배포 매니페스트

yamlagent-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: research-agent
  labels:
    app: research-agent
    version: v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: research-agent
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # 최대 1개 초과 실행
      maxUnavailable: 0  # 항상 요청 replicas 유지
  template:
    metadata:
      labels:
        app: research-agent
        version: v2
    spec:
      containers:
        - name: agent
          image: myrepo/research-agent:2.1.0
          ports:
            - containerPort: 8000
          env:
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: api-secrets
                  key: anthropic-api-key
            - name: LANGCHAIN_API_KEY
              valueFrom:
                secretKeyRef:
                  name: api-secrets
                  key: langsmith-api-key
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "2Gi"
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 30
            periodSeconds: 10
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: research-agent-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: research-agent
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Pods
      pods:
        metric:
          name: agent_active_tasks  # Prometheus 커스텀 메트릭
        target:
          type: AverageValue
          averageValue: "10"

카나리 배포 — Argo Rollouts

yamlcanary-rollout.yaml — 점진적 트래픽 전환

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: research-agent-rollout
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 5      # 5% 트래픽 → 신규 버전
        - pause: {duration: 10m}   # 10분 관찰
        - analysis:         # 자동 메트릭 검사
            templates:
              - templateName: success-rate
        - setWeight: 30     # 30%로 확대
        - pause: {duration: 30m}
        - setWeight: 60
        - pause: {duration: 30m}
        - setWeight: 100    # 전체 전환
      analysis:
        successCondition: "result[0] >= 0.95"  # 95% 성공률
        failureLimit: 2
  selector:
    matchLabels:
      app: research-agent
  template:
    metadata:
      labels:
        app: research-agent
    spec:
      containers:
        - name: agent
          image: myrepo/research-agent:2.2.0

배포 전략 비교

전략	다운타임	위험도	롤백 속도	적합 케이스
Rolling Update	없음	낮음	중간	일반 업데이트
Blue-Green	없음	매우 낮음	즉시	대규모 변경, DB 마이그레이션
Canary	없음	최저	빠름	AI 모델 업데이트, 신기능
Recreate	있음	높음	느림	개발 환경만

💡 AI Agent 배포 특수 고려사항

상태 유지 — LangGraph 체크포인터가 외부 DB(Postgres)를 사용하면 Pod가 교체돼도 대화 이력이 유지됩니다
모델 버전 롤백 — 환경변수로 모델명 관리(MODEL_NAME=claude-opus-4-7)하면 코드 배포 없이 모델 전환 가능
워밍업 시간 — Agent 첫 요청은 모델 로딩으로 느릴 수 있으니 readinessProbe initialDelaySeconds를 넉넉히 설정

관찰 가능성 ↑ 목차 비용 최적화