Have a questions?

견적 및 기술문의

mobile : 010-5139-4813

Kubernetes 모니터링: Prometheus + Grafana 스택 구축 | 소프트모아

Blog

Kubernetes 모니터링: Prometheus + Grafana 스택 구축

개요

Kubernetes 클러스터를 운영할 때 모니터링은 선택이 아닌 필수입니다. Prometheus와 Grafana는 클라우드 네이티브 모니터링의 사실상 표준으로, CNCF(Cloud Native Computing Foundation)의 졸업 프로젝트입니다. Prometheus가 메트릭을 수집하고 저장하면, Grafana가 이를 시각화하여 실시간 대시보드를 제공합니다.

핵심 개념

Prometheus + Grafana 스택의 아키텍처를 이해합니다.

Prometheus: Pull 기반 메트릭 수집 시스템으로 타임시리즈 DB에 저장
PromQL: Prometheus의 쿼리 언어로 메트릭 조회 및 집계
Grafana: 다양한 데이터 소스의 메트릭을 시각화하는 대시보드 플랫폼
AlertManager: Prometheus 알림 규칙에 따라 Slack, PagerDuty 등으로 알림 전송
ServiceMonitor: Kubernetes CRD로 Prometheus 스크래핑 대상을 선언적으로 정의

실전 예제

Helm을 사용하여 kube-prometheus-stack을 설치합니다.

# Helm 차트 추가
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# 네임스페이스 생성
kubectl create namespace monitoring

# kube-prometheus-stack 설치 (Prometheus + Grafana + AlertManager)
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.adminPassword=mySecurePassword \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

# 설치 확인
kubectl get pods -n monitoring

# Grafana 접속 (포트 포워딩)
kubectl port-forward svc/kube-prometheus-grafana 3000:80 -n monitoring

# Prometheus UI 접속
kubectl port-forward svc/kube-prometheus-kube-prome-prometheus 9090:9090 -n monitoring

애플리케이션 메트릭 수집을 위한 ServiceMonitor 설정입니다.

# service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus  # Helm 릴리스와 매칭
spec:
  namespaceSelector:
    matchNames:
      - default
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

---
# 알림 규칙 정의
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus
spec:
  groups:
    - name: my-app
      rules:
        - alert: HighErrorRate
          expr: |
            rate(http_requests_total{status=~"5.."}[5m])
            / rate(http_requests_total[5m]) > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High error rate detected"
            description: "Error rate is above 5% for 5 minutes"

        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Pod is crash looping"

자주 사용하는 PromQL 쿼리 예제입니다.

# CPU 사용률 (노드별)
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# 메모리 사용률 (노드별)
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Pod별 CPU 사용량
sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod)

# HTTP 요청 레이턴시 (95 퍼센타일)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# 네임스페이스별 네트워크 수신 트래픽
sum(rate(container_network_receive_bytes_total[5m])) by (namespace)

활용 팁

리텐션 정책: 메트릭 보존 기간과 스토리지를 적절히 설정하세요. 30일이 일반적이며, 장기 보존이 필요하면 Thanos나 Cortex를 사용합니다.
대시보드 관리: Grafana 대시보드는 JSON으로 내보내 Git에 관리하세요. grafana.com/dashboards에서 커뮤니티 대시보드를 가져와 커스터마이징하면 시간을 절약할 수 있습니다.
알림 피로 방지: 너무 많은 알림은 무시로 이어집니다. Critical, Warning, Info 레벨을 구분하고, Critical만 즉시 알림(PagerDuty)으로, Warning은 Slack 채널로 라우팅하세요.
카디널리티 관리: 라벨 조합이 많아지면 Prometheus 메모리 사용량이 급증합니다. 불필요한 라벨을 제거하고, 하이 카디널리티 메트릭은 Recording Rule로 사전 집계하세요.
SLO 기반 모니터링: 인프라 메트릭보다 SLI/SLO 기반 모니터링을 우선하세요. 사용자 경험에 직접 영향을 미치는 지표(응답 시간, 에러율, 가용성)를 중심으로 대시보드를 구성합니다.

마무리

Prometheus + Grafana 스택은 Kubernetes 모니터링의 표준이며, kube-prometheus-stack Helm 차트를 통해 쉽게 구축할 수 있습니다. 메트릭 수집, 시각화, 알림까지 통합된 환경을 구축하고, PromQL을 활용한 정교한 쿼리로 시스템의 상태를 실시간으로 파악하세요. 모니터링은 장애 대응이 아닌 장애 예방의 핵심 도구입니다.

tool

Show entries

Showing 11 to 20 of 38 entries

No	Title
2146	VS Code 2025 필수 확장 프로그램과 설정 최적화
2145	Terraform으로 인프라 코드화(IaC) 시작하기
2144	GitHub Actions 고급 워크플로우 설계 패턴
2143	Docker Compose v2와 컨테이너 오케스트레이션 실전
2142	Claude Code 완전 가이드: AI 코딩 에이전트 활용법
2033	Grafana + Prometheus 모니터링 구축
2032	Terraform으로 인프라 코드화 (IaC)
2031	Vim 에디터 기초와 생산성 향상
2030	Postman으로 API 테스트 자동화
2029	Jenkins CI/CD 파이프라인 구축

견적 및 기술문의

Blog

Contents

개요

핵심 개념

실전 예제

활용 팁

마무리

tool

기술문서

견적 및 상담

정보 및 소통

Login Here

견적 및 기술문의

Blog

Contents

개요

핵심 개념

실전 예제

활용 팁

마무리

tool

기술문서

견적 및 상담

정보 및 소통