Have a questions?

견적 및 기술문의

mobile : 010-5139-4813

Kubernetes 1.31 신기능과 클러스터 운영 팁 | 소프트모아

Blog

Kubernetes 1.31 신기능과 클러스터 운영 팁

개요

Kubernetes 1.31이 2024년 8월에 릴리스되면서 여러 중요한 기능이 추가되고 안정화되었습니다. 이 글에서는 1.31 버전의 주요 신기능과 실제 클러스터 운영에 활용할 수 있는 실전 팁을 다룹니다. 프로덕션 환경에서 검증된 모범 사례를 중심으로 설명합니다.

핵심 개념

Kubernetes는 컨테이너 오케스트레이션 플랫폼으로, 선언적 API를 통해 애플리케이션의 배포, 스케일링, 관리를 자동화합니다. 1.31 버전에서는 성능 개선, 보안 강화, 운영 편의성 향상에 초점을 맞췄습니다.

1.31 주요 신기능

1. AppArmor 기본 지원 (Stable)

AppArmor가 정식 기능으로 승격되어 Pod 보안을 강화할 수 있습니다.

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  containers:
  - name: app
    image: nginx:1.25
    securityContext:
      appArmorProfile:
        type: RuntimeDefault  # 기본 프로파일 사용
  # 또는 커스텀 프로파일
  - name: custom-app
    image: myapp:latest
    securityContext:
      appArmorProfile:
        type: Localhost
        localhostProfile: my-custom-profile

2. PersistentVolume Last Phase Transition Time (Beta)

PV 상태 변화 시간을 추적하여 스토리지 문제를 빠르게 진단할 수 있습니다.

# PV 상태 변화 확인
kubectl get pv pv-001 -o yaml | grep lastPhaseTransitionTime
# lastPhaseTransitionTime: "2025-02-11T10:30:00Z"

# 오래된 Released PV 정리 스크립트
kubectl get pv -o json | jq -r '.items[] | select(.status.phase=="Released") | select(.status.lastPhaseTransitionTime < (now - 86400 | strftime("%Y-%m-%dT%H:%M:%SZ"))) | .metadata.name'

3. CRI Stats Provider (Beta)

kubelet이 CRI를 통해 컨테이너 메트릭을 직접 가져와 성능이 향상되었습니다.

# kubelet config에 기능 활성화
echo 'KUBELET_EXTRA_ARGS="--feature-gates=CRIStatsProvider=true"' | sudo tee -a /etc/default/kubelet
sudo systemctl restart kubelet

# 메트릭 확인
kubectl top nodes
kubectl top pods -A

실전 운영 팁

1. Resource Quotas와 LimitRange 적절히 설정

네임스페이스별 리소스를 제한하여 클러스터 안정성을 확보합니다.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "10"
    pods: "50"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - max:
      cpu: "4"
      memory: 8Gi
    min:
      cpu: 100m
      memory: 128Mi
    default:
      cpu: "1"
      memory: 1Gi
    defaultRequest:
      cpu: 500m
      memory: 512Mi
    type: Container

2. Pod Disruption Budget (PDB) 설정

노드 유지보수나 업그레이드 시 가용성을 보장합니다.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2  # 최소 2개 Pod는 항상 실행
  selector:
    matchLabels:
      app: api-server
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: worker-pdb
spec:
  maxUnavailable: 1  # 동시에 1개만 중단 가능
  selector:
    matchLabels:
      app: worker

3. HPA (Horizontal Pod Autoscaler) 고급 설정

CPU/메모리 외에 커스텀 메트릭으로 오토스케일링을 구성합니다.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5분 동안 안정화
      policies:
      - type: Percent
        value: 50  # 한 번에 50%까지만 축소
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100  # 빠른 확장
        periodSeconds: 30

4. 노드 선택과 Affinity 전략

워크로드를 적절한 노드에 배치하여 성능과 비용을 최적화합니다.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-workload
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-training
  template:
    metadata:
      labels:
        app: ml-training
    spec:
      nodeSelector:
        gpu: "true"  # GPU 노드에만 배치
      affinity:
        # Pod 간 분산 배치
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: ml-training
              topologyKey: kubernetes.io/hostname
        # 특정 존에 우선 배치
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 80
            preference:
              matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - us-west-2a
      tolerations:
      - key: gpu
        operator: Equal
        value: "true"
        effect: NoSchedule
      containers:
      - name: trainer
        image: ml-training:v2
        resources:
          limits:
            nvidia.com/gpu: 1

5. 클러스터 헬스 체크 자동화

#!/bin/bash
# k8s-health-check.sh

echo "=== Kubernetes Cluster Health Check ==="
echo ""

# 노드 상태
echo "[Node Status]"
kubectl get nodes -o wide
echo ""

# 비정상 Pod 확인
echo "[Unhealthy Pods]"
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
echo ""

# PVC Pending 확인
echo "[Pending PVCs]"
kubectl get pvc -A | grep Pending
echo ""

# 리소스 사용량 Top 5
echo "[Top 5 CPU Pods]"
kubectl top pods -A --sort-by=cpu | head -6
echo ""
echo "[Top 5 Memory Pods]"
kubectl top pods -A --sort-by=memory | head -6
echo ""

# Certificate 만료 확인 (kubeadm 클러스터)
if command -v kubeadm &> /dev/null; then
    echo "[Certificate Expiration]"
    sudo kubeadm certs check-expiration
fi
echo ""

# etcd 상태 (control plane에서 실행 시)
echo "[etcd Health]"
kubectl -n kube-system exec etcd-master-node -- etcdctl   --endpoints=https://127.0.0.1:2379   --cacert=/etc/kubernetes/pki/etcd/ca.crt   --cert=/etc/kubernetes/pki/etcd/server.crt   --key=/etc/kubernetes/pki/etcd/server.key   endpoint health 2>/dev/null || echo "etcd check skipped"

활용 팁

네임스페이스 분리: 환경별(dev/staging/prod), 팀별로 네임스페이스를 나누고 RBAC로 접근 제어합니다.
GitOps 도입: ArgoCD나 Flux를 사용해 배포를 Git 기반으로 관리하면 추적성과 재현성이 향상됩니다.
모니터링 필수: Prometheus + Grafana를 설치하고, kube-state-metrics로 클러스터 상태를 시각화합니다.
로그 중앙화: EFK(Elasticsearch, Fluentd, Kibana) 스택이나 Loki로 모든 Pod 로그를 중앙 관리합니다.
백업 전략: Velero로 etcd와 PV를 정기적으로 백업하고, 복구 테스트를 주기적으로 수행합니다.
보안 스캔: Trivy나 Falco로 이미지 취약점과 런타임 위협을 감지합니다.

마무리

Kubernetes 1.31은 안정성과 보안이 크게 개선된 버전입니다. AppArmor 정식 지원, PV 상태 추적, CRI Stats Provider 등의 신기능을 활용하면 클러스터 운영이 한층 수월해집니다. 프로덕션 환경에서는 Resource Quotas, PDB, HPA를 반드시 설정하고, 정기적인 헬스 체크와 백업을 자동화하세요. Kubernetes는 강력하지만 복잡한 도구이므로, 작은 클러스터에서 충분히 테스트한 후 점진적으로 확장하는 것이 안전합니다.

os

Show entries

Showing 21 to 30 of 58 entries

No	Title
2038	systemd 서비스 관리
2037	Linux 권한 관리 - chmod, chown 완벽 이해
2036	Linux 프로세스 관리 - ps, top, htop
2035	Shell Script 기초 - Bash 프로그래밍
2034	Linux 명령어 치트시트 - 자주 쓰는 100가지
1744	[ window ] ssd,하드디스크 복원,복구 불가하게 삭제하기
1425	[ centos ] ftp 자동전송 스크립트
1424	[ centos ] sshpass scp 로 다른포트 파일 전송
1423	[ centos , oracle ] 오라클 자동으로 백업하기
1422	[ centos ] CurlFtpFs 설치

Blog

Kubernetes 1.31 신기능과 클러스터 운영 팁

see List

개요

핵심 개념

1.31 주요 신기능

1. AppArmor 기본 지원 (Stable)

AppArmor가 정식 기능으로 승격되어 Pod 보안을 강화할 수 있습니다.

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  containers:
  - name: app
    image: nginx:1.25
    securityContext:
      appArmorProfile:
        type: RuntimeDefault  # 기본 프로파일 사용
  # 또는 커스텀 프로파일
  - name: custom-app
    image: myapp:latest
    securityContext:
      appArmorProfile:
        type: Localhost
        localhostProfile: my-custom-profile

2. PersistentVolume Last Phase Transition Time (Beta)

PV 상태 변화 시간을 추적하여 스토리지 문제를 빠르게 진단할 수 있습니다.

# PV 상태 변화 확인
kubectl get pv pv-001 -o yaml | grep lastPhaseTransitionTime
# lastPhaseTransitionTime: "2025-02-11T10:30:00Z"

# 오래된 Released PV 정리 스크립트
kubectl get pv -o json | jq -r '.items[] | select(.status.phase=="Released") | select(.status.lastPhaseTransitionTime < (now - 86400 | strftime("%Y-%m-%dT%H:%M:%SZ"))) | .metadata.name'

3. CRI Stats Provider (Beta)

kubelet이 CRI를 통해 컨테이너 메트릭을 직접 가져와 성능이 향상되었습니다.

# kubelet config에 기능 활성화
echo 'KUBELET_EXTRA_ARGS="--feature-gates=CRIStatsProvider=true"' | sudo tee -a /etc/default/kubelet
sudo systemctl restart kubelet

# 메트릭 확인
kubectl top nodes
kubectl top pods -A

실전 운영 팁

1. Resource Quotas와 LimitRange 적절히 설정

네임스페이스별 리소스를 제한하여 클러스터 안정성을 확보합니다.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "10"
    pods: "50"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - max:
      cpu: "4"
      memory: 8Gi
    min:
      cpu: 100m
      memory: 128Mi
    default:
      cpu: "1"
      memory: 1Gi
    defaultRequest:
      cpu: 500m
      memory: 512Mi
    type: Container

2. Pod Disruption Budget (PDB) 설정

노드 유지보수나 업그레이드 시 가용성을 보장합니다.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2  # 최소 2개 Pod는 항상 실행
  selector:
    matchLabels:
      app: api-server
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: worker-pdb
spec:
  maxUnavailable: 1  # 동시에 1개만 중단 가능
  selector:
    matchLabels:
      app: worker

3. HPA (Horizontal Pod Autoscaler) 고급 설정

CPU/메모리 외에 커스텀 메트릭으로 오토스케일링을 구성합니다.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5분 동안 안정화
      policies:
      - type: Percent
        value: 50  # 한 번에 50%까지만 축소
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100  # 빠른 확장
        periodSeconds: 30

4. 노드 선택과 Affinity 전략

워크로드를 적절한 노드에 배치하여 성능과 비용을 최적화합니다.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-workload
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-training
  template:
    metadata:
      labels:
        app: ml-training
    spec:
      nodeSelector:
        gpu: "true"  # GPU 노드에만 배치
      affinity:
        # Pod 간 분산 배치
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: ml-training
              topologyKey: kubernetes.io/hostname
        # 특정 존에 우선 배치
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 80
            preference:
              matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - us-west-2a
      tolerations:
      - key: gpu
        operator: Equal
        value: "true"
        effect: NoSchedule
      containers:
      - name: trainer
        image: ml-training:v2
        resources:
          limits:
            nvidia.com/gpu: 1

5. 클러스터 헬스 체크 자동화

#!/bin/bash
# k8s-health-check.sh

echo "=== Kubernetes Cluster Health Check ==="
echo ""

# 노드 상태
echo "[Node Status]"
kubectl get nodes -o wide
echo ""

# 비정상 Pod 확인
echo "[Unhealthy Pods]"
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
echo ""

# PVC Pending 확인
echo "[Pending PVCs]"
kubectl get pvc -A | grep Pending
echo ""

# 리소스 사용량 Top 5
echo "[Top 5 CPU Pods]"
kubectl top pods -A --sort-by=cpu | head -6
echo ""
echo "[Top 5 Memory Pods]"
kubectl top pods -A --sort-by=memory | head -6
echo ""

# Certificate 만료 확인 (kubeadm 클러스터)
if command -v kubeadm &> /dev/null; then
    echo "[Certificate Expiration]"
    sudo kubeadm certs check-expiration
fi
echo ""

# etcd 상태 (control plane에서 실행 시)
echo "[etcd Health]"
kubectl -n kube-system exec etcd-master-node -- etcdctl   --endpoints=https://127.0.0.1:2379   --cacert=/etc/kubernetes/pki/etcd/ca.crt   --cert=/etc/kubernetes/pki/etcd/server.crt   --key=/etc/kubernetes/pki/etcd/server.key   endpoint health 2>/dev/null || echo "etcd check skipped"

활용 팁

네임스페이스 분리: 환경별(dev/staging/prod), 팀별로 네임스페이스를 나누고 RBAC로 접근 제어합니다.
GitOps 도입: ArgoCD나 Flux를 사용해 배포를 Git 기반으로 관리하면 추적성과 재현성이 향상됩니다.
모니터링 필수: Prometheus + Grafana를 설치하고, kube-state-metrics로 클러스터 상태를 시각화합니다.
로그 중앙화: EFK(Elasticsearch, Fluentd, Kibana) 스택이나 Loki로 모든 Pod 로그를 중앙 관리합니다.
백업 전략: Velero로 etcd와 PV를 정기적으로 백업하고, 복구 테스트를 주기적으로 수행합니다.
보안 스캔: Trivy나 Falco로 이미지 취약점과 런타임 위협을 감지합니다.

마무리

os

Show entries

Showing 21 to 30 of 58 entries

No	Title
2038	systemd 서비스 관리
2037	Linux 권한 관리 - chmod, chown 완벽 이해
2036	Linux 프로세스 관리 - ps, top, htop
2035	Shell Script 기초 - Bash 프로그래밍
2034	Linux 명령어 치트시트 - 자주 쓰는 100가지
1744	[ window ] ssd,하드디스크 복원,복구 불가하게 삭제하기
1425	[ centos ] ftp 자동전송 스크립트
1424	[ centos ] sshpass scp 로 다른포트 파일 전송
1423	[ centos , oracle ] 오라클 자동으로 백업하기
1422	[ centos ] CurlFtpFs 설치

Login Here

견적 및 기술문의

Blog

Contents

개요

핵심 개념

1.31 주요 신기능

1. AppArmor 기본 지원 (Stable)

2. PersistentVolume Last Phase Transition Time (Beta)

3. CRI Stats Provider (Beta)

실전 운영 팁

1. Resource Quotas와 LimitRange 적절히 설정

2. Pod Disruption Budget (PDB) 설정

3. HPA (Horizontal Pod Autoscaler) 고급 설정

4. 노드 선택과 Affinity 전략

5. 클러스터 헬스 체크 자동화

활용 팁

마무리

os

기술문서

견적 및 상담

정보 및 소통

Blog

Contents

개요

핵심 개념

1.31 주요 신기능

1. AppArmor 기본 지원 (Stable)

2. PersistentVolume Last Phase Transition Time (Beta)

3. CRI Stats Provider (Beta)

실전 운영 팁

1. Resource Quotas와 LimitRange 적절히 설정

2. Pod Disruption Budget (PDB) 설정

3. HPA (Horizontal Pod Autoscaler) 고급 설정

4. 노드 선택과 Affinity 전략

5. 클러스터 헬스 체크 자동화

활용 팁

마무리

os

기술문서

견적 및 상담

정보 및 소통