Skip to content

Kubernetes 高级特性面试题

Kubernetes 高级特性涵盖自动扩缩容、有状态应用管理、安全控制等企业级应用的核心能力。

🔥 自动扩缩容面试题

1. 水平 Pod 自动扩缩容 (HPA)

问题:如何配置基于多指标的 HPA?与 VPA 的区别和适用场景是什么?

参考答案

yaml
# 基于多指标的 HPA 配置
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: multi-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  
  # 扩容策略
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 60
      selectPolicy: Max
    
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      selectPolicy: Min
  
  metrics:
  # CPU 指标
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  
  # 内存指标
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  
  # 自定义指标 (Prometheus)
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1k"
  
  # 外部指标 (SQS 队列长度)
  - type: External
    external:
      metric:
        name: sqs_messages_visible
        selector:
          matchLabels:
            queue: "task-queue"
      target:
        type: AverageValue
        averageValue: "30"

---
# 垂直 Pod 自动扩缩容 (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"  # Off, Initial, Auto
  resourcePolicy:
    containerPolicies:
    - containerName: web
      maxAllowed:
        cpu: 2
        memory: 4Gi
      minAllowed:
        cpu: 100m
        memory: 128Mi
      controlledResources: ["cpu", "memory"]

HPA vs VPA 对比

  • HPA:水平扩展,增加 Pod 数量,适合无状态应用
  • VPA:垂直扩展,调整资源配置,适合有状态应用
  • 组合使用:通常 HPA 处理短期负载,VPA 处理长期资源需求

2. 集群自动扩缩容 (CA)

问题:如何配置集群自动扩缩容?节点池管理策略是什么?

参考答案

yaml
# 集群自动扩缩容配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --balance-similar-node-groups
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --scale-down-utilization-threshold=0.5

---
# 节点池配置 (AWS EKS)
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: production-cluster
  region: us-west-2

nodeGroups:
- name: general-purpose
  instanceType: m5.large
  minSize: 3
  maxSize: 10
  desiredCapacity: 5
  tags:
    k8s.io/cluster-autoscaler/enabled: "true"
    k8s.io/cluster-autoscaler/production-cluster: "owned"

- name: compute-optimized
  instanceType: c5.xlarge
  minSize: 0
  maxSize: 20
  desiredCapacity: 0
  taints:
  - key: workload-type
    value: compute-intensive
    effect: NoSchedule

💡 有状态应用管理

3. StatefulSet 和有状态应用

问题:StatefulSet 与 Deployment 的区别?如何部署 MySQL 主从集群?

参考答案

yaml
# MySQL 主从 StatefulSet
apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - port: 3306
    name: mysql
  clusterIP: None  # Headless Service
  selector:
    app: mysql

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
      - name: init-mysql
        image: mysql:8.0
        command:
        - bash
        - "-c"
        - |
          set -ex
          # 根据序号生成服务器ID
          [[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          echo [mysqld] > /mnt/conf.d/server-id.cnf
          echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
          
          # 配置主从复制
          if [[ $ordinal -eq 0 ]]; then
            cp /mnt/config-map/master.cnf /mnt/conf.d/
          else
            cp /mnt/config-map/slave.cnf /mnt/conf.d/
          fi
        volumeMounts:
        - name: conf
          mountPath: /mnt/conf.d
        - name: config-map
          mountPath: /mnt/config-map

      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        ports:
        - name: mysql
          containerPort: 3306
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 1
            memory: 2Gi
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping"]
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
          initialDelaySeconds: 5
          periodSeconds: 2
          timeoutSeconds: 1

      volumes:
      - name: conf
        emptyDir: {}
      - name: config-map
        configMap:
          name: mysql

  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 20Gi

---
# MySQL 配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql
data:
  master.cnf: |
    [mysqld]
    log-bin=mysql-bin
    binlog-format=ROW
  slave.cnf: |
    [mysqld]
    super-read-only
    log-slave-updates
    read-only

4. DaemonSet 和节点级服务

问题:DaemonSet 的使用场景?如何部署 Fluentd 日志收集器?

参考答案

yaml
# Fluentd DaemonSet 配置
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      tolerations:
      # 允许在 Master 节点运行
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      
      serviceAccountName: fluentd
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.14-debian-elasticsearch7
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch.logging.svc.cluster.local"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        - name: FLUENT_ELASTICSEARCH_SCHEME
          value: "http"
        - name: FLUENT_UID
          value: "0"
        
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluentd-config
          mountPath: /fluentd/etc
      
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluentd-config
        configMap:
          name: fluentd-config

      # 节点选择器
      nodeSelector:
        kubernetes.io/os: linux
      
      # 优先级
      priorityClassName: system-node-critical

🔐 安全和权限管理

5. RBAC 权限控制系统

问题:如何设计完整的 RBAC 权限体系?ServiceAccount 的作用是什么?

参考答案

yaml
# 命名空间
apiVersion: v1
kind: Namespace
metadata:
  name: development

---
# ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dev-user
  namespace: development

---
# Role 定义
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: development
  name: pod-manager
rules:
# Pod 管理权限
- apiGroups: [""]
  resources: ["pods", "pods/log", "pods/status"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# Service 管理权限
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
# ConfigMap 管理权限
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["get", "list", "watch", "create", "update"]
# Deployment 管理权限
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

---
# ClusterRole (跨命名空间权限)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-reader
rules:
- apiGroups: [""]
  resources: ["nodes", "nodes/status"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["metrics.k8s.io"]
  resources: ["nodes", "pods"]
  verbs: ["get", "list"]

---
# RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dev-pod-manager
  namespace: development
subjects:
- kind: ServiceAccount
  name: dev-user
  namespace: development
- kind: User
  name: jane.doe@company.com
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: dev-team
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-manager
  apiGroup: rbac.authorization.k8s.io

---
# ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-readers
subjects:
- kind: ServiceAccount
  name: dev-user
  namespace: development
roleRef:
  kind: ClusterRole
  name: node-reader
  apiGroup: rbac.authorization.k8s.io

---
# Pod 使用 ServiceAccount
apiVersion: v1
kind: Pod
metadata:
  name: app-pod
  namespace: development
spec:
  serviceAccountName: dev-user
  containers:
  - name: app
    image: myapp:latest
    # 应用可以使用挂载的 token 访问 API
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-token
      readOnly: true

6. Pod Security Standards

问题:如何实施 Pod 安全策略?新的 Pod Security Standards 与旧的 PSP 有什么区别?

参考答案

yaml
# 命名空间级别的 Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
  name: secure-namespace
  labels:
    # 强制执行 restricted 安全标准
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.24
    
    # 审计模式
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.24
    
    # 警告模式
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.24

---
# 符合 Restricted 标准的 Pod
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
  namespace: secure-namespace
spec:
  # 必须使用非 root 用户
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    runAsGroup: 1001
    fsGroup: 1001
    seccompProfile:
      type: RuntimeDefault
  
  containers:
  - name: app
    image: nginx:1.21-alpine
    
    # 容器安全上下文
    securityContext:
      runAsNonRoot: true
      runAsUser: 1001
      runAsGroup: 1001
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      seccompProfile:
        type: RuntimeDefault
    
    # 资源限制是必需的
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 512Mi
    
    # 只读根文件系统需要临时存储
    volumeMounts:
    - name: tmp-volume
      mountPath: /tmp
    - name: cache-volume
      mountPath: /var/cache/nginx
    - name: run-volume
      mountPath: /var/run
  
  volumes:
  - name: tmp-volume
    emptyDir: {}
  - name: cache-volume
    emptyDir: {}
  - name: run-volume
    emptyDir: {}

---
# NetworkPolicy 网络安全
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: secure-namespace
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  
  # 允许特定入站流量
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: frontend
    - podSelector:
        matchLabels:
          app: api-gateway
    ports:
    - protocol: TCP
      port: 8080
  
  # 允许特定出站流量
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: database
    ports:
    - protocol: TCP
      port: 3306
  # 允许 DNS 查询
  - to: []
    ports:
    - protocol: UDP
      port: 53

🚀 高级调度和节点管理

7. 高级调度策略

问题:Pod 亲和性、反亲和性和污点容忍的使用场景?如何实现多可用区部署?

参考答案

yaml
# 多可用区高可用部署
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      # Pod 反亲和性 - 确保不在同一节点
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web
            topologyKey: "kubernetes.io/hostname"
          
          # 首选不在同一可用区
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAntiAffinity:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - web
              topologyKey: "topology.kubernetes.io/zone"
        
        # 节点亲和性 - 确保在适当的节点上
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-type
                operator: In
                values:
                - compute
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - us-west-2a
                - us-west-2b
                - us-west-2c
          
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 50
            preference:
              matchExpressions:
              - key: instance-type
                operator: In
                values:
                - m5.large
                - m5.xlarge

      # 容忍污点
      tolerations:
      - key: "dedicated"
        operator: "Equal"
        value: "web-tier"
        effect: "NoSchedule"
      - key: "spot-instance"
        operator: "Exists"
        effect: "NoSchedule"
      
      # 拓扑分布约束
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: web
      - maxSkew: 2
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: web
      
      containers:
      - name: web
        image: nginx:1.21
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

---
# 节点污点管理
apiVersion: v1
kind: Node
metadata:
  name: compute-node-1
  labels:
    node-type: compute
    instance-type: m5.large
    topology.kubernetes.io/zone: us-west-2a
spec:
  taints:
  - key: dedicated
    value: web-tier
    effect: NoSchedule
  - key: gpu-workload
    effect: NoExecute

这些高级特性面试题涵盖了 Kubernetes 在企业环境中的关键功能,展示了对复杂容器编排场景的深入理解。

正在精进