Kubernetes资源限制怎么配置

马哥Linux运维 2026-05-12 470

描述

背景与适用场景

在 Kubernetes 中，资源限制（Resource Limits）是 Pod 调度的核心依据，也是保障集群稳定性的关键配置。很多初学者接触 Kubernetes 时，最常踩的坑就是资源限制配错了——要么配太大导致资源浪费和调度不均，要么配太小导致 OOMKill 或 CPU Throttling，严重影响业务可用性。

本篇聚焦于 CPU 和内存的 Request 和 Limit 这两个核心配置项，解释它们在 Kubernetes 调度、QoS、OOMKill 机制中的具体作用，以及在实际业务中如何合理配置。

适用场景：

部署新应用到 Kubernetes 集群，不确定 resource limits 怎么配

应用频繁出现 OOMKilled 或 CPU Throttling，需要排查根因

需要优化集群资源利用率，减少资源浪费

理解 Kubernetes QoS 等级对 Pod 调度和驱逐优先级的影响

核心概念：Request vs Limit

在 Kubernetes 中，每个容器都可以设置 CPU 和内存的 Request 和 Limit：

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    image: nginx:1.24
    resources:
      requests:
        memory: "256Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"

Request：请求的资源量

Request 是容器最少需要的资源量。调度器（kube-scheduler）在决定把 Pod 调度到哪台 Node 时，用的是 Request 值而非实际使用量。调度器会检查 Node 上的"已分配 Request 总量 + 新 Pod 的 Request"是否超过 Node 的实际容量。

举例：

Node 总内存: 4GiB
Node 上已调度 Pod 的 memory requests 总和: 3GiB
新 Pod 的 memory request: 512Mi

调度检查: 3GiB + 512Mi = 3.512GiB <= 4GiB  -> 可以调度

重要：Request 只影响调度，不影响实际资源使用。如果容器实际使用超过 Request，调度器仍然允许它运行，只是 Node 上的"资源使用"会超过"资源请求"。

Limit：资源的上限

Limit 是容器可以使用的资源上限。当容器尝试使用超过 Limit 的资源时：

内存 Limit：容器会被 OOMKill（Out of Memory Kill），进程被强制终止

CPU Limit：容器无法获取超过 Limit 的 CPU 时间片，CPU 使用被节流（Throttling）

Request 和 Limit 的关系

Request <= Limit（可以相等）

CPU:  可以不相等（通常 Request = Limit，或者 Limit 是 Request 的整数倍）
Memory: 强烈建议 Request = Limit（混用会导致奇怪的 OOMKill 行为，见后文）

CPU 的 Request 和 Limit

Kubernetes CPU 计量单位

CPU 的计量单位是 millicores（毫核），简称 m。

1 CPU = 1000m
500m = 0.5 CPU
250m = 0.25 CPU
100m = 0.1 CPU

也可以直接写小数：0.5、1 等，Kubernetes 会自动转换为 m。

CPU 是可压缩资源

CPU 属于"可压缩资源"（compressible resources）。当 Node 上的 CPU 使用紧张时，Kubernetes 可以通过降低容器的 CPU 时间片来"挤出"更多资源，不会杀掉容器。容器只是变慢，但不会停。

但 CPU 有一个机制叫 CPU Throttling：如果容器达到了 CPU Limit，Kubernetes 会限制容器最多只能使用这么多 CPU，导致容器实际获得的 CPU 时间少于它应得的。这会让容器性能下降。

CPU Throttling 详解

CPU Throttling 是一个容易被忽视的性能问题。假设一个容器的 CPU Limit 是 500m（0.5 CPU）：

Linux CFS（Completely Fair Scheduler）默认以 100ms 为周期分配 CPU 时间片

如果 Limit 是 500m，意味着这个容器在每个 100ms 周期内最多获得 50ms 的 CPU 时间

超过 50ms 的 CPU 使用请求会被推迟到下一个周期

这意味着：即使容器"名义上"有 500m 的 CPU，但实际可能因为 Throttling 导致响应延迟增加。对于延迟敏感型应用（如 Java 服务），CPU Throttling 可能导致性能严重下降。

减少 CPU Throttling 的方法：

适当提高 CPU Limit（如果内存也够用）

使用 Burstable QoS（Request < Limit），让容器在突发时有更多 CPU 可用

调整 CFS 调度周期（--cpu-cfs-quota 和 --cpu-cfs-period），但这需要修改 kubelet 配置

# 查看当前 kubelet 的 CPU CFS 配置
ps aux | grep kubelet | grep cpu

内存的 Request 和 Limit

内存是可压缩资源吗？

不是。内存是"不可压缩资源"（non-compressible resources）。一旦容器申请了内存，Kubernetes 无法像压缩 CPU 那样"回收"内存。如果容器试图使用超过 Limit 的内存，会触发 OOMKill——容器进程被强制杀死。

OOMKill 的机制

当 Linux 系统内存不足时，OOM Killer 会根据进程的 oom_score（由 oom_score_adj 和内存使用量计算）选择一个进程杀掉。在 Kubernetes 环境中，kubelet 会为每个容器设置 oom_score_adj，具体值由 Pod 的 QoS 等级决定：

QoS 等级	oom_score_adj
Guaranteed	-997
Burstable	min(max(2, 1000 - (1000 * memoryRequest / nodeMemory)), 999)
BestEffort	1000

数值越高，越容易被 OOMKill。Guaranteed 的 Pod 的进程 oom_score_adj 最低（-997），最难被杀；BestEffort 最高（1000），最先被杀。

Request = Limit 对内存的重要性

强烈建议将内存的 Request 和 Limit 设置为相同的值。原因如下：

当 Request < Limit 时，Pod 处于 Burstable QoS 等级。此时如果 Node 内存不足，OOM Killer 会综合考虑：

实际内存使用量

Request 值（而非 Limit 值）计算出的 oom_score

这会导致一个反直觉的现象：实际使用内存很少、但 Request 设置得很高的 Pod，反而更容易被 OOMKill。

例如：

# Pod A
resources:
  requests:
    memory: "2Gi"
  limits:
    memory: "4Gi"  # Request < Limit，Burstable

# Pod B
resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "512Mi"  # Request = Limit，Guaranteed

假设两台 Pod 实际都只用了 512Mi，Node 内存紧张时，Pod A（Burstable）的 oom_score 可能更高，更容易被 OOMKill——即使它的 Request 是 2Gi，它根本没用那么多。

所以，对于内存，最好让 Request = Limit，使 Pod 处于 Guaranteed QoS，这样 oom_score_adj 固定为 -997，最不容易被 OOMKill。

QoS 等级详解

Kubernetes 为每个 Pod 自动分配一个 QoS（Quality of Service）等级：

QoS 等级	条件	OOM 优先级	调度优先级
Guaranteed	所有容器都设置了 CPU 和内存的 Request = Limit	最低（最难杀）	最高
Burstable	不满足 Guaranteed，但至少有一个容器设置了 Request	中等	中等
BestEffort	没有任何容器设置 Request 和 Limit	最高（最先杀）	最低

Guaranteed 优先级的实际意义

Guaranteed Pod 的调度优先级最高，OOM 时最难被杀掉。但这不意味着 Guaranteed Pod 不会被驱逐（Eviction）。节点压力过大时，kubelet 仍然会根据 Pod 的 eviction thresholds 驱逐 Pod，但 Guaranteed Pod 是最后被驱逐的。

如何查看 Pod 的 QoS 等级

kubectl get pod  -o jsonpath='{.status.qosClass}'
kubectl describe pod  | grep -E "QoS|Memory|Limit|Request"

不同 QoS 下的 OOMKill 示例

举一个实际场景帮助理解：

Node 有 4Gi 内存，已分配给各个 Pod 的 memory requests 总和为 3.8Gi

Pod A (Guaranteed):
  requests: memory=1Gi, limits: memory=1Gi
  实际使用: 800Mi

Pod B (Burstable):
  requests: memory=512Mi, limits: memory=2Gi
  实际使用: 1.5Gi

Pod C (BestEffort):
  没有设置任何 resources
  实际使用: 200Mi

当 Node 内存真的不够用了（3.8Gi requests + 实际使用 > 4Gi），
OOM Killer 会优先杀掉 Pod C，其次是 Pod B，Pod A 最安全。
但注意：Pod B 虽然请求了 512Mi 但用了 1.5Gi，如果它用了超过 2Gi 就会直接 OOMKill。

调度机制详解

调度器如何用 Request 做决策

调度器调度 Pod 时，按以下步骤选择 Node：

过滤（Filtering）：遍历所有 Node，找出满足 Pod 所有容器 Resource Requests 的 Node（不考虑 Limit）

打分（Scoring）：对过滤通过的 Node 打分，选择分数最高的

绑定（Binding）：将 Pod 绑定到选中的 Node

这意味着：Limit 不影响调度，只影响实际运行时行为。

资源超售（Overcommit）

由于 Request < Limit 是常见做法（特别是 Burstable QoS），Node 上的"已分配 Requests 总和"通常会超过 Node 的实际容量。这叫"资源超售"。

例如：

Node: 4 CPU cores

Pod A: requests.cpu=1, limits.cpu=2
Pod B: requests.cpu=1, limits.cpu=2
Pod C: requests.cpu=1, limits.cpu=2

已分配 requests.cpu = 3（不超过 4，调度器认为 OK）
但如果三个 Pod 都跑满 CPU（各用 2），实际需要 6 CPU

资源超售本身不是问题，问题是当超售的 Pod 全部需要用到 Limit 时，Node 会过载。此时 CPU Throttling 会发生，内存压力时 OOMKill 也会发生。

常用场景的配置建议

场景一：Web 服务（Nginx、Apache）

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: nginx
        image: nginx:1.24
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"    # Nginx 本身内存占用很小
            cpu: "50m"        # 基本只做转发
          limits:
            memory: "128Mi"   # 给点余量
            cpu: "200m"

场景二：Java 应用（Spring Boot、Tomcat）

Java 应用的特点是启动时需要较多内存（加载类、JIT 编译），稳定运行时内存相对稳定，但峰值时可能需要更多。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: java-app
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: java-app
        image: my-java-app:1.0.0
        env:
        - name: JAVA_OPTS
          value: "-Xmx512m -Xms256m"  # 明确 JVM 堆内存，和 k8s limit 对应
        resources:
          # Request = Limit，Guaranteed QoS
          requests:
            memory: "768Mi"    # 覆盖 JVM heap + overhead（类加载、native 等）
            cpu: "500m"
          limits:
            memory: "768Mi"
            cpu: "1000m"        # CPU 可以适当高于 memory

JVM 和容器内存配合：JVM 的 -Xmx 应该等于或略小于容器的 memory limit。建议 -Xmx 设置为容器 limit 的 75-80%，留出给 native 内存、direct buffer、mmap 等非堆内存使用。

场景三：Go 应用

Go 运行时管理自己的内存（GC），对容器内存限制配合较好，但也要注意：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-app
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: go-app
        image: my-go-app:1.0.0
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            # Go 的 GOGC=100 会在内存达到 limit 时触发 GC
            # 建议 Go 应用的内存 limit 足够，避免频繁 GC
            memory: "512Mi"
            cpu: "500m"
        env:
        # 告诉 Go 运行时可以使用最多多少内存
        - name: GOMEMLIMIT
          value: "400MiB"  # GOMEMLIMIT 是 Go 1.19+ 的特性

场景四：数据库（MySQL、PostgreSQL）

数据库通常需要稳定、专用的资源，强烈建议 Guaranteed QoS：

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 1
  template:
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            # 数据库强烈建议 Request = Limit
            memory: "2Gi"
            cpu: "2000m"
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql

场景五：Redis

Redis 是内存密集型，对内存 Limit 非常敏感：

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command: ["redis-server", "--maxmemory", "1536mb", "--maxmemory-policy", "allkeys-lru"]
        resources:
          requests:
            memory: "1536Mi"  # 和 redis --maxmemory 对应
            cpu: "500m"
          limits:
            # maxmemory 设为容器 limit 的 90-95%，留余量给 Redis 自身 overhead
            memory: "1536Mi"
            cpu: "1000m"

Redis 和容器内存 limit 的关系：如果容器 limit < Redis maxmemory，Redis 可能不知道自己实际能用多少内存；如果容器 limit > Redis maxmemory，则浪费了容器层的内存隔离。建议两者对齐。

监控和排查命令

查看 Pod 资源使用情况

# 查看所有 Pod 的资源使用（需要 metrics-server）
kubectl top pods --all-namespaces

# 查看特定 Pod 的资源使用
kubectl top pod  -n 

# 查看特定 Pod 内所有容器的资源使用
kubectl top pod  -n  --containers

查看 Node 资源分配情况

# 查看 Node 的容量和已分配资源
kubectl describe node  | grep -A 10 "Allocated resources"

# 查看所有 Node 的资源概况
kubectl describe nodes | grep -A 5 "Resource"

# 更详细的 Node 资源视图
kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, allocatable: .status.allocatable, capacity: .status.capacity}'

检查 Pod 的 QoS 等级

kubectl get pods -o custom-columns=NAME:.metadata.name,QOS:.status.qosClass,CPU_REQ:.spec.containers[0].resources.requests.cpu,MEM_REQ:.spec.containers[0].resources.requests.memory

检查 OOMKill 事件

# 查看 Pod 的 OOMKill 事件
kubectl describe pod  -n  | grep -E "Last State|Exit Code|OOMKilled"

# 查看 Node 级别的事件
kubectl get events -n  --field-selector involvedObject.name= | grep -E "OOM|Kill"

# 查看 Node 的 OOMKiller 日志（需要登录 Node）
ssh  "dmesg | grep -i 'killed process'"

排查 CPU Throttling

# 查看容器 CPU Throttling 统计（需要登录 Node）
cat /sys/fs/cgroup/cpu/kubepods/burstable//cpu.stat

# 查看某个容器的 CPU throttling
cat /sys/fs/cgroup/cpu/kubepods.slice//cpu.stat

# 关键指标：nr_throttled（被节流的周期数）、throttled_time（被节流的总时间 ns）

常见问题与修复

问题一：Pod 一直处于 Pending 状态

现象：kubectl get pods 显示 Pod 一直处于 Pending

原因：没有任何 Node 的可用资源能满足 Pod 的 Requests

排查：

kubectl describe pod  | grep -A 10 "Events:"
# 通常会看到 "insufficient memory" 或 "insufficient cpu"

# 检查 Node 资源状况
kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated resources"

解决：降低 Pod 的 Requests，或者扩容 Node，或者清理一些低优先级的 Pod。

问题二：Pod 不断被 OOMKilled

现象：Pod 启动后一段时间就被杀掉，kubectl get pods 显示 OOMKilled

排查：

kubectl describe pod  | grep "Last State"
# 如果 Last State.terminated.reason 是 OOMKilled，说明是被 Linux OOM Killer 杀掉的

# 检查实际内存使用
kubectl top pod 

# 检查 Pod 的 limits 配置
kubectl get pod  -o jsonpath='{.spec.containers[0].resources}'

解决：

增加 memory limit（如果是正常业务增长）

检查应用是否有内存泄漏

对于 Java 应用，确保 JVM -Xmx 和容器 limit 匹配

考虑是否应该将 Request 也相应提高（但要注意 Request = Limit 的建议）

问题三：应用响应慢，但 CPU 和内存都没超 limit

可能原因：CPU Throttling

排查：

# 在 Node 上查看 Pod 的 CPU throttling
cat /sys/fs/cgroup/cpu/kubepods/burstable//cpu.stat

# 关键看 nr_throttled 和 throttled_time
# 如果 throttled_time 很大，说明 CPU 被严重限制了

解决：适当提高 CPU limit，或者将 CPU Request 和 Limit 设为相同值（Guaranteed QoS）。

问题四：内存使用远低于 Request，但 OOMKilled 了

原因：配置了 Request < Limit，Pod 处于 Burstable QoS，OOM Killer 的打分机制导致这个 Pod 反而更容易被选中杀掉

解决：将内存的 Request 和 Limit 设为相同值，使 Pod 进入 Guaranteed QoS。

问题五：多容器 Pod 的资源如何计算

一个 Pod 可能有多个容器（sidecar 模式等）：

spec:
  containers:
  - name: main-app
    resources:
      requests:
        memory: "512Mi"
        cpu: "250m"
      limits:
        memory: "1Gi"
        cpu: "500m"
  - name: sidecar
    resources:
      requests:
        memory: "64Mi"
        cpu: "50m"
      limits:
        memory: "128Mi"
        cpu: "100m"

Pod 的总 Request = 所有容器 Request 之和（512Mi + 64Mi = 576Mi） Pod 的总 Limit = 所有容器 Limit 之和（1Gi + 128Mi = 1.128Gi）

调度时按总 Request 分配，OOMKill 时按单个容器 Limit 判断。

LimitRange：给 Namespace 设置默认资源限制

可以在 Namespace 级别设置 LimitRange，为没有设置资源限制的 Pod 自动加上默认值：

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: my-app
spec:
  limits:
  # 容器级别的默认限制
  - type: Container
    # 默认 requests
    default:
      cpu: "100m"
      memory: "128Mi"
    # 默认 limits
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    # 最小允许的 requests
    min:
      cpu: "10m"
      memory: "16Mi"
    # 最大允许的 limits
    max:
      cpu: "4"
      memory: "8Gi"
    # 最大 limit / 最小 request 的比例（防止 limit 远大于 request）
    maxLimitRequestRatio:
      cpu: 10
      memory: 10

这样，部署到 my-app namespace 且没有设置 resources 的 Pod，会自动获得 100m CPU 和 128Mi 内存的默认值。

ResourceQuota：限制 Namespace 的总资源

除了 LimitRange 限制单个 Pod，还可以设置 ResourceQuota 限制整个 Namespace 的总资源使用量：

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    # 整个 namespace 的 requests 总和
    requests.cpu: "10"
    requests.memory: "20Gi"
    # 整个 namespace 的 limits 总和
    limits.cpu: "20"
    limits.memory: "40Gi"
    # Pod 数量限制
    pods: "50"

当 namespace 的资源使用达到 quota 后，新的 Pod 将无法创建。

风险提醒

不要不设置资源限制：没有 resource limits 的 Pod 处于 BestEffort QoS，在 Node 资源紧张时会被最先驱逐和 OOMKill。

内存 limit 不要过大：给一个实际只需要 256Mi 的应用设置 4Gi 内存 limit，会导致调度器认为这台 Node 已经被分配了很多内存，但实际上那些内存根本没被用。这会造成调度不均和资源浪费。

CPU limit 过小会导致 Throttling：对于延迟敏感型应用，不要把 CPU limit 设置得太紧。建议先设 Request = Limit，再根据实际监控数据调整。

混用 Request < Limit 要谨慎：内存的 Request < Limit 会导致 OOMKill 优先级异常。CPU 的 Request < Limit 是可以的（Burstable 场景），但要注意 CPU Throttling 对延迟敏感应用的影响。

生产环境建议：用 kubectl top pods 和监控工具（Prometheus + Grafana）建立资源使用基线，定期review Pod 的资源实际使用量，调整 Request/Limit 使其更准确。

总结

Kubernetes 资源限制配置的核心要点：

配置项	CPU	内存
Request 用途	调度依据	调度依据
Limit 用途	CPU 时间片上限（超限 Throttling）	内存上限（超限 OOMKill）
建议	Request = Limit 或 Request < Limit（可压缩）	强烈建议 Request = Limit
QoS 影响	Request = Limit → Guaranteed	Request = Limit → Guaranteed

QoS 等级与 OOMKill 优先级的对应关系：

Guaranteed (最难杀) → Burstable → BestEffort (最先杀)

排查资源相关问题的顺序：

1. kubectl top pod 查看实际使用量
2. 检查 Pod 的 requests/limits 配置是否合理
3. 检查 Pod 的 QoS 等级（Guaranteed/Burstable/BestEffort）
4. 登录 Node 查看 cgroup 统计或 dmesg 看 OOM 事件
5. 调整 requests/limits 或扩容

资源限制配置没有一劳永逸的标准值，需要结合业务特点（计算密集型 vs 内存密集型 vs IO 密集型）和实际监控数据来持续调整。监控数据是优化资源限制的最终依据。

打开APP阅读更多精彩内容