为什么使用Loki？Loki部署模式有哪些？

马哥Linux运维 2023-07-26 2769

描述

为什么使用 Loki

Loki 是一个轻量级的日志收集、分析的应用，采用的是 promtail 的方式来获取日志内容并送到 loki 里面进行存储，最终在 grafana 的 datasource 里面添加数据源进行日志的展示、查询。 loki 的持久化存储支持 azure、gcs、s3、swift、local 这 5 中类型，其中常用的是 s3、local。另外，它还支持很多种日志搜集类型，像最常用的 logstash、fluentbit 也在官方支持的列表中。

它有哪些优点：

支持的客户端，如 Promtail，Fluentbit，Fluentd，Vector，Logstash 和 Grafana Agent

首选代理 Promtail，可以多来源提取日志，包括本地日志文件，systemd，Windows 事件日志，Docker 日志记录驱动程序等

没有日志格式要求，包括 JSON，XML，CSV，logfmt，非结构化文本

使用与查询指标相同的语法查询日志

日志查询时允许动态筛选和转换日志行

可以轻松地计算日志中的需要的指标

引入时的最小索引意味着您可以在查询时动态地对日志进行切片和切块，以便在出现新问题时回答它们

云原生支持，使用 Prometheus 形式抓取数据

各日志收集组件简单对比：

JSON

Loki 的工作方式

JSON

从上面的图中我们可以看到，它在解析日志的时候是以 index 为主的，index 包括时间戳和 pod 的部分 label（其他 label 为 filename、containers 等），其余的是日志内容。具体查询效果如下：

JSON

{app=”loki”,namespace=”kube-public”} 为索引。

日志搜集架构

JSON

在使用过程中，官方推荐使用 promtail 做为 agent 以 DaemonSet 方式部署在 kubernetes 的 worker 节点上搜集日志。另外也可以用上面提到的其他日志收集工具来收取，这篇文章在结尾处会附上其他工具的配置方式。

Loki 部署模式有哪些

all（读写模式）

服务启动后，我们做的数据查询、数据写入都是来自这一个节点。请看下面的这个图解：

JSON

read/write（读写分离模式）

在读写分离模式下运行时 fronted-query 查询会将流量转发到 read 节点上。读节点上保留了 querier、ruler、fronted，写节点上保留了 distributor、ingester。

微服务模式运行

微服务模式运行下，通过不同的配置参数启动为不同的角色，每一个进程都引用它的目标角色服务。

JSON

大显身手之服务端部署

上面讲了那么多关于 loki 的介绍和它的工作模式，你一定期待它是怎么部署的吧？！该怎么部署、部署在哪里、部署后怎么使用等等问题都会出现在你的脑海里。在部署之前你需要准备好一个 k8s 集群才行哦。那好，接下来耐着性子往下看……

AllInOne 部署模式

k8s部署

我们从 github 上下载的程序是没有配置文件的，需要提前将文件准备一份。这里提供了一份完整的 allInOne 配置文件，部分内容进行了优化。 以下配置文件内容特别长：

auth_enabled: false
target: all
ballast_bytes: 20480
server:
  grpc_listen_port: 9095
  http_listen_port: 3100
  graceful_shutdown_timeout: 20s
  grpc_listen_address: "0.0.0.0"
  grpc_listen_network: "tcp"
  grpc_server_max_concurrent_streams: 100
  grpc_server_max_recv_msg_size: 4194304
  grpc_server_max_send_msg_size: 4194304
  http_server_idle_timeout: 2m
  http_listen_address: "0.0.0.0"
  http_listen_network: "tcp"
  http_server_read_timeout: 30s
  http_server_write_timeout: 20s
  log_source_ips_enabled: true
  ## http_path_prefix如果需要更改，在推送日志的时候前缀都需要加指定的内容
  ## http_path_prefix: "/"
  register_instrumentation: true
  log_format: json
  log_level: info
distributor:
  ring:
    heartbeat_timeout: 3s
    kvstore:
      prefix: collectors/
      store: memberlist
      ## 需要提前创建好consul集群
    ##   consul:
    ##     http_client_timeout: 20s
    ##     consistent_reads: true
    ##     host: 127.0.0.1:8500
    ##     watch_burst_size: 2
    ##     watch_rate_limit: 2
querier:
  engine:
    max_look_back_period: 20s 
    timeout: 3m0s 
  extra_query_delay: 100ms 
  max_concurrent: 10 
  multi_tenant_queries_enabled: true
  query_ingester_only: false
  query_ingesters_within: 3h0m0s
  query_store_only: false
  query_timeout: 5m0s
  tail_max_duration: 1h0s
query_scheduler:
  max_outstanding_requests_per_tenant: 2048
  grpc_client_config:
    max_recv_msg_size: 104857600
    max_send_msg_size: 16777216
    grpc_compression: gzip
    rate_limit: 0
    rate_limit_burst: 0
    backoff_on_ratelimits: false
    backoff_config:
      min_period: 50ms
      max_period: 15s
      max_retries: 5 
  use_scheduler_ring: true
  scheduler_ring:
    kvstore:
      store: memberlist
      prefix: "collectors/"
    heartbeat_period: 30s
    heartbeat_timeout: 1m0s
    ## 默认第一个网卡的名称
    ## instance_interface_names
    ## instance_addr: 127.0.0.1
    ## 默认server.grpc-listen-port
    instance_port: 9095
frontend:
  max_outstanding_per_tenant: 4096
  querier_forget_delay: 1h0s
  compress_responses: true
  log_queries_longer_than: 2m0s
  max_body_size: 104857600
  query_stats_enabled: true
  scheduler_dns_lookup_period: 10s 
  scheduler_worker_concurrency: 15
query_range:
  align_queries_with_step: true
  cache_results: true
  parallelise_shardable_queries: true
  max_retries: 3
  results_cache:
    cache:
      enable_fifocache: false
      default_validity: 30s 
      background:
        writeback_buffer: 10000
      redis:
        endpoint: 127.0.0.1:6379
        timeout: 1s
        expiration: 0s 
        db: 9
        pool_size: 128 
        password: 1521Qyx6^
        tls_enabled: false
        tls_insecure_skip_verify: true
        idle_timeout: 10s 
        max_connection_age: 8h
ruler:
  enable_api: true
  enable_sharding: true
  alertmanager_refresh_interval: 1m
  disable_rule_group_label: false
  evaluation_interval: 1m0s
  flush_period: 3m0s
  for_grace_period: 20m0s
  for_outage_tolerance: 1h0s
  notification_queue_capacity: 10000
  notification_timeout: 4s
  poll_interval: 10m0s
  query_stats_enabled: true
  remote_write:
    config_refresh_period: 10s
    enabled: false
  resend_delay: 2m0s
  rule_path: /rulers
  search_pending_for: 5m0s
  storage:
    local:
      directory: /data/loki/rulers
    type: configdb
  sharding_strategy: default
  wal_cleaner:
    period:  240h
    min_age: 12h0m0s
  wal:
    dir: /data/loki/ruler_wal
    max_age: 4h0m0s
    min_age: 5m0s
    truncate_frequency: 1h0m0s
  ring:
    kvstore:
      store: memberlist
      prefix: "collectors/"
    heartbeat_period: 5s
    heartbeat_timeout: 1m0s
    ## instance_addr: "127.0.0.1"
    ## instance_id: "miyamoto.en0"
    ## instance_interface_names: ["en0","lo0"]
    instance_port: 9500
    num_tokens: 100
ingester_client:
  pool_config:
    health_check_ingesters: false
    client_cleanup_period: 10s 
    remote_timeout: 3s
  remote_timeout: 5s 
ingester:
  autoforget_unhealthy: true
  chunk_encoding: gzip
  chunk_target_size: 1572864
  max_transfer_retries: 0
  sync_min_utilization: 3.5
  sync_period: 20s
  flush_check_period: 30s 
  flush_op_timeout: 10m0s
  chunk_retain_period: 1m30s
  chunk_block_size: 262144
  chunk_idle_period: 1h0s
  max_returned_stream_errors: 20
  concurrent_flushes: 3
  index_shards: 32
  max_chunk_age: 2h0m0s
  query_store_max_look_back_period: 3h30m30s
  wal:
    enabled: true
    dir: /data/loki/wal 
    flush_on_shutdown: true
    checkpoint_duration: 15m
    replay_memory_ceiling: 2GB
  lifecycler:
    ring:
      kvstore:
        store: memberlist
        prefix: "collectors/"
      heartbeat_timeout: 30s 
      replication_factor: 1
    num_tokens: 128
    heartbeat_period: 5s 
    join_after: 5s 
    observe_period: 1m0s
    ## interface_names: ["en0","lo0"]
    final_sleep: 10s 
    min_ready_duration: 15s
storage_config:
  boltdb:
    directory: /data/loki/boltdb 
  boltdb_shipper:
    active_index_directory: /data/loki/active_index
    build_per_tenant_index: true
    cache_location: /data/loki/cache 
    cache_ttl: 48h
    resync_interval: 5m
    query_ready_num_days: 5
    index_gateway_client:
      grpc_client_config:
  filesystem:
    directory: /data/loki/chunks
chunk_store_config:
  chunk_cache_config:
    enable_fifocache: true
    default_validity: 30s
    background:
      writeback_buffer: 10000
    redis:
      endpoint: 192.168.3.56:6379
      timeout: 1s
      expiration: 0s 
      db: 8 
      pool_size: 128 
      password: 1521Qyx6^
      tls_enabled: false
      tls_insecure_skip_verify: true
      idle_timeout: 10s 
      max_connection_age: 8h
    fifocache:
      ttl: 1h
      validity: 30m0s
      max_size_items: 2000
      max_size_bytes: 500MB
  write_dedupe_cache_config:
    enable_fifocache: true
    default_validity: 30s 
    background:
      writeback_buffer: 10000
    redis:
      endpoint: 127.0.0.1:6379
      timeout: 1s
      expiration: 0s 
      db: 7
      pool_size: 128 
      password: 1521Qyx6^
      tls_enabled: false
      tls_insecure_skip_verify: true
      idle_timeout: 10s 
      max_connection_age: 8h
    fifocache:
      ttl: 1h
      validity: 30m0s
      max_size_items: 2000
      max_size_bytes: 500MB
  cache_lookups_older_than: 10s 
## 压缩碎片索引
compactor:
  shared_store: filesystem
  shared_store_key_prefix: index/
  working_directory: /data/loki/compactor
  compaction_interval: 10m0s
  retention_enabled: true
  retention_delete_delay: 2h0m0s
  retention_delete_worker_count: 150
  delete_request_cancel_period: 24h0m0s
  max_compaction_parallelism: 2
  ## compactor_ring:
frontend_worker:
  match_max_concurrent: true
  parallelism: 10
  dns_lookup_duration: 5s 
## runtime_config 这里没有配置任何信息
## runtime_config:
common:
  storage:
    filesystem:
      chunks_directory: /data/loki/chunks
      fules_directory: /data/loki/rulers
  replication_factor: 3
  persist_tokens: false
  ## instance_interface_names: ["en0","eth0","ens33"]
analytics:
  reporting_enabled: false
limits_config:
  ingestion_rate_strategy: global
  ingestion_rate_mb: 100
  ingestion_burst_size_mb: 18
  max_label_name_length: 2096
  max_label_value_length: 2048
  max_label_names_per_series: 60
  enforce_metric_name: true
  max_entries_limit_per_query: 5000
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  creation_grace_period: 20m0s
  max_global_streams_per_user: 5000
  unordered_writes: true
  max_chunks_per_query: 200000
  max_query_length: 721h
  max_query_parallelism: 64 
  max_query_series: 700
  cardinality_limit: 100000
  max_streams_matchers_per_query: 1000 
  max_concurrent_tail_requests: 10 
  ruler_evaluation_delay_duration: 3s 
  ruler_max_rules_per_rule_group: 0
  ruler_max_rule_groups_per_tenant: 0
  retention_period: 700h
  per_tenant_override_period: 20s 
  max_cache_freshness_per_query: 2m0s
  max_queriers_per_tenant: 0
  per_stream_rate_limit: 6MB
  per_stream_rate_limit_burst: 50MB
  max_query_lookback: 0
  ruler_remote_write_disabled: false
  min_sharding_lookback: 0s
  split_queries_by_interval: 10m0s
  max_line_size: 30mb
  max_line_size_truncate: false
  max_streams_per_user: 0


## memberlist_conig模块配置gossip用于在分发服务器、摄取器和查询器之间发现和连接。
## 所有三个组件的配置都是唯一的，以确保单个共享环。
## 至少定义了1个join_members配置后，将自动为分发服务器、摄取器和ring 配置memberlist类型的kvstore
memberlist:
  randomize_node_name: true
  stream_timeout: 5s 
  retransmit_factor: 4
  join_members:
  - 'loki-memberlist'
  abort_if_cluster_join_fails: true
  advertise_addr: 0.0.0.0
  advertise_port: 7946
  bind_addr: ["0.0.0.0"]
  bind_port: 7946
  compression_enabled: true
  dead_node_reclaim_time: 30s
  gossip_interval: 100ms
  gossip_nodes: 3
  gossip_to_dead_nodes_time: 3
  ## join:
  leave_timeout: 15s
  left_ingesters_timeout: 3m0s 
  max_join_backoff: 1m0s
  max_join_retries: 5
  message_history_buffer_bytes: 4096
  min_join_backoff: 2s
  ## node_name: miyamoto
  packet_dial_timeout: 5s
  packet_write_timeout: 5s 
  pull_push_interval: 100ms
  rejoin_interval: 10s
  tls_enabled: false
  tls_insecure_skip_verify: true
schema_config:
  configs:
  - from: "2020-10-24"
    index:
      period: 24h
      prefix: index_
    object_store: filesystem
    schema: v11
    store: boltdb-shipper
    chunks:
      period: 168h
    row_shards: 32
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s
  throughput_updates_disabled: false
  poll_interval: 3m0s
  creation_grace_period: 20m
  index_tables_provisioning:
    provisioned_write_throughput: 1000
    provisioned_read_throughput: 500
    inactive_write_throughput: 4
    inactive_read_throughput: 300
    inactive_write_scale_lastn: 50 
    enable_inactive_throughput_on_demand_mode: true
    enable_ondemand_throughput_mode: true
    inactive_read_scale_lastn: 10 
    write_scale:
      enabled: true
      target: 80
      ## role_arn:
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
  chunk_tables_provisioning:
    enable_inactive_throughput_on_demand_mode: true
    enable_ondemand_throughput_mode: true
    provisioned_write_throughput: 1000
    provisioned_read_throughput: 300
    inactive_write_throughput: 1
    inactive_write_scale_lastn: 50
    inactive_read_throughput: 300
    inactive_read_scale_lastn: 10
    write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
tracing:
  enabled: true

注意 :

ingester.lifecycler.ring.replication_factor 的值在单实例的情况下为 1

ingester.lifecycler.min_ready_duration 的值为 15s，在启动后默认会显示 15 秒将状态变为 ready

JSON

memberlist.node_name 的值可以不用设置，默认是当前主机的名称

memberlist.join_members 是一个列表，在有多个实例的情况下需要添加各个节点的主机名 /IP 地址。在 k8s 里面可以设置成一个 service 绑定到 StatefulSets。

query_range.results_cache.cache.enable_fifocache 建议设置为 false，我这里设置成了 true

instance_interface_names 是一个列表，默认的为["en0","eth0"]，可以根据需要设置对应的网卡名称，一般不需要进行特殊设置。

创建 configmap

说明：将上面的内容写入到一个文件 loki-all.yaml，把它作为一个 configmap 写入 k8s 集群。可以使用如下命令创建：

$ kubectl create configmap --from-file ./loki-all.yaml loki-all

可以通过命令查看到已经创建好的 configmap，具体操作详见下图

JSON

创建持久化存储

在 k8s 里面我们的数据是需要进行持久化的。Loki 收集起来的日志信息对于业务来说是至关重要的，因此需要在容器重启的时候日志能够保留下来。那么就需要用到 pv、pvc，后端存储可以使用 nfs、glusterfs、hostPath、azureDisk、cephfs 等 20 种支持类型，这里因为没有对应的环境就采用了 hostPath 方式。

以下配置文件也特别长：

apiVersion: v1
kind: PersistentVolume
metadata:
  name: loki
  namespace: default
spec:
  hostPath:
    path: /glusterfs/loki
    type: DirectoryOrCreate
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: loki
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  volumeName: loki
创建应用
准备好 k8s 的 StatefulSet 部署文件后就可以直接在集群里面创建应用了。


apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: loki
  name: loki
  namespace: default
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      annotations:
        prometheus.io/port: http-metrics
        prometheus.io/scrape: "true"
      labels:
        app: loki
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/loki-all.yaml
        image: grafana/loki:2.5.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: http-metrics
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: loki
        ports:
        - containerPort: 3100
          name: http-metrics
          protocol: TCP
        - containerPort: 9095
          name: grpc
          protocol: TCP
        - containerPort: 7946
          name: memberlist-port
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: http-metrics
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 500m
            memory: 500Mi
          limits:
            cpu: 500m
            memory: 500Mi
        securityContext:
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /etc/loki
          name: config
        - mountPath: /data
          name: storage
      restartPolicy: Always
      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001
      serviceAccount: loki
      serviceAccountName: loki
      volumes:
      - emptyDir: {}
        name: tmp
      - name: config
        configMap:
          name: loki
      - persistentVolumeClaim:
          claimName: loki
        name: storage
---
kind: Service
apiVersion: v1
metadata:
  name: loki-memberlist
  namespace: default
spec:
  ports:
    - name: loki-memberlist
      protocol: TCP
      port: 7946
      targetPort: 7946
  selector:
    kubepi.org/name: loki
---
kind: Service
apiVersion: v1
metadata:
  name: loki
  namespace: default
spec:
  ports:
    - name: loki
      protocol: TCP
      port: 3100
      targetPort: 3100
  selector:
    kubepi.org/name: loki

验证部署结果

JSON

当看到上面的 Running 状态时可以通过 API 的方式看一下分发器是不是正常工作，当显示 Active 时正常才会正常分发日志流到收集器（ingester)。

JSON

限于篇幅，更多部署方式，就不介绍了。

链接：https://juejin.cn/post/7150469420605767717

审核编辑：刘清

打开APP阅读更多精彩内容