一、EFK vs LPG

架构和组件:

  • Loki:Loki 是一个开源的水平可扩展日志聚合系统,由 Promtail、Loki 和 Grafana 组成。
  • EFK:EFK 是一个集成的解决方案,由 Elasticsearch、Fluentd 和 Kibana 组成。

存储和查询:

  • Loki:Loki 使用了基于日志流的存储方式,将日志数据存储为可压缩的块文件,并 达到高度压缩效率。
  • EFK:EFK 使用 Elasticsearch 作为中心化的日志存储和索引引擎。

可扩展性和资源消耗:

  • Loki:Loki 的水平可扩展性非常好,可以处理大规模的日志数据。
  • EFK:Elasticsearch 是一个高度可扩展的分布式存储系统,但它对硬件资源的要求较高,特别是在存储大规模日志数据时。

配置和部署复杂性:

  • Loki:Loki 的配置和部署较为简单。通过使用 Promtail 收集日志,并使用 Grafana 进行查询和可视化,可以相对快速地启动和使用。
  • EFK:EFK 的配置和部署相对复杂一些。需要配置 Fluentd 的输入、过滤和输出插件,以及 Elasticsearch 和 Kibana 的集群设置。

二、LPG简介

Grafana Loki:https://grafana.com/docs/loki/latest/

Github Loki:https://github.com/grafana/helm-charts/tree/main/charts/loki-stack

2.1 Loki架构

  1. Promtail(采集器):Loki 默认客户端,负责采集并上报日志。
  2. Distributor(分发器): Distributor 是 Loki 的入口组件,负责接收来自客户端的 日志数据,并将其分发给不同的 ingester 节点。
  3. Ingester(摄取器): Ingester 负责接收并保存来自 Distributor 的日志数据。它将 数据写入本地存储,并将索引相关的元数据发送给 index 组件。
  4. Index(索引): Index 组件负责管理和维护 Loki 中的索引数据结构。
  5. Chunks(块文件): Chunks 是 Loki 中日志数据的物理存储形式。
  6. Querier(查询器): Querier 是用于查询 Loki 中日志数据的组件。

Day09-可观察性-ELK&Loki-图22

2.2 日志收集方式

Promtail 客户端采集日志数据,将其索引并存储在后端持久化存储中。

用户可以使用 LogQL 查询语言来过滤和检索特定的日志记录,并通过 Grafana 的集成 来进行可视化分析。

Day09-可观察性-ELK&Loki-图23

三、部署配置

3.1 数据配置

添加 Loki 的 Chart 仓库:

[root@master01 9]# helm repo add grafana https://grafana.github.io/helm-charts
[root@master01 9]# helm repo update

获取 loki-stack 的 Chart 包并解压:

[root@master01 9]# helm search repo loki
[root@master01 9]# helm pull grafana/loki-stack --untar --version 2.9.10

修改所需的 values.yaml

# 修改内容
[root@master01 9]# cd loki-stack/
[root@master01 loki-stack]# vim values.yaml 
test_pod:
  enabled: true
  image: bats/bats:1.8.2
  pullPolicy: IfNotPresent
...

loki:
  enabled: true
  persistence:
    enabled: true
    storageClassName: nfs-storage
    accessModes:
      - ReadWriteOnce
    size: 30Gi
  isDefault: true
  url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
...

promtail:
  enabled: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push
    limits_config:
      ingestion_rate_strategy: local
      ingestion_rate_mb: 15
      ingestion_burst_size_mb: 20  
...

grafana:
  enabled: true
  persistence:
    enabled: true
    storageClassName: nfs-storage
    accessModes:
      - ReadWriteOnce
    size: 10Gi

# 具体修改
## 第3行修改镜像
  image: registry.cn-hangzhou.aliyuncs.com/abroad_images/bats:1.8.2
## 第7行下面新增
  persistence:
    enabled: true
    storageClassName: nfs-storage
    accessModes:
      - ReadWriteOnce
    size: 30Gi
## 第37行下面新增
    limits_config:
      ingestion_rate_strategy: local
      ingestion_rate_mb: 15
      ingestion_burst_size_mb: 20 
## 第47行修改
  enabled: true    
## 第47行下面新增如下内容
    storageClassName: nfs-storage
    accessModes:
      - ReadWriteOnce
    size: 10Gi
---
# 修改promtail的模板文件
[root@master01 loki-stack]# vim charts/promtail/values.yaml 
## 修改第50行内容
 50   registry: registry.cn-hangzhou.aliyuncs.com
## 修改第52行内容 
 52   repository: abroad_images/promtail 
## 修改第54行内容
 54   tag: 2.7.4
---
# 修改grafana的模板文件
[root@master01 loki-stack]# vim charts/grafana/values.yaml
## 修改第78行内容
  78   repository: registry.cn-hangzhou.aliyuncs.com/abroad_images/grafana
## 修改第80行内容
  80   tag: "8.3.5"
---
# 修改loki的模板文件
[root@master01 loki-stack]# vim charts/loki/values.yaml
## 修改第2行内容
  2   repository: registry.cn-hangzhou.aliyuncs.com/abroad_images/loki

上面配置文件修改后的完整配置文件

[root@master01 loki-stack]# egrep -v "#|^$" charts/loki/values.yaml  
image:
  repository: registry.cn-hangzhou.aliyuncs.com/abroad_images/loki 
  tag: 2.6.1
  pullPolicy: IfNotPresent
ingress:
  enabled: false
  annotations: {}
  hosts:
    - host: chart-example.local
      paths: []
  tls: []
affinity: {}
annotations: {}
tracing:
  jaegerAgentHost:
config:
  auth_enabled: false
  memberlist:
    join_members:
      - '{{ include "loki.fullname" . }}-memberlist'
  ingester:
    chunk_idle_period: 3m
    chunk_block_size: 262144
    chunk_retain_period: 1m
    max_transfer_retries: 0
    wal:
      dir: /data/loki/wal
    lifecycler:
      ring:
        replication_factor: 1
  limits_config:
    enforce_metric_name: false
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    max_entries_limit_per_query: 5000
  schema_config:
    configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h
  server:
    http_listen_port: 3100
    grpc_listen_port: 9095
  storage_config:
    boltdb_shipper:
      active_index_directory: /data/loki/boltdb-shipper-active
      cache_location: /data/loki/boltdb-shipper-cache
      shared_store: filesystem
    filesystem:
      directory: /data/loki/chunks
  chunk_store_config:
    max_look_back_period: 0s
  table_manager:
    retention_deletes_enabled: false
    retention_period: 0s
  compactor:
    working_directory: /data/loki/boltdb-shipper-compactor
    shared_store: filesystem
extraArgs: {}
extraEnvFrom: []
livenessProbe:
  httpGet:
    path: /ready
    port: http-metrics
  initialDelaySeconds: 45
networkPolicy:
  enabled: false
client: {}
nodeSelector: {}
persistence:
  enabled: false
  accessModes:
  - ReadWriteOnce
  size: 10Gi
  labels: {}
  annotations: {}
podLabels: {}
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "http-metrics"
podManagementPolicy: OrderedReady
rbac:
  create: true
  pspEnabled: true
readinessProbe:
  httpGet:
    path: /ready
    port: http-metrics
  initialDelaySeconds: 45
replicas: 1
resources: {}
securityContext:
  fsGroup: 10001
  runAsGroup: 10001
  runAsNonRoot: true
  runAsUser: 10001
containerSecurityContext:
  readOnlyRootFilesystem: true
service:
  type: ClusterIP
  nodePort:
  port: 3100
  annotations: {}
  labels: {}
  targetPort: http-metrics
serviceAccount:
  create: true
  name:
  annotations: {}
  automountServiceAccountToken: true
terminationGracePeriodSeconds: 4800
tolerations: []
topologySpreadConstraints:
  enabled: false
podDisruptionBudget: {}
updateStrategy:
  type: RollingUpdate
serviceMonitor:
  enabled: false
  interval: ""
  additionalLabels: {}
  annotations: {}
  scheme: null
  tlsConfig: {}
  prometheusRule:
    enabled: false
    additionalLabels: {}
    rules: []
initContainers: []
extraContainers: []
extraVolumes: []
extraVolumeMounts: []
extraPorts: []
env: []
alerting_groups: []
useExistingAlertingGroup:
  enabled: false
  configmapName: ""

# granfana完整配置文件
[root@master01 loki-stack]# egrep -v "#|^$" charts/grafana/values.yaml 
rbac:
  create: true
  pspEnabled: true
  pspUseAppArmor: true
  namespaced: false
  extraRoleRules: []
  extraClusterRoleRules: []
serviceAccount:
  create: true
  name:
  nameTest:
  labels: {}
  autoMount: true
replicas: 1
headlessService: false
autoscaling:
  enabled: false
podDisruptionBudget: {}
deploymentStrategy:
  type: RollingUpdate
readinessProbe:
  httpGet:
    path: /api/health
    port: 3000
livenessProbe:
  httpGet:
    path: /api/health
    port: 3000
  initialDelaySeconds: 60
  timeoutSeconds: 30
  failureThreshold: 10
image:
  repository: registry.cn-hangzhou.aliyuncs.com/abroad_images/grafana 
  tag: "8.3.5"
  sha: ""
  pullPolicy: IfNotPresent
testFramework:
  enabled: true
  image: "bats/bats"
  tag: "v1.4.1"
  imagePullPolicy: IfNotPresent
  securityContext: {}
securityContext:
  runAsUser: 472
  runAsGroup: 472
  fsGroup: 472
containerSecurityContext:
  {}
createConfigmap: true
extraConfigmapMounts: []
extraEmptyDirMounts: []
extraLabels: {}
downloadDashboardsImage:
  repository: curlimages/curl
  tag: 7.85.0
  sha: ""
  pullPolicy: IfNotPresent
downloadDashboards:
  env: {}
  envFromSecret: ""
  resources: {}
  securityContext: {}
podPortName: grafana
service:
  enabled: true
  type: ClusterIP
  port: 80
  targetPort: 3000
  annotations: {}
  labels: {}
  portName: service
  appProtocol: ""
serviceMonitor:
  enabled: false
  path: /metrics
  labels: {}
  interval: 1m
  scheme: http
  tlsConfig: {}
  scrapeTimeout: 30s
  relabelings: []
extraExposePorts: []
hostAliases: []
ingress:
  enabled: false
  annotations: {}
  labels: {}
  path: /
  pathType: Prefix
  hosts:
    - chart-example.local
  extraPaths: []
  tls: []
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
topologySpreadConstraints: []
extraInitContainers: []
extraContainers: ""
extraContainerVolumes: []
persistence:
  type: pvc
  enabled: false
  accessModes:
    - ReadWriteOnce
  size: 10Gi
  finalizers:
    - kubernetes.io/pvc-protection
  extraPvcLabels: {}
  inMemory:
    enabled: false
initChownData:
  enabled: true
  image:
    repository: busybox
    tag: "1.31.1"
    sha: ""
    pullPolicy: IfNotPresent
  resources: {}
  securityContext:
    runAsNonRoot: false
    runAsUser: 0
adminUser: admin
admin:
  existingSecret: ""
  userKey: admin-user
  passwordKey: admin-password
env: {}
envValueFrom: {}
envFromSecret: ""
envRenderSecret: {}
envFromSecrets: []
envFromConfigMaps: []
enableServiceLinks: true
extraSecretMounts: []
extraVolumeMounts: []
lifecycleHooks: {}
plugins: []
datasources: {}
alerting: {}
notifiers: {}
dashboardProviders: {}
dashboards: {}
dashboardsConfigMaps: {}
grafana.ini:
  paths:
    data: /var/lib/grafana/
    logs: /var/log/grafana
    plugins: /var/lib/grafana/plugins
    provisioning: /etc/grafana/provisioning
  analytics:
    check_for_updates: true
  log:
    mode: console
  grafana_net:
    url: https://grafana.net
  server:
    domain: "{{ if (and .Values.ingress.enabled .Values.ingress.hosts) }}{{ .Values.ingress.hosts | first }}{{ else }}''{{ end }}"
ldap:
  enabled: false
  existingSecret: ""
  config: ""
smtp:
  existingSecret: ""
  userKey: "user"
  passwordKey: "password"
sidecar:
  image:
    repository: quay.io/kiwigrid/k8s-sidecar
    tag: 1.19.2
    sha: ""
  imagePullPolicy: IfNotPresent
  resources: {}
  securityContext: {}
  enableUniqueFilenames: false
  readinessProbe: {}
  livenessProbe: {}
  alerts:
    enabled: false
    env: {}
    label: grafana_alert
    labelValue: ""
    searchNamespace: null
    watchMethod: WATCH
    resource: both
    reloadURL: "http://localhost:3000/api/admin/provisioning/alerting/reload"
    script: null
    skipReload: false
    sizeLimit: {}
  dashboards:
    enabled: false
    env: {}
    SCProvider: true
    label: grafana_dashboard
    labelValue: ""
    folder: /tmp/dashboards
    defaultFolderName: null
    searchNamespace: null
    watchMethod: WATCH
    resource: both
    folderAnnotation: null
    script: null
    provider:
      name: sidecarProvider
      orgid: 1
      folder: ''
      type: file
      disableDelete: false
      allowUiUpdates: false
      foldersFromFilesStructure: false
    extraMounts: []
    sizeLimit: {}
  datasources:
    enabled: false
    env: {}
    label: grafana_datasource
    labelValue: ""
    searchNamespace: null
    watchMethod: WATCH
    resource: both
    reloadURL: "http://localhost:3000/api/admin/provisioning/datasources/reload"
    script: null
    skipReload: false
    initDatasources: false
    sizeLimit: {}
  plugins:
    enabled: false
    env: {}
    label: grafana_plugin
    labelValue: ""
    searchNamespace: null
    watchMethod: WATCH
    resource: both
    reloadURL: "http://localhost:3000/api/admin/provisioning/plugins/reload"
    script: null
    skipReload: false
    initPlugins: false
    sizeLimit: {}
  notifiers:
    enabled: false
    env: {}
    label: grafana_notifier
    labelValue: ""
    searchNamespace: null
    watchMethod: WATCH
    resource: both
    reloadURL: "http://localhost:3000/api/admin/provisioning/notifications/reload"
    script: null
    skipReload: false
    initNotifiers: false
    sizeLimit: {}
namespaceOverride: ""
revisionHistoryLimit: 10
imageRenderer:
  deploymentStrategy: {}
  enabled: false
  replicas: 1
  image:
    repository: grafana/grafana-image-renderer
    tag: latest
    sha: ""
    pullPolicy: Always
  env:
    HTTP_HOST: "0.0.0.0"
  serviceAccountName: ""
  securityContext: {}
  containerSecurityContext:
    capabilities:
      drop: ['ALL']
    allowPrivilegeEscalation: false
    readOnlyRootFilesystem: true
  hostAliases: []
  priorityClassName: ''
  service:
    enabled: true
    portName: 'http'
    port: 8081
    targetPort: 8081
    appProtocol: ""
  grafanaProtocol: http
  grafanaSubPath: ""
  podPortName: http
  revisionHistoryLimit: 10
  networkPolicy:
    limitIngress: true
    limitEgress: false
  resources: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
networkPolicy:
  enabled: false
  ingress: true
  allowExternal: true
  explicitNamespacesSelector: {}
  egress:
    enabled: false
    ports: []
enableKubeBackwardCompatibility: false
useStatefulSet: false
extraObjects: []

# promtail完整配置文件
[root@master01 loki-stack]# egrep -v "#|^$" charts/promtail/values.yaml 
nameOverride: null
fullnameOverride: null
daemonset:
  enabled: true
deployment:
  enabled: false
  replicaCount: 1
  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 10
    targetCPUUtilizationPercentage: 80
    targetMemoryUtilizationPercentage:
secret:
  labels: {}
  annotations: {}
configmap:
  enabled: false
initContainer: []
image:
  registry: registry.cn-hangzhou.aliyuncs.com 
  repository: abroad_images/promtail
  tag: 2.7.4
  pullPolicy: IfNotPresent
imagePullSecrets: []
annotations: {}
updateStrategy: {}
podLabels: {}
podAnnotations: {}
priorityClassName: null
livenessProbe: {}
readinessProbe:
  failureThreshold: 5
  httpGet:
    path: "{{ printf `%s/ready` .Values.httpPathPrefix }}"
    port: http-metrics
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1
resources: {}
podSecurityContext:
  runAsUser: 0
  runAsGroup: 0
containerSecurityContext:
  readOnlyRootFilesystem: true
  capabilities:
    drop:
      - ALL
  allowPrivilegeEscalation: false
rbac:
  create: true
  pspEnabled: false
namespace: null
serviceAccount:
  create: true
  name: null
  imagePullSecrets: []
  annotations: {}
nodeSelector: {}
affinity: {}
tolerations:
  - key: node-role.kubernetes.io/master
    operator: Exists
    effect: NoSchedule
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
    effect: NoSchedule
defaultVolumes:
  - name: run
    hostPath:
      path: /run/promtail
  - name: containers
    hostPath:
      path: /var/lib/docker/containers
  - name: pods
    hostPath:
      path: /var/log/pods
defaultVolumeMounts:
  - name: run
    mountPath: /run/promtail
  - name: containers
    mountPath: /var/lib/docker/containers
    readOnly: true
  - name: pods
    mountPath: /var/log/pods
    readOnly: true
extraVolumes: []
extraVolumeMounts: []
extraArgs: []
extraEnv: []
extraEnvFrom: []
enableServiceLinks: true
serviceMonitor:
  enabled: false
  namespace: null
  namespaceSelector: {}
  annotations: {}
  labels: {}
  interval: null
  scrapeTimeout: null
  relabelings: []
  metricRelabelings: []
  targetLabels: []
  scheme: http
  tlsConfig: null
  prometheusRule:
    enabled: false
    additionalLabels: {}
    rules: []
extraContainers: {}
extraPorts: {}
podSecurityPolicy:
  privileged: true
  allowPrivilegeEscalation: true
  volumes:
    - 'secret'
    - 'hostPath'
    - 'downwardAPI'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    rule: 'RunAsAny'
  seLinux:
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
  readOnlyRootFilesystem: true
  requiredDropCapabilities:
    - ALL
config:
  logLevel: info
  serverPort: 3101
  clients:
    - url: http://loki-gateway/loki/api/v1/push
  snippets:
    pipelineStages:
      - cri: {}
    common:
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_node_name
        target_label: node_name
      - action: replace
        source_labels:
          - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        replacement: $1
        separator: /
        source_labels:
          - namespace
          - app
        target_label: job
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_container_name
        target_label: container
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
          - __meta_kubernetes_pod_uid
          - __meta_kubernetes_pod_container_name
        target_label: __path__
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        regex: true/(.*)
        separator: /
        source_labels:
          - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
          - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
          - __meta_kubernetes_pod_container_name
        target_label: __path__
    addScrapeJobLabel: false
    extraLimitsConfig: ""
    extraServerConfigs: ""
    extraScrapeConfigs: ""
    extraRelabelConfigs: []
    scrapeConfigs: |
      - job_name: kubernetes-pods
        pipeline_stages:
          {{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels:
              - __meta_kubernetes_pod_controller_name
            regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
            action: replace
            target_label: __tmp_controller_name
          - source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_name
              - __meta_kubernetes_pod_label_app
              - __tmp_controller_name
              - __meta_kubernetes_pod_name
            regex: ^;*([^;]+)(;.*)?$
            action: replace
            target_label: app
          - source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_instance
              - __meta_kubernetes_pod_label_release
            regex: ^;*([^;]+)(;.*)?$
            action: replace
            target_label: instance
          - source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_component
              - __meta_kubernetes_pod_label_component
            regex: ^;*([^;]+)(;.*)?$
            action: replace
            target_label: component
          {{- if .Values.config.snippets.addScrapeJobLabel }}
          - replacement: kubernetes-pods
            target_label: scrape_job
          {{- end }}
          {{- toYaml .Values.config.snippets.common | nindent 4 }}
          {{- with .Values.config.snippets.extraRelabelConfigs }}
          {{- toYaml . | nindent 4 }}
          {{- end }}
  file: |
    server:
      log_level: {{ .Values.config.logLevel }}
      http_listen_port: {{ .Values.config.serverPort }}
      {{- with .Values.httpPathPrefix }}
      http_path_prefix: {{ . }}
      {{- end }}
      {{- tpl .Values.config.snippets.extraServerConfigs . | nindent 2 }}
    clients:
      {{- tpl (toYaml .Values.config.clients) . | nindent 2 }}
    positions:
      filename: /run/promtail/positions.yaml
    scrape_configs:
      {{- tpl .Values.config.snippets.scrapeConfigs . | nindent 2 }}
      {{- tpl .Values.config.snippets.extraScrapeConfigs . | nindent 2 }}
    limits_config:
      {{- tpl .Values.config.snippets.extraLimitsConfig . | nindent 2 }}
networkPolicy:
  enabled: false
  metrics:
    podSelector: {}
    namespaceSelector: {}
    cidrs: []
  k8sApi:
    port: 8443
    cidrs: []
httpPathPrefix: ""
sidecar:
  configReloader:
    enabled: false
    image:
      registry: docker.io
      repository: jimmidyson/configmap-reload
      tag: v0.8.0
      pullPolicy: IfNotPresent
    extraArgs: []
    extraEnv: []
    extraEnvFrom: []
    containerSecurityContext:
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL
      allowPrivilegeEscalation: false
    readinessProbe: {}
    livenessProbe: {}
    resources: {}
    config:
      serverPort: 9533
    serviceMonitor:
      enabled: true
extraObjects: []

# 自身values.yaml完整配置文件
[root@master01 loki-stack]# egrep -v "#|^$" values.yaml 
test_pod:
  enabled: true
  image: registry.cn-hangzhou.aliyuncs.com/abroad_images/bats:1.8.2 
  pullPolicy: IfNotPresent
loki:
  enabled: true
  persistence:
    enabled: true
    storageClassName: nfs-storage
    accessModes:
      - ReadWriteOnce
    size: 30Gi
  isDefault: true
  url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
  readinessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 45
  livenessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 45
  datasource:
    jsonData: "{}"
    uid: ""
promtail:
  enabled: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push
    limits_config:
      ingestion_rate_strategy: local
      ingestion_rate_mb: 15
      ingestion_burst_size_mb: 20 
fluent-bit:
  enabled: false
grafana:
  enabled: true
  storageClassName: nfs-storage
  accessModes:
    - ReadWriteOnce
  size: 10Gi
  sidecar:
    datasources:
      label: ""
      labelValue: ""
      enabled: true
      maxLines: 1000
  image:
    tag: 8.3.5
prometheus:
  enabled: false
  isDefault: false
  url: http://{{ include "prometheus.fullname" .}}:{{ .Values.prometheus.server.service.servicePort }}{{ .Values.prometheus.server.prefixURL }}
  datasource:
    jsonData: "{}"
filebeat:
  enabled: false
  filebeatConfig:
    filebeat.yml: |
      filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log
        processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            matchers:
            - logs_path:
                logs_path: "/var/log/containers/"
      output.logstash:
        hosts: ["logstash-loki:5044"]
logstash:
  enabled: false
  image: grafana/logstash-output-loki
  imageTag: 1.0.1
  filters:
    main: |-
      filter {
        if [kubernetes] {
          mutate {
            add_field => {
              "container_name" => "%{[kubernetes][container][name]}"
              "namespace" => "%{[kubernetes][namespace]}"
              "pod" => "%{[kubernetes][pod][name]}"
            }
            replace => { "host" => "%{[kubernetes][node][name]}"}
          }
        }
        mutate {
          remove_field => ["tags"]
        }
      }
  outputs:
    main: |-
      output {
        loki {
          url => "http://loki:3100/loki/api/v1/push"
        }
      }
proxy:
  http_proxy: ""
  https_proxy: ""
  no_proxy: ""

3.2 部署验证

[root@master01 loki-stack]# kubectl create ns logging
[root@master01 loki-stack]# helm upgrade --install loki -n logging -f values.yaml . 

# 后面如果要卸载,可执行下面内容
$ helm uninstall loki -n logging 

查看验证:

# 查看pod
[root@master01 loki-stack]# kubectl get pods -n logging |grep loki
loki-0                          1/1     Running   0             11m
loki-grafana-8667dc7b46-cnh8d   2/2     Running   0             11m
loki-promtail-4qpgf             1/1     Running   0             11m
loki-promtail-8d25j             1/1     Running   0             11m
loki-promtail-l9msz             1/1     Running   0             11m
loki-promtail-s67t8             1/1     Running   0             11m
loki-promtail-t728x             1/1     Running   0             11m

# 查看服务
[root@master01 loki-stack]# kubectl -n logging get svc |grep loki
loki                            ClusterIP   192.168.9.209     <none>        3100/TCP                     11m
loki-grafana                    ClusterIP   192.168.252.13    <none>        80/TCP                       11m
loki-headless                   ClusterIP   None              <none>        3100/TCP                     11m
loki-memberlist                 ClusterIP   None              <none>        7946/TCP                     11m

获取grafana的密码:

[root@master01 loki-stack]# kubectl get secret --namespace logging loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

#密码
21hubL5ZXNVG6ZPvfigKeWV9FBfYGxYAEseT1YZy

创建ing:

[root@master01 loki-stack]# vim grafana-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: logging
  name: grafana-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: grafana-logging.zhang-qing.com
    http:
      paths:
        - pathType: Prefix
          backend:
            service:
              name: loki-grafana
              port:
                number: 80
          path: /

# 应用
[root@master01 loki-stack]# kaf grafana-ingress.yaml

测试验证:

[root@master01 loki-stack]# curl grafana-logging.zhang-qing.com -i
HTTP/1.1 302 Found
Date: Wed, 16 Apr 2025 00:31:35 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 29
Connection: keep-alive
Cache-Control: no-cache
Expires: -1
Location: /login
Pragma: no-cache
Set-Cookie: redirect_to=%2F; Path=/; HttpOnly; SameSite=Lax
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-Xss-Protection: 1; mode=block

<a href="/login">Found</a>.

使用用户名 admin 和上面的获取的密码即可登录 Grafana;

由于 Helm Chart 已经为 Grafana 配置好了 Loki 的数据源,所以我们可以直接获取到日志数据了。

点击左侧 Explore 菜单,然后就可以筛选 Loki 的日志数据了:

Day09-可观察性-ELK&Loki-图24

Day09-可观察性-ELK&Loki-图25

Day09-可观察性-ELK&Loki-图26

使用 Helm 安装的 Promtail 默认已经帮我们做好了配置,已经针对 Kubernetes 做了优化,我们可以查看其配置:

# 下载jq
[root@master01 loki-stack]# yum install -y epel-release
[root@master01 loki-stack]# yum install -y jq

# 查看配置信息
[root@master01 loki-stack]# kubectl get secret loki-promtail -n logging -o json | jq -r '.data."promtail.yaml"' | base64 -d
server:
  log_level: info
  http_listen_port: 3101

clients:
  - url: http://loki:3100/loki/api/v1/push

positions:
  filename: /run/promtail/positions.yaml

scrape_configs:
  # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
  - job_name: kubernetes-pods
    pipeline_stages:
      - cri: {}
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          - __meta_kubernetes_pod_controller_name
        regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
        action: replace
        target_label: __tmp_controller_name
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_name
          - __meta_kubernetes_pod_label_app
          - __tmp_controller_name
          - __meta_kubernetes_pod_name
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: app
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_instance
          - __meta_kubernetes_pod_label_release
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: instance
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_component
          - __meta_kubernetes_pod_label_component
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: component
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node_name
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        replacement: $1
        separator: /
        source_labels:
        - namespace
        - app
        target_label: job
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        target_label: container
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_uid
        - __meta_kubernetes_pod_container_name
        target_label: __path__
      - action: replace
        regex: true/(.*)
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
        - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
        - __meta_kubernetes_pod_container_name
        target_label: __path__

limits_config:

四、Loki查询案例

4.1 日志选择器

对于查询表达式的标签部分,将其用大括号括起来{},然后使用键值语法选择标签。多个标签表达式用逗号分隔:

= 完全相等。
!= 不相等。
=~ 正则表达式匹配。
!~ 不进行正则表达式匹配。

# 根据任务名称来查找日志
{app="ingress-nginx"}
{job="devops/metallb"}
{namespace="default",app="podstdr2"}
{namespace="default",app="counterlog"}
{app=~"kube-state-metrics|prometheus|zookeeper"}

4.2 使用日志过滤器来查找

编写日志流选择器后,您可以通过编写搜索表达式来进一步过滤结果

|= 行包含字符串
!= 行不包含字符串。
|~ 行匹配正则表达式。
!~ 行与正则表达式不匹配。

regex表达式接受RE2语法。默认情况下,匹配项区分大小写,并且可以将regex切换为不区分大小写的前缀(?i)1. 精确查找名称空间为logging下container为zookeeper且包含有INFO关键字的日志
{namespace="logging",container="zookeeper"} |= "INFO"

2. 正则查找
{job="huohua/svc-huohua-batch"} |~ "(duration|latency)s*(=|is|of)s*[d.]+"

3. 不包含。
{job="mysql"} |= "error" != "timeout"

五、常见问题

5.1 问题(1)

提示找不到/var/log/pods目录下的日志文件,无法tail。

level=error ts=2023-07-17T03:22:11.682802445Z caller=filetarget.go:307 msg="failed to tail file, stat failed" error="stat /var/log/pods/kube-system_kube-apiserver-master3_a8daf137c2a2ea7ef925aaef1e82ac16/kube-apiserver/13.log: no such file or directory" filename=/var/log/pods/kube-system_kube-apiserver-master3_a8daf137c2a2ea7ef925aaef1e82ac16/kube-apiserver/13.log
level=error ts=2023-07-17T03:22:11.682823944Z caller=filetarget.go:307 msg="failed to tail file, stat failed" error="stat /var/log/pods/kube-system_kube-scheduler-master3_bdef86673f60f833d12eb8a3ad337fac/kube-scheduler/1.log: no such file or directory" filename=/var/log/pods/kube-system_kube-scheduler-master3_bdef86673f60f833d12eb8a3ad337fac/kube-scheduler/1.log

首先我们可以进入promtail容器内,到该目录下查看下是否有该文件,通过cat命令看看是否有日志。

默认安装promtail,它会将主机 /var/log/pods/var/lib/docker/containers目录通过volumes方式挂载到promtail容器内。

如果安装docker和k8s都是采用默认配置,应该不会存在读不到日志的问题。

{
    "name": "docker",
    "hostPath": {
        "path": "/var/lib/docker/containers",
        "type": ""
    }
},
    {
    "name": "pods",
    "hostPath": {
        "path": "/var/log/pods",
        "type": ""
    }
}

但是我们这边真实的企业场景是将docker的数据目录挂载磁盘/data目录下,所以需要修改默认volumes配置。

修改步骤:

$ vim values.yaml
promtail:
  enabled: true
  extraVolumes:
    - name: docker
      hostPath:
        path: /data/docker/containers
  extraVolumeMounts:
    - name: docker
      mountPath: /data/docker/containers
      readOnly: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push

上面volumes和volumeMounts都要修改,因为 /var/log/pods 目录下的日志文件其实是个软链接,指向的是 docker/containers 目录下的日志文件。

如果只修改了volumes,那么promtail容器内可以找到日志文件,但是打开确实空的,因为它只是个软连接。

[root@node1 log]# ll /var/log/pods/monitoring_promtail-bs5cs_5bc5bc90-bac9-480d-b291-4caadeff2236/promtail/
total 4
lrwxrwxrwx 1 root root 162 Dec 17 14:04 0.log -> /data/docker/containers/db45d5118e9508817e1a2efa3c9da68cfe969a2b0a3ed42619ff61a29cc64e5f/db45d5118e9508817e1a2efa3c9da68cfe969a2b0a3ed42619ff61a29cc64e5f-json.log

5.2 问题(2)

Loki日志系统收集日志报429错误:

level=warn ts=2023-07-17T03:42:34.456086325Z caller=client.go:369 component=client host=loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 4194304 bytes/sec) while attempting to ingest '5381' lines totaling '1048504' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"

level=warn ts=2023-07-17T03:42:35.144739805Z caller=client.go:369 component=client host=loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 4194304 bytes/sec) while attempting to ingest '5381' lines totaling '1048504' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"

收集的日志太多了,超过了 loki 的限制,所以会报 429 错误,如果你要增加限制可以修改 loki 的配置文件:

promtail:
  enabled: true
  extraVolumes:
    - name: docker
      hostPath:
        path: /data/docker/containers
  extraVolumeMounts:
    - name: docker
      mountPath: /data/docker/containers
      readOnly: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push
    limits_config:
      #将直接将日志数据发送到运行在本地的 Loki 实例
      ingestion_rate_strategy: local
      #每个用户每秒的采样率限制
      ingestion_rate_mb: 15
      #每个用户允许的采样突发大小
      ingestion_burst_size_mb: 20