Monitoring Step 2: Log Ingestion with Fluent Bit

With my last post completed, I now have a reasonably operational platform that can provide me with insight, alerting and dashboarding over events generated from my kubernetes cluster or from other sources.

That’s a great start, but I need to get the logs to this platform in a digestible form. Thankfully Kubernetes is a very well known platform and there are some well defined tools out there to use.

I’ve chosen to leverage Fluent Bit – considered a next generation implementation of fluentd that is quite light in its use of resources – and with my platform every bit of CPU and RAM counts.

Installing Fluent Bit

Fluent Bit implementation diagram.

A daemonset will draw from configmap and secrets values and creatae a Fluent Bit pod in each worker node, outputting logs to Opensearch.

Some Configuration required

Fluent Bit offers a Helm chart for install which will require some modification as I’m going to need to inject some secrets and configs to the platform. At a minimum I need:

To add the root ca certificate, not the key. If you recall from my last post, nothing should have the private key for your root (or intermediate) CA. The certificate is public so in this case I’m going to inject this as a configmap.
A username and password for a user configured with enough access to Opensearch to:
- Create indexes
- Update indexes using the bulk and bulk* endpoint
Some customisation to the configuration files of the Fluent Bit service. I want to configure the outputs to ship to Opensearch using both the above credentials and settings to ensure the logs are in the right shape and format.

Planning for Each Configuration Item

The platform will exist in a dedicated namespace.
The pure config items will go in a modified values file for the helm chart.
The secrets will be delivered via doppler leveraging a new Doppler project and the Doppler Kubernetes Operator I’m also going to include the Opensearch Host and Port in the secret config. This is semi-sensitive data at best, but centralising the storage of this information is helpful at this stage. The Doppler Operator will keep these in sync and should notify pods/deployments when these values have changed.

Config Samples

fluentbit-doppler.yaml – this file assumes a pre-configured Doppler secrets operator and that we have also pre-configured a service token for the Doppler project. In this case I am expecting a secret named “doppler-token-fluentbit” within namespace doppler-operator-system connected to my Doppler project.

yaml

apiVersion: secrets.doppler.com/v1alpha1
kind: DopplerSecret
metadata:
  name: doppler-fluentbit # DopplerSecret Name
  namespace: doppler-operator-system
spec:
  tokenSecret: # Kubernetes service token secret (namespace defaults to doppler-operator-system)
    name: doppler-token-fluentbit
  managedSecret: # Kubernetes managed secret (will be created if does not exist)
    name: doppler-fluentbit
    namespace: fluentbit # Should match the namespace of deployments that will use the secret
    type: Opaque # Optional: defaults to `Opaque` so can be left out unless changed

tlsconfigs.yaml – when setting this up I had originally read this in as a raw file, but found that using key-value pairs was more friendly to mounting into the cluster. When using this approach the configmap or secret is mounted as a folder, and each key as a file containing its value. As a future exercise I may look to add this to my secrets file and mount this, but for now this works well.

yaml

kind: ConfigMap
apiVersion: v1
metadata:
  name: tlsconfigs
data:
  # Configuration values can be set as key-value properties
  myca.cert: |
    -----BEGIN CERTIFICATE-----
    ... some base64 certificate data.
    -----END CERTIFICATE-----

fluentbit-values.yaml – this file contains the customisations I’ve made to the helm chart for my implementation. The majority of this file is taken from the Fluent Bit values.yaml defaults hosted on github.

Key changes:

Modified the Output targets to Opensearch, and configured the platform to read the user, password, host and host port secrets from doppler environment variables.
Modified the log level to warn. You would generally set this to more verbose until the platform is stable.
Mounted the root certificate in an appropriate location, and modified the Output configs to be aware of this and to use TLS and ensure they validate against this root cert when connecting.
I’ve also tweaked the output buffer size after encountering a warning with the default configuration – this may require revisiting as there appears to be some ongoing discussion as to where the root cause of this particular warning lies.

yaml

kind: DaemonSet
replicaCount: 1
image:
  repository: cr.fluentbit.io/fluent/fluent-bit
  tag: 2.2.2
  digest:
  pullPolicy: Always
testFramework:
  enabled: true
  namespace:
  image:
    repository: busybox
    pullPolicy: Always
    tag: latest
    digest:
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
  create: true
  annotations: {}
  name:
rbac:
  create: true
  nodeAccess: false
  eventsAccess: false
podSecurityPolicy:
  create: false
  annotations: {}
openShift:
  enabled: false
  securityContextConstraints:
    create: true
    name: ""
    annotations: {}
    existingName: ""
podSecurityContext: {}
hostNetwork: false
dnsPolicy: ClusterFirst
dnsConfig: {}
hostAliases: []
securityContext: {}
service:
  type: ClusterIP
  port: 2020
  loadBalancerClass:
  loadBalancerSourceRanges: []
  labels: {}
  annotations: {}
serviceMonitor:
  enabled: false
  additionalEndpoints: []
prometheusRule:
  enabled: false
dashboards:
  enabled: false
  labelKey: grafana_dashboard
  labelValue: 1
  annotations: {}
  namespace: ""
lifecycle: {}
livenessProbe:
  httpGet:
    path: /
    port: http
readinessProbe:
  httpGet:
    path: /api/v1/health
    port: http
resources: {}
ingress:
  enabled: false
  ingressClassName: ""
  annotations: {}
  hosts: []
  extraHosts: []
  tls: []
autoscaling:
  vpa:
    enabled: false
    annotations: {}
    controlledResources: []
    maxAllowed: {}
    minAllowed: {}
    updatePolicy:
      updateMode: Auto
  enabled: false
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 75
  customRules: []
  behavior: {}
podDisruptionBudget:
  enabled: false
  annotations: {}
  maxUnavailable: "30%"
nodeSelector: {}
tolerations: []
affinity: {}
labels: {}
annotations: {}
podAnnotations: {}
podLabels: {}
minReadySeconds:
terminationGracePeriodSeconds:
priorityClassName: ""
env:
  - name: FLUENTBIT_PWD
    valueFrom:
      secretKeyRef:
        name: doppler-fluentbit
        key: FLUENTBIT_PWD
  - name: FLUENTBIT_USER
    valueFrom:
      secretKeyRef:
        name: doppler-fluentbit
        key: FLUENTBIT_USER
  - name: OPENSEARCH_HOST
    valueFrom:
      secretKeyRef:
        name: doppler-fluentbit
        key: OPENSEARCH_HOST
  - name: OPENSEARCH_PORT
    valueFrom:
      secretKeyRef:
        name: doppler-fluentbit
        key: OPENSEARCH_PORT
envWithTpl: []
envFrom: []
extraContainers: []
flush: 1
metricsPort: 2020
extraPorts: []
extraVolumes:
  - name: tlsconfigs
    configMap:
      name: tlsconfigs
    type: FileOrCreate
extraVolumeMounts:
  - name: tlsconfigs
    mountPath: /etc/ssl/myca
    readOnly: true
updateStrategy: {}
existingConfigMap: ""
networkPolicy:
  enabled: false
luaScripts: {}
config:
  service: |
    [SERVICE]
        Daemon Off
        Flush {{ .Values.flush }}
        Log_Level {{ .Values.logLevel }}
        Parsers_File /fluent-bit/etc/parsers.conf
        Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port {{ .Values.metricsPort }}
        Health_Check On
 
  inputs: |
    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On
    [INPUT]
        Name systemd
        Tag host.*
        Systemd_Filter _SYSTEMD_UNIT=kubelet.service
        Read_From_Tail On
  filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On
  outputs: |
    [OUTPUT]
        Name opensearch
        Match kube.*
        Host ${OPENSEARCH_HOST}
        Port ${OPENSEARCH_PORT}
        Suppress_Type_Name On
        Replace_Dots    On
        HTTP_User ${FLUENTBIT_USER}
        HTTP_Passwd ${FLUENTBIT_PWD}
        tls.ca_file /etc/ssl/myca/myca.cert
        tls On
        tls.verify On
        Buffer_Size 256KB
        Logstash_Format On
        Retry_Limit False
        Trace_Error       On
    [OUTPUT]
        Name opensearch
        Match host.*
        Host ${OPENSEARCH_HOST}
        Port ${OPENSEARCH_PORT}
        Suppress_Type_Name On
        Replace_Dots    On
        HTTP_User ${FLUENTBIT_USER}
        HTTP_Passwd ${FLUENTBIT_PWD}
        tls.ca_file /etc/ssl/myca/myca.cert
        tls On
        tls.verify On
        Buffer_Size 256KB
        Logstash_Format On
        Logstash_Prefix node
        Retry_Limit False
        Trace_Error       On
  upstream: {}
  customParsers: |
    [PARSER]
        Name docker_no_time
        Format json
        Time_Keep Off
        Time_Key time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
  extraFiles: {}
volumeMounts:
  - name: config
    mountPath: /fluent-bit/etc/conf
daemonSetVolumes:
  - name: varlog
    hostPath:
      path: /var/log
  - name: varlibdockercontainers
    hostPath:
      path: /var/lib/docker/containers
  - name: etcmachineid
    hostPath:
      path: /etc/machine-id
      type: File
daemonSetVolumeMounts:
  - name: varlog
    mountPath: /var/log
  - name: varlibdockercontainers
    mountPath: /var/lib/docker/containers
    readOnly: true
  - name: etcmachineid
    mountPath: /etc/machine-id
    readOnly: true
command:
  - /fluent-bit/bin/fluent-bit
args:
  - --workdir=/fluent-bit/etc
  - --config=/fluent-bit/etc/conf/fluent-bit.conf
initContainers: []
logLevel: warn
hotReload:
  enabled: false
  image:
    repository: ghcr.io/jimmidyson/configmap-reload
    tag: v0.11.1
    digest:
    pullPolicy: IfNotPresent
  resources: {}

Deploying and Validating the Installation

From this point, deploying to the cluster is relatively straight forward. Assuming Helm is already up and running and I have added the appropriate source of Helm charts as described in the Fluent Bit documentation, the following commands should set everything up:

bash

kubectl create namespace fluentbit
kubectl apply --namespace fluentbit -f tlsconfigs.yaml
kubectl apply --namespace fluentbit -f fluentbit-dopler.yaml
helm upgrade --install fluent-bit fluent/fluent-bit --namespace fluentbit -f fluentbit-values.yaml

The above helm command is friendly to re-apply, so if any changes are required, I can adjust fluentbit-values.yaml and re-run that line.

Once up and running I can check on the daemonset and pods with the following commands:

kubectl get daemonset --namespace fluentbit
kubectl get pods --namespace fluentbit

Or inspect an individual pod for any warnings or errors:

bash

kubectl logs --namespace fluentbit <pod-name>

Of course the true test is whether these logs actually reach opensearch. What I should see is the creation of two indexes per day:

logstash-YYYY.MM.DD
node-YYYY.MM.DD

Thankfully that appears to be working as required:

A screenshot of opensearch-dashboards indexes containing both node and logstash indexes for 5 days

The excess of logs on Jan 20th and Jan 21st are as a result of leaving the fluentd log level in debug mode – which certainly assisted in a stress test of the platform itself. With the indexes now active, this has allowed me to create a few very basic dashboards for early insight on the platform.

Two simple opensearch dashboards filtering on cluster logs for the last 15 minutes, by namespace.

What the above dashboards suggest to me at an initial glance is that both of the namespaces generating logs require a deeper look. If there’s nothing awry then perhaps the verbosity of the logs could be reduced.

A few quick thoughts on Security

Is this Secure?

This is secure enough for my purposes of internal development use.

I would not consider this anywhere near secure enough to expose to the internet or to leverage against a production cluster.

While not a complete list, some additional steps that should be considered for a more critical or openly accessible environment would be:

Encryption of data at rest:
- On your worker nodes – as there’s no point hardening the remainder of the journey if the origin is unencrypted.
- In fluentd if you are using filesystem buffering.
- More importantly in Opensearch for longer term storage.
Additional logic in fluent-bit and/or opensearch to warn or reject data that could be considered in breach of your organisational expectations on log data. Some examples would be GDPR personal data, PI data, or payment information that should not be retained/stored.
Consider using mutual TLS authentication instead of usernames and passwords for authentication to opensearch.
Limit who can access the Opensearch endpoint with additional security layers. Some options:
- VPN tunneling
- An API gateway with its own pre-authentication requirements.
Change the setup so Fluent Bit is not responsible for creating new indexes, or set up a middleware implementation of Fluent Bit responsible for this in a more secured location, limiting what a malicious actor could do if they obtained the Fluent Bit credentials that are leveraged by the cluster.
Within the kubernetes cluster, introduce egress controls that only allow outbound connections from fluent-bit to your opensearch container (generally a good idea for any outbound traffic).

What is Next?

There is plenty of work ahead here but with the logs ingesting I can now move on to more detailed monitoring, analytics and alerting. I hope to be able to build out much more detailed insight than my simple dashboard above provides, and also introduce some alerting to some key applications present on the cluster.

Monitoring Step 2: Log Ingestion with Fluent Bit ​

Installing Fluent Bit ​

Some Configuration required ​

Planning for Each Configuration Item ​

Config Samples ​

Deploying and Validating the Installation ​

A few quick thoughts on Security ​

Is this Secure? ​

What is Next? ​

Monitoring Step 2: Log Ingestion with Fluent Bit

Installing Fluent Bit

Some Configuration required

Planning for Each Configuration Item

Config Samples

Deploying and Validating the Installation

A few quick thoughts on Security

Is this Secure?

What is Next?