Secure Swarm Secrets with OpenBao

Executive Summary

Achieving automatic secret rotation in Docker Swarm is historically difficult because native Swarm Secrets are immutable (they cannot change without restarting the service). Furthermore, strict security standards like PCI-DSS Requirement 3 prohibit storing unencrypted credentials in static configuration files or on physical disk.

This guide details the “Bundled Process Sidecar” architecture. This pattern uses OpenBao (the open-source fork of HashiCorp Vault) to inject rotating credentials directly into a secure RAM-disk (tmpfs) at runtime.

Key Benefits

  1. Automatic Rotation: Database passwords rotate without restarting the application container.
  2. PCI-DSS Compliance: Secrets exist only in volatile memory (tmpfs). They are never written to the host hard drive or included in the Docker image layers.
  3. Swarm Compatibility: Overcomes Swarm’s lack of “Pods” by bundling the Agent and App into a single atomic scheduling unit.

1. Architecture: The Bundled Process Pattern

In Kubernetes, you would run a “Sidecar” container in the same Pod. Docker Swarm does not have Pods; if you deploy two separate containers, they may land on different servers.

To guarantee co-location and secure memory sharing, we bundle the OpenBao Agent binary inside the Application Container.

graph TD
    subgraph "Docker Swarm Node"
        subgraph "Container (Bundled)"
            A[Entrypoint Script] -->|Starts & Monitors| B(OpenBao Agent)
            A -->|Starts & Monitors| C(Application)
            B -->|Writes Secret| D[/"tmpfs (RAM Disk)"/]
            C -->|Reads Secret| D
        end
    end
    B -.->|Auth & Fetch| E((OpenBao Server))

2. Compliance: Why tmpfs Satisfies PCI-DSS

Your auditor may flag “storing credentials in a file” as a violation. You must clarify the difference between Data at Rest and Data in Use.

  • The Config: We use a Docker tmpfs volume. This maps the directory /app/secrets to a block of System RAM.
  • The Compliance Argument:
    • Volatile: If power is cut or the container stops, the data vanishes instantly. It is physically impossible to recover from the hard drive.
    • Requirement 3.4: This requirement applies to PAN and sensitive data stored on disk. Since tmpfs is memory, it falls under “Data in Use,” similar to how the variable sits in your application’s memory heap.
    • No Artifacts: The secret is not in the Docker Image, not in the overlay2 file system, and not in backups.

3. Implementation Guide

Step 1: The OpenBao Agent Configuration (agent-config.hcl)

This file tells the agent how to authenticate and where to write the secret.

pid_file = "/var/run/bao-agent.pid"

auto_auth {
  # Swarm nodes should ideally use AppRole or Kubernetes auth (if using Mirantis)
  # For simple Swarm, AppRole is most common.
  method "approle" {
    config = {
      role_id_file_path = "/etc/bao/role_id"
      secret_id_file_path = "/etc/bao/secret_id"
      remove_secret_id_file_after_reading = false
    }
  }

  sink "file" {
    config = {
      path = "/tmp/bao-token" # Ephemeral token location
    }
  }
}

template {
  # The critical PCI-DSS path - this MUST be the tmpfs volume
  destination = "/app/secrets/db_password"
  
  # Fetch data from OpenBao and format it as a simple string
  contents = "{{ with secret \"database/creds/my-app-role\" }}{{ .Data.password }}{{ end }}"
  
  # Optional: Command to run when secret rotates (e.g., reload app)
  # command = "pkill -HUP -f 'python app.py'"
}

Step 2: The Fail-Fast Entrypoint (entrypoint.sh)

This script replaces the default command. It acts as a process supervisor. If the OpenBao Agent crashes, this script kills the container immediately so Swarm can restart it.

#!/bin/bash
set -m # Enable job control to handle background processes

# 1. Start OpenBao Agent in the background
# We assume the 'bao' binary is in the PATH
bao agent -config=/etc/bao/agent-config.hcl > /var/log/bao-agent.log 2>&1 &
BAO_PID=$!

# 2. Wait for the secret to be rendered (Critical for startup race conditions)
echo "Waiting for secrets to be rendered into /app/secrets/..."
TIMEOUT=30
while [ ! -f /app/secrets/db_password ]; do
  if ! kill -0 $BAO_PID 2>/dev/null; then
    echo "CRITICAL: OpenBao Agent died while starting up! Check logs."
    cat /var/log/bao-agent.log
    exit 1
  fi
  sleep 1
  ((TIMEOUT--))
  if [ $TIMEOUT -le 0 ]; then
    echo "Timed out waiting for OpenBao to render secrets."
    exit 1
  fi
done
echo "Secrets authenticated and rendered successfully."

# 3. Start the Main Application
# Replace this with your actual start command
python app.py &
APP_PID=$!

# 4. Monitoring Loop
# If either process dies, kill the container to trigger a Swarm Restart
while true; do
  if ! kill -0 $BAO_PID 2>/dev/null; then
    echo "CRITICAL: OpenBao Agent crashed. Shutting down container."
    kill -TERM $APP_PID
    exit 1
  fi
  
  if ! kill -0 $APP_PID 2>/dev/null; then
    echo "Application exited. Shutting down OpenBao Agent."
    kill -TERM $BAO_PID
    exit 0
  fi
  
  sleep 2
done

Step 3: The Unified Health Check (healthcheck.sh)

Docker only allows one HEALTHCHECK instruction. This script checks both components.

#!/bin/bash

# 1. Check if OpenBao Agent is running
pgrep bao > /dev/null || exit 1

# 2. Check if the secret file exists and is not empty
if [ ! -s /app/secrets/db_password ]; then
  exit 1
fi

# 3. Check if the App is responsive (replace port/path as needed)
curl -f http://localhost:8080/health || exit 1

exit 0

Step 4: The Dockerfile

We use a multi-stage build to copy the bao binary from the official OpenBao image into your application image.

# Stage 1: Get OpenBao binary
FROM openbao/openbao:latest AS bao-source

# Stage 2: Your Application
FROM python:3.9-slim

# Install dependencies for healthcheck and process management
RUN apt-get update && apt-get install -y curl procps && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# COPY the 'bao' binary. Note: The binary is usually at /bin/bao or /usr/local/bin/bao
COPY --from=bao-source /bin/bao /usr/local/bin/bao

# Copy Configs
COPY agent-config.hcl /etc/bao/agent-config.hcl
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
COPY healthcheck.sh /usr/local/bin/healthcheck.sh

# Set permissions
RUN chmod +x /usr/local/bin/bao \
    && chmod +x /usr/local/bin/entrypoint.sh \
    && chmod +x /usr/local/bin/healthcheck.sh \
    && mkdir -p /etc/bao

# Copy Application Code
COPY . .

# Define the Healthcheck
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
  CMD /usr/local/bin/healthcheck.sh

ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]

Step 5: The Docker Compose (Swarm Stack)

This is where you define the tmpfs volume to ensure compliance.

version: '3.8'

services:
  webapp:
    image: my-registry/my-bundled-app:latest
    deploy:
      replicas: 3
      restart_policy:
        condition: any
    environment:
      # Address of your OpenBao server
      VAULT_ADDR: "[http://openbao.internal:8200](http://openbao.internal:8200)" 
    volumes:
      # PCI Compliance: Map /app/secrets to RAM (tmpfs)
      - type: tmpfs
        target: /app/secrets
        tmpfs:
          size: 20m  # Limit size to prevent memory exhaustion DoS
          mode: 0700 # Strict permissions (Owner only)
    configs:
      # Inject AppRole credentials using standard Swarm Secrets/Configs
      # These are read-only by the Agent to authenticate initially
      - source: bao_role_id
        target: /etc/bao/role_id
      - source: bao_secret_id
        target: /etc/bao/secret_id

configs:
  bao_role_id:
    external: true
  bao_secret_id:
    external: true

4. Operational Best Practices

Handling Rotation

When OpenBao rotates the password (e.g., every 1 hour):

  1. OpenBao Server generates new credentials.
  2. OpenBao Agent (inside the container) detects the change.
  3. Agent rewrites the file /app/secrets/db_password in the tmpfs volume.
  4. Application Response:
    • Option A (Hot Reload): Your app watches the file and re-reads it.
    • Option B (Signal): Configure the template block in agent-config.hcl to send a signal (SIGHUP) to your app to force a reload.

Troubleshooting

If a container is restarting loop:

  1. Check docker service logs <service_name>. The entrypoint.sh is configured to print “CRITICAL” errors to stdout.
  2. Ensure the tmpfs size is adequate (though 20MB is plenty for text secrets).
  3. Verify network connectivity from the container to the OpenBao server address.

Security Hardening

  • AppRole: Ensure the secret_id used for initial authentication is short-lived or wrapped.
  • Memory Limit: Always set a size limit on the tmpfs volume to prevent a compromised container from filling the host RAM and causing a node crash.

Discover more from HTTP Expert

Subscribe to get the latest posts sent to your email.

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.