Skip to content

Bottlerocket OS Administration

A comprehensive guide to managing Bottlerocket OS instances with AWS Systems Manager Session Manager, containerd, and essential administration tasks.

Quick Start

Prerequisites

  • AWS CLI configured with appropriate IAM permissions
  • Session Manager plugin installed
  • Instance with SSMServiceRole and AmazonSSMManagedInstanceCore policy
  • Bottlerocket instance with control container enabled

Essential Setup Commands

For instructions on installing the Session Manager plugin on your operating system, please refer to the official AWS documentation: Install the Session Manager plugin for the AWS CLI.

After installation, you can verify it and connect to an instance.

# Verify installation
session-manager-plugin

# Connect to instance
aws ssm start-session --target $INSTANCE_ID --region $REGION

Core Concepts

Container Architecture

  • Control Container: Provides SSM access and API management (enabled by default)
  • Admin Container: Privileged access for debugging (disabled by default)
  • Host Containers: Out-of-band access to host OS with configurable privileges
  • Workload Containers: Application containers managed by orchestrators

Security Model

  • dm-verity: Read-only root filesystem with cryptographic integrity checking
  • SELinux: Enforcing mode with process labels (container_t, control_t, super_t)
  • Immutable OS: Updates via complete image replacement, not package managers
  • Minimal Attack Surface: No SSH, shell, or package manager by default

AWS SSM Session Manager

Basic Connection

# Connect to instance via SSM
aws ssm start-session --target i-1234567890abcdef0 --region us-west-2

# Connect with specific profile
aws ssm start-session --target $INSTANCE_ID --region $REGION --profile myprofile

# List managed instances
aws ssm describe-instance-information --region us-west-2

Setting Up SSM Access

1. Create IAM Service Role

# Create trust policy
cat <<EOF > ssmservice-trust.json
{
    "Version": "2012-10-17",
    "Statement": {
        "Effect": "Allow",
        "Principal": {
            "Service": "ssm.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
    }
}
EOF

# Create the role
aws iam create-role \
    --role-name SSMServiceRole \
    --assume-role-policy-document file://ssmservice-trust.json

# Attach SSM policy
aws iam attach-role-policy \
    --role-name SSMServiceRole \
    --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

2. Configure Control Container

# Create SSM activation
export SSM_ACTIVATION="$(aws ssm create-activation \
  --iam-role SSMServiceRole \
  --registration-limit 100 \
  --region us-west-2 \
  --output json)"

# Extract activation details
SSM_ACTIVATION_ID="$(jq -r '.ActivationId' <<< ${SSM_ACTIVATION})"
SSM_ACTIVATION_CODE="$(jq -r '.ActivationCode' <<< ${SSM_ACTIVATION})"

# Create user data for control container
CONTROL_USER_DATA="$(echo '{"ssm":{"activation-id":"'${SSM_ACTIVATION_ID}'","activation-code":"'${SSM_ACTIVATION_CODE}'","region":"us-west-2"}}' | base64 -w0)"

# Add to user-data.toml
cat <<EOF >>user-data.toml
[settings.host-containers.control]
enabled = true
user-data = "${CONTROL_USER_DATA}"
# Note: Replace vX.Y.Z with the latest version from the ECR Public Gallery
# https://gallery.ecr.aws/bottlerocket/bottlerocket-control
source = "public.ecr.aws/bottlerocket/bottlerocket-control:vX.Y.Z"
EOF

3. Enable Admin Container (if needed)

Note: The container versions below are examples. Always check the Amazon ECR Public Gallery for the latest version tags for the bottlerocket-control and bottlerocket-admin containers before use.

# Via user data (during launch)
PUBKEY_FILE="${HOME}/.ssh/id_rsa.pub"
PUBKEY=$(< "${PUBKEY_FILE}")
ADMIN_USER_DATA="$(echo '{"ssh": {"authorized-keys": ["'${PUBKEY}'"]}}' | base64 -w 0)"

cat <<EOF >>user-data.toml
[settings.host-containers.admin]
enabled = true
superpowered = true
user-data = "${ADMIN_USER_DATA}"
# Note: Replace vX.Y.Z with the latest version from the ECR Public Gallery
# https://gallery.ecr.aws/bottlerocket/bottlerocket-admin
source = "public.ecr.aws/bottlerocket/bottlerocket-admin:vX.Y.Z"
EOF

Session Management

# Start interactive session
aws ssm start-session --target $INSTANCE_ID

# Run single command
aws ssm send-command \
    --instance-ids $INSTANCE_ID \
    --document-name "AWS-RunShellScript" \
    --parameters 'commands=["uptime"]'

# Get command output
aws ssm get-command-invocation \
    --command-id $COMMAND_ID \
    --instance-id $INSTANCE_ID

Container Management with ctr

Container Runtime Access

# Access the host containerd socket
sudo ctr --address /run/host-containerd/host-containerd.sock

# Common namespaces
ctr namespaces list
# k8s.io (Kubernetes)
# moby (Docker/ECS)
# default

Image Management

# List images
ctr --namespace k8s.io images list
ctr --namespace moby images list

# Pull image
ctr --namespace k8s.io images pull docker.io/nginx:latest

# Remove image
ctr --namespace k8s.io images remove docker.io/nginx:latest

# Import image from tar
ctr --namespace k8s.io images import image.tar

# Export image to tar
ctr --namespace k8s.io images export image.tar docker.io/nginx:latest

# Tag image
ctr --namespace k8s.io images tag docker.io/nginx:latest my-registry/nginx:v1

Container Operations

# List containers
ctr --namespace k8s.io containers list
ctr --namespace moby containers list

# List running tasks
ctr --namespace k8s.io tasks list

# View container info
ctr --namespace k8s.io containers info $CONTAINER_ID

# Execute command in running container
ctr --namespace k8s.io tasks exec --exec-id $EXEC_ID $CONTAINER_ID sh

# Kill task
ctr --namespace k8s.io tasks kill $CONTAINER_ID

# Remove container
ctr --namespace k8s.io containers remove $CONTAINER_ID

Advanced Operations

# Create container (without starting)
ctr --namespace k8s.io containers create docker.io/nginx:latest nginx-test

# Start container as task
ctr --namespace k8s.io tasks start nginx-test

# Attach to running container
ctr --namespace k8s.io tasks attach nginx-test

# Get container metrics
ctr --namespace k8s.io tasks metrics $CONTAINER_ID

# Create snapshot
ctr --namespace k8s.io snapshots prepare snapshot-name $CONTAINER_ID

# List snapshots
ctr --namespace k8s.io snapshots list

Administration Tasks

Host Container Management

From Control Container

# Enable admin container
enable-admin-container

# Enter admin container
enter-admin-container

# Check admin container status
apiclient get host-containers.admin

From Admin Container

# Access privileged host namespace
sudo sheltie

# Run single command with elevated privileges
sudo sheltie -c "command_here"

# Common privileged operations
sudo sheltie -c "systemctl status kubelet"
sudo sheltie -c "journalctl -u kubelet --no-pager"
sudo sheltie -c "crictl ps"

API Client Operations

# Set configuration
apiclient set host-containers.admin.enabled=true
apiclient set settings.kubernetes.cluster-name=my-cluster

# Get configuration
apiclient get settings
apiclient get host-containers

# Execute in container
apiclient exec admin bash
apiclient exec control enable-admin-container

# Apply settings from file
apiclient apply --from-file settings.json

# Generate configuration
apiclient generate-settings > current-settings.json

System Information

# OS version and build info
apiclient get os
cat /etc/os-release

# Kernel information
uname -a
apiclient get kernel

# Container runtime info
ctr version
apiclient get container-runtime

# Network configuration
apiclient get network
ip addr show

# Storage information
df -h
lsblk
apiclient get storage

Service Management

# From admin container with sheltie
sudo sheltie -c "systemctl status containerd"
sudo sheltie -c "systemctl status kubelet"
sudo sheltie -c "systemctl status ecs"
sudo sheltie -c "systemctl restart containerd"

# View service logs
sudo sheltie -c "journalctl -u containerd --no-pager"
sudo sheltie -c "journalctl -u kubelet -f"
sudo sheltie -c "journalctl --boot --no-pager"

Troubleshooting

Log Collection

# Generate comprehensive log archive
sudo sheltie
logdog

# Fetch logs via SSH
ssh -i YOUR_KEY_FILE ec2-user@YOUR_HOST \
    "cat /.bottlerocket/support/bottlerocket-logs.tar.gz" > bottlerocket-logs.tar.gz

# Fetch logs via kubectl (for Kubernetes)
kubectl get --raw \
    "/api/v1/nodes/NODE_NAME/proxy/logs/support/bottlerocket-logs.tar.gz" > bottlerocket-logs.tar.gz

Common Issues and Solutions

1. Container Runtime Issues

# Check containerd status
sudo sheltie -c "systemctl status containerd"

# Restart containerd
sudo sheltie -c "systemctl restart containerd"

# Check containerd configuration
sudo sheltie -c "cat /etc/containerd/config.toml"

# View containerd logs
sudo sheltie -c "journalctl -u containerd --no-pager -n 100"

2. Kubernetes Node Issues

# Check kubelet status
sudo sheltie -c "systemctl status kubelet"

# View kubelet configuration
apiclient get settings.kubernetes

# Check node readiness
kubectl get nodes
kubectl describe node $NODE_NAME

# Restart kubelet
sudo sheltie -c "systemctl restart kubelet"

3. Network Connectivity

# Check network interfaces
ip addr show
ip route show

# Test DNS resolution
nslookup kubernetes.default.svc.cluster.local

# Check iptables rules (from admin container)
sudo sheltie -c "iptables -L"
sudo sheltie -c "iptables -t nat -L"

# Test container network
ctr --namespace k8s.io tasks exec --exec-id test nginx-test ping 8.8.8.8

4. Storage Issues

# Check disk usage
df -h
lsblk

# Check mount points
mount | grep bottlerocket

# EBS volume attachment (AWS)
aws ec2 describe-volumes --volume-ids $VOLUME_ID

# Container storage usage
ctr --namespace k8s.io images list -q | xargs ctr --namespace k8s.io images usage

5. SSM Connection Problems

# Check SSM agent status
sudo sheltie -c "systemctl status amazon-ssm-agent"

# Restart SSM agent
sudo sheltie -c "systemctl restart amazon-ssm-agent"

# Verify instance registration
aws ssm describe-instance-information --filters "Key=InstanceIds,Values=$INSTANCE_ID"

# Check IAM permissions
aws sts get-caller-identity
aws iam list-attached-role-policies --role-name $ROLE_NAME

Performance Monitoring

# CPU and memory usage
top
htop (if available)
apiclient get metrics

# Container resource usage
ctr --namespace k8s.io tasks metrics
crictl stats (from admin container)

# System resource limits
apiclient get settings.oci-defaults

# Network statistics
ss -tuln
netstat -i

# I/O statistics
iostat
iotop (if available)

Security Best Practices

Container Security

# Disable privileged containers (ECS)
apiclient set settings.ecs.allow-privileged-containers=false

# Configure SELinux labels for containers
apiclient set settings.oci-defaults.capabilities=[]
apiclient set settings.container-runtime.selinux-label="system_u:system_r:container_t:s0"

Host Container Hardening

# Disable admin container when not needed
apiclient set host-containers.admin.enabled=false

# Disable control container (careful - may lose remote access)
apiclient set host-containers.control.enabled=false

# Limit superpowered containers
# Avoid: superpowered = true unless absolutely necessary

Network Security

# Configure proxy settings if required
apiclient set settings.network.proxy.http-proxy="http://proxy:8080"
apiclient set settings.network.proxy.https-proxy="https://proxy:8080"
apiclient set settings.network.proxy.no-proxy="localhost,127.0.0.1,.internal"

# Review iptables rules
sudo sheltie -c "iptables -L -n"

Access Control with Pod Security Admission

Pod Security Policies (PSPs) were deprecated in Kubernetes v1.21 and removed entirely in v1.25. Security for pods is now enforced using Pod Security Admission (PSA), which applies security standards at the namespace level. You can enforce privileged, baseline, or restricted policies by labeling your namespaces.

# Example: Label a namespace to enforce the baseline policy
kubectl label --overwrite ns YOUR_NAMESPACE pod-security.kubernetes.io/enforce=baseline

# Example: Label a namespace to warn on violations of the restricted policy
kubectl label --overwrite ns YOUR_NAMESPACE pod-security.kubernetes.io/warn=restricted

Updates and Maintenance

OS Updates

# Check current version
apiclient get os.version-id
apiclient get os.build-id

# Configure automatic updates
# Note: Replace aws-k8s-1.XX with your desired and supported Kubernetes version.
apiclient set settings.updates.metadata-base-url="https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.XX/x86_64/"
apiclient set settings.updates.targets-base-url="https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.XX/x86_64/"

# Check for available updates
apiclient check-update

# Apply updates
apiclient update apply

# Rollback if needed
apiclient update rollback

# Reboot to new image
apiclient reboot

Container Image Management

# Clean up unused images
ctr --namespace k8s.io images prune
ctr --namespace moby images prune

# Check image sizes
ctr --namespace k8s.io images list -q | xargs -I {} ctr --namespace k8s.io images usage {}

# Update container runtime settings
apiclient set settings.container-runtime.max-container-log-line-size="16384"
apiclient set settings.container-runtime.max-concurrent-downloads="6"

AMI Management

Finding Bottlerocket AMIs

Note: The Kubernetes version 1.27 used below is an example. You should find the latest supported version for your use case.

# List all available Bottlerocket variants to find the latest K8s version
aws ssm get-parameters-by-path \
    --path "/aws/service/bottlerocket" \
    --query "Parameters[].Name" --output text | sort | uniq

# Get latest AMI ID from SSM for a specific K8s version
aws ssm get-parameter \
    --region us-west-2 \
    --name "/aws/service/bottlerocket/aws-k8s-1.27/x86_64/latest/image_id" \
    --query Parameter.Value --output text

# Get AMI details for a specific K8s version
aws ssm get-parameters --region us-west-2 \
    --names "/aws/service/bottlerocket/aws-k8s-1.27/x86_64/latest/image_id" \
            "/aws/service/bottlerocket/aws-k8s-1.27/x86_64/latest/image_version" \
    --output json | jq -r '.Parameters | .[] | "\(.Name): \(.Value)"'

# List all available variants (alternative)
aws ssm get-parameters-by-path \
    --path "/aws/service/bottlerocket" \
    --recursive --region us-west-2 | jq -r '.Parameters[].Name' | sort

Instance Launch

# Launch with user data
aws ec2 run-instances \
    --image-id $AMI_ID \
    --count 1 \
    --instance-type m5.large \
    --key-name my-key \
    --security-group-ids sg-12345678 \
    --subnet-id subnet-12345678 \
    --iam-instance-profile Name=BottlerocketInstanceProfile \
    --user-data file://user-data.toml

Quick Reference

Essential Commands

Task Command
Connect via SSM aws ssm start-session --target $INSTANCE_ID
Enable admin container enable-admin-container (from control)
Access admin container enter-admin-container (from control)
Privileged shell sudo sheltie (from admin)
List containers ctr --namespace k8s.io containers list
Generate logs logdog (from admin with sheltie)
Get OS info apiclient get os
Set configuration apiclient set key=value
Apply updates apiclient update apply

Container Namespaces

Namespace Purpose
k8s.io Kubernetes containers and images
moby Docker/ECS containers and images
default Default containerd namespace

Important File Locations

Path Description
/run/host-containerd/host-containerd.sock Host containerd socket
/run/containerd/containerd.sock Container containerd socket
/etc/containerd/config.toml Containerd configuration
/.bottlerocket/support/ Log and debug files
/etc/os-release OS version information

SELinux Labels

Label Usage Privileges
container_t Regular containers Limited
control_t Privileged containers API socket access
super_t Superpowered containers Full host access

Production Recommendations

  1. Security
  2. Keep admin container disabled unless actively debugging
  3. Use least-privilege IAM roles
  4. Implement Pod Security Admission standards
  5. Regular security updates

  6. Monitoring

  7. Set up CloudWatch for system metrics
  8. Monitor container resource usage
  9. Alert on update failures
  10. Track SSM session activity

  11. Maintenance

  12. Schedule regular OS updates
  13. Clean up unused container images
  14. Monitor disk usage
  15. Backup important configurations

  16. Access Control

  17. Limit SSH key distribution
  18. Use SSM for remote access
  19. Implement proper IAM policies
  20. Regular access reviews

  21. Troubleshooting Preparedness

  22. Document common procedures
  23. Maintain debug tooling access
  24. Regular log collection tests
  25. Incident response procedures