Kubernetes instancer
Deploy the rCTF Kubernetes instancer on GKE with the bundled Terraform modules and operator.
The Kubernetes instancer is the scalable backend for per-team challenge instances. An operator runs inside the cluster, turning a ChallengeInstance custom resource into a namespace, network policies, deployments, services, and Traefik routes.
The rCTF API only talks to the Kubernetes API server through the instancer/k8s-instancer provider. The operator handles every other moving piece.
Warning (Hostile workloads)
Challenge images run untrusted code. The defaults assume strict isolation, but new variables, wider RBAC, or any other “sensitive” config changes can quickly break that assumption.
Architecture#
A working deployment has three cooperating components:
A participant request flows through these in order:
The API receives PUT /api/v2/integrations/challs/:id/instance, validates the challenge config with the provider schema, then creates a cluster-scoped ChallengeInstance resource in the rctf-instancer.osec.io/v1 API group. The CR carries the challenge ID, team ID, expiry, pod specs, and expose entries.
The operator watches ChallengeInstance events and runs its reconciliation loop. It adds the rctf.osec.io/finalizer finalizer, then creates a namespace, network policies, deployments, services, and Traefik IngressRoute or IngressRouteTCP resources for each expose entry.
Each expose entry gets a hostname of the form <hostPrefix>-<uid>.<instancer-host>. Traefik matches the hostname against the generated IngressRoute and forwards traffic to the per-instance service.
When time.Now() passes spec.expiresAt, the controller deletes the ChallengeInstance. The deletion timestamp triggers the finalizer, which deletes the namespace and removes the finalizer once the namespace is gone. Manual deletion through the rCTF API follows the same path.
Namespaces are deterministic and named inst-<challenge-id>-<team-id> so the controller can find them across restarts. Every child resource inherits owner references from the ChallengeInstance, so cluster-level garbage collection acts as a safety net behind the explicit finalizer.
Prerequisites#
The Terraform example assumes GKE plus Cloudflare for DNS and ACME. GCP Cloud DNS works as a drop-in alternative.
The instancer’s public hostname is <instancer_subdomain>.<instancer_zone> (or just <instancer_zone> when no subdomain is set). All per-instance hostnames live under a wildcard one level below.
Controller image#
The operator image is published at ghcr.io/otter-sec/rctf-new/k8s-controller, and the matching install.yaml ships in the repo at apps/k8s-controller/dist/install.yaml. The Terraform k8s module reads that file directly and substitutes the configured hostname into the INSTANCER_HOST placeholder, so there’s nothing to build or push before running terraform apply.
Terraform variables#
The example terraform.tfvars lives in deploy/terraform/instancer/example/:
deploy/
terraform/
instancer/
example/
- main.tf Providers, GKE module wiring
- dns.tf Cloudflare or GCP Cloud DNS record
- tls.tf ACME wildcard certificate and Traefik TLSStore
- rctf-instancer.tf k8s module call and rCTF ServiceAccount
- variables.tf Input variables
- terraform.tfvars.example Example values
modules/
- gke/GKE cluster and Artifact Registry
- k8s/Traefik, error pages, controller installer
Copy terraform.tfvars.example to terraform.tfvars and fill in:
A minimal Cloudflare-backed file looks like this:
cloudflare_api_token = "<cloudflare-api-token>"letsencrypt_email_address = "ops@example.com"instancer_zone = "ctf.example.com"instancer_subdomain = "instances"ctf_name = "Example CTF"
gcp_project_id = "example-ctf"gcp_region = "us-central1"gcp_zone = "us-central1-a"gcp_instancer_cluster_name = "rctf-instancer"gcp_instancer_machine_type = "e2-standard-4"gcp_instancer_min_node_count = 1gcp_instancer_max_node_count = 8To use GCP Cloud DNS instead of Cloudflare, comment out the Cloudflare blocks in dns.tf and tls.tf, uncomment the google_dns_record_set and gcloud ACME blocks, and set gcp_dns_managed_zone_name.
Deployment#
cd deploy/terraform/instancer/examplecp terraform.tfvars.example terraform.tfvars$EDITOR terraform.tfvarsterraform initterraform applyTerraform provisions GKE, the node pool, Artifact Registry, the Cloudflare or Cloud DNS record, the ACME wildcard certificate, Traefik, the error-pages deployment, the rctf service account, and applies the bundled apps/k8s-controller/dist/install.yaml (pointing at the prebuilt ghcr.io/otter-sec/rctf-new/k8s-controller image). The first apply typically takes 10 to 15 minutes. ACME validation alone can add a few minutes if DNS propagation is slow.
gcloud container clusters get-credentials rctf-instancer --project example-ctf --location us-central1kubectl get pods -n rctf-instancer-controller-systemThe controller pod should be Running. Traefik comes up in the traefik namespace, with the wildcard certificate stored in the instancer-wildcard-tls Kubernetes Secret.
Three Terraform outputs map directly to provider options:
Render them into rCTF’s rctf.d/:
terraform output -raw rctf_instancer_api_urlterraform output -raw rctf_instancer_auth_tokenterraform output -raw rctf_instancer_ca_certificateinstancerProvider: name: instancer/k8s-instancer options: apiUrl: https://203.0.113.10 authToken: <rctf_instancer_auth_token> caCertificate: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE-----caCertificate is required even when the API server certificate is already trusted by the host.
Create an instanced challenge that uses the instancer/k8s-instancer provider and start it as a participant. The controller should create the inst-<challenge-id>-<team-id> namespace, and Traefik should serve the <hostPrefix>-<uid>.<instancer-host> hostname over HTTPS.
What Terraform provisions#
The example layers the GKE module, the k8s module, and the example-level resources in rctf-instancer.tf, dns.tf, and tls.tf:
Traefik is configured with three ports:
The wildcard certificate is provisioned manually instead of through cert-manager, so DNS provider credentials never have to live inside the cluster. The blast radius of a cluster compromise stays limited to whatever certificates Terraform has already issued.
Network policies#
The controller creates three NetworkPolicy resources in every instance namespace:
The exposed label is applied automatically based on whether a pod is named by any expose[] entry. The egress label comes from the per-pod egress: true flag in instancerConfig. Challenges that shouldn’t reach the internet leave it false.
Note (Cluster network plugin)
Network policies only enforce isolation when the cluster’s CNI supports them. GKE’s default
dataplane enforces them. On a bare-metal cluster, make sure the chosen CNI honors NetworkPolicy.
Per-pod safety checklist#
Unlike Docker, Kubernetes’ PodSpec doesn’t have first-class fields for every resource cap, and the controller deploys what you give it verbatim. Set these on every pod you ship through instancerConfig.config.pods[]:
Warning (File descriptor / nofile limits)
Kubernetes’ PodSpec has no first-class ulimits field. The CRI passes container ulimits through containerd’s default_ulimits, which on GKE COS_CONTAINERD nodes isn’t exposed via the kubelet config. Changing it takes a custom node startup script or DaemonSet that rewrites /etc/containerd/config.toml.
If your challenge is sensitive to FD exhaustion, the practical workaround is to set the limit in the entrypoint:
#!/bin/shulimit -n 1024exec /your/challenge "$@"This is per-image, not platform-enforced, so it’s only as strong as the image. Don’t rely on it for hostile-input boundaries that absolutely must not break. Reach for a per-connection sandbox (nsjail) instead.
RBAC and the rCTF service account#
The example creates a single ServiceAccount named rctf in kube-system and a matching ClusterRole granting only what the API needs:
rule: api_groups: ['rctf-instancer.osec.io'] resources: ['challengeinstances'] verbs: ['create', 'get', 'delete', 'patch']The rCTF API never reads or writes any other resource type. The kubernetes_secret_v1.rctf_token resource issues a kubernetes.io/service-account-token so the token doesn’t rotate. The value comes back through the rctf_instancer_auth_token Terraform output.
The controller itself runs with its own RBAC from apps/k8s-controller/config/. It needs broad permissions on namespaces, deployments, services, network policies, and Traefik IngressRoute, IngressRouteTCP, and Middleware resources so it can reconcile per-instance objects. The CRD lives in apps/k8s-controller/config/crd/bases/ and is generated by make manifests.
Example challenge config (Konata)#
A complete instanced challenge as it would live in a Konata deployment repo. This is the web/mirror-temple config from the DiceCTF Quals 2026 challenge repository. Konata builds and pushes the image, then forwards instancer_config straight to rCTF, which hands it to the k8s-instancer provider.
challenges: - category: web name: mirror-temple author: arcblroth description: | stare long enough at the void and the void stares back attachments: files: - 'Dockerfile' - 'chall/src/' flags: rctf: file: flag.txt instancer_config: challenge_integration_id: '{{ challenge.name }}' timeout_milliseconds: 1800000 extendable: true expose: - kind: https host_prefix: '{{ challenge.name }}' container_name: app container_port: 8080 config: pods: - name: app egress: true ports: - protocol: TCP name: http-service port: 8080 spec: restartPolicy: Always terminationGracePeriodSeconds: 0 automountServiceAccountToken: false containers: - name: app image: '{{ images[challenge.name] }}' ports: - containerPort: 8080 resources: requests: cpu: '500m' memory: '500Mi' limits: cpu: '3' memory: '2Gi' readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 5 periodSeconds: 3 securityContext: runAsNonRoot: true readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: drop: - ALL
deployment: images: - build_context: . name: '{{ challenges[0].name }}' tag: latest registry_name: instancer-challenges platform: linux/amd64Things worth pointing at in this example:
egress: trueon the pod opts it into theegressNetworkPolicyso the challenge can reach the public internet. Drop it for challenges that should be sealed off.- Resource
requestsandlimitsare mandatory in practice. The controller schedules the pod normally, so an unset limit lets a single instance starve the node. Size them to the per-team load you expect at peak. readinessProbekeeps Traefik from routing to the pod before the app is up. Without it, the first request after creation often 502s while the container is still booting.securityContextlocks the container down (read-only root FS, dropped capabilities, no root). The k8s-instancer namespace is already isolated by the per-namespaceNetworkPolicyset, but a tight pod-level context is the second layer.{{ images[challenge.name] }}resolves to the fully-qualified registry path Konata pushed to (registries.instancer-challenges+ the image name + tag).flags.rctf.file: flag.txtlets the flag live in a sibling file Konata reads at sync time, so the challenge directory stays self-contained.
For the rest of the Konata schema, see Konata.
Local development with Kind#
For controller iteration, the README in apps/k8s-controller/ uses Kind. Routing inside the cluster needs cloud-provider-kind so that LoadBalancer services get an external IP.
go install sigs.k8s.io/cloud-provider-kind@latestInstall Kind itself from its quick-start guide.
cd apps/k8s-controllerkind create cluster --name rctf --config kind-config.yamlThe bundled kind-config.yaml spins up one control plane and one worker node.
Leave this running in a separate session for the duration of development:
cloud-provider-kindPoint the Terraform example at the local Kind context. In deploy/terraform/instancer/example/main.tf switch from the GCP-backed kubernetes, helm, and kubectl providers to the commented-out kind-rctf blocks, then apply:
cd deploy/terraform/instancer/exampleterraform applycd apps/k8s-controllermake installmake run ARGS="-instancer-host instancer.test"make install applies the CRDs from config/crd, and make run runs the controller against the current kubectl context with -instancer-host setting the hostname suffix.
kubectl apply -f config/sample/rctf-instancer_v1_challengeinstance.yamlThe controller logs the reconciliation flow, and the sample’s namespace, service, and IngressRoute should appear. Tear the cluster down with kind delete cluster --name rctf when you’re done.
Troubleshooting#
The controller exposes Kubernetes events through standard kubectl describe output. Pair kubectl describe challengeinstance <name> with the controller logs (kubectl logs -n rctf-instancer-controller-system -l control-plane=controller-manager) to trace down any reconciliation failure.