Post

Setting Up Traefik as a Kubernetes Gateway API Provider with Tailscale (Headscale) Sidecar and cert-manager

A comprehensive guide to deploying Traefik v3.7 as a Kubernetes Gateway API provider with a Headscale-connected Tailscale sidecar for secure ingress and automatic TLS certificate provisioning with cert-manager.

Setting Up Traefik as a Kubernetes Gateway API Provider with Tailscale (Headscale) Sidecar and cert-manager

A few weeks ago, I decided it was finally time to migrate my Kubernetes ingress setup from the classic Ingress resources (backed by nginx-ingress) to the Gateway API. My cluster is a multi-node k3s setup spread across Oracle Cloud and a few Raspberry Pis at home. Some services are publicly exposed on a public IP, but most are internal and accessed through Tailscale connected to my self-hosted Headscale server.

I chose Traefik as the Gateway API provider because of its native support for the Gateway API without requiring any CRD-proxying or extra controllers.

This post covers what I learned, the options I considered, the pitfalls I hit, and the final working setup.

Why Not the Tailscale Operator?

If you are on Tailscale SaaS, the official Tailscale Kubernetes Operator is the simplest way to expose cluster workloads to your tailnet. Annotate a Service with tailscale.com/expose: true, and it just works.

However, the operator has a limitation: it authenticates using OAuth credentials against login.tailscale.com and uses the Tailscale API to manage devices. There is no option to point it at a custom control server. If you run Headscale — and many self-hosters do — the operator simply does not work. The Headscale project tracked this request and closed it as wontfix.

This leaves the sidecar pattern. Run a Tailscale container alongside your proxy, and have Tailscale handle the connectivity while your proxy handles the traffic. This works with any control server — SaaS or self-hosted.

Gateway API vs Ingress

The Kubernetes Ingress resource has been the standard for years, but it has well-known limitations:

  • Ingress is a single-resource abstraction. There is no standard way to express TCP/UDP routing, TLS configuration at the listener level, or cross-namespace route references without vendor annotations.
  • Each controller re-implements these features through custom CRDs (IngressRoute, Middleware, etc.), locking you into that controller.
  • The Ingress spec was designed around a single “one ingress per host” model, which does not scale well for complex routing.

The Gateway API solves this by splitting responsibilities across multiple resource types:

  • GatewayClass — defines a class of load balancers (like StorageClass for storage)
  • Gateway — represents the load balancer instance with listener configuration (ports, TLS, hostnames)
  • HTTPRoute, TCPRoute, TLSRoute, GRPCRoute — route resources that attach to Gateways

The separation means platform operators can define GatewayClasses, and application teams can create their own routes without needing to touch the Gateway configuration.

The Architecture

The final setup looks like this:

1
2
3
4
5
6
7
8
9
10
Client (tailnet)
    │
    â–Ľ
Tailscale sidecar (kernel mode, iptables DNAT)
    │
    â–Ľ
Traefik (TLS termination via cert-manager cert)
    │
    â–Ľ
Gateway API HTTPRoute → backend Service → Pods

Everything runs in a single pod with two containers: tailscale and traefik. Tailscale handles the encrypted tunnel to the tailnet. In kernel mode (more on this later), it sets up iptables rules that DNAT traffic from the Tailscale interface directly to Traefik, preserving the original client IP.

The CRD Problem That Blocked Everything

When I first deployed Traefik with --providers.kubernetesGateway=true, the GatewayClass stayed stuck at Pending:

1
2
3
4
"message": "Waiting for controller",
"reason": "Pending",
"status": "Unknown",
"type": "Accepted"

The Traefik logs were filling up with:

1
E reflector.go: ... Failed to watch *v1.TLSRoute: the server could not find the requested resource

The root cause: I had installed the Gateway API CRDs months ago (v1.0 or so), and newer versions added resources like TLSRoute and BackendTLSPolicy. Traefik v3.7 expects the v1.5.1 standard channel, which includes these as standard resources.

Traefik’s WatchAll method calls WaitForCacheSync for all informers at startup. When TLSRoute is missing, WaitForCacheSync blocks forever. The event loop never starts, and the GatewayClass is never processed. It does not matter that you are not using TLSRoutes — Traefik tries to watch them anyway, and the missing CRD halts the entire provider.

The fix was straightforward: install the Gateway API v1.5.1 standard CRDs.

1
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.5.1/standard-install.yaml

If you manage your cluster with ArgoCD, commit the raw CRD YAML to your repo and create an ArgoCD Application for it. CRDs are cluster-scoped, so the ArgoCD app needs to omit the destination namespace and the CreateNamespace sync option.

RBAC for Gateway API

Traefik needs permission to watch Gateway API resources and update their status. The ClusterRole needs every Gateway API resource type including the newer ones:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: traefik-gateway
rules:
  - apiGroups: [""]
    resources: ["configmaps", "nodes", "services", "secrets", "namespaces", "pods"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["discovery.k8s.io"]
    resources: ["endpointslices"]
    verbs: ["list", "watch"]
  - apiGroups: ["gateway.networking.k8s.io"]
    resources:
      - backendtlspolicies
      - gatewayclasses
      - gateways
      - grpcroutes
      - httproutes
      - referencegrants
      - tcproutes
      - tlsroutes
    verbs: ["get", "list", "watch"]
  - apiGroups: ["gateway.networking.k8s.io"]
    resources:
      - backendtlspolicies/status
      - gatewayclasses/status
      - gateways/status
      - grpcroutes/status
      - httproutes/status
      - tcproutes/status
      - tlsroutes/status
    verbs: ["update"]

The status subresource permissions are essential — without them, Traefik cannot mark Gateway resources as Accepted or Programmed.

Tailscale Sidecar: Userspace vs Kernel Mode

The Tailscale container image (ghcr.io/tailscale/tailscale) uses a binary called containerboot as its entrypoint. It reads environment variables, starts tailscaled, and runs tailscale up to authenticate.

Default: Userspace Mode

By default, containerboot runs tailscaled with --tun=userspace-networking. In this mode, Tailscale uses a userspace network stack (gVisor’s netstack). Connections work, but there is a catch: Tailscale proxies traffic by making a new TCP connection to 127.0.0.1, so every request appears to come from localhost.

1
127.0.0.1 - - [03/Jun/2026:17:07:08 +0000] "GET / HTTP/2.0" 200 778

If you do not need client IPs in your logs, userspace mode is fine. It requires no special container capabilities.

Better: Kernel Mode

To get real client IPs, you need kernel mode. The containerboot binary respects the TS_USERSPACE environment variable. When set to "false", it passes --tun=tailscale0 to tailscaled, which creates a kernel tun device and uses iptables DNAT rules to forward traffic to the local application. The WireGuard data path still runs in userspace (wireguard-go), but the tun interface allows proper iptables handling that preserves the original source IP.

You also need privileged: true on the container — kernel mode requires NET_ADMIN capability and access to /dev/net/tun.

1
2
3
4
5
6
env:
  - name: TS_USERSPACE
    value: "false"
# ...
securityContext:
  privileged: true

After switching, the logs show the actual client IP from the tailnet:

1
100.64.0.3 - - [03/Jun/2026:17:27:13 +0000] "GET / HTTP/2.0" 200 714

A Note About TS_EXTRA_ARGS

The containerboot binary processes two sets of extra arguments:

  • TS_EXTRA_ARGS — passed to tailscale up. Use this for --login-server, --accept-routes, --accept-dns.
  • TS_TAILSCALED_EXTRA_ARGS — passed to tailscaled. Use this for --tun, --socks5-server.

The --tun flag is controlled by TS_USERSPACE, not by TS_EXTRA_ARGS. Adding --tun=tailscale0 to TS_EXTRA_ARGS does nothing — it gets passed to tailscale up, which ignores it. This tripped me up for a while.

Connecting to Headscale

Point the sidecar at your Headscale server via the extra args:

1
2
3
env:
  - name: TS_EXTRA_ARGS
    value: "--login-server https://headscale.yourdomain.com"

The auth key comes from a Kubernetes Secret. I store mine in Infisical and sync it via the external-secrets operator.

The sidecar also needs a Role to persist its state to a Kubernetes Secret (so state survives pod restarts):

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: internal-gateway-ts-state
rules:
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["create"]
  - apiGroups: [""]
    resourceNames: ["internal-gateway-ts-state"]
    resources: ["secrets"]
    verbs: ["get", "update", "patch"]

Set TS_KUBE_SECRET=internal-gateway-ts-state on the container to enable this.

TLS with cert-manager

For TLS certificates, I use cert-manager with Let’s Encrypt. cert-manager has built-in support for the Gateway API: when you annotate a Gateway with cert-manager.io/cluster-issuer, it automatically provisions a Certificate for the referenced TLS Secret.

This requires the --enable-gateway-api flag on cert-manager. Without it, the annotation is silently ignored and the TLS Secret is never created.

If you deploy cert-manager via Helm or Rancher HelmChart, add it to the extra args:

1
2
extraArgs:
  - --enable-gateway-api

Then annotate your Gateway:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: whoami
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-production
spec:
  gatewayClassName: internal-gateway
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - name: whoami-tls

cert-manager watches Gateway resources, sees the annotation, creates a Certificate resource, and eventually the Secret appears. The Gateway listener transitions from InvalidCertificateRef to ResolvedRefs.

Options Considered

Before settling on Traefik + Tailscale sidecar, I looked at a few alternatives:

nginx-ingress + Tailscale sidecar

nginx-ingress works with the Gateway API through the nginxinc/kubernetes-ingress controller. It supports v1 Gateway API resources. I already had extensive nginx-ingress experience, but the controller configuration is annotation-heavy, and I wanted something cleaner for the Gateway API.

Cilium Gateway API

Cilium has a built-in Gateway API implementation using eBPF. It is fast and elegant, but it requires Cilium as the CNI. My cluster runs Flannel (k3s default), and I did not want to swap CNIs. Cilium’s Gateway API also does not support all route types yet.

Tailscale Operator + Something Else (if I were on SaaS)

If I were on Tailscale SaaS, I would have used the operator and could pick any Gateway API provider behind it. The operator handles the networking side; the provider handles the routing. This would be the cleanest split. But since I run Headscale, the operator is not an option.

Traefik

Traefik has native Gateway API support since v3.0. It does not need custom CRDs — no IngressRoute, no Middleware. Everything goes through standard Gateway API resources. The configuration is minimal: a few CLI flags, and it discovers all Gateway resources in the cluster.

I also like that Traefik handles both Layer 7 routing (HTTPRoute) and TLS termination in one process. I was already using it for another project and had no complaints.

The Full Deployment

The deployment has two containers sharing a pod network namespace:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
apiVersion: apps/v1
kind: Deployment
metadata:
  name: internal-gateway
  namespace: internal-gateway
spec:
  replicas: 1
  selector:
    matchLabels:
      app: internal-gateway
  template:
    metadata:
      labels:
        app: internal-gateway
    spec:
      serviceAccountName: internal-gateway
      containers:
        - name: tailscale
          image: ghcr.io/tailscale/tailscale:v1.96.5
          env:
            - name: TS_HOSTNAME
              value: "internal-gateway"
            - name: TS_KUBE_SECRET
              value: "internal-gateway-ts-state"
            - name: TS_AUTH_ONCE
              value: "true"
            - name: TS_USERSPACE
              value: "false"
            - name: TS_AUTHKEY
              valueFrom:
                secretKeyRef:
                  name: internal-gateway
                  key: TS_AUTHKEY
            - name: TS_EXTRA_ARGS
              value: "--login-server https://headscale.yourdomain.com"
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_UID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.uid
          securityContext:
            privileged: true
        - name: traefik
          image: traefik:v3.7.1
          args:
            - --providers.kubernetesGateway=true
            - --entrypoints.web.address=:80
            - --entrypoints.websecure.address=:443
            - --entrypoints.web.http.redirections.entryPoint.to=websecure
            - --entrypoints.web.http.redirections.entryPoint.scheme=https
            - --accesslog=true
            - --log.level=INFO
          ports:
            - name: web
              containerPort: 80
            - name: websecure
              containerPort: 443
          readinessProbe:
            tcpSocket:
              port: 443

Sample Application

Here is a complete test app with a namespace, deployment, service, gateway, and HTTPRoute:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
apiVersion: v1
kind: Namespace
metadata:
  name: test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: whoami
  namespace: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: whoami
  template:
    metadata:
      labels:
        app: whoami
    spec:
      containers:
        - name: whoami
          image: traefik/whoami:v1.10.1
---
apiVersion: v1
kind: Service
metadata:
  name: whoami
  namespace: test
spec:
  selector:
    app: whoami
  ports:
    - port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: whoami
  namespace: test
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-production
spec:
  gatewayClassName: internal-gateway
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - name: whoami-tls
      allowedRoutes:
        namespaces:
          from: Same
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: whoami
  namespace: test
spec:
  parentRefs:
    - name: whoami
  rules:
    - backendRefs:
        - name: whoami
          port: 80
This post is licensed under CC BY 4.0 by the author.