Setting Up Traefik as a Kubernetes Gateway API Provider with Tailscale (Headscale) Sidecar and cert-manager
A comprehensive guide to deploying Traefik v3.7 as a Kubernetes Gateway API provider with a Headscale-connected Tailscale sidecar for secure ingress and automatic TLS certificate provisioning with cert-manager.
A few weeks ago, I decided it was finally time to migrate my Kubernetes ingress setup from the classic Ingress resources (backed by nginx-ingress) to the Gateway API. My cluster is a multi-node k3s setup spread across Oracle Cloud and a few Raspberry Pis at home. Some services are publicly exposed on a public IP, but most are internal and accessed through Tailscale connected to my self-hosted Headscale server.
I chose Traefik as the Gateway API provider because of its native support for the Gateway API without requiring any CRD-proxying or extra controllers.
This post covers what I learned, the options I considered, the pitfalls I hit, and the final working setup.
Why Not the Tailscale Operator?
If you are on Tailscale SaaS, the official Tailscale Kubernetes Operator is the simplest way to expose cluster workloads to your tailnet. Annotate a Service with tailscale.com/expose: true, and it just works.
However, the operator has a limitation: it authenticates using OAuth credentials against login.tailscale.com and uses the Tailscale API to manage devices. There is no option to point it at a custom control server. If you run Headscale — and many self-hosters do — the operator simply does not work. The Headscale project tracked this request and closed it as wontfix.
This leaves the sidecar pattern. Run a Tailscale container alongside your proxy, and have Tailscale handle the connectivity while your proxy handles the traffic. This works with any control server — SaaS or self-hosted.
Gateway API vs Ingress
The Kubernetes Ingress resource has been the standard for years, but it has well-known limitations:
- Ingress is a single-resource abstraction. There is no standard way to express TCP/UDP routing, TLS configuration at the listener level, or cross-namespace route references without vendor annotations.
- Each controller re-implements these features through custom CRDs (IngressRoute, Middleware, etc.), locking you into that controller.
- The Ingress spec was designed around a single “one ingress per host” model, which does not scale well for complex routing.
The Gateway API solves this by splitting responsibilities across multiple resource types:
- GatewayClass — defines a class of load balancers (like StorageClass for storage)
- Gateway — represents the load balancer instance with listener configuration (ports, TLS, hostnames)
- HTTPRoute, TCPRoute, TLSRoute, GRPCRoute — route resources that attach to Gateways
The separation means platform operators can define GatewayClasses, and application teams can create their own routes without needing to touch the Gateway configuration.
The Architecture
The final setup looks like this:
1
2
3
4
5
6
7
8
9
10
Client (tailnet)
│
â–Ľ
Tailscale sidecar (kernel mode, iptables DNAT)
│
â–Ľ
Traefik (TLS termination via cert-manager cert)
│
â–Ľ
Gateway API HTTPRoute → backend Service → Pods
Everything runs in a single pod with two containers: tailscale and traefik. Tailscale handles the encrypted tunnel to the tailnet. In kernel mode (more on this later), it sets up iptables rules that DNAT traffic from the Tailscale interface directly to Traefik, preserving the original client IP.
The CRD Problem That Blocked Everything
When I first deployed Traefik with --providers.kubernetesGateway=true, the GatewayClass stayed stuck at Pending:
1
2
3
4
"message": "Waiting for controller",
"reason": "Pending",
"status": "Unknown",
"type": "Accepted"
The Traefik logs were filling up with:
1
E reflector.go: ... Failed to watch *v1.TLSRoute: the server could not find the requested resource
The root cause: I had installed the Gateway API CRDs months ago (v1.0 or so), and newer versions added resources like TLSRoute and BackendTLSPolicy. Traefik v3.7 expects the v1.5.1 standard channel, which includes these as standard resources.
Traefik’s WatchAll method calls WaitForCacheSync for all informers at startup. When TLSRoute is missing, WaitForCacheSync blocks forever. The event loop never starts, and the GatewayClass is never processed. It does not matter that you are not using TLSRoutes — Traefik tries to watch them anyway, and the missing CRD halts the entire provider.
The fix was straightforward: install the Gateway API v1.5.1 standard CRDs.
1
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.5.1/standard-install.yaml
If you manage your cluster with ArgoCD, commit the raw CRD YAML to your repo and create an ArgoCD Application for it. CRDs are cluster-scoped, so the ArgoCD app needs to omit the destination namespace and the CreateNamespace sync option.
RBAC for Gateway API
Traefik needs permission to watch Gateway API resources and update their status. The ClusterRole needs every Gateway API resource type including the newer ones:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: traefik-gateway
rules:
- apiGroups: [""]
resources: ["configmaps", "nodes", "services", "secrets", "namespaces", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["discovery.k8s.io"]
resources: ["endpointslices"]
verbs: ["list", "watch"]
- apiGroups: ["gateway.networking.k8s.io"]
resources:
- backendtlspolicies
- gatewayclasses
- gateways
- grpcroutes
- httproutes
- referencegrants
- tcproutes
- tlsroutes
verbs: ["get", "list", "watch"]
- apiGroups: ["gateway.networking.k8s.io"]
resources:
- backendtlspolicies/status
- gatewayclasses/status
- gateways/status
- grpcroutes/status
- httproutes/status
- tcproutes/status
- tlsroutes/status
verbs: ["update"]
The status subresource permissions are essential — without them, Traefik cannot mark Gateway resources as Accepted or Programmed.
Tailscale Sidecar: Userspace vs Kernel Mode
The Tailscale container image (ghcr.io/tailscale/tailscale) uses a binary called containerboot as its entrypoint. It reads environment variables, starts tailscaled, and runs tailscale up to authenticate.
Default: Userspace Mode
By default, containerboot runs tailscaled with --tun=userspace-networking. In this mode, Tailscale uses a userspace network stack (gVisor’s netstack). Connections work, but there is a catch: Tailscale proxies traffic by making a new TCP connection to 127.0.0.1, so every request appears to come from localhost.
1
127.0.0.1 - - [03/Jun/2026:17:07:08 +0000] "GET / HTTP/2.0" 200 778
If you do not need client IPs in your logs, userspace mode is fine. It requires no special container capabilities.
Better: Kernel Mode
To get real client IPs, you need kernel mode. The containerboot binary respects the TS_USERSPACE environment variable. When set to "false", it passes --tun=tailscale0 to tailscaled, which creates a kernel tun device and uses iptables DNAT rules to forward traffic to the local application. The WireGuard data path still runs in userspace (wireguard-go), but the tun interface allows proper iptables handling that preserves the original source IP.
You also need privileged: true on the container — kernel mode requires NET_ADMIN capability and access to /dev/net/tun.
1
2
3
4
5
6
env:
- name: TS_USERSPACE
value: "false"
# ...
securityContext:
privileged: true
After switching, the logs show the actual client IP from the tailnet:
1
100.64.0.3 - - [03/Jun/2026:17:27:13 +0000] "GET / HTTP/2.0" 200 714
A Note About TS_EXTRA_ARGS
The containerboot binary processes two sets of extra arguments:
- TS_EXTRA_ARGS — passed to
tailscale up. Use this for--login-server,--accept-routes,--accept-dns. - TS_TAILSCALED_EXTRA_ARGS — passed to
tailscaled. Use this for--tun,--socks5-server.
The --tun flag is controlled by TS_USERSPACE, not by TS_EXTRA_ARGS. Adding --tun=tailscale0 to TS_EXTRA_ARGS does nothing — it gets passed to tailscale up, which ignores it. This tripped me up for a while.
Connecting to Headscale
Point the sidecar at your Headscale server via the extra args:
1
2
3
env:
- name: TS_EXTRA_ARGS
value: "--login-server https://headscale.yourdomain.com"
The auth key comes from a Kubernetes Secret. I store mine in Infisical and sync it via the external-secrets operator.
The sidecar also needs a Role to persist its state to a Kubernetes Secret (so state survives pod restarts):
1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: internal-gateway-ts-state
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["create"]
- apiGroups: [""]
resourceNames: ["internal-gateway-ts-state"]
resources: ["secrets"]
verbs: ["get", "update", "patch"]
Set TS_KUBE_SECRET=internal-gateway-ts-state on the container to enable this.
TLS with cert-manager
For TLS certificates, I use cert-manager with Let’s Encrypt. cert-manager has built-in support for the Gateway API: when you annotate a Gateway with cert-manager.io/cluster-issuer, it automatically provisions a Certificate for the referenced TLS Secret.
This requires the --enable-gateway-api flag on cert-manager. Without it, the annotation is silently ignored and the TLS Secret is never created.
If you deploy cert-manager via Helm or Rancher HelmChart, add it to the extra args:
1
2
extraArgs:
- --enable-gateway-api
Then annotate your Gateway:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: whoami
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
spec:
gatewayClassName: internal-gateway
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: whoami-tls
cert-manager watches Gateway resources, sees the annotation, creates a Certificate resource, and eventually the Secret appears. The Gateway listener transitions from InvalidCertificateRef to ResolvedRefs.
Options Considered
Before settling on Traefik + Tailscale sidecar, I looked at a few alternatives:
nginx-ingress + Tailscale sidecar
nginx-ingress works with the Gateway API through the nginxinc/kubernetes-ingress controller. It supports v1 Gateway API resources. I already had extensive nginx-ingress experience, but the controller configuration is annotation-heavy, and I wanted something cleaner for the Gateway API.
Cilium Gateway API
Cilium has a built-in Gateway API implementation using eBPF. It is fast and elegant, but it requires Cilium as the CNI. My cluster runs Flannel (k3s default), and I did not want to swap CNIs. Cilium’s Gateway API also does not support all route types yet.
Tailscale Operator + Something Else (if I were on SaaS)
If I were on Tailscale SaaS, I would have used the operator and could pick any Gateway API provider behind it. The operator handles the networking side; the provider handles the routing. This would be the cleanest split. But since I run Headscale, the operator is not an option.
Traefik
Traefik has native Gateway API support since v3.0. It does not need custom CRDs — no IngressRoute, no Middleware. Everything goes through standard Gateway API resources. The configuration is minimal: a few CLI flags, and it discovers all Gateway resources in the cluster.
I also like that Traefik handles both Layer 7 routing (HTTPRoute) and TLS termination in one process. I was already using it for another project and had no complaints.
The Full Deployment
The deployment has two containers sharing a pod network namespace:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
apiVersion: apps/v1
kind: Deployment
metadata:
name: internal-gateway
namespace: internal-gateway
spec:
replicas: 1
selector:
matchLabels:
app: internal-gateway
template:
metadata:
labels:
app: internal-gateway
spec:
serviceAccountName: internal-gateway
containers:
- name: tailscale
image: ghcr.io/tailscale/tailscale:v1.96.5
env:
- name: TS_HOSTNAME
value: "internal-gateway"
- name: TS_KUBE_SECRET
value: "internal-gateway-ts-state"
- name: TS_AUTH_ONCE
value: "true"
- name: TS_USERSPACE
value: "false"
- name: TS_AUTHKEY
valueFrom:
secretKeyRef:
name: internal-gateway
key: TS_AUTHKEY
- name: TS_EXTRA_ARGS
value: "--login-server https://headscale.yourdomain.com"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
securityContext:
privileged: true
- name: traefik
image: traefik:v3.7.1
args:
- --providers.kubernetesGateway=true
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- --entrypoints.web.http.redirections.entryPoint.to=websecure
- --entrypoints.web.http.redirections.entryPoint.scheme=https
- --accesslog=true
- --log.level=INFO
ports:
- name: web
containerPort: 80
- name: websecure
containerPort: 443
readinessProbe:
tcpSocket:
port: 443
Sample Application
Here is a complete test app with a namespace, deployment, service, gateway, and HTTPRoute:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
apiVersion: v1
kind: Namespace
metadata:
name: test
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: whoami
namespace: test
spec:
replicas: 1
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: traefik/whoami:v1.10.1
---
apiVersion: v1
kind: Service
metadata:
name: whoami
namespace: test
spec:
selector:
app: whoami
ports:
- port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: whoami
namespace: test
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
spec:
gatewayClassName: internal-gateway
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: whoami-tls
allowedRoutes:
namespaces:
from: Same
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: whoami
namespace: test
spec:
parentRefs:
- name: whoami
rules:
- backendRefs:
- name: whoami
port: 80