Skip to content

sni-router: host-net HAProxy to preserve real client IPs#522

Open
dolonet wants to merge 3 commits into
masterfrom
sni-router-host-mode-real-ips
Open

sni-router: host-net HAProxy to preserve real client IPs#522
dolonet wants to merge 3 commits into
masterfrom
sni-router-host-mode-real-ips

Conversation

@dolonet
Copy link
Copy Markdown
Collaborator

@dolonet dolonet commented May 18, 2026

Follow-up to discussion in #498 — bam80 reported that real client IPs never made it through contrib/sni-router despite all four PROXY-protocol pieces being wired up correctly.

Root cause

When the HAProxy container is on a bridge network and :443 / :80 are published via ports:, the source IP of every inbound connection is rewritten to the bridge gateway before HAProxy sees it:

  • Docker (default): docker-proxy accepts on the host and re-opens the connection from the bridge gateway.
  • Docker with userland-proxy: false: kernel DNAT should preserve the source, but on Docker 29 / Fedora the MASQUERADE rewrite (moby/moby#48854) intermittently drops or rewrites traffic.
  • Podman rootless: slirp4netns / pasta userspace forwarder, no equivalent flag.

In every case HAProxy stamps the gateway address (e.g. 172.x.x.1) into the PROXY v2 header, so mtg and Caddy faithfully log the wrong IP. The fix has to lift HAProxy out of the rewrite path — no amount of backend-side configuration can recover what HAProxy never received.

Change

  • HAProxy → network_mode: host. Binds :443/:80 in the host netns directly. No NAT, no userspace forwarder, no source rewrite. Real client IPs (v4 and v6) propagate end-to-end via PROXY v2.
  • mtg and Caddy stay on the compose bridge, published on 127.0.0.1 only (127.0.0.1:3128:3128, 127.0.0.1:8080:80, 127.0.0.1:8443:8443). HAProxy reaches them via host loopback.
  • HAProxy frontends gain explicit IPv6 binds (bind :443,[::]:443 / bind :80,[::]:80). bind *:443 is IPv4-only; the old example accepted IPv6 only on hosts where dual-stack quirks happened to cover for it.
  • Caddy allow-list gains 127.0.0.1/32 to cover the new loopback hop from HAProxy. The RFC1918 ranges stay for the fronting path (mtg → Caddy on the compose bridge).
  • README gains a short subsection explaining the host-mode choice and its trade-offs.

mtg-config.toml is intentionally unchanged — mtg and Caddy are still on the compose bridge, so fronting can keep host = "web" and resolve over compose-network DNS.

Alternatives considered

  • {"userland-proxy": false} in /etc/docker/daemon.json. Smaller change (host config, no compose edits) but: (a) requires host-level daemon config that the contrib example can't ship, (b) flaky on Docker 29 (MASQUERADE rewrite), (c) doesn't help Podman rootless at all.
  • All three services in network_mode: host. Cleaner network model but requires changing mtg-config.toml (fronting host and listen address) and loses compose-network isolation. Bigger change for no functional gain over the current layout.

Trade-offs / platform notes

  • HAProxy owns the host's :443 and :80. Don't run anything else on those ports on the same host.
  • Linux host only. On Docker Desktop (macOS/Windows), network_mode: host binds inside the Linux VM, so external clients can't reach the proxy. Out of scope for this contrib example, which is server-deployment-oriented anyway.
  • With Docker userns-remap, the in-container "root" loses the privilege to bind <1024. README documents the workaround.

Test status

End-to-end validated by @bam80 on Fedora + Docker 29 — both the original revision (real client IPv4/IPv6 visible in mtg and Caddy logs) and the current revision (DOMAIN=localhost run after the review-fixups, see thread).

Branch rebased on master after the 2026-05-20 batch — picks up #514 (image bump to :master, hard dep so the stack actually exposes proxy-protocol-listener end-to-end) and #525 (mtg-config.toml rendered from tracked .example). A fresh checkout now brings up a working stack without manual patching.

Closes #498.

@dolonet dolonet changed the title sni-router: switch HAProxy to host networking for real client IPs sni-router: host-net HAProxy to preserve real client IPs May 18, 2026
@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented May 18, 2026

@bam80 friendly ping — could you re-run this on your Fedora + Docker 29 setup before it leaves draft? The shape changed slightly from the version you tested:

  • backends use 127.0.0.1:... instead of [::1]:... (PROXY v2 still carries the real v6 client IP regardless of the loopback transport, so v6 publishing isn't needed)
  • HAProxy frontends explicitly bind 0.0.0.0:443 + bind [::]:443 v6only (and same for :80)
  • Caddy allow includes 127.0.0.1/32 for the new loopback hop

Concretely, what I'd like to confirm:

  1. docker compose up -d from a fresh checkout brings everything up cleanly.
  2. An IPv4 client hitting :443 → its real address in mtg log + Caddy access log when probing the domain.
  3. Same for an IPv6 client (this is the bit I most want validated since I changed the publishing layout vs. your tested version).

If anything breaks I'll iterate. Thanks for the patience on the round-trip.

Comment thread contrib/sni-router/haproxy.cfg Outdated
Comment on lines +27 to +31
# host's net.ipv6.bindv6only sysctl. `v6only` on the v6 bind prevents it
# from also accepting v4-mapped connections, which would otherwise
# conflict with the explicit v4 bind on the same port.
bind 0.0.0.0:80
bind [::]:80 v6only
Copy link
Copy Markdown
Contributor

@bam80 bam80 May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# host's net.ipv6.bindv6only sysctl. `v6only` on the v6 bind prevents it
# from also accepting v4-mapped connections, which would otherwise
# conflict with the explicit v4 bind on the same port.
bind 0.0.0.0:80
bind [::]:80 v6only
# host's net.ipv6.bindv6only sysctl.
bind :80,[::]:80

*:, 0.0.0.0: and : are equivalent per the doc .
I don't have v6only here in my patch variant (which is pretty the same) and still didn't notice any conflicts (with net.ipv6.bindv6only = 0). Not sure if it's allowed in the one-line notation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that *:, 0.0.0.0: and : are equivalent, and re: v6only — I checked the actual behavior: with SO_REUSEADDR (HAProxy's default) and bindv6only=0, the v6 bind succeeds alongside the v4 bind, and the kernel routes v4 packets to the more-specific AF_INET socket. So both forms produce identical runtime behavior on Linux. My earlier comment overstated the v6only/sysctl interaction — it's not load-bearing, it's self-documentation.

That makes the choice purely stylistic:

  • Two binds + v6only: spells out why two binds coexist for someone reading the cfg without having to reason about SO_REUSEADDR semantics.
  • One-liner: shorter; the comment doesn't have to explain v6only because it's not there.

I have a mild preference for the explicit form for a contrib/ example, but you're the one actually running sni-router and closer to the audience copying this config — if you'd rather have the one-liner, I'll switch. Either way I'm fine.

On v6only in comma syntax: HAProxy docs say bind options apply to all sockets on the line, so bind :80,[::]:80 v6only would set IPV6_V6ONLY on the v4 socket too — no-op there, but cosmetically odd. If we go one-liner, I'd drop v6only entirely, as your suggestion does.

Either way, the gate I'd still like to clear before un-drafting is an actual compose up -d with this layout — v4 and v6 client landing in mtg + Caddy logs with real addresses. The bind nit is a quick swap after; that e2e run is the bit I can't reproduce from my side.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why I would prefer one-liner -
it makes adding new ports easier, and look better, e.g.:

    bind :80,[::]:80
    bind :8080,[::]:8080

I'm personally exploiting the multi-port configuration, I keep them all on one line but someone else might prefer just add a new line with the ports. I don't have a hard preference, though.

I'll test it tomorrow, thanks.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2a63578 — switched both :80 and :443 blocks to bind :PORT,[::]:PORT, dropped v6only, trimmed the comment to one sentence (nothing about v6only to explain anymore). Multi-port-scaling point taken; future ports can just add another comma-separated line.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test it tomorrow, thanks.

Много крови мне попил этот тест (#525 (comment)), но вроде работает, спасибо.

Кстати, я так и не понял в чем проблема была протестировать самому.
Я всё равно не мог тестировать в обычном режиме (80 порт недоступен снаружи), пришлось тестировать с DOMAIN=localhost, но этого д.б. достаточно - обе версии IP видны нормально.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну и масла подлило в огонь отсутствие #514 - тоже побился головой об стену.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Перечитал свою отговорку про «не могу воспроизвести у себя» — ты прав, она не выдерживает критики. Реальная причина: ты был исходным тестером с уже верифицированной средой, и в голове это сложилось как «дешевле попросить ещё раз, чем поднимать чистую тачку». Но это ровно тот случай, когда «дешевле» = «свалить на другого». DOMAIN=localhost на любом dev-VPS — то, что нужно было сделать самому до того, как просить третий проход. Учту.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Да, и это была не «соседняя» проблема, а жёсткая зависимость: nineseconds/mtg:2 без proxy-protocol-listener, без #514/#480 стэк объективно не работает end-to-end, отсюда твой ручной патч во время теста. Должен был либо явно зачейнить #514 в описании, либо включить bump образа сюда же. Сейчас #514 в master — после rebase следующий тестер получит рабочий стэк без ручной возни.

Comment thread contrib/sni-router/haproxy.cfg Outdated
Comment on lines +38 to +44
bind *:443
bind 0.0.0.0:443
bind [::]:443 v6only
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched here too in 2a63578.

Comment thread contrib/sni-router/haproxy.cfg Outdated
Comment on lines +26 to +27
# Explicit v4 + v6 binds so IPv6 clients are accepted regardless of the
# host's net.ipv6.bindv6only sysctl. `v6only` on the v6 bind prevents it
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We could also just do bind [::]:80 v4v6 without explicit v4 and v6 ports but then we would get ffffffff:1.2.3.4 in the logs for IPv4 addresses.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — that ::ffff:1.2.3.4 noise is exactly why I went with explicit dual binds rather than v4v6. Sticking with bind :PORT,[::]:PORT so v4 stays v4 in PROXY-v2 and downstream logs.

dolonet added a commit that referenced this pull request May 19, 2026
Switch to one-line `bind :80,[::]:80` and `bind :443,[::]:443` per
review feedback in #522.  The v6only flag was self-documentation, not
load-bearing: with SO_REUSEADDR (HAProxy's default) and bindv6only=0
the kernel routes v4 packets to the more-specific AF_INET socket
regardless.  Comment trimmed to match — the v6only paragraph is gone
because v6only itself is gone.

The shorter form also scales more cleanly when adding ports later,
e.g. `bind :8080,[::]:8080` on a new line.
dolonet added 3 commits May 20, 2026 13:15
Bridge ingress (Docker's docker-proxy userland forwarder, Podman's
slirp4netns/pasta) rewrites the source IP of inbound connections on a
published port to the bridge gateway address.  HAProxy then stamps that
gateway address into the PROXY v2 header it forwards to mtg and Caddy,
so neither backend ever sees a real client IP.

Move HAProxy into the host netns (network_mode: host) so it binds
:443/:80 directly with no NAT in the path.  mtg and Caddy stay on the
compose bridge and are published on 127.0.0.1 only; HAProxy reaches
them via host loopback and PROXY v2 carries the real client IP (v4 or
v6) end-to-end.

Also accept IPv6 clients explicitly on the HAProxy frontends — `bind
*:443` is IPv4-only and missed v6 clients on hosts where the previous
example happened to "work" only because of dual-stack quirks.

Add 127.0.0.0/8 to Caddy's PROXY allow-list to cover the new loopback
hop from HAProxy.  README gains a short subsection explaining the
host-mode choice and its trade-off (HAProxy occupies host :443/:80).

Diagnosed and tested by @bam80 on Fedora + Docker 29.  Fixes #498.
…rrow Caddy allow)

- Caddy allow: 127.0.0.0/8 → 127.0.0.1/32 (only loopback peer is HAProxy).
- haproxy.cfg: rewrite v6only comment to describe what it actually does
  (suppresses v4-mapped accept, preventing conflict with the v4 bind),
  not the symptom.
- docker-compose.yml: trim the 8-line haproxy comment to 3 lines and
  defer the rationale to README.  Add one-line note explaining why web
  uses host port 8080 (HAProxy owns :80).
- README: condense the "Why network_mode: host" subsection.  Spell out
  trade-offs as a list: own-the-host-ports, Linux-only (Docker Desktop
  doesn't make this layout reachable), userns-remap incompatibility.
  Note that mtg-config.toml stays as-is because mtg/web remain on the
  compose bridge.
Switch to one-line `bind :80,[::]:80` and `bind :443,[::]:443` per
review feedback in #522.  The v6only flag was self-documentation, not
load-bearing: with SO_REUSEADDR (HAProxy's default) and bindv6only=0
the kernel routes v4 packets to the more-specific AF_INET socket
regardless.  Comment trimmed to match — the v6only paragraph is gone
because v6only itself is gone.

The shorter form also scales more cleanly when adding ports later,
e.g. `bind :8080,[::]:8080` on a new line.
@dolonet dolonet force-pushed the sni-router-host-mode-real-ips branch from 2a63578 to a7febc2 Compare May 20, 2026 13:15
@dolonet
Copy link
Copy Markdown
Collaborator Author

dolonet commented May 20, 2026

Rebased on master — picks up #514 (image bump) and #525 (config rendering), both of which were friction sources during @bam80's test. Body updated, draft lifted. Ready for review.

@dolonet dolonet marked this pull request as ready for review May 20, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't see real client IPs passed with PROXY protocol v2

2 participants