Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions kong-mcp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ One plugin instance per MCP resource. See `kong.yml` for full examples.
| `issuer` | ✅ | AuthGate base URL. Must equal the token's `iss` claim byte-for-byte. |
| `gateway_origin` | ✅ | Externally reachable Kong origin, e.g. `https://gw.example.com`. Used to build the PRM URL. |
| `resource_path` | ✅ | This resource's path, e.g. `/mcp/gitea`. |
| `jwks_uri` | | AuthGate JWKS endpoint (RS256). Accepted algs are always pinned to the RS family. |
| `jwks_uri` | | AuthGate JWKS endpoint (RS256). Accepted algs are always pinned to the RS family. Leave empty to **auto-discover** it from the issuer's AS metadata (RFC 8414 `/.well-known/oauth-authorization-server`, falling back to OIDC discovery; cached 1h, the metadata's `issuer` must match). Set it explicitly when Kong reaches AuthGate on a different host than clients do — e.g. `host.docker.internal` in the compose demos. |
| `required_scopes` | | All listed scopes must be present in the token's `scope`, else `403 insufficient_scope`. |
| `audience` | | Expected `aud` for **token validation only**. Defaults to `gateway_origin + resource_path`. The PRM `resource` always stays the canonical URL (RFC 9728 §3.3), so set this only when AuthGate emits a fixed non-URL `aud`. |
| `require_audience` | | Enforce `aud` only when `true`. **All shipped configs enable it** (the schema default is `false` only because go-pdk booleans default to false). AuthGate emits a per-resource `aud` via RFC 8707: the client sends `resource=<gateway_origin + resource_path>` on the token request, and that URL must be on the client's `allowed_resources` allowlist. The expected value is an exact, scheme/slash-sensitive match — a token minted without the matching `aud` gets `401`. Set `false` only temporarily while debugging token issuance (see the replay warning below). |
Expand All @@ -96,7 +96,7 @@ Only tokens with `type=access` are accepted; AuthGate refresh tokens (same key,
`iss`, `aud`, and `scope`, differing only by `type` and a longer `exp`) are
rejected with `401 invalid_token`.

> go-pdk schemas can't mark fields required, so the four required fields are
> go-pdk schemas can't mark fields required, so the three required fields are
> validated on the first request instead — a missing one fails every request
> with `500 server_error` and a critical log line, not a silent misbehavior.

Expand Down Expand Up @@ -200,6 +200,11 @@ Before this works end-to-end, confirm three things on AuthGate (decode a real

## Operational notes

- **Auto-discovery adds the metadata endpoint to the availability chain.** With
`jwks_uri` empty, the first token (and one refresh per hour) also depends on
the issuer's AS metadata endpoint; a cold-cache discovery failure is answered
`503` and retried on the next request, while a failed hourly refresh keeps
serving the last discovered `jwks_uri`.
- **JWKS endpoint must be highly available.** If the **initial** fetch fails,
token requests get `503 temporarily_unavailable` (not `401`, so clients don't
re-run OAuth) and it is retried on the next request — a failed initial fetch is
Expand Down
8 changes: 6 additions & 2 deletions kong-mcp/README.zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ JWKS 的抓取、記憶體快取、背景輪替、未知 `kid` 的限流補抓
| `issuer` | ✅ | AuthGate base URL,必須與 token 的 `iss` claim 逐字元相符。 |
| `gateway_origin` | ✅ | 對外可達的 Kong origin,例如 `https://gw.example.com`,用來組出 PRM URL。 |
| `resource_path` | ✅ | 此資源的路徑,例如 `/mcp/gitea`。 |
| `jwks_uri` | | AuthGate JWKS endpoint(RS256)。接受的演算法固定鎖在 RS 家族。 |
| `jwks_uri` | | AuthGate JWKS endpoint(RS256)。接受的演算法固定鎖在 RS 家族。留空則改由 issuer 的 AS metadata **自動發現**(RFC 8414 `/.well-known/oauth-authorization-server`,失敗時退回 OIDC discovery;快取 1 小時,metadata 的 `issuer` 必須與設定值相符)。當 Kong 連 AuthGate 的位址與 client 不同時(例如 compose 範例裡的 `host.docker.internal`)才需要手動指定。 |
| `required_scopes` | | token 的 `scope` 必須包含全部所列項目,否則 `403 insufficient_scope`。 |
| `audience` | | **只影響 token 的 `aud` 驗證**,預設為 `gateway_origin + resource_path`。PRM 的 `resource` 永遠維持 canonical URL(RFC 9728 §3.3),只有在 AuthGate 發固定的非 URL `aud` 時才需要設。 |
| `require_audience` | | 設 `true` 才強制檢查 `aud`。**所有隨附設定檔都已開啟**(schema 預設為 `false` 只是因為 go-pdk 的布林零值)。AuthGate 透過 RFC 8707 發出 per-resource `aud`:client 在 token 請求帶 `resource=<gateway_origin + resource_path>`,且該 URL 必須在 client 的 `allowed_resources` 白名單內。比對值是逐字元、區分 scheme/斜線的精確比對——沒綁定相符 `aud` 的 token 一律 `401`。只有在除錯 token 簽發時才暫時設回 `false`(見下方重放警告)。 |
Expand All @@ -92,7 +92,7 @@ JWKS 的抓取、記憶體快取、背景輪替、未知 `kid` 的限流補抓
只接受 `type=access` 的 token;AuthGate 的 refresh token(金鑰、`iss`、`aud`、
`scope` 都相同,只有 `type` 與較長的 `exp` 不同)會被回 `401 invalid_token` 拒絕。

> go-pdk 產生的 schema 無法標記必填欄位,所以四個必填欄位改在第一個請求時驗證——
> go-pdk 產生的 schema 無法標記必填欄位,所以三個必填欄位改在第一個請求時驗證——
> 缺欄位時所有請求都會回 `500 server_error` 並寫一行 critical log,而不是默默地
> 行為異常。

Expand Down Expand Up @@ -187,6 +187,10 @@ token 能通過驗證之前,請先改 `kong.yml` 讓 `issuer` / `gateway_origi

## 維運注意事項

- **自動發現會把 metadata endpoint 也納入可用性鏈。** `jwks_uri` 留空時,第一顆
token(以及每小時一次的更新)還會多依賴 issuer 的 AS metadata endpoint;冷快取
下發現失敗回 `503`、下一個請求重試,而每小時更新失敗則沿用上次發現的
`jwks_uri`,不影響線上流量。
- **JWKS endpoint 要高可用。** **初次**抓取失敗時,帶 token 的請求會回
`503 temporarily_unavailable`(不是 `401`,client 不會誤以為要重跑 OAuth),
下一個請求會重試——初次抓失敗的結果不會被快取。抓取等待上限 10 秒,且在 per-URI
Expand Down
9 changes: 8 additions & 1 deletion kong-mcp/kong.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ services:
issuer: https://auth.example.com
gateway_origin: https://gw.example.com
resource_path: /mcp/gitea
# Optional: omit to auto-discover from the issuer's AS metadata
# (RFC 8414). Spelled out here because the demo issuer is a
# placeholder; keep it explicit whenever Kong reaches AuthGate on a
# different host than clients do (see kong.local.yml).
jwks_uri: https://auth.example.com/.well-known/jwks.json
required_scopes:
- mcp:gitea
Expand All @@ -53,7 +57,10 @@ services:
issuer: https://auth.example.com
gateway_origin: https://gw.example.com
resource_path: /mcp/sentry
jwks_uri: https://auth.example.com/.well-known/jwks.json
# jwks_uri omitted on purpose: it is auto-discovered from the
# issuer's AS metadata (RFC 8414). See the gitea service above for
# the explicit form, and use that form whenever Kong reaches
# AuthGate on a different host than clients do.
required_scopes:
- mcp:sentry
require_audience: true
Expand Down
208 changes: 198 additions & 10 deletions kong-mcp/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ import (
"encoding/json"
"errors"
"fmt"
"io"
"log/slog"
"net/http"
"net/url"
"os"
"slices"
Expand All @@ -49,7 +51,7 @@ import (
)

var (
Version = "0.3.0"
Version = "0.4.0"
Priority = 1000
)

Expand All @@ -60,6 +62,9 @@ const (
// jwksHTTPTimeout caps every JWKS fetch and unknown-kid refetch wait; the
// library default of one minute would let a slow AuthGate stall requests.
jwksHTTPTimeout = 10 * time.Second
// metadataTTL bounds how long a discovered jwks_uri is trusted before the
// AS metadata is re-fetched (matches Kong's metadata_cache_ttl of 3600s).
metadataTTL = time.Hour
)

// rsMethods pins accepted algorithms to the RS family: never accept HS* when
Expand All @@ -73,7 +78,7 @@ type Config struct {
ResourcePath string `json:"resource_path"` // e.g. /mcp/gitea
Audience string `json:"audience"` // expected aud; default GatewayOrigin+ResourcePath
RequiredScopes []string `json:"required_scopes"` // all must be present
JWKSURI string `json:"jwks_uri"` // AuthGate JWKS endpoint (RS256)
JWKSURI string `json:"jwks_uri"` // AuthGate JWKS endpoint (RS256); empty => discover via RFC 8414 from Issuer
RequireAudience bool `json:"require_audience"` // false until AuthGate emits per-resource aud
LeewaySeconds int `json:"leeway_seconds"` // clock-skew tolerance for exp/nbf

Expand All @@ -99,7 +104,6 @@ func (conf *Config) setup() error {
{"issuer", conf.Issuer},
{"gateway_origin", conf.GatewayOrigin},
{"resource_path", conf.ResourcePath},
{"jwks_uri", conf.JWKSURI},
} {
if f.value == "" {
missing = append(missing, f.name)
Expand Down Expand Up @@ -128,19 +132,29 @@ func (conf *Config) setup() error {
invalid = append(invalid, `gateway_origin must not end with "/"`)
}
// issuer/gateway_origin/jwks_uri are concatenated into URLs (PRM URL,
// audience) and fetched (JWKS); a relative or schemeless value would
// otherwise surface only at traffic time as an opaque per-request 503
// (jwks_uri) or a silent universal 401 (issuer).
// audience) and fetched (JWKS, AS metadata); a relative or schemeless
// value would otherwise surface only at traffic time as an opaque
// per-request 503 (jwks_uri) or a silent universal 401 (issuer).
for _, u := range []struct{ name, value string }{
{"issuer", conf.Issuer},
{"gateway_origin", conf.GatewayOrigin},
{"jwks_uri", conf.JWKSURI},
{"jwks_uri", conf.JWKSURI}, // optional: empty means RFC 8414 discovery
} {
parsed, err := url.Parse(u.value)
if err != nil || !parsed.IsAbs() || (parsed.Scheme != "http" && parsed.Scheme != "https") {
// only jwks_uri can be empty here — missing required fields
// already returned above
if u.value != "" && !isAbsHTTPURL(u.value) {
invalid = append(invalid, u.name+` must be an absolute http(s) URL`)
}
}
// RFC 8414 §2 forbids query/fragment in an issuer identifier, and
// discovery builds the well-known URLs from scheme://host+path only —
// a query would be dropped silently and the metadata issuer check
// could then never match. Reject loudly instead. (A trailing slash is
// NOT rejected: some ASes — e.g. Auth0 — legitimately use one, and it
// works as long as the token's iss and the metadata issuer carry it too.)
if parsed, err := url.Parse(conf.Issuer); err == nil && (parsed.RawQuery != "" || parsed.Fragment != "") {
invalid = append(invalid, "issuer must not contain a query or fragment (RFC 8414)")
}
if conf.LeewaySeconds < 0 {
invalid = append(invalid, "leeway_seconds must not be negative")
}
Expand Down Expand Up @@ -176,6 +190,13 @@ func (conf *Config) audience() string {
return conf.GatewayOrigin + conf.ResourcePath
}

// isAbsHTTPURL reports whether s parses as an absolute http(s) URL — the one
// shape rule shared by setup()'s config checks and the discovered jwks_uri.
func isAbsHTTPURL(s string) bool {
parsed, err := url.Parse(s)
return err == nil && parsed.IsAbs() && (parsed.Scheme == "http" || parsed.Scheme == "https")
}

// JWKS cache: the plugin server is long-lived, so one self-refreshing keyfunc
// per JWKS URI is shared across the whole process. Construction performs a
// synchronous initial HTTP fetch (up to jwksHTTPTimeout); it runs under a
Expand All @@ -198,6 +219,13 @@ var errJWKSUnavailable = errors.New("JWKS unavailable")
// first fetch is returned as an error — not cached as an empty key set that
// would 401 every token until the next refresh window — so the next request
// simply retries.
//
// Entries are never evicted. With discovery this means an issuer that MOVES
// its advertised jwks_uri strands the old URI's keyfunc here: one goroutine
// plus one HTTP fetch (and, once the old URL dies, one error log) per hour,
// per orphan, until restart. Cancelling the old context on change would break
// in-flight verifications still holding that keyfunc, so the leak is accepted
// — it is bounded by how often an AS relocates its JWKS, which is rare.
func getJWKS(uri string) (keyfunc.Keyfunc, error) {
jwksMu.RLock()
k, ok := jwksCache[uri]
Expand Down Expand Up @@ -267,8 +295,168 @@ func getJWKS(uri string) (keyfunc.Keyfunc, error) {
return k, nil
}

// AS metadata discovery (RFC 8414): when jwks_uri is not configured, it is
// looked up from the issuer's authorization-server metadata instead — the same
// document MCP clients read in step ③→④. Mirrors the JWKS cache: per-issuer
// construction lock, failures never cached, and a discovery failure is an
// infrastructure 503, not a token 401. One caveat the explicit jwks_uri config
// exists for: discovery fetches FROM THE GATEWAY, so the issuer (and the
// jwks_uri its metadata advertises) must be reachable from inside Kong — in
// the docker-compose demos that means host.docker.internal, which is why those
// configs keep setting jwks_uri by hand.
var (
metaMu sync.RWMutex // guards metaCache reads/writes
metaCache = map[string]metaEntry{} // discovered jwks_uri, keyed by issuer
metaInitMu sync.Mutex // guards metaInit
metaInit = map[string]*sync.Mutex{} // per-issuer discovery lock

metadataHTTPClient = &http.Client{Timeout: jwksHTTPTimeout}
)

type metaEntry struct {
jwksURI string
expires time.Time
}

// metadataURLs returns the discovery documents to try for issuer, in order:
// RFC 8414 (well-known inserted between host and path) first, then OIDC
// discovery (well-known appended) — AuthGate serves both, other ASes at least
// one.
func metadataURLs(issuer string) []string {
u, err := url.Parse(issuer)
if err != nil { // setup() already validated; defensive
return []string{strings.TrimSuffix(issuer, "/") + "/.well-known/oauth-authorization-server"}
}
origin := u.Scheme + "://" + u.Host
path := strings.TrimSuffix(u.Path, "/")
return []string{
origin + "/.well-known/oauth-authorization-server" + path,
origin + path + "/.well-known/openid-configuration",
}
}

// fetchJWKSURI fetches the issuer's AS metadata and returns its jwks_uri.
// The document's issuer must equal the configured one (RFC 8414 §3.3) — a
// mismatched document could otherwise point verification at attacker keys —
// and the advertised jwks_uri must be an absolute http(s) URL, the same shape
// rule setup() applies to a hand-configured one.
func fetchJWKSURI(issuer string) (string, error) {
// every attempt's error is kept and joined: the RFC 8414 attempt usually
// carries the diagnostic one (e.g. an issuer mismatch), and a trailing
// OIDC-fallback 404 must not mask it
var errs []error
for _, mdURL := range metadataURLs(issuer) {
resp, err := metadataHTTPClient.Get(mdURL)
if err != nil {
errs = append(errs, err)
continue
}
// read one byte past the cap so an oversized document fails loudly
// instead of being truncated into a confusing JSON parse error
body, err := io.ReadAll(io.LimitReader(resp.Body, 1<<20+1))
_ = resp.Body.Close()
if err != nil {
errs = append(errs, fmt.Errorf("%s: %w", mdURL, err))
continue
}
if len(body) > 1<<20 {
errs = append(errs, fmt.Errorf("%s: metadata document exceeds 1 MiB", mdURL))
continue
}
if resp.StatusCode != http.StatusOK {
errs = append(errs, fmt.Errorf("%s: HTTP %d", mdURL, resp.StatusCode))
continue
}
var meta struct {
Issuer string `json:"issuer"`
JWKSURI string `json:"jwks_uri"`
}
if err := json.Unmarshal(body, &meta); err != nil {
errs = append(errs, fmt.Errorf("%s: %w", mdURL, err))
continue
}
if meta.Issuer != issuer {
errs = append(errs, fmt.Errorf("%s: metadata issuer %q does not match configured issuer %q", mdURL, meta.Issuer, issuer))
continue
}
if !isAbsHTTPURL(meta.JWKSURI) {
errs = append(errs, fmt.Errorf("%s: metadata jwks_uri %q is not an absolute http(s) URL", mdURL, meta.JWKSURI))
continue
}
return meta.JWKSURI, nil
}
return "", errors.Join(errs...)
}

// discoverJWKSURI returns the issuer's jwks_uri, re-fetching the AS metadata
// at most once per metadataTTL. A failed refresh keeps serving the previously
// discovered value (traffic should not break because a metadata fetch blipped
// — key freshness is keyfunc's job, not this lookup's); only a cold cache with
// no fallback surfaces the error, which Access answers with 503.
func discoverJWKSURI(issuer string) (string, error) {
metaMu.RLock()
e, ok := metaCache[issuer]
metaMu.RUnlock()
if ok && time.Now().Before(e.expires) {
return e.jwksURI, nil
}

// serialize discovery per issuer; same pattern as getJWKS
metaInitMu.Lock()
initMu, found := metaInit[issuer]
if !found {
initMu = &sync.Mutex{}
metaInit[issuer] = initMu
}
metaInitMu.Unlock()

initMu.Lock()
defer initMu.Unlock()

// another caller may have refreshed it while we waited for initMu
metaMu.RLock()
e, ok = metaCache[issuer]
metaMu.RUnlock()
if ok && time.Now().Before(e.expires) {
return e.jwksURI, nil
}

uri, err := fetchJWKSURI(issuer)
if err != nil {
if ok { // stale entry: extend it rather than failing live traffic
slog.Error("AS metadata refresh failed; keeping cached jwks_uri", "issuer", issuer, "error", err)
uri = e.jwksURI
} else {
return "", err
}
}
metaMu.Lock()
metaCache[issuer] = metaEntry{jwksURI: uri, expires: time.Now().Add(metadataTTL)}
metaMu.Unlock()
return uri, nil
}

// resolveJWKSURI returns the JWKS endpoint to verify against: the configured
// jwks_uri, or — when it is left empty — the one discovered from the issuer's
// AS metadata. A discovery failure is an infrastructure error (503), same as
// a failed JWKS fetch.
func (conf *Config) resolveJWKSURI() (string, error) {
if conf.JWKSURI != "" {
return conf.JWKSURI, nil
}
uri, err := discoverJWKSURI(conf.Issuer)
if err != nil {
return "", fmt.Errorf("AS metadata discovery: %w", err)
}
return uri, nil
}

func (conf *Config) keyFunc(token *jwt.Token) (any, error) {
kf, err := getJWKS(conf.JWKSURI)
uri, err := conf.resolveJWKSURI()
if err != nil {
return nil, fmt.Errorf("%w: %v", errJWKSUnavailable, err)
}
kf, err := getJWKS(uri)
if err != nil {
return nil, fmt.Errorf("%w: %v", errJWKSUnavailable, err)
}
Expand Down
Loading