Redact git credentials at remaining URL log/serialization sites (fixes #3751)#3792
Conversation
Follow-up to #2154/#3689. Closes the remaining credential leak vectors in the git/workspace stack so URL-credential redaction is consistent. - Add `redact_url_credentials_in_text()` to sdk/utils/redact.py: a substring-capable redactor that strips `https?://user:token@` credentials embedded anywhere in a larger string. The existing `redact_url_credentials()` is anchored and only redacts when the entire string is a URL, so it misses credentials inside messages like git stderr. - RepoSource.validate_url: redact the URL before logging the http->https normalization warning (was logging user-supplied creds verbatim). - run_git_command: redact stderr before logging it and before storing it on GitCommandError. - Consolidate repo.py `_mask_url`/`_mask_token` onto the central helpers (`redact_url_credentials` / `redact_url_credentials_in_text`) so there is a single source of truth; the old `_mask_token` only string-replaced the exact token and missed URL-encoded/partial credentials. Tests cover the new helper (embedded-URL strings, multiple URLs, URL-encoded creds, path-`@` non-matches) and each leak site. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Taste Rating: Good taste - The fix is small, centralized, and covers the remaining real leak sites without changing public behavior.
I reviewed the changed production paths and the new regression coverage. The new substring redactor handles embedded https?://user:token@host URLs in git stderr/log text, while the existing whole-URL helper remains available for URL arguments. The repo clone and git command call sites now redact before logging or persisting error text, and the tests exercise the important leak vectors.
Validation run: uv run pytest tests/sdk/utils/test_redact.py tests/sdk/git/test_url_redaction.py tests/workspace/test_cloud_workspace_repos.py -q → 118 passed.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Narrow defense-in-depth logging/redaction change with no dependency changes and no API surface changes. The main compatibility-sensitive behavior is preserving stored repo URLs and command execution while only sanitizing log/error surfaces, which the tests cover.
VERDICT:
✅ Worth merging: Core logic is sound; merge after the remaining standard CI checks finish green.
KEY INSIGHT:
Centralizing substring URL credential redaction eliminates the remaining special-case leak paths without broadening clone or git-command behavior.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The PR achieves its redaction goal: I reproduced credential leaks on the base commit and verified the same SDK/git/workspace operations redact credentials on the PR commit.
Does this PR achieve its stated goal?
Yes. The stated goal was to redact credentials at the remaining URL log/exception sites in the git/workspace stack. Running an SDK consumer probe showed the base commit leaked SUPERSECRET, oauth2, and a partially echoed token in RepoSource warnings, GitCommandError.stderr/logs, and clone_repos failure logs; the PR commit redacted those same paths to http(s)://****@... while preserving expected behavior such as HTTP→HTTPS normalization and clone failure reporting.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully and installed the uv dev environment. |
| CI Status | ⏳ Observed via gh pr checks: 23 success, 7 in progress, 3 skipped; I did not rerun tests. |
| Functional Verification | ✅ Before/after SDK probe reproduced base leaks and confirmed PR redaction. |
Functional Verification
Test 1: SDK substring redaction helper for embedded URL text
Step 1 — Reproduce / establish baseline (without the fix):
Checked out base 89fd946d6dc0046bb6902ccc1415723063d5ea95 and ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_redaction_probe.py.
Relevant output:
## anchored helper on embedded URL: LEAK: SUPERSECRET, oauth2
fatal: unable to access 'https://oauth2:SUPERSECRET@github.com/owner/repo.git/': 403
## substring helper availability: MISSING
ImportError
This confirms the old public helper only handled whole URL strings and did not provide substring redaction for git-style stderr text.
Step 2 — Apply the PR's changes:
Checked out PR head 34744d3fe8800265662421827bee945f9328a644.
Step 3 — Re-run with the fix in place:
Ran the same command.
Relevant output:
## anchored helper on embedded URL: LEAK: SUPERSECRET, oauth2
fatal: unable to access 'https://oauth2:SUPERSECRET@github.com/owner/repo.git/': 403
## substring helper on embedded URL: REDACTED
fatal: unable to access 'https://****@github.com/owner/repo.git/': 403
This shows the new substring-capable helper redacts embedded credentials while the anchored helper remains unchanged as described.
Test 2: RepoSource HTTP normalization warning no longer leaks credentials
Step 1 — Reproduce / establish baseline (without the fix):
On the base commit, the probe constructed RepoSource(url="http://oauth2:SUPERSECRET@github.com/owner/repo.git") as an SDK user would.
Relevant output:
## RepoSource normalization warning: LEAK: SUPERSECRET, oauth2
stored_https=True
WARNING:Converting HTTP URL to HTTPS for security: http://oauth2:SUPERSECRET@github.com/owner/repo.git
This confirms the bug: normalization worked, but the warning logged both username and token.
Step 2 — Apply the PR's changes:
Checked out PR head 34744d3fe8800265662421827bee945f9328a644.
Step 3 — Re-run with the fix in place:
Ran the same probe.
Relevant output:
## RepoSource normalization warning: REDACTED
stored_https=True
WARNING:Converting HTTP URL to HTTPS for security: http://****@github.com/owner/repo.git
This confirms the warning is now safe while preserving the HTTPS normalization behavior.
Test 3: run_git_command exception/log stderr redaction
Step 1 — Reproduce / establish baseline (without the fix):
On the base commit, the probe called run_git_command(...) against a real temporary executable that exited non-zero and wrote a git-style credential-bearing URL to stderr.
Relevant output:
## run_git_command failing stderr: LEAK: SUPERSECRET, oauth2
exception.stderr=fatal: Authentication failed for 'https://oauth2:SUPERSECRET@github.com/owner/repo.git/'
logs=ERROR:Git command failed: /tmp/tmpujnh6b7h/fake-git clone 'https://****@github.com/owner/repo.git' /tmp/tmpujnh6b7h/repo. Exit code: 128. Stderr: fatal: Authentication failed for 'https://oauth2:SUPERSECRET@github.com/owner/repo.git/'
This shows the command argument was already redacted, but stderr still leaked through both the exception payload and error log.
Step 2 — Apply the PR's changes:
Checked out PR head 34744d3fe8800265662421827bee945f9328a644.
Step 3 — Re-run with the fix in place:
Ran the same probe.
Relevant output:
## run_git_command failing stderr: REDACTED
exception.stderr=fatal: Authentication failed for 'https://****@github.com/owner/repo.git/'
logs=ERROR:Git command failed: /tmp/tmpdh06pqcc/fake-git clone 'https://****@github.com/owner/repo.git' /tmp/tmpdh06pqcc/repo. Exit code: 128. Stderr: fatal: Authentication failed for 'https://****@github.com/owner/repo.git/'
This confirms the remaining stderr leak path is fixed.
Test 4: clone_repos failure logging handles partially echoed credentials
Step 1 — Reproduce / establish baseline (without the fix):
On the base commit, the probe called clone_repos([RepoSource(url="owner/repo", provider="github")], ...) with a token fetcher returning tok@secret, exercising the clone failure path with real git behavior.
Relevant output:
## clone_repos failing stderr: LEAK: secret
success_count=0, failed_repos=['owner/repo']
logs=INFO:[clone] Cloning 1 repository(ies)...
INFO:[clone] Cloning owner/repo (github) -> repo/
WARNING:[clone] Failed: Cloning into '/tmp/tmpvmdawx3d/repos/repo'...
fatal: unable to access 'https://secret@github.com/owner/repo.git/': URL rejected: Bad hostname
This confirms the old exact token replacement missed a partially echoed credential component.
Step 2 — Apply the PR's changes:
Checked out PR head 34744d3fe8800265662421827bee945f9328a644.
Step 3 — Re-run with the fix in place:
Ran the same probe.
Relevant output:
## clone_repos failing stderr: REDACTED
success_count=0, failed_repos=['owner/repo']
logs=INFO:[clone] Cloning 1 repository(ies)...
INFO:[clone] Cloning owner/repo (github) -> repo/
WARNING:[clone] Failed: Cloning into '/tmp/tmp76vrerj1/repos/repo'...
fatal: unable to access 'https://****@github.com/owner/repo.git/': URL rejected: Bad hostname
This confirms the broader URL redaction also covers partially echoed credential strings in clone failures.
Issues Found
None.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
HUMAN:
Redact git credentials at remaining URL log/serialization sites
AGENT:
Verified locally with
uv run pytest tests/sdk/utils/test_redact.py tests/sdk/git/test_url_redaction.py tests/workspace/test_cloud_workspace_repos.py→ 118 passed, and the broader
tests/sdk/git tests/sdk/plugin tests/sdk/utils tests/workspace→ 615 passed. Reproduced the original leaks before the fix andconfirmed each is now redacted (see How to Test).
Why
Follow-up to #2154 and #3689. Those redacted credentials embedded in
https://user:token@hostURLs at most logging sites and (in #3689) onGitCommandError.command. A few related leak vectors remained where auser-supplied credential could still be written to logs or persisted on an
exception. This closes them so credential redaction is consistent across the
git/workspace stack (defense-in-depth; lower severity than #3689).
Summary
redact_url_credentials_in_text()tosdk/utils/redact.py— asubstring-capable redactor (
re.subover embeddedhttps?://user:token@host).The existing
redact_url_credentials()is anchored (^...$), so it onlyredacts when the whole string is a URL and misses credentials embedded inside
a larger string such as git
stderr.RepoSource.validate_url(repo.py): redact the URL before logging thehttp://→https://normalization warning (previously logged any embeddedcredential verbatim).
run_git_command(git/utils.py): redactstderrbefore logging it andbefore storing it on
GitCommandError.repo.py_mask_url/_mask_tokenonto the central helpers sothere is a single source of truth.
_mask_tokenonly string-replaced the exacttoken value and failed for URL-encoded / partially-echoed credentials.
Issue Number
Closes #3751
How to Test
Run the tests:
Manual reproduction of the embedded-URL limitation that the new helper fixes:
Site #1 (warning no longer leaks the credential):
Video/Screenshots
N/A — logging/redaction change, covered by unit tests.
Type
Notes
redact_url_credentials()(anchored) is intentionally kept for whole-URLarguments (e.g. each
gitclone arg); the new substring helper is used wherevera credential may appear inside a larger string.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:34744d3-pythonRun
All tags pushed for this build
About Multi-Architecture Support
34744d3-python) is a multi-arch manifest supporting both amd64 and arm6434744d3-python-amd64) are also available if needed