Skip to content

Restructure threat-model sources by controller#32

Open
ppkarwasz wants to merge 1 commit into
mainfrom
threat-model-sources-by-controller
Open

Restructure threat-model sources by controller#32
ppkarwasz wants to merge 1 commit into
mainfrom
threat-model-sources-by-controller

Conversation

@ppkarwasz

Copy link
Copy Markdown
Member

Restructures the "Data sources" section of the common threat model into a Sources section organised by who controls each input, rather than a flat trusted/untrusted split.

The three categories are defined by their controller:

  • Configuration (operator-controlled, trusted): environment variables, configuration properties, and configuration files, with the existing deployer responsibilities.
  • Structural identifiers and control (developer-controlled, trusted): logger names, levels, markers, the identifiers and field names of a structured log message (for example the RFC 5424 MSGID and SD-ID fields), and the format string of a parameterized log statement.
  • Content (user-controlled, untrusted): log messages, the string representation of log parameters, and thread context values.

Why

Triaging recent reports showed that reporters have a difficulty into classifying sources correctly, such as the structural identifiers of a structured message. Organising sources by controller makes the trust classification explicit and gives a clean basis for deciding which reports are in scope.

It also lets the model state the remediation rule that follows from the classification: trusted inputs may be rejected (fail-fast, for example by throwing), while user-controlled content must be accepted and never rejected, since rejecting it would turn a malicious value into a denial of service.

Thread context keys

Thread context values are classified as untrusted content. The trust level of thread context keys is deferred to discussion apache/logging-log4j2#4132 and marked in the document as a known open gap, rather than silently choosing a side.

Dependent changes

  • The adversary-capabilities section now references the new content subsection, drops thread context keys from the in-scope channels pending the discussion above, and marks an adversary who controls developer-controlled structural or control inputs as out of scope.
  • Internal cross-references updated to the new subsection anchors; the parent #threat-common-sources anchor is retained.

Out of scope for this PR

Sink-side concepts (structured versus unstructured layouts, active versus passive sinks, and downstream-destination trust) are intentionally left for a follow-up PR.

Replace the "Data sources" section's trusted/untrusted split with a "Sources" section organised by who controls each input: configuration (operator-controlled), structural identifiers and control (developer-controlled), and content (user-controlled). This makes the trust classification explicit and gives a clean basis for triaging which reports are in scope.

Thread context values are classified as untrusted content. The trust level of thread context keys is deferred to discussion apache/logging-log4j2#4132 and marked as a known open gap.

The adversary-capabilities section now references the new content subsection, drops thread context keys from the in-scope channels pending that discussion, and marks an adversary who controls developer-controlled structural or control inputs as out of scope.

Sink concepts (structured vs unstructured layouts, active vs passive sinks, downstream-destination trust) are intentionally left for a follow-up PR.

Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the "Data sources" section of the common threat model document into a "Sources" section organized by who controls each input (operator/developer/user), making the trust classification explicit. It also aligns the adversary-capabilities section with the new categories and defers the classification of thread context keys to an external discussion, marking it as a known open gap.

Changes:

  • Replace the flat trusted/untrusted split with three controller-based subsections: Configuration (operator), Structural identifiers and control (developer), and Content (user).
  • Add an explicit remediation rule (trusted inputs may be rejected; user-controlled content must not be rejected) and move the deserialization/object-stringification guidance under Content.
  • Update adversary-capability cross-references to the new anchors, drop thread context keys from in-scope channels, and mark developer-controlled structural/control inputs as out of scope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@FreeAndNil FreeAndNil left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put the string conversion topic in a separate note to move it a little bit away from the bullet points before about what we don't trust?

* The logging frameworks do not trust neither the keys nor the values in the thread context.
* They **do not** trust the **values** stored in the thread context.

The frameworks **trust** that the objects passed to a log statement can be safely converted to strings.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The frameworks **trust** that the objects passed to a log statement can be safely converted to strings.
[NOTE]
====
Although the frameworks accept arbitrary content, they **trust** that the objects passed to a log statement can be safely converted to strings.

Maybe we should put this in a separate note to move it a little bit away from the bullet points before about what we don't trust?


The frameworks **trust** that the objects passed to a log statement can be safely converted to strings.
They **should not** be used to log deserialized data from untrusted sources; see https://owasp.org/www-community/vulnerabilities/Deserialization_of_untrusted_data[the related OWASP guide].

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
====

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants