Skip to content

feat: CDK Comprehensive Validation RFC#899

Open
kaizencc wants to merge 16 commits intomainfrom
conroy/validation
Open

feat: CDK Comprehensive Validation RFC#899
kaizencc wants to merge 16 commits intomainfrom
conroy/validation

Conversation

@kaizencc
Copy link
Copy Markdown
Contributor

@kaizencc kaizencc commented Apr 1, 2026

This is a request for comments about CDK Comprehensive Validation. See #897 for
additional details.

APIs are signed off by @rix0rrr .


By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license

Comment on lines +7 to +9
We will shift left failures that occur during AWS CloudFormation deployment.
We will validate as much as possible offline during cdk synth
and supplement with validation that occurs at the beginning of cdk deploy.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This reads strange. Is it tenets? Is it a blog post intro?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supposed to be 1-2 sentences to describe the feature, i have edited it to be that as the rest of the information is duplicated elsewhere.

and supplement with validation that occurs at the beginning of cdk deploy.

Today, validation happens when your CDK app is built at synth time
and when your CFN ChangeSet is created at deploy time (by default).
Copy link
Copy Markdown
Contributor

@mrgrain mrgrain Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not just for change sets. and what does by default mean?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume the default changeset deploy mode (instead of --direct). but agree its not clear.
also did you confirm that direct mode (createstack etc) does not have validation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding of our validation posture today for our default setup is that we validate

  • at compile time during synth
  • when we create change set during deploy

there are other validations like annotations and policy-as-code validations but these are supply your own. we have some hand-written annotations, sure, and they get executed post synth as well.

i will be updating the RFC to include the holistic picture of tools customers use to validate their CDK code and how Comprehensive Validation fits in.

Copy link
Copy Markdown
Contributor Author

@kaizencc kaizencc Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume the default changeset deploy mode (instead of --direct). but agree its not clear.
also did you confirm that direct mode (createstack etc) does not have validation?

CFN Early Validation happens when change set is created so --direct does not validate that way as it does not create a change set.

offline rule checks, construct library errors, and CloudFormation changeset validation —
into a single invocation.

##### Three Layers of Defense
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome!

* **CFN Early Validation (existing)** — During `cdk deploy`, CloudFormation validates your changeset
before execution, catching issues like resources that already exist.

##### Offline Validation
Copy link
Copy Markdown
Contributor

@mrgrain mrgrain Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike that name lol. Is this really different than policy as code?
Maybe Local Validation? Super Early Validation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving it as "offline" and "online" until we have a better naming system. i don't share your distaste but am happy to workshop. i feel strongly that post-synthesis validation and pre-deploy validation should have monikers that relate to each other, the way "offline" and "online" do.

and I think CFN Early Validation is a misnomer in the world of CDK since it does not happen "early" at all. so i would like to steer away from "Early" and rename our pre-deploy validation something different, even if it just a wrapper around CFN Early Validation.

Comment on lines +108 to +110
If you are already using the existing [Policy Validation](https://docs.aws.amazon.com/cdk/v2/guide/policy-validation-synthesis.html) plugin system
to integrate external tools like CloudFormation Guard,
those plugins will continue to work alongside the new built-in engine.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This doesn't feel enough. We should be thinking more on how these tools integrate and relate to each other. Why do we have two different things doing the same stuff?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, this is probably the most important thing this rfc needs to clarify (currently missing). Right now it reads as "don't worry, your existing stuff still works" — but that sidesteps the real question: why are we building a second validation system that overlaps with the first? how do we explain the relationship between the existing CDK Policy Validation and this new Comprehensive Validation? Is Policy Validation the "bring your own engine" extensibility point and Comprehensive Validation the "works out of the box" default? Whats our position going forward, maintaining both?

user should be at least clear about which to use when.

(at Resources/MyLambdaFunction)

// Annotation Errors
[error] [blocker] MyOwnError: Bucket versioning is not enabled
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not crazy about the naming, but good direction

```

Rules with the `.rego` file extension will be automatically loaded into the validation for
that CDK stack or app.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from where?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the file or directory you've supplied in the sources property

Comment on lines +471 to +477
#### Validations:

* [Existing] [cdk synth] Construct Library Validation — handwritten errors that occur during synthesis
* [New] [cdk synth] Offline Validation — synthesized CloudFormation Template
is evaluated against both base and custom rules
* [Existing] [cdk deploy] CFN Early Validation — CFN changesets are validated
* [Existing] [cdk deploy] CFN Deploy-Time Errors — Errors that occur during CFN deployment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is missing so many other things

Comment on lines +525 to +531
3. Use the existing CDK Policy Validation plugin system: rejected because it requires
external tool installation (e.g. CloudFormation Guard CLI, OPA binary),
does not resolve CloudFormation intrinsic functions,
does not ship with a default rule set,
and adds friction for teams that just want validation to work out of the box.
The existing plugin system will continue to work alongside the new built-in engine
for teams that need integration with specific external tools.
Copy link
Copy Markdown
Contributor

@mrgrain mrgrain Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really a reason, because it's all things we can fix:

  • we can bring our own engine (instead of building it inside the Toolkit), in fact later on you call out a wasm based engine.
  • If we can resolve intrinsics in the Toolkit, we can bring it to that engine
  • we can ship a default ruleset
  • we can make the default rules work out of the bo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the points/goals of the current policy validation was that you would bring your own engine, so if we start supplying our engine then that is substantially different already. Which also means we can't really do anything to preprocess the data (because your engine already expects a certain format, most likely a CFN template), and we can't ship a default ruleset into your engine.

We could do "our engine is just another validation plugin, one that happens to be enabled by default", but that runs into implementation location concerns I would say (part of the toolkit or part of the library?)

Also I'm not sure if we are "just another plugin" we would want you to be able to disable it, because we probably have L1 validations in our engine that are going to replace handwritten L2 validations, and we would want those to always take effect.


Having said that, I do agree that it's worth doing a slightly deeper dive into the current API/system and seeing how they synergize: are their APIs that carry over, can apply to both, would not apply to both, etc.

Copy link
Copy Markdown
Contributor

@rix0rrr rix0rrr Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we could consider moving the current Policy-as-Code mechanism to the toolkit as well.

Momo made a good point that we should have a look at the original Policy-as-Code RFC and see if there's considerations or motivations in there that we could learn from while designing this.

Maybe it's Fine(tm) as a new default dependency of the construct library. What are your thoughts @kaizencc ?

Copy link
Copy Markdown
Contributor

@rix0rrr rix0rrr Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we put the engine into the toolkit, then how do the [default] rules make it to the engine? They should probably be version-locked to the library, or separately versioned by the user. We wouldn't want a CLI update to all of a sudden tell a user "error error you need to add suppression to your source code". (On the other hand, if we tie the rules to the library version, a library update would do the same, and it would still be considered breaking. So is that desirable? Do we need an additional method of versioning the rules?)

How big are they? Can we put them into every cloud assembly without blowing up the size there? Or can we put a version reference into the cloud assembly if they are large?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To a first approximation I guess the rules will be ~15-20MB, judging by the size of a dump of the Uluru schema)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amortizing the engine startup cost in a potential future LSP/cdk watch might be an argument one way or the other.

Today, validation happens when your CDK app is built at synth time
and when your CFN ChangeSet is created at deploy time (by default).
We will add a offline validation hook to shift left failures
that previously surface during AWS CloudFormation Deployment;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the offline validation already happened in cfn deployment today? in which operation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors surface during CFN Deployment (a subsection of cdk deploy, i guess). offline validation catches those errors earlier in the process (during cdk synth, which always proceeds deploy). i feel like this sentence is clear but if others disagree i can workshop it

It ships with a comprehensive default rule set that checks for common misconfigurations
including invalid property values, deprecated runtimes, overly permissive IAM policies,
missing encryption, and cross-resource dependency issues.
The engine is compiled to WebAssembly and adds under 1 second to synthesis time.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super comfortable of committing this 1s, we can specify it after ur research

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turned this to 'Y' until we have more information


CDK Comprehensive Validation is available today.
Upgrade to the latest AWS CDK CLI and run `cdk synth` —
offline validation runs automatically.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can offline validation be disabled during cdk synth?

Comment on lines +108 to +110
If you are already using the existing [Policy Validation](https://docs.aws.amazon.com/cdk/v2/guide/policy-validation-synthesis.html) plugin system
to integrate external tools like CloudFormation Guard,
those plugins will continue to work alongside the new built-in engine.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, this is probably the most important thing this rfc needs to clarify (currently missing). Right now it reads as "don't worry, your existing stuff still works" — but that sidesteps the real question: why are we building a second validation system that overlaps with the first? how do we explain the relationship between the existing CDK Policy Validation and this new Comprehensive Validation? Is Policy Validation the "bring your own engine" extensibility point and Comprehensive Validation the "works out of the box" default? Whats our position going forward, maintaining both?

user should be at least clear about which to use when.

```

```bash
cdk synth --custom-rules ./my-local-rules
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had envisioned custom rules as being defined in the CDK app.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean custom rules written in TS or rego files referenced in the CDK App? I explored the latter in the 'alternatives considered' section here. The tl;dr: is that suppressions have a construct scope Validations.of(construct).acknowledge() but I didn't think it made sense for custom rules to have the same scope: Validations.of(construct).addRules() because what would it mean to add a rule onto a specific CDK construct?

reading this, i'm now thinking it can make sense and we just need to glue a filter to that rule to limit it to the specific resourceId via construct metatdata when we provide it to the engine. is that what you were envisioning?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean custom rules written in TS or rego files referenced in the CDK App?

Either or both are fine. We can imagine how we would have constructs that synthesize to rego/guard rules. But let's start with the simple thing, and do the complex thing later.

This will be deprecated in favor of a unified `Validations.of()` API that covers
suppression for all types of CDK errors and warnings.

The main `Validations` method will be `acknowledge`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a layer of indirection for this plugin, specially. We should make it configurable like any other plugin:

const app = new App({
  policyValidation: [
    // By default, this plugin is enabled with default configuration.
    // This call creates a new instance with custom configuration.
    OfflineValidationPlugin.configure({
	    customRules: [...],
	    exemptions: [...],
	    // ...whatever else 
    })
  ],
});

If we really need to restrict validation to a given sub-tree, this Validation API should be applicable to any plugin (including third party ones). Something like:

Annotations.of(construct).acknowledge({
	pluginName: 'cfn-offline-validation', 
	ruleIds: [...]
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, we are the ones constructing the OfflineValidationPlugin because its not meant to be turned off. That's why the API proposed focuses on exposing custom rules and suppressions as a separate API that we pipe into the plugin. To do what you're proposing, we'd need a way of differentiating when a user supplies a "custom" OfflineValidationPlugin the override the default and how do we verify that this "custom" plugin does what its supposed to? The plugin owns the validate command which does all the heavy lifting, so we'd need to make sure that method isn't tampered with. All that is to say, allowing for custom plugins this way is a can of worms.

Agree that the validation API should be generic enough to support other validation plugins, but I don't see a use case for that yet. At any rate, i'll leave pluginName? as an option on acknowledge

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I'm suggesting a static factory method. This would be the only way to instantiate this plugin. And once it's called, we can mutate some private state to indicate to the framework that we are in custom mode rather than default mode. We are still in control because all the user can do is provide some pre-defined configuration parameters, but the implementation is our own. And it will be always on: either in custom or default mode.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see and its a good idea that I hadn't considered. However --

I think with this we are conflicting different layers of abstractions here. if the config layer (i.e. custom rules + suppressions) lives on the OfflineValidationPlugin, then they cannot be used to encompass Annotation suppressions and the customer must know what kind of validation issue they are receiving when it shows up. It feels much cleaner to have a unified API to handle both these types of suppressions.

Copy link
Copy Markdown

@go-to-k go-to-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this feature! I've left a few comments on things I'm curious about.

we recommend publishing them in your package manager of choice.
Rego files are plain text — no compilation or runtime is involved.

They can be imported via your CDK App via a new `Validations.of()` API:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the traditional CDK, helper classes within core modules followed a naming convention of Xxxs / IXxx, as shown below:

  • Aspects / IAspect
  • PropertyInjectors / IPropertyInjector
  • Mixins / IMixin

However, the Validations mentioned here are not classes that handle IValidation, are they? (IValidation is an interface used with the addValidation method.)

In other words, this does not follow the existing naming convention. I thought it might be a bit confusing, but is this acceptable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Comment on lines +83 to +86
* **Offline Validation (NEW)** — Immediately after synthesis, the new built-in validation engine evaluates
your CloudFormation template against hundreds of rules. Unlike external tools, this engine resolves
CloudFormation intrinsic functions natively, so it can catch issues that currently available CloudFormation
analysis tools miss.
Copy link
Copy Markdown

@go-to-k go-to-k Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific types of issues can be caught by resolving CloudFormation intrinsic functions natively?

Also, since this has an "offline" prefix, it shouldn't perform any network communication with SDK, so does that mean it doesn't validate the information returned by the actual resource such as !Ref and !GetAtt?

Comment on lines +283 to +285
Offline Validation will be built natively into the CDK Toolkit. The
`synth` method (and other methods that synthesize) will automatically validate.
A new `validate` method will be available that mirrors the `cdk validate` CLI command.
Copy link
Copy Markdown

@go-to-k go-to-k Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is included by default, is it possible that a CDK application that was previously successful could start failing after upgrading the CDK CLI version including this feature? Or does it mean that by default no rules are executed, or perhaps only warnings are issued?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants