Conversation
| We will shift left failures that occur during AWS CloudFormation deployment. | ||
| We will validate as much as possible offline during cdk synth | ||
| and supplement with validation that occurs at the beginning of cdk deploy. |
There was a problem hiding this comment.
nit: This reads strange. Is it tenets? Is it a blog post intro?
There was a problem hiding this comment.
supposed to be 1-2 sentences to describe the feature, i have edited it to be that as the rest of the information is duplicated elsewhere.
| and supplement with validation that occurs at the beginning of cdk deploy. | ||
|
|
||
| Today, validation happens when your CDK app is built at synth time | ||
| and when your CFN ChangeSet is created at deploy time (by default). |
There was a problem hiding this comment.
nit: Not just for change sets. and what does by default mean?
There was a problem hiding this comment.
i assume the default changeset deploy mode (instead of --direct). but agree its not clear.
also did you confirm that direct mode (createstack etc) does not have validation?
There was a problem hiding this comment.
my understanding of our validation posture today for our default setup is that we validate
- at compile time during synth
- when we create change set during deploy
there are other validations like annotations and policy-as-code validations but these are supply your own. we have some hand-written annotations, sure, and they get executed post synth as well.
i will be updating the RFC to include the holistic picture of tools customers use to validate their CDK code and how Comprehensive Validation fits in.
There was a problem hiding this comment.
i assume the default changeset deploy mode (instead of --direct). but agree its not clear.
also did you confirm that direct mode (createstack etc) does not have validation?
CFN Early Validation happens when change set is created so --direct does not validate that way as it does not create a change set.
| offline rule checks, construct library errors, and CloudFormation changeset validation — | ||
| into a single invocation. | ||
|
|
||
| ##### Three Layers of Defense |
| * **CFN Early Validation (existing)** — During `cdk deploy`, CloudFormation validates your changeset | ||
| before execution, catching issues like resources that already exist. | ||
|
|
||
| ##### Offline Validation |
There was a problem hiding this comment.
I really dislike that name lol. Is this really different than policy as code?
Maybe Local Validation? Super Early Validation?
There was a problem hiding this comment.
leaving it as "offline" and "online" until we have a better naming system. i don't share your distaste but am happy to workshop. i feel strongly that post-synthesis validation and pre-deploy validation should have monikers that relate to each other, the way "offline" and "online" do.
and I think CFN Early Validation is a misnomer in the world of CDK since it does not happen "early" at all. so i would like to steer away from "Early" and rename our pre-deploy validation something different, even if it just a wrapper around CFN Early Validation.
| If you are already using the existing [Policy Validation](https://docs.aws.amazon.com/cdk/v2/guide/policy-validation-synthesis.html) plugin system | ||
| to integrate external tools like CloudFormation Guard, | ||
| those plugins will continue to work alongside the new built-in engine. |
There was a problem hiding this comment.
thought: This doesn't feel enough. We should be thinking more on how these tools integrate and relate to each other. Why do we have two different things doing the same stuff?
There was a problem hiding this comment.
agree, this is probably the most important thing this rfc needs to clarify (currently missing). Right now it reads as "don't worry, your existing stuff still works" — but that sidesteps the real question: why are we building a second validation system that overlaps with the first? how do we explain the relationship between the existing CDK Policy Validation and this new Comprehensive Validation? Is Policy Validation the "bring your own engine" extensibility point and Comprehensive Validation the "works out of the box" default? Whats our position going forward, maintaining both?
user should be at least clear about which to use when.
| (at Resources/MyLambdaFunction) | ||
|
|
||
| // Annotation Errors | ||
| [error] [blocker] MyOwnError: Bucket versioning is not enabled |
There was a problem hiding this comment.
not crazy about the naming, but good direction
| ``` | ||
|
|
||
| Rules with the `.rego` file extension will be automatically loaded into the validation for | ||
| that CDK stack or app. |
There was a problem hiding this comment.
from the file or directory you've supplied in the sources property
| #### Validations: | ||
|
|
||
| * [Existing] [cdk synth] Construct Library Validation — handwritten errors that occur during synthesis | ||
| * [New] [cdk synth] Offline Validation — synthesized CloudFormation Template | ||
| is evaluated against both base and custom rules | ||
| * [Existing] [cdk deploy] CFN Early Validation — CFN changesets are validated | ||
| * [Existing] [cdk deploy] CFN Deploy-Time Errors — Errors that occur during CFN deployment |
There was a problem hiding this comment.
this is missing so many other things
| 3. Use the existing CDK Policy Validation plugin system: rejected because it requires | ||
| external tool installation (e.g. CloudFormation Guard CLI, OPA binary), | ||
| does not resolve CloudFormation intrinsic functions, | ||
| does not ship with a default rule set, | ||
| and adds friction for teams that just want validation to work out of the box. | ||
| The existing plugin system will continue to work alongside the new built-in engine | ||
| for teams that need integration with specific external tools. |
There was a problem hiding this comment.
This isn't really a reason, because it's all things we can fix:
- we can bring our own engine (instead of building it inside the Toolkit), in fact later on you call out a wasm based engine.
- If we can resolve intrinsics in the Toolkit, we can bring it to that engine
- we can ship a default ruleset
- we can make the default rules work out of the bo
There was a problem hiding this comment.
One of the points/goals of the current policy validation was that you would bring your own engine, so if we start supplying our engine then that is substantially different already. Which also means we can't really do anything to preprocess the data (because your engine already expects a certain format, most likely a CFN template), and we can't ship a default ruleset into your engine.
We could do "our engine is just another validation plugin, one that happens to be enabled by default", but that runs into implementation location concerns I would say (part of the toolkit or part of the library?)
Also I'm not sure if we are "just another plugin" we would want you to be able to disable it, because we probably have L1 validations in our engine that are going to replace handwritten L2 validations, and we would want those to always take effect.
Having said that, I do agree that it's worth doing a slightly deeper dive into the current API/system and seeing how they synergize: are their APIs that carry over, can apply to both, would not apply to both, etc.
There was a problem hiding this comment.
Also, we could consider moving the current Policy-as-Code mechanism to the toolkit as well.
Momo made a good point that we should have a look at the original Policy-as-Code RFC and see if there's considerations or motivations in there that we could learn from while designing this.
Maybe it's Fine(tm) as a new default dependency of the construct library. What are your thoughts @kaizencc ?
There was a problem hiding this comment.
If we put the engine into the toolkit, then how do the [default] rules make it to the engine? They should probably be version-locked to the library, or separately versioned by the user. We wouldn't want a CLI update to all of a sudden tell a user "error error you need to add suppression to your source code". (On the other hand, if we tie the rules to the library version, a library update would do the same, and it would still be considered breaking. So is that desirable? Do we need an additional method of versioning the rules?)
How big are they? Can we put them into every cloud assembly without blowing up the size there? Or can we put a version reference into the cloud assembly if they are large?
There was a problem hiding this comment.
(To a first approximation I guess the rules will be ~15-20MB, judging by the size of a dump of the Uluru schema)
There was a problem hiding this comment.
Amortizing the engine startup cost in a potential future LSP/cdk watch might be an argument one way or the other.
| Today, validation happens when your CDK app is built at synth time | ||
| and when your CFN ChangeSet is created at deploy time (by default). | ||
| We will add a offline validation hook to shift left failures | ||
| that previously surface during AWS CloudFormation Deployment; |
There was a problem hiding this comment.
the offline validation already happened in cfn deployment today? in which operation?
There was a problem hiding this comment.
errors surface during CFN Deployment (a subsection of cdk deploy, i guess). offline validation catches those errors earlier in the process (during cdk synth, which always proceeds deploy). i feel like this sentence is clear but if others disagree i can workshop it
| It ships with a comprehensive default rule set that checks for common misconfigurations | ||
| including invalid property values, deprecated runtimes, overly permissive IAM policies, | ||
| missing encryption, and cross-resource dependency issues. | ||
| The engine is compiled to WebAssembly and adds under 1 second to synthesis time. |
There was a problem hiding this comment.
not super comfortable of committing this 1s, we can specify it after ur research
There was a problem hiding this comment.
turned this to 'Y' until we have more information
|
|
||
| CDK Comprehensive Validation is available today. | ||
| Upgrade to the latest AWS CDK CLI and run `cdk synth` — | ||
| offline validation runs automatically. |
There was a problem hiding this comment.
can offline validation be disabled during cdk synth?
| If you are already using the existing [Policy Validation](https://docs.aws.amazon.com/cdk/v2/guide/policy-validation-synthesis.html) plugin system | ||
| to integrate external tools like CloudFormation Guard, | ||
| those plugins will continue to work alongside the new built-in engine. |
There was a problem hiding this comment.
agree, this is probably the most important thing this rfc needs to clarify (currently missing). Right now it reads as "don't worry, your existing stuff still works" — but that sidesteps the real question: why are we building a second validation system that overlaps with the first? how do we explain the relationship between the existing CDK Policy Validation and this new Comprehensive Validation? Is Policy Validation the "bring your own engine" extensibility point and Comprehensive Validation the "works out of the box" default? Whats our position going forward, maintaining both?
user should be at least clear about which to use when.
| ``` | ||
|
|
||
| ```bash | ||
| cdk synth --custom-rules ./my-local-rules |
There was a problem hiding this comment.
I had envisioned custom rules as being defined in the CDK app.
There was a problem hiding this comment.
Do you mean custom rules written in TS or rego files referenced in the CDK App? I explored the latter in the 'alternatives considered' section here. The tl;dr: is that suppressions have a construct scope Validations.of(construct).acknowledge() but I didn't think it made sense for custom rules to have the same scope: Validations.of(construct).addRules() because what would it mean to add a rule onto a specific CDK construct?
reading this, i'm now thinking it can make sense and we just need to glue a filter to that rule to limit it to the specific resourceId via construct metatdata when we provide it to the engine. is that what you were envisioning?
There was a problem hiding this comment.
Do you mean custom rules written in TS or rego files referenced in the CDK App?
Either or both are fine. We can imagine how we would have constructs that synthesize to rego/guard rules. But let's start with the simple thing, and do the complex thing later.
| This will be deprecated in favor of a unified `Validations.of()` API that covers | ||
| suppression for all types of CDK errors and warnings. | ||
|
|
||
| The main `Validations` method will be `acknowledge`, |
There was a problem hiding this comment.
I don't think we need a layer of indirection for this plugin, specially. We should make it configurable like any other plugin:
const app = new App({
policyValidation: [
// By default, this plugin is enabled with default configuration.
// This call creates a new instance with custom configuration.
OfflineValidationPlugin.configure({
customRules: [...],
exemptions: [...],
// ...whatever else
})
],
});If we really need to restrict validation to a given sub-tree, this Validation API should be applicable to any plugin (including third party ones). Something like:
Annotations.of(construct).acknowledge({
pluginName: 'cfn-offline-validation',
ruleIds: [...]
});There was a problem hiding this comment.
The problem is, we are the ones constructing the OfflineValidationPlugin because its not meant to be turned off. That's why the API proposed focuses on exposing custom rules and suppressions as a separate API that we pipe into the plugin. To do what you're proposing, we'd need a way of differentiating when a user supplies a "custom" OfflineValidationPlugin the override the default and how do we verify that this "custom" plugin does what its supposed to? The plugin owns the validate command which does all the heavy lifting, so we'd need to make sure that method isn't tampered with. All that is to say, allowing for custom plugins this way is a can of worms.
Agree that the validation API should be generic enough to support other validation plugins, but I don't see a use case for that yet. At any rate, i'll leave pluginName? as an option on acknowledge
There was a problem hiding this comment.
That's why I'm suggesting a static factory method. This would be the only way to instantiate this plugin. And once it's called, we can mutate some private state to indicate to the framework that we are in custom mode rather than default mode. We are still in control because all the user can do is provide some pre-defined configuration parameters, but the implementation is our own. And it will be always on: either in custom or default mode.
There was a problem hiding this comment.
I see and its a good idea that I hadn't considered. However --
I think with this we are conflicting different layers of abstractions here. if the config layer (i.e. custom rules + suppressions) lives on the OfflineValidationPlugin, then they cannot be used to encompass Annotation suppressions and the customer must know what kind of validation issue they are receiving when it shows up. It feels much cleaner to have a unified API to handle both these types of suppressions.
go-to-k
left a comment
There was a problem hiding this comment.
I really like this feature! I've left a few comments on things I'm curious about.
| we recommend publishing them in your package manager of choice. | ||
| Rego files are plain text — no compilation or runtime is involved. | ||
|
|
||
| They can be imported via your CDK App via a new `Validations.of()` API: |
There was a problem hiding this comment.
In the traditional CDK, helper classes within core modules followed a naming convention of Xxxs / IXxx, as shown below:
- Aspects / IAspect
- PropertyInjectors / IPropertyInjector
- Mixins / IMixin
However, the Validations mentioned here are not classes that handle IValidation, are they? (IValidation is an interface used with the addValidation method.)
In other words, this does not follow the existing naming convention. I thought it might be a bit confusing, but is this acceptable?
| * **Offline Validation (NEW)** — Immediately after synthesis, the new built-in validation engine evaluates | ||
| your CloudFormation template against hundreds of rules. Unlike external tools, this engine resolves | ||
| CloudFormation intrinsic functions natively, so it can catch issues that currently available CloudFormation | ||
| analysis tools miss. |
There was a problem hiding this comment.
What specific types of issues can be caught by resolving CloudFormation intrinsic functions natively?
Also, since this has an "offline" prefix, it shouldn't perform any network communication with SDK, so does that mean it doesn't validate the information returned by the actual resource such as !Ref and !GetAtt?
| Offline Validation will be built natively into the CDK Toolkit. The | ||
| `synth` method (and other methods that synthesize) will automatically validate. | ||
| A new `validate` method will be available that mirrors the `cdk validate` CLI command. |
There was a problem hiding this comment.
If this is included by default, is it possible that a CDK application that was previously successful could start failing after upgrading the CDK CLI version including this feature? Or does it mean that by default no rules are executed, or perhaps only warnings are issued?
This is a request for comments about CDK Comprehensive Validation. See #897 for
additional details.
APIs are signed off by @rix0rrr .
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license