Skip to content

sh-cloud-software/cloudwatch-metrics-insights-query-alarm-cdk-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CloudWatch Metrics Insights Query CDK Example

This CDK example demonstrates how to use CloudWatch Metrics Insights queries to create a single alarm that monitors multiple Lambda functions based on resource tags. This approach solves a common problem when working with large CloudFormation stacks: hitting resource limits when creating individual alarms for many resources.

The Problem

In large-scale AWS applications with many Lambda functions, it's common to want standard monitoring for metrics like Errors, Throttles, Duration, etc. The traditional approach is to create 2-3 alarms per Lambda function:

// Traditional approach - creates separate alarms for each function
const function1 = new lambda.Function(this, 'Function1', {...});
const function2 = new lambda.Function(this, 'Function2', {...});
const function3 = new lambda.Function(this, 'Function3', {...});

// This quickly adds up to many CloudFormation resources!
new cloudwatch.Alarm(this, 'Function1ErrorAlarm', {...});
new cloudwatch.Alarm(this, 'Function1ThrottleAlarm', {...});
new cloudwatch.Alarm(this, 'Function2ErrorAlarm', {...});
new cloudwatch.Alarm(this, 'Function2ThrottleAlarm', {...});
new cloudwatch.Alarm(this, 'Function3ErrorAlarm', {...});
new cloudwatch.Alarm(this, 'Function3ThrottleAlarm', {...});

The Issue: CloudFormation stacks have a 500 resource limit. With 100 Lambda functions and 3 alarms each, you've already consumed 400 resources just for monitoring! Not event counting the IAM role and policy resources.

Typical Alternative: Separate Automation Function

Deploy a Lambda function that listens to CloudFormation stack events and automatically creates/deletes CloudWatch alarms:

Drawbacks:

  • Requires additional Lambda function and event handling logic
  • Alarms are created outside of CloudFormation (not infrastructure-as-code)
  • Complex state management and synchronization
  • Harder to audit and version control
  • Additional maintenance burden

The Better Solution: Metrics Insights Queries

CloudWatch Metrics Insights queries allow you to create a single alarm that monitors multiple resources using SQL-like queries with tag-based filtering. This feature was announced in September 2025.

Benefits:

  • Single alarm monitors multiple resources - drastically reduces CloudFormation resource count
  • Tag-based filtering - easily group resources by team, environment, criticality, etc.
  • Simple and maintainable - no additional automation logic needed
  • Flexible querying - SQL-like syntax with aggregations and grouping

What This Example Does

This CDK stack demonstrates the Metrics Insights approach:

  1. Creates a Lambda function using the NodejsFunction construct (automatically bundles TypeScript)
  2. Tags the function with errorMetric=high to indicate it should be monitored
  3. Creates a CloudWatch alarm using a Metrics Insights query that:
    • Selects the sum of Errors metric from all Lambda functions
    • Filters to only functions tagged with errorMetric=high
    • Groups results by the CloudFormation logical ID
    • Orders by error count (descending)

The Metrics Insights Query

Example of CloudWatch Metrics Insights Query

SELECT SUM(Errors) FROM "AWS/Lambda" WHERE tag."errorMetric" = \'high\' GROUP BY tag."aws:cloudformation:logical-id" ORDER BY SUM() DESC',

This single query can monitor errors across all Lambda functions with the errorMetric=high tag, regardless of how many you have! It then returns multiple time series for each value of aws:cloudformation:logical-id, ordered by the sum of errors descending.

The Resulting Alarm

Example of CloudWatch Alarm with Constributors using Metrics Insights Query as source

If you specify a target action for the alarm and e.g. forward it to an SNS topic, you'll receive an email notification whenever the alarm triggers, including which contributor attributes triggered the alarm. This can look like the following:

{
  "AlarmContributorAttributes": {"tag.aws:cloudformation:logical-id":"SampleFunction123ABC456"},
  // ...
}

If you include more fields in the GROUP BY clause, you'll get more contributor attributes. They allow you to easily identify which function caused the alarm.

Getting Started

⚠️ Before deploying this example, you need to enable resource tags on telemetry data in your AWS CloudWatch settings.

Prerequisites

  • Node.js 18+ installed
  • AWS CLI configured with appropriate credentials
  • AWS CDK CLI installed (npm install -g aws-cdk)

Commands

# Install dependencies
npm install

# Deploy to AWS
npx cdk deploy

After deployment, you'll see outputs including:

  • LambdaFunctionArn - ARN of the deployed Lambda function
  • LambdaFunctionName - Name of the Lambda function
  • AlarmName - Name of the CloudWatch alarm

Testing the Alarm

💡 Ensure you have enabled resource tags on telemetry data in your AWS CloudWatch settings. Also, it may take a few moments until the resource tags are available in CloudWatch. Your metric will not show any results until then.

To test that the alarm works, you can invoke the Lambda function with an event that triggers an error:

# Get the function name from CDK outputs
FUNCTION_NAME=$(aws cloudformation describe-stacks \
  --stack-name CloudwatchMetricsInsightsQueryCdkExampleStack \
  --query 'Stacks[0].Outputs[?OutputKey==`LambdaFunctionName`].OutputValue' \
  --output text)

# Invoke the function with an event that triggers an error
aws lambda invoke \
  --function-name $FUNCTION_NAME \
  --payload '{"foo": true}' \
  response.json

# Invoke it a few times to trigger the alarm threshold
aws lambda invoke --function-name $FUNCTION_NAME --payload '{"foo": true}' response.json

After a few minutes, check the CloudWatch alarm:

aws cloudwatch describe-alarms --alarm-names lambda-error-metric-alarm

Use Cases

This pattern is particularly useful when you want to:

  • Monitor standard metrics (Errors, Throttles) across many Lambda functions
  • Organize monitoring by teams, environments, or criticality levels
  • Stay within CloudFormation resource limits in large stacks
  • Maintain infrastructure-as-code for all monitoring configurations
  • Simplify alarm management as your application scales

Useful Commands

  • npm run build - Compile TypeScript to JavaScript
  • npm run watch - Watch for changes and compile
  • npm run test - Perform the Jest unit tests
  • npx cdk deploy - Deploy this stack to your default AWS account/region
  • npx cdk diff - Compare deployed stack with current state
  • npx cdk synth - Emit the synthesized CloudFormation template
  • npx cdk destroy - Remove the stack from your AWS account

References

About

A CDK example to show CloudWatch Metrics Insights Query based alarms.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors