How to Monitor AWS Resources Using CloudWatch
Learn how to effectively monitor your AWS resources using CloudWatch metrics, alarms, and dashboards.
How to Monitor AWS Resources Using CloudWatch
Amazon CloudWatch is a comprehensive monitoring and observability service that provides real-time monitoring of AWS resources and applications. It collects and tracks metrics, monitors log files, sets alarms, and automatically reacts to changes in your AWS resources.
Understanding CloudWatch Fundamentals
Metrics and Dimensions
CloudWatch metrics structure:
CloudWatch Metrics Components
Component | Description | Example |
---|---|---|
Namespace | Container for metrics | AWS/EC2 |
Metric Name | What is being measured | CPUUtilization |
Dimensions | Metric identity attributes | InstanceId=i-1234567890abcdef0 |
Value | The measured value | 75.5 |
Timestamp | When measurement occurred | 2024-02-29T12:00:00Z |
Unit | Unit of measurement | Percent |
Example CloudFormation template for custom metrics:
Resources: CustomMetricFunction: Type: 'AWS::Lambda::Function' Properties: Handler: index.handler Role: !GetAtt LambdaExecutionRole.Arn Code: ZipFile: | const AWS = require('aws-sdk'); const cloudwatch = new AWS.CloudWatch(); exports.handler = async () => { await cloudwatch.putMetricData({ Namespace: 'CustomMetrics', MetricData: [{ MetricName: 'BusinessTransactions', Value: 100, Unit: 'Count', Dimensions: [{ Name: 'ServiceName', Value: 'OrderProcessing' }] }] }).promise(); }; Runtime: nodejs18.x
Logs and Log Groups
Log data flow architecture:
Example log group configuration:
Resources: ApplicationLogGroup: Type: 'AWS::Logs::LogGroup' Properties: LogGroupName: /aws/application/production RetentionInDays: 30 LogMetricFilter: Type: 'AWS::Logs::MetricFilter' Properties: LogGroupName: !Ref ApplicationLogGroup FilterPattern: '[timestamp, requestId, errorCode, message]' MetricTransformations: - MetricName: ErrorCount MetricNamespace: CustomMetrics MetricValue: '1'
Setting Up Basic Monitoring
Resource Monitoring
Standard metrics by resource type:
Resource Monitoring
Resource | Key Metrics | Default Frequency |
---|---|---|
EC2 | CPU, Network, Disk | 5 minutes |
RDS | CPU, Memory, IOPS | 1 minute |
Lambda | Invocations, Errors, Duration | 1 minute |
ECS | CPU, Memory, Network | 1 minute |
ELB | Requests, Latency, Errors | 1 minute |
Example monitoring configuration:
Resources: MonitoringRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: ec2.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy EC2Instance: Type: 'AWS::EC2::Instance' Properties: InstanceType: t3.micro IamInstanceProfile: !Ref MonitoringInstanceProfile UserData: Fn::Base64: !Sub | #!/bin/bash yum install -y amazon-cloudwatch-agent /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:/AmazonCloudWatch/Config
Alarm Configuration
Alarm states and transitions:
Example alarm configuration:
Resources: HighCPUAlarm: Type: 'AWS::CloudWatch::Alarm' Properties: AlarmName: HighCPUUtilization AlarmDescription: CPU utilization is too high MetricName: CPUUtilization Namespace: AWS/EC2 Dimensions: - Name: InstanceId Value: !Ref EC2Instance Statistic: Average Period: 300 EvaluationPeriods: 2 Threshold: 80 ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref AlertSNSTopic
Advanced Monitoring Techniques
Custom Metrics and Dashboards
Dashboard layout example:
Example dashboard configuration:
Resources: MonitoringDashboard: Type: 'AWS::CloudWatch::Dashboard' Properties: DashboardName: ApplicationMonitoring DashboardBody: !Sub | { "widgets": [ { "type": "metric", "properties": { "metrics": [ ["AWS/EC2", "CPUUtilization", "InstanceId", "${EC2Instance}"], [".", "NetworkIn", ".", "."], [".", "NetworkOut", ".", "."] ], "period": 300, "stat": "Average", "region": "${AWS::Region}", "title": "EC2 Performance" } } ] }
Log Analytics and Insights
Log analysis workflow:
Performance Monitoring
Application Performance
Application monitoring architecture:
Resource Utilization
Resource metrics configuration:
Resources: MetricsCollection: Type: 'AWS::AutoScaling::AutoScalingGroup' Properties: MetricsCollection: - Granularity: 1Minute Metrics: - GroupMinSize - GroupMaxSize - GroupDesiredCapacity - GroupInServiceInstances - GroupPendingInstances - GroupTerminatingInstances - GroupTotalInstances
Security Monitoring
Security Events and Compliance
Security monitoring workflow:
Access and Authentication
Authentication monitoring setup:
Resources: LoginMonitoringFunction: Type: 'AWS::Lambda::Function' Properties: Handler: index.handler Role: !GetAtt LambdaExecutionRole.Arn Code: ZipFile: | exports.handler = async (event) => { if (event.detail.eventName === 'ConsoleLogin') { // Process login event console.log('Console login detected:', event); } }; Runtime: nodejs18.x LoginEventRule: Type: 'AWS::Events::Rule' Properties: EventPattern: source: - aws.signin detail-type: - AWS Console Sign In via CloudTrail Targets: - Arn: !GetAtt LoginMonitoringFunction.Arn Id: ProcessLoginEvents
Cost Optimization
Cost Monitoring
Cost monitoring dashboard:
Resource Optimization
Example resource cleanup function:
Resources: CleanupFunction: Type: 'AWS::Lambda::Function' Properties: Handler: index.handler Role: !GetAtt LambdaExecutionRole.Arn Code: ZipFile: | const AWS = require('aws-sdk'); exports.handler = async () => { // Find and terminate unused resources const ec2 = new AWS.EC2(); const instances = await ec2.describeInstances({ Filters: [{ Name: 'instance-state-name', Values: ['running'] }] }).promise(); // Add cleanup logic here }; Runtime: nodejs18.x