How to Monitor AWS Resources Using CloudWatch
AWS

How to Monitor AWS Resources Using CloudWatch

Learn how to effectively monitor your AWS resources using CloudWatch metrics, alarms, and dashboards.

January 6, 2024
DevHub Team
5 min read

How to Monitor AWS Resources Using CloudWatch

Amazon CloudWatch is a comprehensive monitoring and observability service that provides real-time monitoring of AWS resources and applications. It collects and tracks metrics, monitors log files, sets alarms, and automatically reacts to changes in your AWS resources.

flowchart TB A[AWS Resources] --> B[CloudWatch] B --> C[Metrics] B --> D[Logs] B --> E[Alarms] C --> F[Dashboards] D --> F E --> G[Actions] G --> H[SNS] G --> I[Auto Scaling] G --> J[Lambda] style B fill:#f96,stroke:#333,stroke-width:2px style F fill:#bbf,stroke:#333,stroke-width:2px

Understanding CloudWatch Fundamentals

Metrics and Dimensions

CloudWatch metrics structure:

CloudWatch Metrics Components

ComponentDescriptionExample
NamespaceContainer for metricsAWS/EC2
Metric NameWhat is being measuredCPUUtilization
DimensionsMetric identity attributesInstanceId=i-1234567890abcdef0
ValueThe measured value75.5
TimestampWhen measurement occurred2024-02-29T12:00:00Z
UnitUnit of measurementPercent

Example CloudFormation template for custom metrics:

Resources: CustomMetricFunction: Type: 'AWS::Lambda::Function' Properties: Handler: index.handler Role: !GetAtt LambdaExecutionRole.Arn Code: ZipFile: | const AWS = require('aws-sdk'); const cloudwatch = new AWS.CloudWatch(); exports.handler = async () => { await cloudwatch.putMetricData({ Namespace: 'CustomMetrics', MetricData: [{ MetricName: 'BusinessTransactions', Value: 100, Unit: 'Count', Dimensions: [{ Name: 'ServiceName', Value: 'OrderProcessing' }] }] }).promise(); }; Runtime: nodejs18.x

Logs and Log Groups

Log data flow architecture:

graph LR A[Applications] -->|Log Events| B[CloudWatch Logs] C[Lambda] -->|Log Events| B D[EC2] -->|Log Events| B B -->|Filter| E[Metric Filters] B -->|Stream| F[Kinesis] B -->|Archive| G[S3] E -->|Alert| H[Alarms] style B fill:#f96,stroke:#333,stroke-width:2px

Example log group configuration:

Resources: ApplicationLogGroup: Type: 'AWS::Logs::LogGroup' Properties: LogGroupName: /aws/application/production RetentionInDays: 30 LogMetricFilter: Type: 'AWS::Logs::MetricFilter' Properties: LogGroupName: !Ref ApplicationLogGroup FilterPattern: '[timestamp, requestId, errorCode, message]' MetricTransformations: - MetricName: ErrorCount MetricNamespace: CustomMetrics MetricValue: '1'

Setting Up Basic Monitoring

Resource Monitoring

Standard metrics by resource type:

Resource Monitoring

ResourceKey MetricsDefault Frequency
EC2CPU, Network, Disk5 minutes
RDSCPU, Memory, IOPS1 minute
LambdaInvocations, Errors, Duration1 minute
ECSCPU, Memory, Network1 minute
ELBRequests, Latency, Errors1 minute

Example monitoring configuration:

Resources: MonitoringRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: ec2.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy EC2Instance: Type: 'AWS::EC2::Instance' Properties: InstanceType: t3.micro IamInstanceProfile: !Ref MonitoringInstanceProfile UserData: Fn::Base64: !Sub | #!/bin/bash yum install -y amazon-cloudwatch-agent /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:/AmazonCloudWatch/Config

Alarm Configuration

Alarm states and transitions:

stateDiagram-v2 [*] --> OK OK --> ALARM: Threshold Breached ALARM --> OK: Threshold Restored OK --> INSUFFICIENT_DATA: Missing Data INSUFFICIENT_DATA --> OK: Data Resumed ALARM --> INSUFFICIENT_DATA: Missing Data

Example alarm configuration:

Resources: HighCPUAlarm: Type: 'AWS::CloudWatch::Alarm' Properties: AlarmName: HighCPUUtilization AlarmDescription: CPU utilization is too high MetricName: CPUUtilization Namespace: AWS/EC2 Dimensions: - Name: InstanceId Value: !Ref EC2Instance Statistic: Average Period: 300 EvaluationPeriods: 2 Threshold: 80 ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref AlertSNSTopic

Advanced Monitoring Techniques

Custom Metrics and Dashboards

Dashboard layout example:

graph TB subgraph Dashboard A[CPU Metrics] --> E[Overview] B[Memory Metrics] --> E C[Network Metrics] --> E D[Custom Metrics] --> E end subgraph Widgets E --> F[Line Graphs] E --> G[Counters] E --> H[Heat Maps] end

Example dashboard configuration:

Resources: MonitoringDashboard: Type: 'AWS::CloudWatch::Dashboard' Properties: DashboardName: ApplicationMonitoring DashboardBody: !Sub | { "widgets": [ { "type": "metric", "properties": { "metrics": [ ["AWS/EC2", "CPUUtilization", "InstanceId", "${EC2Instance}"], [".", "NetworkIn", ".", "."], [".", "NetworkOut", ".", "."] ], "period": 300, "stat": "Average", "region": "${AWS::Region}", "title": "EC2 Performance" } } ] }

Log Analytics and Insights

Log analysis workflow:

sequenceDiagram participant App participant CloudWatch participant Insights participant Lambda App->>CloudWatch: Send logs CloudWatch->>Insights: Query logs Insights->>Lambda: Process results Lambda->>CloudWatch: Create metrics CloudWatch->>CloudWatch: Trigger alarms

Performance Monitoring

Application Performance

Application monitoring architecture:

graph TB subgraph Application Layer A[Web Servers] B[App Servers] C[Databases] end subgraph Monitoring Layer D[CloudWatch] E[X-Ray] F[ServiceLens] end A --> D B --> D C --> D D --> E E --> F

Resource Utilization

Resource metrics configuration:

Resources: MetricsCollection: Type: 'AWS::AutoScaling::AutoScalingGroup' Properties: MetricsCollection: - Granularity: 1Minute Metrics: - GroupMinSize - GroupMaxSize - GroupDesiredCapacity - GroupInServiceInstances - GroupPendingInstances - GroupTerminatingInstances - GroupTotalInstances

Security Monitoring

Security Events and Compliance

Security monitoring workflow:

graph LR A[CloudTrail] -->|Logs| B[CloudWatch] C[GuardDuty] -->|Findings| B D[Security Hub] -->|Alerts| B B -->|Analyze| E[Security Team] B -->|AutoRemediate| F[Lambda]

Access and Authentication

Authentication monitoring setup:

Resources: LoginMonitoringFunction: Type: 'AWS::Lambda::Function' Properties: Handler: index.handler Role: !GetAtt LambdaExecutionRole.Arn Code: ZipFile: | exports.handler = async (event) => { if (event.detail.eventName === 'ConsoleLogin') { // Process login event console.log('Console login detected:', event); } }; Runtime: nodejs18.x LoginEventRule: Type: 'AWS::Events::Rule' Properties: EventPattern: source: - aws.signin detail-type: - AWS Console Sign In via CloudTrail Targets: - Arn: !GetAtt LoginMonitoringFunction.Arn Id: ProcessLoginEvents

Cost Optimization

Cost Monitoring

Cost monitoring dashboard:

graph TB A[Cost Explorer API] --> B[CloudWatch] C[Budget API] --> B B --> D[Cost Dashboard] D --> E[Daily Costs] D --> F[Monthly Trends] D --> G[Service Breakdown] D --> H[Budget Alerts]

Resource Optimization

Example resource cleanup function:

Resources: CleanupFunction: Type: 'AWS::Lambda::Function' Properties: Handler: index.handler Role: !GetAtt LambdaExecutionRole.Arn Code: ZipFile: | const AWS = require('aws-sdk'); exports.handler = async () => { // Find and terminate unused resources const ec2 = new AWS.EC2(); const instances = await ec2.describeInstances({ Filters: [{ Name: 'instance-state-name', Values: ['running'] }] }).promise(); // Add cleanup logic here }; Runtime: nodejs18.x
CloudWatch
Monitoring
Alerts