Azure Monitoring and Observability: A Complete Implementation Guide

Effective monitoring and observability in Azure are crucial for maintaining reliable and performant cloud applications. This guide covers implementation details for Azure Monitor, Log Analytics, and Application Insights.

Monitoring Components Overview

Key monitoring services in Azure:

Service	Purpose	Key Features
Azure Monitor	Platform metrics	Real-time monitoring, alerts
Log Analytics	Log management	Query, analysis, retention
Application Insights	Application monitoring	APM, user analytics
Network Watcher	Network monitoring	Connectivity, performance

Azure Monitor Implementation

Metric Collection

# Enable diagnostic settings
Set-AzDiagnosticSetting `
    -ResourceId $vm.Id `
    -WorkspaceId $workspaceId `
    -Enabled $true `
    -Category @("AllMetrics") `
    -MetricCategory @(
        "CPU",
        "Memory",
        "Network",
        "Disk"
    )

# Create metric alert
New-AzMetricAlertRule `
    -Name "HighCPU" `
    -ResourceGroupName "MonitoringRG" `
    -Location "East US" `
    -TargetResourceId $vm.Id `
    -MetricName "Percentage CPU" `
    -Operator GreaterThan `
    -Threshold 90 `
    -WindowSize "00:05:00" `
    -TimeAggregationOperator Average

Alert Configuration

{
    "name": "HighMemoryAlert",
    "type": "Microsoft.Insights/metricAlerts",
    "location": "global",
    "properties": {
        "description": "Alert when memory usage exceeds 90%",
        "severity": 2,
        "enabled": true,
        "scopes": ["[parameters('resourceId')]"],
        "evaluationFrequency": "PT1M",
        "windowSize": "PT5M",
        "criteria": {
            "odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria",
            "allOf": [
                {
                    "name": "HighMemory",
                    "metricName": "Available Memory Bytes",
                    "operator": "LessThan",
                    "threshold": 1073741824,
                    "timeAggregation": "Average"
                }
            ]
        },
        "actions": [
            {
                "actionGroupId": "[parameters('actionGroupId')]"
            }
        ]
    }
}

Log Analytics Implementation

Workspace Configuration

# Create Log Analytics workspace
New-AzOperationalInsightsWorkspace `
    -ResourceGroupName "MonitoringRG" `
    -Name "MyWorkspace" `
    -Location "East US" `
    -Sku "PerGB2018"

# Configure data retention
Set-AzOperationalInsightsWorkspace `
    -ResourceGroupName "MonitoringRG" `
    -Name "MyWorkspace" `
    -RetentionInDays 90

Query Examples

// CPU usage by VM
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| render timechart

// Failed authentication attempts
SecurityEvent
| where EventID == 4625
| summarize FailedAttempts = count() by TargetAccount, bin(TimeGenerated, 1h)
| order by FailedAttempts desc

// Storage account operations
StorageBlobLogs
| where OperationName contains "Write"
| summarize DataWritten = sum(RequestBodySize) by AccountName, bin(TimeGenerated, 1d)

Application Insights Integration

SDK Implementation

public void ConfigureServices(IServiceCollection services)
{
    services.AddApplicationInsightsTelemetry(Configuration["APPINSIGHTS_CONNECTIONSTRING"]);
    
    services.ConfigureTelemetryModule<DependencyTrackingTelemetryModule>((module, o) =>
    {
        module.EnableSqlCommandTextInstrumentation = true;
    });
    
    services.ConfigureTelemetryModule<RequestTrackingTelemetryModule>((module, o) =>
    {
        module.CollectionOptions.TrackExceptions = true;
    });
}

Custom Telemetry

public class OrderController : Controller
{
    private readonly TelemetryClient _telemetry;
    
    public OrderController(TelemetryClient telemetry)
    {
        _telemetry = telemetry;
    }
    
    public async Task<IActionResult> ProcessOrder(Order order)
    {
        var stopwatch = Stopwatch.StartNew();
        
        try
        {
            // Process order
            _telemetry.TrackEvent("OrderProcessed", new Dictionary<string, string>
            {
                { "OrderId", order.Id },
                { "Amount", order.Amount.ToString() }
            });
            
            return Ok();
        }
        catch (Exception ex)
        {
            _telemetry.TrackException(ex);
            throw;
        }
        finally
        {
            stopwatch.Stop();
            _telemetry.TrackMetric("OrderProcessingTime", stopwatch.ElapsedMilliseconds);
        }
    }
}

Performance Monitoring

Performance Metrics

Metric	Threshold	Action
CPU Usage	90%	Scale out
Memory Usage	85%	Investigate leaks
Response Time	2 seconds	Optimize code
Error Rate	1%	Debug issues

Performance Dashboard

{
    "name": "PerformanceDashboard",
    "type": "Microsoft.Portal/dashboards",
    "properties": {
        "lenses": {
            "0": {
                "order": 0,
                "parts": {
                    "0": {
                        "position": {
                            "x": 0,
                            "y": 0,
                            "colSpan": 6,
                            "rowSpan": 4
                        },
                        "metadata": {
                            "inputs": [
                                {
                                    "name": "resourceId",
                                    "value": "[parameters('appServiceId')]"
                                },
                                {
                                    "name": "timeframe",
                                    "value": "Last24Hours"
                                }
                            ],
                            "type": "Extension/Microsoft_Azure_Monitoring/PartType/MetricsChartPart"
                        }
                    }
                }
            }
        }
    }
}

Alerting Strategy

Alert Configuration Matrix

Severity	Response Time	Notification Channel
Critical	5 minutes	SMS, Email
Warning	15 minutes	Email
Information	1 hour	Email digest
Error	10 minutes	Teams, Email

Best Practices Summary

Monitoring Strategy
- Define monitoring objectives
- Implement proper retention
- Configure appropriate alerts
- Regular monitoring reviews
Data Collection
- Collect relevant metrics
- Implement proper sampling
- Configure custom metrics
- Optimize data volume
Alert Management
- Define severity levels
- Configure proper thresholds
- Implement alert routing
- Regular alert reviews
Performance Monitoring
- Monitor key metrics
- Set up baselines
- Configure auto-scaling
- Regular performance reviews

Troubleshooting Guide

Common monitoring issues and solutions:

Data Collection Issues
- Check agent status
- Verify connectivity
- Review collection rules
- Check quotas
Alert Problems
- Verify alert conditions
- Check notification channels
- Review alert history
- Test alert rules
Performance Issues
- Analyze metrics
- Review logs
- Check dependencies
- Monitor resources

Next Steps

After implementing monitoring:

Set up dashboards
Configure automated responses
Implement log analytics
Train operations team
Regular monitoring reviews

Remember to regularly review and update your monitoring implementation to maintain optimal visibility and control.

Azure Monitor: Complete Monitoring Solution

Azure Monitoring and Observability: A Complete Implementation Guide

Monitoring Components Overview

Azure Monitor Implementation

Metric Collection

Alert Configuration

Log Analytics Implementation

Workspace Configuration

Query Examples

Application Insights Integration

SDK Implementation

Custom Telemetry

Performance Monitoring

Performance Metrics

Performance Dashboard

Alerting Strategy

Alert Configuration Matrix

Best Practices Summary

Troubleshooting Guide

Next Steps

Related Posts

Kubernetes Monitoring and Observability: A Complete Guide

Getting Started with Azure: A Comprehensive Guide for Beginners

Azure Virtual Machines: Complete Implementation Guide

Deploying Applications with Azure App Service

Azure Kubernetes Service (AKS): A Complete Implementation Guide

Serverless Computing with Azure Functions

Azure Database Services: Choosing the Right Database

Azure Networking: Architecture and Best Practices

Azure Security Center and Microsoft Defender for Cloud

Azure Kubernetes Service (AKS): Enterprise-Scale Implementation