AWS EC2-GPU Cost Optimization Scans (Beta)
Overview
GPU resources are among the most expensive in cloud infrastructure - yet GPU waste is largely invisible in billing data. AWS embeds GPU cost into EC2 instance pricing with no dedicated cost dimension, making underutilization hard to detect without usage metrics.
Finout ingests GPU metrics from CloudWatch and runs scans in CostGuard to surface idle and GPU-backed EC2 instances that can be shut down or rightsized.
Enabling GPU metric collection via CloudWatch Agent will incur standard CloudWatch custom metric charges on your AWS account. See AWS CloudWatch pricing for details.
Setup
GPU usage metrics are not emitted by AWS CloudWatch by default. The CloudWatch Agent must be installed and configured on each target EC2 instance before Finout can ingest GPU metrics.
Finout does not install agents or modify instance configurations. Once GPU metrics are present in your CloudWatch environment, Finout ingests them automatically, no additional configuration is required inside Finout.
This part covers what you need to configure in your AWS environment before GPU cost optimization scans can run in Finout.
Prerequisites
Before you begin, confirm the following:
Your EC2 instances use NVIDIA GPUs (P or G instance families for rightsizing scans; any GPU-backed instance for idle scans)
You have IAM permissions to create and modify roles and policies
You have access to AWS Systems Manager
Step 1: Install the NVIDIA Driver
Install the appropriate NVIDIA driver for every relevant EC2 instance.
Follow the official AWS documentation.
Step 2: Install and Configure the SSM Agent
SSM (Systems Manager) lets you deploy and manage the CloudWatch Agent across instances without direct SSH access.
Follow the official AWS documentation to install and configure the SSM Agent.
Step 3: Attach IAM Role Policies
Each target EC2 instance must have an IAM role with both of the following managed policies attached:
AmazonSSMManagedInstanceCore
Allows the instance to communicate with SSM
CloudWatchAgentServerPolicy
Allows the instance to publish metrics to CloudWatch
Both policies are required.
Step 4: Install and Configure the CloudWatch Agent
4a: Install the CloudWatch Agent via SSM
Use AWS Systems Manager Run Command to install the CloudWatch Agent on your instances.
4b: Create the Agent Configuration File
The agent requires a JSON configuration file that defines which metrics to collect. Store this configuration in AWS Systems Manager Parameter Store so it can be deployed to instances via SSM and collect all GPU metrics required by Finout.
Metrics are collected every 60 seconds, published under the CWAgent namespace, and dimensioned by InstanceId.
Note: The append_dimensions field with InstanceId is required. Metrics published without InstanceId cannot be linked to EC2 resources in Finout and will not be usable for scans.
For full configuration reference, see the official AWS documentation.
Step 5: Verify Metrics Are Being Collected
After deploying the agent configuration, confirm that GPU metrics are flowing into CloudWatch.
Open the CloudWatch console in the AWS Management Console.

Navigate to Metrics > All metrics and look for the CWAgent namespace.

Within CWAgent, you should see metrics like
utilization_gpu,temperature_gpu,memory_used, etc., dimensioned by InstanceId.
If no metrics appear within a few minutes, check the CloudWatch Agent logs at:
/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
Ingested Metrics
Once setup is complete, Finout ingests the following GPU metrics from CloudWatch. All metrics are collected per physical GPU and preserved at instanceId + index granularity - this is critical for detecting partial waste on multi-GPU instances.
GPU Utilization
nvidia_smi_utilization_gpu
% of time GPU cores are active - core waste signal
GPU Memory Used
nvidia_smi_memory_used
Absolute GPU memory consumption
GPU Memory Total
nvidia_smi_memory_total
Total GPU memory - reference/metadata
GPU Memory Free
nvidia_smi_memory_free
Free GPU memory
GPU Memory Utilization
nvidia_smi_utilization_memory
% of memory bandwidth in use
CPU Utilization
CPUUtilization
High CPU + low GPU → misconfigured workload
GPU Power Draw
nvidia_smi_power_draw
Confirms powered on but idle
GPU Temperature
nvidia_smi_temperature_gpu
Sustained low temperature reinforces inactivity
Graphics Clock
nvidia_smi_clocks_current_graphics
GPU clock speed - suppresses false positives on bursty workloads
SM Clock
nvidia_smi_clocks_current_sm
Streaming multiprocessor clock - same use as above
EC2/GPU Cost Optimization Scans
This section covers how the GPU cost optimization scans work in CostGuard after the required metrics are ingested from CloudWatch.
Note: A minimum of 7 days of GPU metrics data from CloudWatch must accumulate into Finout, after the configuration is updated, before the first CostGuard scan results appear.
Idle GPU Scan
Goal
Identify GPU-backed EC2 instances where both the GPU and CPU are consistently inactive, and recommend shutdown. Because GPU cost is embedded in EC2 instance pricing, a fully idle GPU instance represents 100% wasted spend.
Scan name
EC2 - GPU Idle
Sampling source
Amazon CloudWatch (CWAgent + EC2 metrics)
Timeframe
7 days back
Calculation interval
24 hours
Cost type
Net Amortized
Potential savings
Full EC2 instance cost
Logic
All GPUs on the instance must meet every idle threshold below on every day in the 7-day lookback period. If any single GPU index is active on any single day, the instance is not flagged.
Scan Thresholds
CPU Utilization
max < 5%
Required - aligned with existing EC2 idle scan
GPU Utilization per index
avg < 10%
-
GPU Memory Utilization per index
avg < 20%
-
NetworkIn
avg < 30 MB
Supporting - aligned with existing EC2 idle scan
NetworkOut
avg < 30 MB
Supporting - aligned with existing EC2 idle scan
GPU Power Draw per index
avg < 15%
Supporting - confirms the GPU is powered on but unused

GPU Rightsizing Scans
Goal
Identify GPU-backed EC2 instances where the GPU is idle but the workload is actively running. The recommendation is to migrate to an equivalent non-GPU instance, eliminating the GPU premium without disrupting the workload.
Notes:
Applies exclusively to P and G EC2 instance families.
Potential savings reflect the cost delta between the current instance and the recommended instance — not the full instance cost.
How the Target Family Is Determined
High CPU, low memory
C family
Compute-optimized — high CPU throughput without GPU overhead
High CPU, high memory
R family
Memory-optimized — suited for in-memory analytics and large dataset processing
Moderate CPU, moderate memory
M family
General purpose — balanced workload with no GPU requirement
Scan Thresholds
CPU and memory signals determine which scan fires and which target family is recommended.
EC2 - GPU Rightsize to CPU Instance
GPU Utilization < 10%
AND
max CPU Utilization > 50%
AND Memory Used < 40%
C family - c6i instances
Migrate to compute-optimized C-family.
Workload is CPU-bound with low memory and no GPU activity.
EC2 - GPU Rightsize to Memory Instance
GPU Utilization < 10%
AND
Max CPU Utilization > 50%
AND
Memory Used > 60%
R family - r6i instances
Migrate to memory-optimized R-family.
Workload needs CPU and high memory but no GPU.
EC2 - GPU Rightsize to General Purpose Instance
GPU Utilization < 10%
AND
Max CPU Utilization = 20–50%
AND Memory Used = 40–60%
M family - m6i instances
Migrate to general purpose M-family.
Balanced CPU/memory with no GPU activity.


Last updated
Was this helpful?