AWS EC2-GPU Cost Optimization Scans (Beta)

Overview

GPU resources are among the most expensive in cloud infrastructure - yet GPU waste is largely invisible in billing data. AWS embeds GPU cost into EC2 instance pricing with no dedicated cost dimension, making underutilization hard to detect without usage metrics.

Finout ingests GPU metrics from CloudWatch and runs scans in CostGuard to surface idle and GPU-backed EC2 instances that can be shut down or rightsized.

Setup

GPU usage metrics are not emitted by AWS CloudWatch by default. The CloudWatch Agent must be installed and configured on each target EC2 instance before Finout can ingest GPU metrics.

Finout does not install agents or modify instance configurations. Once GPU metrics are present in your CloudWatch environment, Finout ingests them automatically, no additional configuration is required inside Finout.

This part covers what you need to configure in your AWS environment before GPU cost optimization scans can run in Finout.

Prerequisites

Before you begin, confirm the following:

  • Your EC2 instances use NVIDIA GPUs (P or G instance families for rightsizing scans; any GPU-backed instance for idle scans)

  • You have IAM permissions to create and modify roles and policies

  • You have access to AWS Systems Manager


Step 1: Install the NVIDIA Driver

Install the appropriate NVIDIA driver for every relevant EC2 instance.

Follow the official AWS documentation.


Step 2: Install and Configure the SSM Agent

SSM (Systems Manager) lets you deploy and manage the CloudWatch Agent across instances without direct SSH access.

Follow the official AWS documentation to install and configure the SSM Agent.


Step 3: Attach IAM Role Policies

Each target EC2 instance must have an IAM role with both of the following managed policies attached:

AmazonSSMManagedInstanceCore

Allows the instance to communicate with SSM

CloudWatchAgentServerPolicy

Allows the instance to publish metrics to CloudWatch

Both policies are required.


Step 4: Install and Configure the CloudWatch Agent

4a: Install the CloudWatch Agent via SSM

Use AWS Systems Manager Run Command to install the CloudWatch Agent on your instances.

4b: Create the Agent Configuration File

The agent requires a JSON configuration file that defines which metrics to collect. Store this configuration in AWS Systems Manager Parameter Store so it can be deployed to instances via SSM and collect all GPU metrics required by Finout.

  • Metrics are collected every 60 seconds, published under the CWAgent namespace, and dimensioned by InstanceId.

For full configuration reference, see the official AWS documentation.


Step 5: Verify Metrics Are Being Collected

After deploying the agent configuration, confirm that GPU metrics are flowing into CloudWatch.

  1. Open the CloudWatch console in the AWS Management Console.

  2. Navigate to Metrics > All metrics and look for the CWAgent namespace.

  3. Within CWAgent, you should see metrics like utilization_gpu, temperature_gpu, memory_used, etc., dimensioned by InstanceId.

  4. If no metrics appear within a few minutes, check the CloudWatch Agent logs at: /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log


Ingested Metrics

Once setup is complete, Finout ingests the following GPU metrics from CloudWatch. All metrics are collected per physical GPU and preserved at instanceId + index granularity - this is critical for detecting partial waste on multi-GPU instances.

GPU Utilization

nvidia_smi_utilization_gpu

% of time GPU cores are active - core waste signal

GPU Memory Used

nvidia_smi_memory_used

Absolute GPU memory consumption

GPU Memory Total

nvidia_smi_memory_total

Total GPU memory - reference/metadata

GPU Memory Free

nvidia_smi_memory_free

Free GPU memory

GPU Memory Utilization

nvidia_smi_utilization_memory

% of memory bandwidth in use

CPU Utilization

CPUUtilization

High CPU + low GPU → misconfigured workload

GPU Power Draw

nvidia_smi_power_draw

Confirms powered on but idle

GPU Temperature

nvidia_smi_temperature_gpu

Sustained low temperature reinforces inactivity

Graphics Clock

nvidia_smi_clocks_current_graphics

GPU clock speed - suppresses false positives on bursty workloads

SM Clock

nvidia_smi_clocks_current_sm

Streaming multiprocessor clock - same use as above


EC2/GPU Cost Optimization Scans

This section covers how the GPU cost optimization scans work in CostGuard after the required metrics are ingested from CloudWatch.

Note: A minimum of 7 days of GPU metrics data from CloudWatch must accumulate into Finout, after the configuration is updated, before the first CostGuard scan results appear.

Idle GPU Scan

Goal

Identify GPU-backed EC2 instances where both the GPU and CPU are consistently inactive, and recommend shutdown. Because GPU cost is embedded in EC2 instance pricing, a fully idle GPU instance represents 100% wasted spend.

Scan name

EC2 - GPU Idle

Sampling source

Amazon CloudWatch (CWAgent + EC2 metrics)

Timeframe

7 days back

Calculation interval

24 hours

Cost type

Net Amortized

Potential savings

Full EC2 instance cost

Logic

All GPUs on the instance must meet every idle threshold below on every day in the 7-day lookback period. If any single GPU index is active on any single day, the instance is not flagged.

Scan Thresholds

CPU Utilization

max < 5%

Required - aligned with existing EC2 idle scan

GPU Utilization per index

avg < 10%

-

GPU Memory Utilization per index

avg < 20%

-

NetworkIn

avg < 30 MB

Supporting - aligned with existing EC2 idle scan

NetworkOut

avg < 30 MB

Supporting - aligned with existing EC2 idle scan

GPU Power Draw per index

avg < 15%

Supporting - confirms the GPU is powered on but unused


GPU Rightsizing Scans

Goal

Identify GPU-backed EC2 instances where the GPU is idle but the workload is actively running. The recommendation is to migrate to an equivalent non-GPU instance, eliminating the GPU premium without disrupting the workload.

Notes:

  • Applies exclusively to P and G EC2 instance families.

  • Potential savings reflect the cost delta between the current instance and the recommended instance — not the full instance cost.

How the Target Family Is Determined

High CPU, low memory

C family

Compute-optimized — high CPU throughput without GPU overhead

High CPU, high memory

R family

Memory-optimized — suited for in-memory analytics and large dataset processing

Moderate CPU, moderate memory

M family

General purpose — balanced workload with no GPU requirement

Scan Thresholds

CPU and memory signals determine which scan fires and which target family is recommended.

EC2 - GPU Rightsize to CPU Instance

GPU Utilization < 10%

AND

max CPU Utilization > 50%

AND Memory Used < 40%

C family - c6i instances

Migrate to compute-optimized C-family.

Workload is CPU-bound with low memory and no GPU activity.

EC2 - GPU Rightsize to Memory Instance

GPU Utilization < 10%

AND

Max CPU Utilization > 50%

AND

Memory Used > 60%

R family - r6i instances

Migrate to memory-optimized R-family.

Workload needs CPU and high memory but no GPU.

EC2 - GPU Rightsize to General Purpose Instance

GPU Utilization < 10%

AND

Max CPU Utilization = 20–50%

AND Memory Used = 40–60%

M family - m6i instances

Migrate to general purpose M-family.

Balanced CPU/memory with no GPU activity.

Last updated

Was this helpful?