> For the complete documentation index, see [llms.txt](https://docs.finout.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.finout.io/user-guide/optimize/costguard/connect-costguard-for-aws/aws-ec2-gpu-cost-optimization-scans-beta.md).

# AWS EC2-GPU Cost Optimization Scans (Beta)

## Overview

GPU resources are among the most expensive in cloud infrastructure - yet GPU waste is largely invisible in billing data. AWS embeds GPU cost into EC2 instance pricing with no dedicated cost dimension, making underutilization hard to detect without usage metrics.

Finout ingests GPU metrics from CloudWatch and runs scans in CostGuard to surface idle and GPU-backed EC2 instances that can be shut down or rightsized.

{% hint style="warning" %}
Enabling GPU metric collection via CloudWatch Agent will incur standard CloudWatch custom metric charges on your AWS account. See [AWS CloudWatch pricing](https://aws.amazon.com/cloudwatch/pricing/) for details.
{% endhint %}

## Setup

GPU usage metrics are not emitted by AWS CloudWatch by default. The CloudWatch Agent must be installed and configured on each target EC2 instance before Finout can ingest GPU metrics.

Finout does not install agents or modify instance configurations. Once GPU metrics are present in your CloudWatch environment, Finout ingests them automatically, no additional configuration is required inside Finout.&#x20;

**This part covers what you need to configure in your AWS environment before GPU cost optimization scans can run in Finout.**

### Prerequisites

Before you begin, confirm the following:

* Your EC2 instances use NVIDIA GPUs (P or G instance families for rightsizing scans; any GPU-backed instance for idle scans)
* You have IAM permissions to create and modify roles and policies
* You have access to AWS Systems Manager

***

### Step 1: Install the NVIDIA Driver

Install the appropriate NVIDIA driver for every relevant EC2 instance.

Follow the[ official AWS documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html).

***

### Step 2: Install and Configure the SSM Agent

SSM (Systems Manager) lets you deploy and manage the CloudWatch Agent across instances without direct SSH access.

Follow the[ official AWS documentation](https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html) to install and configure the SSM Agent.

***

### Step 3: Attach IAM Role Policies

Each target EC2 instance must have an IAM role with both of the following managed policies attached:

| AmazonSSMManagedInstanceCore | Allows the instance to communicate with SSM          |
| ---------------------------- | ---------------------------------------------------- |
| CloudWatchAgentServerPolicy  | Allows the instance to publish metrics to CloudWatch |

Both policies are required.&#x20;

***

### Step 4: Install and Configure the CloudWatch Agent

#### 4a: Install the CloudWatch Agent via SSM

Use[ AWS Systems Manager Run Command](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/installing-cloudwatch-agent-ssm.html) to install the CloudWatch Agent on your instances.

#### 4b: Create the Agent Configuration File

The agent requires a JSON configuration file that defines which metrics to collect. Store this configuration in AWS Systems Manager Parameter Store so it can be deployed to instances via SSM and collect all GPU metrics required by Finout.&#x20;

* Metrics are collected every 60 seconds, published under the CWAgent namespace, and dimensioned by InstanceId.

{% hint style="warning" %}
**Note**: The `append_dimensions` field with InstanceId is **required**. Metrics published without `InstanceId` cannot be linked to EC2 resources in Finout and will not be usable for scans.&#x20;
{% endhint %}

```
{
  "metrics": {
    "namespace": "CWAgent",
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "nvidia_gpu": {
        "measurement": [
          "utilization_gpu",
          "temperature_gpu",
          "power_draw",
          "utilization_memory",
          "fan_speed",
          "memory_total",
          "memory_used",
          "memory_free",
          "pcie_link_gen_current",
          "pcie_link_width_current",
          "encoder_stats_session_count",
          "encoder_stats_average_fps",
          "encoder_stats_average_latency",
          "clocks_current_graphics",
          "clocks_current_sm",
          "clocks_current_memory",
          "clocks_current_video"
        ],
        "metrics_collection_interval": 60
      }
    }
  },
  "force_flush_interval": 60
}
```

For full configuration reference, see the[ official AWS documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/create-cloudwatch-agent-configuration-file.html).

***

### Step 5: Verify Metrics Are Being Collected

After deploying the agent configuration, confirm that GPU metrics are flowing into CloudWatch.

1. Open the CloudWatch console in the AWS Management Console.<br>

   <img src="/files/CnpqA6utlDWB9rULyKRJ" alt="" height="240" width="624">

2. Navigate to Metrics > All metrics and look for the CWAgent namespace.<br>

   <img src="/files/y3KN86cjpcIcsfFhAbQx" alt="" height="393" width="569">

3. Within CWAgent, you should see metrics like `utilization_gpu`, `temperature_gpu`, `memory_used`, etc., dimensioned by InstanceId.

   <img src="/files/h5FDSjeH2VmUmD8Xwb5z" alt="" height="210" width="657">

4. If no metrics appear within a few minutes, check the CloudWatch Agent logs at: `/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log`

***

### Ingested Metrics

Once setup is complete, Finout ingests the following GPU metrics from CloudWatch. All metrics are collected per physical GPU and preserved at instanceId + index granularity - this is critical for detecting partial waste on multi-GPU instances.

<table data-header-hidden><thead><tr><th width="196.89453125">Metric</th><th>Metric Name</th><th>Description</th></tr></thead><tbody><tr><td>GPU Utilization</td><td><code>nvidia_smi_utilization_gpu</code></td><td>% of time GPU cores are active - core waste signal</td></tr><tr><td>GPU Memory Used</td><td><code>nvidia_smi_memory_used</code></td><td>Absolute GPU memory consumption</td></tr><tr><td>GPU Memory Total</td><td><code>nvidia_smi_memory_total</code></td><td>Total GPU memory - reference/metadata</td></tr><tr><td>GPU Memory Free</td><td><code>nvidia_smi_memory_free</code></td><td>Free GPU memory</td></tr><tr><td>GPU Memory Utilization</td><td><code>nvidia_smi_utilization_memory</code></td><td>% of memory bandwidth in use</td></tr><tr><td>CPU Utilization</td><td><code>CPUUtilization</code></td><td>High CPU + low GPU → misconfigured workload</td></tr><tr><td>GPU Power Draw</td><td><code>nvidia_smi_power_draw</code></td><td>Confirms powered on but idle</td></tr><tr><td>GPU Temperature</td><td><code>nvidia_smi_temperature_gpu</code></td><td>Sustained low temperature reinforces inactivity</td></tr><tr><td>Graphics Clock</td><td><code>nvidia_smi_clocks_current_graphics</code></td><td>GPU clock speed - suppresses false positives on bursty workloads</td></tr><tr><td>SM Clock</td><td><code>nvidia_smi_clocks_current_sm</code></td><td>Streaming multiprocessor clock - same use as above</td></tr></tbody></table>

***

## EC2/GPU Cost Optimization Scans

This section covers how the GPU cost optimization scans work in [CostGuard](https://docs.finout.io/user-guide/optimize/costguard) after the required metrics are ingested from CloudWatch.

{% hint style="info" %}
**Note**: A minimum of 7 days of GPU metrics data from CloudWatch must accumulate into Finout, after the configuration is updated, before the first CostGuard scan results appear.
{% endhint %}

### &#x20;Idle GPU Scan

#### Goal

Identify GPU-backed EC2 instances where both the GPU and CPU are consistently inactive, and recommend shutdown. Because GPU cost is embedded in EC2 instance pricing, a fully idle GPU instance represents 100% wasted spend.

| Scan name            | EC2 - GPU Idle                            |
| -------------------- | ----------------------------------------- |
| Sampling source      | Amazon CloudWatch (CWAgent + EC2 metrics) |
| Timeframe            | 7 days back                               |
| Calculation interval | 24 hours                                  |
| Cost type            | Net Amortized                             |
| Potential savings    | Full EC2 instance cost                    |

#### Logic

All GPUs on the instance must meet every idle threshold below on every day in the 7-day lookback period. If any single GPU index is active on any single day, the instance is not flagged.

#### Scan Thresholds

<table data-header-hidden><thead><tr><th>Metric</th><th width="185.80078125">Condition</th><th>Notes</th></tr></thead><tbody><tr><td>CPU Utilization</td><td>max &#x3C; 5%</td><td>Required - aligned with existing EC2 idle scan</td></tr><tr><td>GPU Utilization per index</td><td>avg &#x3C; 10%</td><td>-</td></tr><tr><td>GPU Memory Utilization per index</td><td>avg &#x3C; 20%</td><td>-</td></tr><tr><td>NetworkIn</td><td>avg &#x3C; 30 MB</td><td>Supporting - aligned with existing EC2 idle scan</td></tr><tr><td>NetworkOut</td><td>avg &#x3C; 30 MB</td><td>Supporting - aligned with existing EC2 idle scan</td></tr><tr><td>GPU Power Draw per index</td><td>avg &#x3C; 15%</td><td>Supporting - confirms the GPU is powered on but unused</td></tr></tbody></table>

<img src="/files/y0v9jFd65010Tp8WFeDJ" alt="" height="242" width="671">

***

### GPU Rightsizing Scans

#### Goal

Identify GPU-backed EC2 instances where the GPU is idle but the workload is actively running. The recommendation is to migrate to an equivalent non-GPU instance, eliminating the GPU premium without disrupting the workload.

{% hint style="info" %}
**Notes:**

* Applies exclusively to P and G EC2 instance families.
* Potential savings reflect the cost delta between the current instance and the recommended instance — not the full instance cost.
  {% endhint %}

#### How the Target Family Is Determined

<table data-header-hidden><thead><tr><th>Workload Signal</th><th width="179.5078125">Recommended Family</th><th>Rationale</th></tr></thead><tbody><tr><td>High CPU, low memory</td><td>C family</td><td>Compute-optimized — high CPU throughput without GPU overhead</td></tr><tr><td>High CPU, high memory</td><td>R family</td><td>Memory-optimized — suited for in-memory analytics and large dataset processing</td></tr><tr><td>Moderate CPU, moderate memory</td><td>M family</td><td>General purpose — balanced workload with no GPU requirement</td></tr></tbody></table>

#### Scan Thresholds

CPU and memory signals determine which scan fires and which target family is recommended.

<table data-header-hidden><thead><tr><th>Scan Name</th><th>Thresholds</th><th width="152.4609375">Recommended Family</th><th>Recommendation</th></tr></thead><tbody><tr><td>EC2 - GPU Rightsize to CPU Instance</td><td><p>GPU Utilization &#x3C; 10% </p><p>AND </p><p>max CPU Utilization > 50% </p><p>AND Memory Used &#x3C; 40%</p></td><td>C family - c6i instances </td><td><p>Migrate to compute-optimized C-family. </p><p>Workload is CPU-bound with low memory and no GPU activity.</p></td></tr><tr><td>EC2 - GPU Rightsize to Memory Instance</td><td><p>GPU Utilization &#x3C; 10% </p><p>AND</p><p>Max CPU Utilization > 50% </p><p>AND</p><p>Memory Used > 60%</p></td><td>R family - r6i instances</td><td><p>Migrate to memory-optimized R-family. </p><p>Workload needs CPU and high memory but no GPU.</p></td></tr><tr><td>EC2 - GPU Rightsize to General Purpose Instance</td><td><p>GPU Utilization &#x3C; 10%</p><p>AND</p><p>Max CPU Utilization = 20–50% </p><p>AND Memory Used =  40–60%</p></td><td>M family - m6i instances</td><td><p>Migrate to general purpose M-family. </p><p>Balanced CPU/memory with no GPU activity.</p></td></tr></tbody></table>

<img src="/files/hDRNgLhHNYo0CHMvb1x2" alt="" height="265" width="624">

<img src="/files/xGbEjI9WPakCDmWdvYvj" alt="" height="308" width="526">


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.finout.io/user-guide/optimize/costguard/connect-costguard-for-aws/aws-ec2-gpu-cost-optimization-scans-beta.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.