Skip to main content
Data Explorer
Updated over a month ago

Data Explorer Feature

The Data Explorer feature is a powerful insight-driven tool for data analysis, designed to provide users with maximum flexibility and comprehensiveness when it comes to multi-dimensional report generation. With the ability to create reports that aggregate data across various fields, users can tailor their reports to reflect the exact cost measurements and dimensions they need.

Data Explorer is unique due to its explorative capabilities, which allow for the aggregation of complex data tailored to select cost measurements and dimensions, whether on a daily, weekly, or monthly basis. This level of granularity enables users to gain deeper insights and a more accurate understanding of their costs, ultimately empowering them to make more informed decisions.

Use cases

The Data Explorer feature can be utilized in various scenarios, including but not limited to:

  • Assess Resource IDs, allowing for an in-depth look at various aspects such as usage types and product families, enhanced by the ability to filter results according to specific services.

  • Track and analyze the cost-related aspects of sub-services, offering insights into the utilization of resources like S3 storage and NAT gateway, measured in hours and gigabytes.

  • Identify untagged data, providing essential information to guide the tagging process, which is crucial for improved data management and categorization within an organization.

Measurements vs. dimensions

Measurements- Numerical values or quantitative data used to represent a particular aspect of the analyzed data. These can include numbers such as cost types, and usage.

Dimensions- Categories or classifications that help organize and group related measurements. These can include the service type, region, or any tags. Dimensions provide context to the measurements and allow for deeper analysis and understanding of the data. Dimensions affect the level of detail in the view.

Creating a New Data Explorer

  1. Navigate to Data Explorer in the navigation bar on the left side of the console.

  2. Click New Data Explorer.
    The Create new explorer side tab appears.

  3. Name: Provide a name for your report.

  4. Description (Optional): Include a description for your report.

  5. Time Frame: Define the relevant dates for your report data.

  6. Filters (Optional): Filter the new Data Explorer using the MegaBill components, such as services, cost centers, or Virtual Tags.

  7. Time Column (Optional): Choose a time column from day, week, or month.

  8. Measurements (Optional): Define measurements-

    • Specify aggregation activities for the report, choosing from Sum, Average, Minimum, or Maximum.

    • Select the cost type. See Cost Type definitions.

    • To add multiple measurements, click ‘+’.

  9. Dimensions (Optional): Add dimensions to a new Data Explorer by using the MegaBill components, such as services, cost centers, or Virtual Tags.

  10. Dimensional Analysis (Optional): Count the number of unique values within a specific dimension or count the total number of entries or occurrences within a dimension, including duplicates. For a full explanation, see the dimension analysis section.

  11. Predefined queries: Predefined queries are prebuilt, standardized queries designed to streamline data retrieval and analysis. Created to address common data needs or reporting requirements, they allow for quick access to specific data sets or insights without the need to craft complex queries from scratch. For a full explanation, see the predefined queries section.

  12. Columns management (Optional): Reorder or rename selected fields as desired. Drag measurements up or down by clicking on the dots beside each measurement.

  13. Order by- Set the order of the report based on selected measurements.

  14. Click Save to generate the report.
    Once saved, the report will be displayed with all the chosen data.


Dimension Analysis

In Dimension Analysis, two key concepts are used to gain insights into your data:

  • Count Distinct Dimensions: Counts the number of unique values within a specific dimension.
    Use Case #1: Analyzing user tags - Counting distinct dimensions shows how many unique tags exist across all resources. This helps you understand the diversity and categorization within that dimension.
    Use Case #2: Determine the distinct number of resources for each service per day- Identify the various values for each user tag and check the number of instance types for EC2/RDS or the number of regions per account.

  • Count Dimensions: Counts the total number of entries or occurrences within a dimension, including duplicates.
    Note: This feature is supported for every dimension in Data Explorer and is available for all accounts.

    To add dimensional analysis:

    1. Select one of the following:

      • Count Dimensions - Count the values that are unique from one another based on what was selected above.

      • Count - Count all the values that have been selected.
        Note: If you selected the same dimension as in the report, you will only receive one return per value.

    2. Click Select Dimension.
      The Dimension window appears.

    3. Mark the dimensions that you want to be used for the analysis and click Apply Group By.

    4. Click to add another dimensional analysis.

Finout Predefined Queries

Predefined queries are prebuilt, standardized queries designed to streamline data retrieval and analysis. Created to address common data needs or reporting requirements, they allow for quick access to specific data sets or insights without the need to craft complex queries from scratch.

Resource Normalized Runtime


The Resource Normalized Runtime metric measures the total operational time of resources within a specified timeframe, normalized against the total hours in that period. It is calculated by dividing the total running hours of all resources by the total number of hours in the timeframe.

Note: When grouping by date, each timeframe is set to 24 hours.

Formula:

  • Resource Normalized Runtime = Total Running Hours / Total Hours in Timeframe

  • Numerator (Total Running Hours): The cumulative sum of hours all queried resources were active during the specified timeframe.

  • Denominator (Total Hours in Timeframe): The total number of hours within the specified timeframe.
    Note: When grouping by date, this is set to 24 hours.

Example Calculations:

Scenario 1: 5 Resources Over 3 Days

Date

Total Running Hours

Total Hours in Timeframe

Resource Normalized Runtime

2024-07-28

79

24

3.29

2024-07-29

85

24

3.54

2024-07-30

74

24

3.08

2024-07-28:

Total Running Hours: 79 hours (cumulative for all 5 resources)

Total Hours in Timeframe: 24 hours

Resource Normalized Runtime: 79 / 24 = 3.29

Scenario 2: 3 Resources Over 2 Days with Resource IDs

Date

Resource ID

Total Running Hours

Total Hours in Timeframe

Resource Normalized Runtime

2024-07-28

i-abc123

30

24

1.25

2024-07-28

i-def456

40

24

1.67

2024-07-29

i-ghi789

35

24

1.46

i-abc123:

Total Running Hours: 30 hours

Total Hours in Timeframe: 24 hours

Resource Normalized Runtime: 30 / 24 = 1.25

Scenario 3: 4 Resources Over 1 Day

Date

Total Running Hours

Total Hours in Timeframe

Resource Normalized Runtime

2024-08-01

24

24

1.00

2024-08-01:

Total Running Hours: 24 hours (cumulative for all resources running the entire day)

Total Hours in Timeframe: 24 hours

Resource Normalized Runtime: 24 / 24 = 1.00

Scenario 4: Aggregated Resources Over Multiple Days Without Date Column

Resource ID

Total Running Hours

Total Hours in Timeframe

Resource Normalized Runtime

i-abc123

120

240

0.50

i-def456

200

240

0.83

i-ghi789

160

240

0.67

i-jkl012

80

240

0.33

i-abc123:

Total Running Hours: 120 hours

Total Hours in Timeframe: 240 hours (10 days x 24 hours/day)

Resource Normalized Runtime: 120 / 240 = 0.50

Scenario 5: Aggregated Total Running Hours Without Date or Resource ID

Total Running Hours

Total Hours in Timeframe

Resource Normalized Runtime

2100

600

3.50

2400

600

4.00

3000

600

5.00

1800

600

3.00

Entry 1:

Total Running Hours: 2100 hours

Total Hours in Timeframe: 600 hours

Resource Normalized Runtime: 2100 / 600 = 3.50

Resource Normalized Runtime vCPU

Resource Normalized Runtime vCPU: This metric calculates the total normalized vCPU runtime used by resources over a specified timeframe. It is determined by multiplying the total running hours of each resource by its vCPU count and dividing by the total number of hours in the timeframe.
Note: When grouping by date, each time frame is set to 24 hours.

Formula:

  • Resource Normalized Runtime vCPU = Σ (Running Hours x vCPU) / Total Hours in Timeframe

  • Numerator: The cumulative sum of the product of running hours and vCPU count for each resource.

  • Denominator (Total Hours in Timeframe): The total number of hours within the specified timeframe. When grouping by date, this is set to 24 hours.

Example Calculations:

Scenario 1: Multiple Resources with vCPUs Grouped by Day

Date

Resource ID

vCPU

Running Hours

Normalized Runtime vCPU

2024-07-28

i-12345678

5

24

5.00

2024-07-28

i-12345677

6

12

3.00

2024-07-28

i-12345676

3

16

2.00

2024-07-28

i-12345675

2

17

1.42

2024-07-28

i-12345674

16

20

13.33

i-12345678:

Calculation: (5 x 24) / 24 = 5.00

The resource consumed 5.00 vCPU days.

Scenario 2: Mixed vCPU Resources Over 2 Days

Date

Resource ID

vCPU

Running Hours

Normalized Runtime vCPU

2024-07-28

i-98765432

8

18

6.00

2024-07-28

i-87654321

4

10

1.67

2024-07-29

i-76543210

2

5

0.42

i-98765432:

Calculation: (8 x 18) / 24 = 6.00

The resource consumed 6.00 vCPU days.

Scenario 3: Aggregated vCPU Resources Over Multiple Days

Date

Resource ID

vCPU

Running Hours

Normalized Runtime vCPU

2024-07-30

i-55555555

10

40

16.67

2024-07-30

i-44444444

6

20

5.00

2024-07-31

i-33333333

3

15

1.88

i-55555555:

Calculation: (10 x 40) / 24 = 16.67

The resource consumed 16.67 vCPU days.

Scenario 4: vCPU Resources Without Date or Resource ID

vCPU

Running Hours

Total Hours in Timeframe

Normalized Runtime vCPU

8

200

600

2.67

4

150

600

1.00

6

240

600

2.40

Entry 1:

Calculation: (8 x 200) / 600 = 2.67

The resources consumed 2.67 vCPU days in the timeframe.

EBS Running Hours

This metric calculates the total running hours of EBS resources within a specified timeframe and tracks the cumulative hours that they have been active.

Formula:

-EBS Running Hours = Sum of Running Hours for EBS Resources in a selected timeframe.

-Total Running Hours: The cumulative sum of hours that individual EBS resources were active within the selected timeframe.

- Per resource: The total hours for each resource are summed based on the timeframe and aggregation type.

- For grouped resources: When multiple resources are grouped, their running hours are summed together to give the total running hours.

Example Calculations

Scenario 1: Single Resource Over 2 Days, Group by Day

Date

Total Running hours

2024-08-01

22

2024-08-02

18

For 2 days:

  • Total Running Hours for August 1: 22 hours

  • Total Running Hours for August 2: 18 hours

The total running hours over these two days for a single resource is 40 hours.

Scenario 2: 3 Resources Over 4 Days, Group by Day

Date

Total Running Hours

2024-09-05

65

2024-09-06

72

2024-09-07

80

2024-09-08

78

For 4 days with 3 resources:

  • Total Running Hours for September 5 for all 3 resources: 65 hours

  • Total Running Hours for September 6 for all 3 resources: 72 hours

  • Total Running Hours for September 7 for all 3 resources: 80 hours

  • Total Running Hours for September 8 for all 3 resources: 78 hours

The total running hours over these four days is 295.

Scenario 3: 5 Resources Over 1 Month (August)

Date Range

Total Running Hours

2024-08-01 to 2024-08-31

3,000 hours

The total running hours for August for all 5 resources is 3000 hours.

S3 Number of Objects

The S3 Number of Objects metric calculates the average number of objects stored in your S3 buckets from the last 24 hours of a selected timeframe. This metric is derived from CloudWatch data and helps monitor object count trends in your S3 buckets, providing insights into storage usage.

Note: The resource ID will be automatically added to your Data Explorer report when selecting this predefined query.

Scenario 1: Number of Objects in S3 Buckets, Broken Down by Virtual Tag Teams

You want to create a report of the number of S3 objects in a bucket over April.

Note: The S3 Number of Objects metric calculates the average number of objects stored in your S3 buckets from the last 24 hours of the selected timeframe. For example, in the table below, Bucket001, Team A, had an average of 1,500,000 objects on April 30th.

Resource ID

Team

Number of S3 Objects

Bucket 001

Team A

1,500,000

Bucket 001

Team B

800,000

Bucket 002

Team A

1,100,000

Bucket 003

Team C

500,000

Bucke t003

Team B

900,000

Scenario 2: Number of Objects in S3 Buckets in Q1 2024, Broken Down by Month

You want to create a report for Q1 with object counts in monthly intervals to monitor the growth of data in your S3 buckets over time.

Resource ID

Month

Number of S3 Objects

Bucket 001

January

1,200,000

Bucket 002

January

900,000

Bucket 001

February

1,500,000

Bucket 003

February

600,000

Bucket 002

March

1,000,000

Bucket 003

March

800,000

Scenario 3: Number of Objects in S3 Buckets, Broken Down by Virtual Tag Teams and by Week

You want to create a report for the previous month with weekly intervals, showing the number of objects in each S3 bucket, broken down by virtual tag teams.

Resource ID

Team

Week

Number of S3 Objects

Bucket 10

Team A

Week 1

11,000

Bucket 10

Team A

Week 2

8,800

Bucket 10

Team B

Week 1

3,000

Bucket 20

Team A

Week 3

21,800

Bucket 20

Team B

Week 2

16,700

FAQ:

Question: What is the difference between the object count from CloudWatch and the object count from AWS CUR

Answer: The CUR alone does not provide object-level or detailed bucket insights without integrating other AWS tools such as S3 Storage Lens (with advanced metrics) or S3 Inventory. Without these, the CUR will only show aggregated S3 storage usage and costs.

Question: How can I get object count data per S3 bucket in the AWS CUR?

Answer: To obtain object count data per S3 bucket in the CUR, you need additional services:

- S3 Storage Lens with advanced metrics- This is required for detailed usage insights.

- S3 Inventory: Provides detailed reports at the object level but does not integrate directly into the CUR.

Working with Data Explorer

  • Edit, Duplicate, and Delete a Data Explorer.

  • Customize column display.

Did this answer your question?