Data Explorer
Last updated
Last updated
The Data Explorer feature is a powerful insight-driven tool for data analysis, designed to provide users with maximum flexibility and comprehensiveness when it comes to multi-dimensional report generation. With the ability to create reports that aggregate data across various fields, users can tailor their reports to reflect the exact cost measurements and dimensions they need.
Data Explorer is unique due to its explorative capabilities, which allow for the aggregation of complex data tailored to select cost measurements and dimensions, whether on a daily, weekly, or monthly basis. This level of granularity enables users to gain deeper insights and a more accurate understanding of their costs, ultimately empowering them to make more informed decisions.
Use cases
The Data Explorer feature can be utilized in various scenarios, including but not limited to:
Assess Resource IDs, allowing for an in-depth look at various aspects such as usage types and product families, enhanced by the ability to filter results according to specific services.
Track and analyze the cost-related aspects of sub-services, offering insights into the utilization of resources like S3 storage and NAT gateway, measured in hours and gigabytes.
Identify untagged data, providing essential information to guide the tagging process, which is crucial for improved data management and categorization within an organization.
Measurements vs. dimensions
Measurements- Numerical values or quantitative data used to represent a particular aspect of the analyzed data. These can include numbers such as cost types, and usage.
Dimensions- Categories or classifications that help organize and group related measurements. These can include the service type, region, or any tags. Dimensions provide context to the measurements and allow for deeper analysis and understanding of the data. Dimensions affect the level of detail in the view.
Navigate to Data Explorer in the navigation bar on the left side of the console.
Click New Data Explorer. The Create new explorer side tab appears.
Name: Provide a name for your report.
Description (Optional): Include a description for your report.
Time Frame: Define the relevant dates for your report data.
Filters (Optional): Filter the new Data Explorer using the MegaBill components, such as services, cost centers, or Virtual Tags.
Time Column (Optional): Choose a time column from day, week, or month.
Measurements (Optional): Define measurements-
Specify aggregation activities for the report, choosing from Sum, Average, Minimum, or Maximum.
Select the cost type. See Cost Type definitions.
To add multiple measurements, click ‘+’.
Dimensions (Optional): Add dimensions to a new Data Explorer by using the MegaBill components, such as services, cost centers, or Virtual Tags.
Dimensional Analysis (Optional): Count the number of unique values within a specific dimension or count the total number of entries or occurrences within a dimension, including duplicates. For a full explanation, see the dimension analysis section.
Predefined queries: The Resource Normalized Runtime metric provides a measure of how long resources were operational within a specified timeframe and normalized according to the total hours in that timeframe. For a full explanation, see the predefined queries section.
Columns management (Optional): Reorder or rename selected fields as desired. Drag measurements up or down by clicking on the dots beside each measurement.
Order by- Set the order of the report based on selected measurements.
Click Save to generate the report. Once saved, the report will be displayed with all the chosen data.
In Dimension Analysis, two key concepts are used to gain insights into your data:
Count Distinct Dimensions: Counts the number of unique values within a specific dimension. Use Case #1: Analyzing user tags - Counting distinct dimensions shows how many unique tags exist across all resources. This helps you understand the diversity and categorization within that dimension. Use Case #2: Determine the distinct number of resources for each service per day- Identify the various values for each user tag and check the number of instance types for EC2/RDS or the number of regions per account.
Count Dimensions: Counts the total number of entries or occurrences within a dimension, including duplicates.
Note: This feature is supported for every dimension in Data Explorer and is available for all accounts.
To add dimensional analysis:
Select one of the following:
Count Dimensions - Count the values that are unique from one another based on what was selected above.
Count - Count all the values that have been selected.
Note: If you selected the same dimension as in the report, you will only receive one return per value.
Click Select Dimension. The Dimension window appears.
Mark the dimensions that you want to be used for the analysis and click Apply Group By.
Predefined queries are prebuilt, standardized queries designed to streamline data retrieval and analysis. Created to address common data needs or reporting requirements, they allow for quick access to specific data sets or insights without the need to craft complex queries from scratch.
The Resource Normalized Runtime metric measures the total operational time of resources within a specified timeframe, normalized against the total hours in that period. It is calculated by dividing the total running hours of all resources by the total number of hours in the timeframe.
Note: When grouping by date, each timeframe is set to 24 hours.
Formula:
Resource Normalized Runtime = Total Running Hours / Total Hours in Timeframe
Numerator (Total Running Hours): The cumulative sum of hours all queried resources were active during the specified timeframe.
Denominator (Total Hours in Timeframe): The total number of hours within the specified timeframe.
Note: When grouping by date, this is set to 24 hours.
Example Calculations:
Scenario 1: 5 Resources Over 3 Days
Date
Total Running Hours
Total Hours in Timeframe
Resource Normalized Runtime
2024-07-28
79
24
3.29
2024-07-29
85
24
3.54
2024-07-30
74
24
3.08
2024-07-28:
Total Running Hours: 79 hours (cumulative for all 5 resources)
Total Hours in Timeframe: 24 hours
Resource Normalized Runtime: 79 / 24 = 3.29
Scenario 2: 3 Resources Over 2 Days with Resource IDs
Date
Resource ID
Total Running Hours
Total Hours in Timeframe
Resource Normalized Runtime
2024-07-28
i-abc123
30
24
1.25
2024-07-28
i-def456
40
24
1.67
2024-07-29
i-ghi789
35
24
1.46
i-abc123:
Total Running Hours: 30 hours
Total Hours in Timeframe: 24 hours
Resource Normalized Runtime: 30 / 24 = 1.25
Scenario 3: 4 Resources Over 1 Day
Date
Total Running Hours
Total Hours in Timeframe
Resource Normalized Runtime
2024-08-01
24
24
1.00
2024-08-01:
Total Running Hours: 24 hours (cumulative for all resources running the entire day)
Total Hours in Timeframe: 24 hours
Resource Normalized Runtime: 24 / 24 = 1.00
Scenario 4: Aggregated Resources Over Multiple Days Without Date Column
Resource ID
Total Running Hours
Total Hours in Timeframe
Resource Normalized Runtime
i-abc123
120
240
0.50
i-def456
200
240
0.83
i-ghi789
160
240
0.67
i-jkl012
80
240
0.33
i-abc123:
Total Running Hours: 120 hours
Total Hours in Timeframe: 240 hours (10 days x 24 hours/day)
Resource Normalized Runtime: 120 / 240 = 0.50
Scenario 5: Aggregated Total Running Hours Without Date or Resource ID
Total Running Hours
Total Hours in Timeframe
Resource Normalized Runtime
2100
600
3.50
2400
600
4.00
3000
600
5.00
1800
600
3.00
Entry 1:
Total Running Hours: 2100 hours
Total Hours in Timeframe: 600 hours
Resource Normalized Runtime: 2100 / 600 = 3.50
Resource Normalized Runtime vCPU: This metric calculates the total normalized vCPU runtime used by resources over a specified timeframe. It is determined by multiplying the total running hours of each resource by its vCPU count and dividing by the total number of hours in the timeframe.
Note: When grouping by date, each time frame is set to 24 hours.
Formula:
Resource Normalized Runtime vCPU = Σ (Running Hours x vCPU) / Total Hours in Timeframe
Numerator: The cumulative sum of the product of running hours and vCPU count for each resource.
Denominator (Total Hours in Timeframe): The total number of hours within the specified timeframe. When grouping by date, this is set to 24 hours.
Example Calculations:
Scenario 1: Multiple Resources with vCPUs Grouped by Day
Date
Resource ID
vCPU
Running Hours
Normalized Runtime vCPU
2024-07-28
i-12345678
5
24
5.00
2024-07-28
i-12345677
6
12
3.00
2024-07-28
i-12345676
3
16
2.00
2024-07-28
i-12345675
2
17
1.42
2024-07-28
i-12345674
16
20
13.33
i-12345678:
Calculation: (5 x 24) / 24 = 5.00
The resource consumed 5.00 vCPU days.
Scenario 2: Mixed vCPU Resources Over 2 Days
Date
Resource ID
vCPU
Running Hours
Normalized Runtime vCPU
2024-07-28
i-98765432
8
18
6.00
2024-07-28
i-87654321
4
10
1.67
2024-07-29
i-76543210
2
5
0.42
i-98765432:
Calculation: (8 x 18) / 24 = 6.00
The resource consumed 6.00 vCPU days.
Scenario 3: Aggregated vCPU Resources Over Multiple Days
Date
Resource ID
vCPU
Running Hours
Normalized Runtime vCPU
2024-07-30
i-55555555
10
40
16.67
2024-07-30
i-44444444
6
20
5.00
2024-07-31
i-33333333
3
15
1.88
i-55555555:
Calculation: (10 x 40) / 24 = 16.67
The resource consumed 16.67 vCPU days.
Scenario 4: vCPU Resources Without Date or Resource ID
vCPU
Running Hours
Total Hours in Timeframe
Normalized Runtime vCPU
8
200
600
2.67
4
150
600
1.00
6
240
600
2.40
Entry 1:
Calculation: (8 x 200) / 600 = 2.67
The resources consumed 2.67 vCPU days in the timeframe.
This metric calculates the total running hours of EBS resources within a specified timeframe and tracks the cumulative hours that they have been active.
Formula:
-EBS Running Hours = Sum of Running Hours for EBS Resources in a selected timeframe.
-Total Running Hours: The cumulative sum of hours that individual EBS resources were active within the selected timeframe.
- Per resource: The total hours for each resource are summed based on the timeframe and aggregation type.
- For grouped resources: When multiple resources are grouped, their running hours are summed together to give the total running hours.
Example Calculations:
Scenario 1: Single Resource Over 2 Days, Group by Day
Date
Total Running hours
2024-08-01
22
2024-08-02
18
For 2 days:
Total Running Hours for August 1: 22 hours
Total Running Hours for August 2: 18 hours
The total running hours over these two days for a single resource is 40 hours.
Scenario 2: 3 Resources Over 4 Days, Group by Day
Date
Total Running Hours
2024-09-05
65
2024-09-06
72
2024-09-07
80
2024-09-08
78
For 4 days with 3 resources:
Total Running Hours for September 5 for all 3 resources: 65 hours
Total Running Hours for September 6 for all 3 resources: 72 hours
Total Running Hours for September 7 for all 3 resources: 80 hours
Total Running Hours for September 8 for all 3 resources: 78 hours
The total running hours over these four days is 295.
Scenario 3: 5 Resources Over 1 Month (August)
Date Range
Total Running Hours
2024-08-01 to 2024-08-31
3,000 hours
The total running hours for August for all 5 resources is 3000 hours.
The S3 Number of Objects metric calculates the average number of objects stored in your S3 buckets from the last 24 hours of a selected timeframe. This metric is derived from CloudWatch data and helps monitor object count trends in your S3 buckets, providing insights into storage usage.
Note: The resource ID will be automatically added to your Data Explorer report when selecting this predefined query.
Scenario 1: Number of Objects in S3 Buckets, Broken Down by Virtual Tag Teams
You want to create a report of the number of S3 objects in a bucket over April.
Note: The S3 Number of Objects metric calculates the average number of objects stored in your S3 buckets from the last 24 hours of the selected timeframe. For example, in the table below, Bucket001, Team A, had an average of 1,500,000 objects on April 30th.
Resource ID
Team
Number of S3 Objects
Bucket 001
Team A
1,500,000
Bucket 001
Team B
800,000
Bucket 002
Team A
1,100,000
Bucket 003
Team C
500,000
Bucke t003
Team B
900,000
Scenario 2: Number of Objects in S3 Buckets in Q1 2024, Broken Down by Month
You want to create a report for Q1 with object counts in monthly intervals to monitor the growth of data in your S3 buckets over time.
Resource ID
Month
Number of S3 Objects
Bucket 001
January
1,200,000
Bucket 002
January
900,000
Bucket 001
February
1,500,000
Bucket 003
February
600,000
Bucket 002
March
1,000,000
Bucket 003
March
800,000
Scenario 3: Number of Objects in S3 Buckets, Broken Down by Virtual Tag Teams and by Week
You want to create a report for the previous month with weekly intervals, showing the number of objects in each S3 bucket, broken down by virtual tag teams.
Resource ID
Team
Week
Number of S3 Objects
Bucket 10
Team A
Week 1
11,000
Bucket 10
Team A
Week 2
8,800
Bucket 10
Team B
Week 1
3,000
Bucket 20
Team A
Week 3
21,800
Bucket 20
Team B
Week 2
16,700
Question: What is the difference between the object count from CloudWatch and the object count from AWS CUR
Answer: The CUR alone does not provide object-level or detailed bucket insights without integrating other AWS tools such as S3 Storage Lens (with advanced metrics) or S3 Inventory. Without these, the CUR will only show aggregated S3 storage usage and costs.
Question: How can I get object count data per S3 bucket in the AWS CUR?
Answer: To obtain object count data per S3 bucket in the CUR, you need additional services:
- S3 Storage Lens with advanced metrics- This is required for detailed usage insights.
- S3 Inventory: Provides detailed reports at the object level but does not integrate directly into the CUR.
Edit, Duplicate, and Delete a Data Explorer.
Customize column display.
Click to add another dimensional analysis.