Cost Archiver

Overview

Finout offers a straightforward solution for managing long-term cost data storage by allowing you to archive data in an AWS S3 bucket you own. This setup consolidates your company’s cloud spending data from various cost centers into a structured and easily accessible format.

The Archiver feature enhances data management by introducing a partitioned data structure in S3, enabling faster access with reduced processing overhead. The data format is in Parquet, which improves processing efficiency. Additionally, a manifest file is included with each archiver run, providing visibility into data export status. This design supports a scalable system capable of handling large data volumes with minimal delays while maintaining alignment with Finout’s data delivery SLAs.

Note: The Archiver provides all fields, enrichments, costs, virtual tags, and values, except for virtual tag reallocation.

Cost Archiver Workflow:

Archiver files structure

Cost data will be updated once a day in the following structure:

Archiver/<archiver_name>/v2/year=<year>/month=<month>/day=<day>/*.snappy.parquet

Cost data structure

Field name
Type
Description

time

Date (YYYY-MM-DDT00:00:00.000Z)

service

String

The type of service the cost is related to, for example, “EC2” (in AWS) or “Compute Engine” (in GCP).

region

String

The region the cost is associated with.

cost_center_type

String

AWS / GCP / K8s / Snowflake

costs

dict<string, double>

All the following costs: amortized_cost, net_amortized_cost, blended_cost, unblended_cost, net_unblended_cost, uncovered_cost, reservation_cost, savings_plan_cost, spot_cost

metadata

dict<string, string>

All metadata relevant to this cost line.

Manifest file structure

The new manifest file will provide transparency and allow clients to monitor data export consistency. This file will be maintained under destination-archiver/version=2/manifest.json and will document each archiver run, detailing the date and scope of data exported, run ID, and the export timestamp.

{
"latest_export_time": 
"exports": [
	{	
		"archiver_id": "<archiver_id>", 
		"run_id": "<external_exporter_run_id>",
		"export_time": "<Current time in UTC in ISO format>", 
		"expored_partitions": ["<year=yyyy>/month=<mm>/day=<dd>"]
		},
	{	
			"archiver_id": "<archiver_id>", 
			"run_id": "<external_exporter_run_id>",
			"export_time": "<Current time in UTC in ISO format>",
		"	expored_partitions": ["<year=yyyy>/month=<mm>/day=<dd>"]
	}	,
		....
]	
}

Setting Up Cost Archiver

Set up Cost Archiver in Finout to efficiently store and manage your organization’s cost data in an AWS S3 bucket, enabling structured, scalable, and easily accessible long-term storage.

1. Create a New Bucket

To output cost data, Finout requires write and delete permissions to an S3 bucket. To do so, create a new bucket or a new folder within an existing S3 bucket used for your Archiver.

Note: Save your bucket name for step 3.

2. Obtain an External ID from Finout

Get the external ID to Grant Finout Access to Your Archiver Bucket.

  1. In Finout, navigate to Settings > Cost Centers > Add cost center. The Connect Accounts page appears.

  2. Click Connect Now.

  3. Copy the External ID and continue to grant Finout access to your Archiver bucket (Save for step 4).

3. Grant Finout Access to your Archiver Bucket

Once the Archiver bucket is created, grant Finout access to the bucket by creating an IAM role. This can be done manually or by using CloudFormation.

Note: It is recommended to grant access through CloudFormation.

To grant access using CloudFormation:

  1. Create a CloudFormation Stack after logging to your AWS account.

  2. Add the necessary information above, as well as the “external-id” (obtained in step 2) and the bucket name created for your Archiver (step 1).

  3. Click Next and then Submit. ​You are brought to the Stack details page.

  4. Click Output and copy the ARN IAM role value to add in Finout (step 4).

To grant access manually:

  1. Navigate to creating a new cross-account role in IAM , sign in the your AWS account, and then create a role for another AWS account.

  2. In the account ID, enter: 277411487094.

  3. Paste the External ID and enter the “external-id” (Obtained in step 2).

  4. Click Next. ​ The Review step appears.

  5. Add a Role name: FinoutArchiverRole and then configure the role. A new role is created.

  6. Go to your new role in Summary.

  7. Copy and save the Role ARN.

  8. Click on Add permissions and choose Create inline policy. Choose JSON format and paste the following JSON:

      {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "s3:PutObject",
                    "s3:PutObjectAcl",
                    "s3:ListMultipartUploadParts",
                    "s3:ListBucket",
                    "s3:DeleteObject"
                ],
                "Resource": [
                    "arn:aws:s3:::finout-data-archiver/*",
                    "arn:aws:s3:::finout-data-archiver"
                ]
            }
        ]
    }
    

  9. Click Next until the Review step, and name it finout-access-policy.

  10. Click Create policy. Your IAM role is finalized and created.

4. Adding AWS Bucket Details to Finout

After creating your bucket in AWS and granting Finout access, send the following bucket details to Finout support.

  1. Role ARN -The Amazon Resource Name (ARN) specifies the role.

  2. Bucket Name -This is the name under which Finout will write archiver data.

  3. S3 Path Prefix - This is the folder in S3 in which the archiver files are written.

FAQs

How does Finout ensure data accuracy and reliability with the manifest file?

Finout ensures data accuracy and reliability through automated systems that monitor and reprocess any failed exports, eliminating the need for manual intervention. While the manifest file enhances transparency by providing visibility into data export processes, Finout’s robust validation mechanisms ensure consistent and accurate data without requiring clients to check the manifest manually.

How does Cost Archiver handle data updates over time?

Cost Archiver is designed to handle retrospective data updates, which are common with sources like AWS CUR that may reflect changes for several months. It ensures that past data is updated to include late adjustments, with the manifest file clearly indicating which dates have been modified. Finout’s automation ensures data completeness and reliability throughout this process.

What mechanisms are in place for maintaining data consistency with partitioned files?

Cost Archiver performs automatic checks to identify and update partitioned files whenever source data changes. Finout updates relevant partitions and logs these updates in the manifest file. This approach guarantees consistency and integrity across the partitioned data structure.

Last updated

Was this helpful?