Ennistymon Cascades, River Inagh, Co. Clare — Mark Lawson / Unsplash
Cloud bills don’t arrive organised by team. They arrive organised by resource — thousands of line items covering EC2 instances, S3 buckets, managed databases, load balancers, and dozens of other service types — with no inherent connection to the business concepts your organisation actually uses to reason about ownership and cost.
The gap between “which resources ran this month” and “which teams own those resources and what did their infrastructure cost” is the core problem that tag-driven FinOps exists to solve. Resource tags are the bridge: metadata attached to cloud resources that maps them onto your organisational structure.
This post builds the foundation of a platform that crosses that bridge reliably. By the end, you will have a cost fact store that can answer with confidence: how much did each team spend, broken down by service category, for any date range you care about? Everything else in this series builds on that answer — and the final section covers what you need to do to run the whole thing reliably in production.
This is Post 1 of a five-post series. Each post is self-contained, but the data store built here is the input to all subsequent posts. Reading the series in order is recommended.
Before FOCUS, building a multi-cloud FinOps platform meant writing a separate ingestion adapter for each cloud provider. AWS Cost and Usage Reports arrived as wide Parquet files with lineItem/-prefixed columns. Azure Cost Management produced CSVs with tags encoded as a JSON blob in a single column. GCP exported nested structs into BigQuery requiring an UNNEST query. Each had its own type conventions, date formats, and tag field layout.
The FinOps Foundation’s FOCUS specification — FinOps Open Cost and Usage Specification, ratified at version 1.3 in December 2025 — defines a single common schema that all major cloud providers now emit. AWS, Azure, GCP, OCI, Alibaba, Tencent, Databricks, and a growing list of SaaS providers all support FOCUS exports. The three separate ingestors collapse into one.
| Column | Type | Purpose |
|---|---|---|
| ChargePeriodStart | datetime | Start of the charge period — replaces all per-cloud date fields |
| ProviderName | string | Standardised: “Amazon Web Services”, “Microsoft Azure”, “Google Cloud” |
| ServiceName | string | Normalised service name across providers |
| ServiceCategory | string | High-level: Compute, Storage, Database, Networking, AI and Machine Learning |
| ResourceId | string | Cloud-native resource identifier (ARN, resource URI, etc.) |
| BilledCost | decimal | Invoice cost — what you owe the provider |
| EffectiveCost | decimal | Amortised cost inclusive of Reserved Instance and Savings Plan benefit |
| Tags | key/value | All resource tags — department, team, environment live here |
| x_DatasetCompletionStatus | string | FOCUS 1.3: “Complete” or “Incomplete” — safe to process? |
Use EffectiveCost for internal chargeback and forecasting — it reflects what a team truly consumed including the amortised benefit of any Reserved Instances or Savings Plans. Use BilledCost for invoice reconciliation with finance. The platform stores both; downstream posts choose the appropriate basis per budget configuration.
FOCUS defines column names and types — not what your teams put in their tags. The tag keys department, team, and environment are your organisation’s responsibility to define, enforce, and keep consistent. Tag key normalisation logic — tolerating Department, dept, and department as equivalent — remains in the platform. This is covered in the tag extractor below.
Each major cloud requires a one-time configuration to enable FOCUS-format billing exports. Run these steps before deploying any ingestion code.
In the Billing and Cost Management console, navigate to Data Exports and create a new export. Select the FOCUS export type, daily granularity, and Parquet format. Configure delivery to an S3 bucket in your management account. AWS allows up to two free FOCUS exports per account — run a FOCUS export alongside your existing CUR during migration without additional cost.
In Cost Management, navigate to Exports and create a Daily Export. Select FOCUS schema under the export type options. Configure delivery to a storage account blob container. Azure’s export typically lands within a few hours of midnight UTC.
Enable the FOCUS billing export in the Cloud Billing console under Billing export → FOCUS export. Data lands in a BigQuery dataset. Schedule a daily Cloud Scheduler job to export the previous day’s rows to Cloud Storage as Parquet for ingestion by the universal loader.
Enable FOCUS exports immediately, even if you are not yet ready to ingest them. Most providers retain 12+ months of FOCUS export history from the moment you enable the feature. Starting the export now means you will have historical data available for backfill when you deploy the loader, without any gaps.
With FOCUS exports enabled on all clouds, the entire ingestion layer becomes a single Python file. No per-cloud adapters. No field name remapping. No type translation specific to any one provider.
import pandas as pd
from pathlib import Path
from .store import upsert_cost_records
from .tags import extract_attribution_tags
# FOCUS columns this platform uses — others are ignored
FOCUS_COLS = [
'ChargePeriodStart', 'ChargePeriodEnd',
'ProviderName', 'ServiceName', 'ServiceCategory',
'ResourceId', 'BilledCost', 'EffectiveCost', 'BillingCurrency',
'Tags',
'CommitmentDiscountId', 'CommitmentDiscountStatus',
'x_DatasetCompletionStatus', 'x_DatasetLastUpdated',
]
def load_focus_export(path: str | Path) -> int:
"""Load a FOCUS export from any provider. Returns records written."""
df = pd.read_parquet(path)
# FOCUS 1.3 — respect completeness flag before processing
if 'x_DatasetCompletionStatus' in df.columns:
status = df['x_DatasetCompletionStatus'].dropna().iloc[0] \
if not df['x_DatasetCompletionStatus'].empty else 'Complete'
if status != 'Complete':
raise IncompleteExportError(
f"Export marked {status!r} — retry after provider finalises data"
)
available = [c for c in FOCUS_COLS if c in df.columns]
df = df[available].copy()
# Normalise types
df['ChargePeriodStart'] = pd.to_datetime(df['ChargePeriodStart'], utc=True)
df['BilledCost'] = df['BilledCost'].astype(float)
df['EffectiveCost'] = df.get('EffectiveCost', df['BilledCost']).astype(float)
# Extract attribution dimensions from Tags
attribution = df['Tags'].apply(extract_attribution_tags)
df = pd.concat([df, pd.DataFrame(attribution.tolist(), index=df.index)], axis=1)
return upsert_cost_records(df)
class IncompleteExportError(Exception):
"""Raised when a FOCUS 1.3 export is not yet finalised.""" Tag key normalisation lives in its own module — tolerating the naming inconsistencies that accumulate in any real organisation over time.
# Tag key aliases: first match wins
TAG_ALIASES: dict[str, list[str]] = {
'department': ['department', 'Department', 'dept', 'Dept', 'business-unit'],
'team': ['team', 'Team', 'squad', 'owner'],
'environment': ['environment', 'env', 'Environment', 'stage'],
'cost_center': ['cost-center', 'CostCenter', 'costcenter', 'cc'],
}
def extract_attribution_tags(tags: dict | None) -> dict[str, str | None]:
if not tags:
return {k: None for k in TAG_ALIASES}
result = {}
for dimension, aliases in TAG_ALIASES.items():
value = None
for alias in aliases:
if v := tags.get(alias):
value = v.strip().lower()
break
result[dimension] = value
return result The fact table is partitioned by charge date, with column names kept close to the FOCUS spec to simplify validation and onboarding of new team members already familiar with FOCUS terminology.
CREATE TABLE cloud_cost_facts (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
ingested_at timestamptz DEFAULT now(),
-- FOCUS mandatory columns
"ChargePeriodStart" timestamptz NOT NULL,
"ChargePeriodEnd" timestamptz NOT NULL,
"ProviderName" varchar NOT NULL,
"ServiceName" varchar,
"ServiceCategory" varchar,
"ResourceId" varchar,
"BilledCost" numeric(14,6) NOT NULL,
"EffectiveCost" numeric(14,6),
"BillingCurrency" char(3) NOT NULL DEFAULT 'USD',
"CommitmentDiscountId" varchar,
"CommitmentDiscountStatus" varchar,
"Tags" jsonb,
-- Attribution dimensions (derived from Tags)
department varchar,
team varchar,
environment varchar,
cost_center varchar,
-- Computed partition key
charge_date date GENERATED ALWAYS AS
("ChargePeriodStart"::date) STORED
) PARTITION BY RANGE (charge_date);
-- Monthly partitions (automate with pg_partman in production)
CREATE TABLE cloud_cost_facts_y2026m05
PARTITION OF cloud_cost_facts
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
-- Primary access pattern indexes
CREATE INDEX ON cloud_cost_facts (department, charge_date);
CREATE INDEX ON cloud_cost_facts (team, charge_date);
CREATE INDEX ON cloud_cost_facts ("ResourceId", charge_date);
CREATE INDEX ON cloud_cost_facts ("ProviderName", charge_date); Rows where department and team are NULL are stored as-is — never dropped. The daily rollup replaces NULL with the sentinel value '__untagged__' so untagged spend is always visible as a first-class cost centre. Tracking untagged spend percentage over time is a key tagging hygiene metric.
The materialised rollup pre-aggregates the fact table by team, department, and service category for each day. This is the primary query surface for forecasting, dashboards, and anomaly detection in later posts — the fact table is for drill-down and auditing.
CREATE MATERIALIZED VIEW daily_cost_by_team AS
SELECT
charge_date,
"ProviderName",
COALESCE(department, '__untagged__') AS department,
COALESCE(team, '__untagged__') AS team,
COALESCE(environment, '__unknown__') AS environment,
"ServiceCategory",
SUM("BilledCost") AS billed_cost_usd,
SUM(COALESCE("EffectiveCost", "BilledCost")) AS effective_cost_usd,
COUNT(DISTINCT "ResourceId") AS resource_count,
COUNT(*) FILTER (WHERE department IS NULL) AS untagged_rows,
COUNT(*) FILTER
(WHERE "CommitmentDiscountId" IS NOT NULL) AS committed_rows
FROM cloud_cost_facts
GROUP BY 1,2,3,4,5,6
WITH DATA;
CREATE UNIQUE INDEX ON daily_cost_by_team
(charge_date, "ProviderName", department, team, environment, "ServiceCategory");
-- Schedule nightly: REFRESH MATERIALIZED VIEW CONCURRENTLY daily_cost_by_team; The rollup is the single source of truth for every time-series query in this series. When the fact table gains new columns in later posts, the rollup is extended — not replaced.
The nightly pipeline is straightforward: export lands in cloud storage, a trigger fires the loader, the loader validates completeness and writes to the fact table, the rollup refreshes. Failures at any step should not cause silent data gaps — use a queue-backed retry mechanism rather than a simple cron.
import logging
from .focus_loader import load_focus_export, IncompleteExportError
from .db import refresh_rollup
from .storage import list_new_exports, mark_processed
log = logging.getLogger('finops.ingest')
def run_nightly_ingest() -> None:
exports = list_new_exports() # poll S3/blob/GCS for unprocessed files
total = 0
for export in exports:
try:
n = load_focus_export(export.path)
mark_processed(export)
total += n
log.info(f"Loaded {n} records from {export.provider} export")
except IncompleteExportError as e:
# Leave unprocessed — retry queue will pick it up in 2 hours
log.warning(f"Export not ready: {e}")
except Exception as e:
log.error(f"Failed to load {export.path}: {e}", exc_info=True)
raise # escalate — don't silently skip
log.info(f"Ingestion complete: {total} records. Refreshing rollup...")
refresh_rollup()
log.info("Rollup refreshed. Nightly pipeline complete.") With the ingestion layer complete, the platform can reliably answer these questions for any date range:
-- Total effective cost by team, current month
SELECT team, SUM(effective_cost_usd) AS month_cost
FROM daily_cost_by_team
WHERE charge_date >= date_trunc('month', current_date)
GROUP BY team
ORDER BY month_cost DESC;
-- Untagged spend percentage by provider
SELECT
"ProviderName",
ROUND(
100.0 * SUM(effective_cost_usd) FILTER (WHERE team = '__untagged__')
/ NULLIF(SUM(effective_cost_usd), 0),
2) AS untagged_pct
FROM daily_cost_by_team
WHERE charge_date >= current_date - 30
GROUP BY 1
ORDER BY untagged_pct DESC;
-- Cost by ServiceCategory across all clouds
SELECT
"ServiceCategory",
SUM(effective_cost_usd) AS total_cost
FROM daily_cost_by_team
WHERE charge_date >= date_trunc('month', current_date)
AND team != '__untagged__'
GROUP BY 1 ORDER BY total_cost DESC; Getting the ingestion pipeline working in a development environment is relatively straightforward. Getting it to the point where it runs reliably every night, produces correct data, and stays healthy as your organisation scales is where the real engineering work lives.
No amount of platform engineering compensates for poor tagging coverage. If 40% of cloud spend is untagged when the platform launches, attribution reports are misleading from day one. Establish and enforce tagging policy before deploying any ingestion pipeline.
Define the required tag set. At minimum: department, team, environment, cost-center. Document canonical values — the list of valid department names, environment strings, and cost centre codes. Publish this as a policy document in your internal wiki.
Enforce at resource creation. AWS: AWS Config rules with auto-remediation or Service Control Policies that block resource creation without required tags. Azure: Azure Policy deny effects on resource creation. GCP: Organisation Policies with label constraints. Enforcement is more effective than remediation.
Establish baseline tagging coverage. Before launch, query current untagged spend percentage from your cloud billing consoles. Set a target — typically “below 10% untagged within 90 days” — and track it weekly. The platform makes this metric automatic once ingestion is running.
Assign untagged remediation ownership. Every team that owns cloud resources owns their tagging debt. Assign a FinOps champion per department whose quarterly goal includes reducing untagged spend percentage. The platform’s untagged spend reports are their primary tool.
Enable FOCUS exports on every cloud account immediately, even if ingestion won’t begin for weeks. Most providers retain 12+ months of FOCUS export history from the moment the feature is enabled.
# Enable FOCUS export via AWS CLI
# Run once per management account (covers all org member accounts)
aws bcm-data-exports create-export \
--export '{
"Name": "finops-focus-daily",
"DataQuery": {
"QueryStatement": "SELECT * FROM FOCUS_1_0",
"TableConfigurations": {
"FOCUS_1_0": {
"TIME_GRANULARITY": "DAILY",
"INCLUDE_RESOURCES": "TRUE"
}
}
},
"DestinationConfigurations": {
"S3Destination": {
"S3Bucket": "your-finops-billing-bucket",
"S3Prefix": "focus-exports/aws/",
"S3Region": "us-east-1",
"S3OutputConfigurations": {
"OutputType": "PARQUET",
"Format": "PARQUET",
"Compression": "GZIP",
"Overwrite": "OVERWRITE_REPORT"
}
}
},
"RefreshCadence": {"Frequency": "SYNCHRONOUS"}
}' If you have existing CUR-based tooling, run the FOCUS export alongside it. The two exports share the same billing data — there is no double-counting. Migrate dashboards and reports to the FOCUS-based platform incrementally, and decommission the old pipeline once you have 30 days of validated overlap.
The nightly pipeline has four sequential stages. Each depends on the previous completing successfully — a failed ingestion should not trigger a rollup refresh that would incorporate incomplete data.
| Time (UTC) | Stage | Action | Retry on failure |
|---|---|---|---|
| 05:00 | Export check | Poll cloud storage for new FOCUS exports. Check x_DatasetCompletionStatus = Complete. | Every 30min until 08:00 |
| 06:00 | Ingestion | Run focus_loader.py for each available export. Write to cloud_cost_facts. | 3× with backoff |
| 07:00 | Rollup refresh | REFRESH MATERIALIZED VIEW CONCURRENTLY daily_cost_by_team and attribution views. | Alert on failure — do not retry blindly |
| 08:00 | Evaluation | Run budget evaluator + anomaly detector. Publish alerts to SNS. | Non-destructive — safe to retry |
| 08:30 | Correlation sweep | Run attribution correlation engine on untagged resources. Write suggestions. | Skip if rollup failed |
Every stage must emit a success or failure signal to your orchestration layer (Airflow, Step Functions, Cloud Composer). A silent skip — where a stage fails but the pipeline continues — is the most dangerous failure mode. It produces a rollup refresh from incomplete data, which then drives incorrect budget evaluations that may not be detected for days.
The default thresholds — 80% forecast for budget alerts, 2.5σ for anomaly detection — are starting points, not permanent settings. Calibrate them based on the first two to four weeks of real alerts.
Budget alert thresholds. If teams are receiving daily WARNING alerts for weeks without being able to act on them, the threshold is too low. Either raise the percentage, increase the re-alert suppression window, or reduce alert frequency to weekly for teams with stable, predictable spend. If CRITICAL alerts are arriving the same day a period closes with an overage, the threshold is too high — teams need more lead time.
Anomaly detection sensitivity. Track the false positive rate for the first month: count alerts where the team confirmed there was no actual problem. A false positive rate above 20% indicates the sigma threshold is too low. Raise it from 2.5 to 3.0 and observe. For resources with inherently volatile spend — development environments, spot instances, data pipeline jobs — consider per-resource threshold overrides rather than a global setting.
-- False positive rate by team (last 30 days)
SELECT
team,
COUNT(*) AS total_anomaly_alerts,
COUNT(*) FILTER (WHERE resolved_at - detected_at < interval '4 hours')
AS likely_false_positives,
ROUND(100.0 * COUNT(*) FILTER (WHERE resolved_at - detected_at < interval '4 hours')
/ NULLIF(COUNT(*), 0), 1) AS fp_pct
FROM anomaly_history
WHERE detected_at >= now() - interval '30 days'
GROUP BY team
ORDER BY fp_pct DESC; Monthly partition management. The cloud_cost_facts table is partitioned by month. Add new partitions at least one month ahead — ideally via pg_partman which automates this. A missing partition causes inserts to fail silently in some PostgreSQL configurations.
Quarterly tag alias review. The tag alias list in ingestion/tags.py accumulates drift as teams adopt new naming conventions. Review it quarterly — check the __untagged__ spend trend and run a query against raw Tags JSON to identify tag keys that appear frequently but aren’t being extracted.
-- Find tag keys that appear frequently but aren't mapped to attribution dimensions
SELECT
key,
COUNT(*) AS appearances,
SUM("BilledCost") AS associated_cost
FROM cloud_cost_facts,
jsonb_each_text("Tags") AS t(key, value)
WHERE department IS NULL
AND charge_date >= current_date - 30
AND key NOT IN ('Name', 'aws:createdBy', 'aws:cloudformation:stack-name')
GROUP BY key
HAVING COUNT(*) > 100
ORDER BY associated_cost DESC
LIMIT 20; Budget review cadence. Budgets become stale as organisations change. A team that doubled headcount six months ago should not be operating against a budget set before that growth. Include a quarterly budget review as a standing agenda item in engineering leadership meetings. The platform makes this easy — the historical spend trend for each team is one query away.
Untagged spend percentage by provider. Pipeline success rate (ingestion + rollup). Alert false positive rate by team. Sparse group percentage (from cardinality filter). Attribution suggestion acceptance rate. These five metrics tell you whether the platform is healthy and improving over time.
The foundation is now in place. Post 2 builds the forecasting and budget alert layer on top of this data store, giving teams early warning before periods close over budget.
If your team is building a FinOps platform and wants to talk through architecture decisions, tag governance, or cloud cost strategy — get in touch.