Kymeca
FinOps Engineering

FinOps Engineering: Foundation, FOCUS Ingestion, and Running It in Production

Mark 11 min read Part 1 of 5 — FinOps Engineering
A striking red and white building surrounded by strong foundations, symbolizing structural reliability.

Ennistymon Cascades, River Inagh, Co. Clare — Mark Lawson / Unsplash

The Attribution Problem

Cloud bills don’t arrive organised by team. They arrive organised by resource — thousands of line items covering EC2 instances, S3 buckets, managed databases, load balancers, and dozens of other service types — with no inherent connection to the business concepts your organisation actually uses to reason about ownership and cost.

The gap between “which resources ran this month” and “which teams own those resources and what did their infrastructure cost” is the core problem that tag-driven FinOps exists to solve. Resource tags are the bridge: metadata attached to cloud resources that maps them onto your organisational structure.

This post builds the foundation of a platform that crosses that bridge reliably. By the end, you will have a cost fact store that can answer with confidence: how much did each team spend, broken down by service category, for any date range you care about? Everything else in this series builds on that answer — and the final section covers what you need to do to run the whole thing reliably in production.

Series Foundation

This is Post 1 of a five-post series. Each post is self-contained, but the data store built here is the input to all subsequent posts. Reading the series in order is recommended.

Why FOCUS — and What It Gives You

Before FOCUS, building a multi-cloud FinOps platform meant writing a separate ingestion adapter for each cloud provider. AWS Cost and Usage Reports arrived as wide Parquet files with lineItem/-prefixed columns. Azure Cost Management produced CSVs with tags encoded as a JSON blob in a single column. GCP exported nested structs into BigQuery requiring an UNNEST query. Each had its own type conventions, date formats, and tag field layout.

The FinOps Foundation’s FOCUS specification — FinOps Open Cost and Usage Specification, ratified at version 1.3 in December 2025 — defines a single common schema that all major cloud providers now emit. AWS, Azure, GCP, OCI, Alibaba, Tencent, Databricks, and a growing list of SaaS providers all support FOCUS exports. The three separate ingestors collapse into one.

Key FOCUS columns for attribution

ColumnTypePurpose
ChargePeriodStartdatetimeStart of the charge period — replaces all per-cloud date fields
ProviderNamestringStandardised: “Amazon Web Services”, “Microsoft Azure”, “Google Cloud”
ServiceNamestringNormalised service name across providers
ServiceCategorystringHigh-level: Compute, Storage, Database, Networking, AI and Machine Learning
ResourceIdstringCloud-native resource identifier (ARN, resource URI, etc.)
BilledCostdecimalInvoice cost — what you owe the provider
EffectiveCostdecimalAmortised cost inclusive of Reserved Instance and Savings Plan benefit
Tagskey/valueAll resource tags — department, team, environment live here
x_DatasetCompletionStatusstringFOCUS 1.3: “Complete” or “Incomplete” — safe to process?
EffectiveCost vs BilledCost

Use EffectiveCost for internal chargeback and forecasting — it reflects what a team truly consumed including the amortised benefit of any Reserved Instances or Savings Plans. Use BilledCost for invoice reconciliation with finance. The platform stores both; downstream posts choose the appropriate basis per budget configuration.

What FOCUS does not standardise

FOCUS defines column names and types — not what your teams put in their tags. The tag keys department, team, and environment are your organisation’s responsibility to define, enforce, and keep consistent. Tag key normalisation logic — tolerating Department, dept, and department as equivalent — remains in the platform. This is covered in the tag extractor below.

Enabling FOCUS Exports

Each major cloud requires a one-time configuration to enable FOCUS-format billing exports. Run these steps before deploying any ingestion code.

AWS

In the Billing and Cost Management console, navigate to Data Exports and create a new export. Select the FOCUS export type, daily granularity, and Parquet format. Configure delivery to an S3 bucket in your management account. AWS allows up to two free FOCUS exports per account — run a FOCUS export alongside your existing CUR during migration without additional cost.

Azure

In Cost Management, navigate to Exports and create a Daily Export. Select FOCUS schema under the export type options. Configure delivery to a storage account blob container. Azure’s export typically lands within a few hours of midnight UTC.

GCP

Enable the FOCUS billing export in the Cloud Billing console under Billing export → FOCUS export. Data lands in a BigQuery dataset. Schedule a daily Cloud Scheduler job to export the previous day’s rows to Cloud Storage as Parquet for ingestion by the universal loader.

Migration Strategy

Enable FOCUS exports immediately, even if you are not yet ready to ingest them. Most providers retain 12+ months of FOCUS export history from the moment you enable the feature. Starting the export now means you will have historical data available for backfill when you deploy the loader, without any gaps.

The Universal FOCUS Loader

With FOCUS exports enabled on all clouds, the entire ingestion layer becomes a single Python file. No per-cloud adapters. No field name remapping. No type translation specific to any one provider.

Python ingestion/focus_loader.py
import pandas as pd
from pathlib  import Path
from .store   import upsert_cost_records
from .tags    import extract_attribution_tags

# FOCUS columns this platform uses — others are ignored
FOCUS_COLS = [
    'ChargePeriodStart', 'ChargePeriodEnd',
    'ProviderName', 'ServiceName', 'ServiceCategory',
    'ResourceId', 'BilledCost', 'EffectiveCost', 'BillingCurrency',
    'Tags',
    'CommitmentDiscountId', 'CommitmentDiscountStatus',
    'x_DatasetCompletionStatus', 'x_DatasetLastUpdated',
]


def load_focus_export(path: str | Path) -> int:
    """Load a FOCUS export from any provider. Returns records written."""
    df = pd.read_parquet(path)

    # FOCUS 1.3 — respect completeness flag before processing
    if 'x_DatasetCompletionStatus' in df.columns:
        status = df['x_DatasetCompletionStatus'].dropna().iloc[0] \
                 if not df['x_DatasetCompletionStatus'].empty else 'Complete'
        if status != 'Complete':
            raise IncompleteExportError(
                f"Export marked {status!r} — retry after provider finalises data"
            )

    available = [c for c in FOCUS_COLS if c in df.columns]
    df = df[available].copy()

    # Normalise types
    df['ChargePeriodStart'] = pd.to_datetime(df['ChargePeriodStart'], utc=True)
    df['BilledCost']        = df['BilledCost'].astype(float)
    df['EffectiveCost']     = df.get('EffectiveCost', df['BilledCost']).astype(float)

    # Extract attribution dimensions from Tags
    attribution = df['Tags'].apply(extract_attribution_tags)
    df = pd.concat([df, pd.DataFrame(attribution.tolist(), index=df.index)], axis=1)

    return upsert_cost_records(df)


class IncompleteExportError(Exception):
    """Raised when a FOCUS 1.3 export is not yet finalised."""

Tag extraction with alias tolerance

Tag key normalisation lives in its own module — tolerating the naming inconsistencies that accumulate in any real organisation over time.

Python ingestion/tags.py
# Tag key aliases: first match wins
TAG_ALIASES: dict[str, list[str]] = {
    'department':  ['department', 'Department', 'dept', 'Dept', 'business-unit'],
    'team':        ['team', 'Team', 'squad', 'owner'],
    'environment': ['environment', 'env', 'Environment', 'stage'],
    'cost_center': ['cost-center', 'CostCenter', 'costcenter', 'cc'],
}


def extract_attribution_tags(tags: dict | None) -> dict[str, str | None]:
    if not tags:
        return {k: None for k in TAG_ALIASES}
    result = {}
    for dimension, aliases in TAG_ALIASES.items():
        value = None
        for alias in aliases:
            if v := tags.get(alias):
                value = v.strip().lower()
                break
        result[dimension] = value
    return result

The Cost Fact Table

The fact table is partitioned by charge date, with column names kept close to the FOCUS spec to simplify validation and onboarding of new team members already familiar with FOCUS terminology.

SQL schema/cloud_cost_facts.sql
CREATE TABLE cloud_cost_facts (
    id                         uuid         PRIMARY KEY DEFAULT gen_random_uuid(),
    ingested_at                timestamptz  DEFAULT now(),

    -- FOCUS mandatory columns
    "ChargePeriodStart"         timestamptz  NOT NULL,
    "ChargePeriodEnd"           timestamptz  NOT NULL,
    "ProviderName"              varchar      NOT NULL,
    "ServiceName"               varchar,
    "ServiceCategory"           varchar,
    "ResourceId"                varchar,
    "BilledCost"                numeric(14,6) NOT NULL,
    "EffectiveCost"             numeric(14,6),
    "BillingCurrency"           char(3)      NOT NULL DEFAULT 'USD',
    "CommitmentDiscountId"      varchar,
    "CommitmentDiscountStatus"  varchar,
    "Tags"                      jsonb,

    -- Attribution dimensions (derived from Tags)
    department                  varchar,
    team                        varchar,
    environment                 varchar,
    cost_center                 varchar,

    -- Computed partition key
    charge_date                 date GENERATED ALWAYS AS
                                    ("ChargePeriodStart"::date) STORED
) PARTITION BY RANGE (charge_date);

-- Monthly partitions (automate with pg_partman in production)
CREATE TABLE cloud_cost_facts_y2026m05
    PARTITION OF cloud_cost_facts
    FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

-- Primary access pattern indexes
CREATE INDEX ON cloud_cost_facts (department,  charge_date);
CREATE INDEX ON cloud_cost_facts (team,         charge_date);
CREATE INDEX ON cloud_cost_facts ("ResourceId", charge_date);
CREATE INDEX ON cloud_cost_facts ("ProviderName", charge_date);
Untagged Resources

Rows where department and team are NULL are stored as-is — never dropped. The daily rollup replaces NULL with the sentinel value '__untagged__' so untagged spend is always visible as a first-class cost centre. Tracking untagged spend percentage over time is a key tagging hygiene metric.

The Daily Cost Rollup

The materialised rollup pre-aggregates the fact table by team, department, and service category for each day. This is the primary query surface for forecasting, dashboards, and anomaly detection in later posts — the fact table is for drill-down and auditing.

SQL schema/daily_cost_rollup.sql
CREATE MATERIALIZED VIEW daily_cost_by_team AS
SELECT
    charge_date,
    "ProviderName",
    COALESCE(department,  '__untagged__') AS department,
    COALESCE(team,        '__untagged__') AS team,
    COALESCE(environment, '__unknown__')  AS environment,
    "ServiceCategory",

    SUM("BilledCost")                                AS billed_cost_usd,
    SUM(COALESCE("EffectiveCost", "BilledCost"))      AS effective_cost_usd,
    COUNT(DISTINCT "ResourceId")                     AS resource_count,
    COUNT(*) FILTER (WHERE department IS NULL)       AS untagged_rows,
    COUNT(*) FILTER
        (WHERE "CommitmentDiscountId" IS NOT NULL)    AS committed_rows

FROM  cloud_cost_facts
GROUP BY 1,2,3,4,5,6
WITH DATA;

CREATE UNIQUE INDEX ON daily_cost_by_team
    (charge_date, "ProviderName", department, team, environment, "ServiceCategory");

-- Schedule nightly: REFRESH MATERIALIZED VIEW CONCURRENTLY daily_cost_by_team;

The rollup is the single source of truth for every time-series query in this series. When the fact table gains new columns in later posts, the rollup is extended — not replaced.

Pipeline Orchestration

The nightly pipeline is straightforward: export lands in cloud storage, a trigger fires the loader, the loader validates completeness and writes to the fact table, the rollup refreshes. Failures at any step should not cause silent data gaps — use a queue-backed retry mechanism rather than a simple cron.

Python pipeline/nightly_ingest.py
import logging
from .focus_loader import load_focus_export, IncompleteExportError
from .db            import refresh_rollup
from .storage       import list_new_exports, mark_processed

log = logging.getLogger('finops.ingest')


def run_nightly_ingest() -> None:
    exports = list_new_exports()  # poll S3/blob/GCS for unprocessed files
    total = 0

    for export in exports:
        try:
            n = load_focus_export(export.path)
            mark_processed(export)
            total += n
            log.info(f"Loaded {n} records from {export.provider} export")
        except IncompleteExportError as e:
            # Leave unprocessed — retry queue will pick it up in 2 hours
            log.warning(f"Export not ready: {e}")
        except Exception as e:
            log.error(f"Failed to load {export.path}: {e}", exc_info=True)
            raise  # escalate — don't silently skip

    log.info(f"Ingestion complete: {total} records. Refreshing rollup...")
    refresh_rollup()
    log.info("Rollup refreshed. Nightly pipeline complete.")

What You Can Answer Now

With the ingestion layer complete, the platform can reliably answer these questions for any date range:

SQL Example queries against the daily rollup
-- Total effective cost by team, current month
SELECT team, SUM(effective_cost_usd) AS month_cost
FROM   daily_cost_by_team
WHERE  charge_date >= date_trunc('month', current_date)
GROUP BY team
ORDER BY month_cost DESC;

-- Untagged spend percentage by provider
SELECT
    "ProviderName",
    ROUND(
        100.0 * SUM(effective_cost_usd) FILTER (WHERE team = '__untagged__')
               / NULLIF(SUM(effective_cost_usd), 0),
    2) AS untagged_pct
FROM  daily_cost_by_team
WHERE charge_date >= current_date - 30
GROUP BY 1
ORDER BY untagged_pct DESC;

-- Cost by ServiceCategory across all clouds
SELECT
    "ServiceCategory",
    SUM(effective_cost_usd) AS total_cost
FROM  daily_cost_by_team
WHERE charge_date >= date_trunc('month', current_date)
  AND  team != '__untagged__'
GROUP BY 1 ORDER BY total_cost DESC;

Running It in Production

Getting the ingestion pipeline working in a development environment is relatively straightforward. Getting it to the point where it runs reliably every night, produces correct data, and stays healthy as your organisation scales is where the real engineering work lives.

Before You Deploy: Tagging Policy

No amount of platform engineering compensates for poor tagging coverage. If 40% of cloud spend is untagged when the platform launches, attribution reports are misleading from day one. Establish and enforce tagging policy before deploying any ingestion pipeline.

Define the required tag set. At minimum: department, team, environment, cost-center. Document canonical values — the list of valid department names, environment strings, and cost centre codes. Publish this as a policy document in your internal wiki.

Enforce at resource creation. AWS: AWS Config rules with auto-remediation or Service Control Policies that block resource creation without required tags. Azure: Azure Policy deny effects on resource creation. GCP: Organisation Policies with label constraints. Enforcement is more effective than remediation.

Establish baseline tagging coverage. Before launch, query current untagged spend percentage from your cloud billing consoles. Set a target — typically “below 10% untagged within 90 days” — and track it weekly. The platform makes this metric automatic once ingestion is running.

Assign untagged remediation ownership. Every team that owns cloud resources owns their tagging debt. Assign a FinOps champion per department whose quarterly goal includes reducing untagged spend percentage. The platform’s untagged spend reports are their primary tool.

Enabling FOCUS Exports — CLI Migration

Enable FOCUS exports on every cloud account immediately, even if ingestion won’t begin for weeks. Most providers retain 12+ months of FOCUS export history from the moment the feature is enabled.

Bash setup/enable_aws_focus_export.sh
# Enable FOCUS export via AWS CLI
# Run once per management account (covers all org member accounts)
aws bcm-data-exports create-export \
  --export '{
    "Name": "finops-focus-daily",
    "DataQuery": {
      "QueryStatement": "SELECT * FROM FOCUS_1_0",
      "TableConfigurations": {
        "FOCUS_1_0": {
          "TIME_GRANULARITY": "DAILY",
          "INCLUDE_RESOURCES": "TRUE"
        }
      }
    },
    "DestinationConfigurations": {
      "S3Destination": {
        "S3Bucket": "your-finops-billing-bucket",
        "S3Prefix": "focus-exports/aws/",
        "S3Region": "us-east-1",
        "S3OutputConfigurations": {
          "OutputType": "PARQUET",
          "Format": "PARQUET",
          "Compression": "GZIP",
          "Overwrite": "OVERWRITE_REPORT"
        }
      }
    },
    "RefreshCadence": {"Frequency": "SYNCHRONOUS"}
  }'
Run in parallel during migration

If you have existing CUR-based tooling, run the FOCUS export alongside it. The two exports share the same billing data — there is no double-counting. Migrate dashboards and reports to the FOCUS-based platform incrementally, and decommission the old pipeline once you have 30 days of validated overlap.

Pipeline Schedule

The nightly pipeline has four sequential stages. Each depends on the previous completing successfully — a failed ingestion should not trigger a rollup refresh that would incorporate incomplete data.

Time (UTC)StageActionRetry on failure
05:00Export checkPoll cloud storage for new FOCUS exports. Check x_DatasetCompletionStatus = Complete.Every 30min until 08:00
06:00IngestionRun focus_loader.py for each available export. Write to cloud_cost_facts.3× with backoff
07:00Rollup refreshREFRESH MATERIALIZED VIEW CONCURRENTLY daily_cost_by_team and attribution views.Alert on failure — do not retry blindly
08:00EvaluationRun budget evaluator + anomaly detector. Publish alerts to SNS.Non-destructive — safe to retry
08:30Correlation sweepRun attribution correlation engine on untagged resources. Write suggestions.Skip if rollup failed
Never silently skip

Every stage must emit a success or failure signal to your orchestration layer (Airflow, Step Functions, Cloud Composer). A silent skip — where a stage fails but the pipeline continues — is the most dangerous failure mode. It produces a rollup refresh from incomplete data, which then drives incorrect budget evaluations that may not be detected for days.

Alert Threshold Calibration

The default thresholds — 80% forecast for budget alerts, 2.5σ for anomaly detection — are starting points, not permanent settings. Calibrate them based on the first two to four weeks of real alerts.

Budget alert thresholds. If teams are receiving daily WARNING alerts for weeks without being able to act on them, the threshold is too low. Either raise the percentage, increase the re-alert suppression window, or reduce alert frequency to weekly for teams with stable, predictable spend. If CRITICAL alerts are arriving the same day a period closes with an overage, the threshold is too high — teams need more lead time.

Anomaly detection sensitivity. Track the false positive rate for the first month: count alerts where the team confirmed there was no actual problem. A false positive rate above 20% indicates the sigma threshold is too low. Raise it from 2.5 to 3.0 and observe. For resources with inherently volatile spend — development environments, spot instances, data pipeline jobs — consider per-resource threshold overrides rather than a global setting.

SQL ops/alert_false_positive_rate.sql
-- False positive rate by team (last 30 days)
SELECT
    team,
    COUNT(*) AS total_anomaly_alerts,
    COUNT(*) FILTER (WHERE resolved_at - detected_at < interval '4 hours')
             AS likely_false_positives,
    ROUND(100.0 * COUNT(*) FILTER (WHERE resolved_at - detected_at < interval '4 hours')
                 / NULLIF(COUNT(*), 0), 1) AS fp_pct
FROM  anomaly_history
WHERE detected_at >= now() - interval '30 days'
GROUP BY team
ORDER BY fp_pct DESC;

Ongoing Maintenance Practices

Monthly partition management. The cloud_cost_facts table is partitioned by month. Add new partitions at least one month ahead — ideally via pg_partman which automates this. A missing partition causes inserts to fail silently in some PostgreSQL configurations.

Quarterly tag alias review. The tag alias list in ingestion/tags.py accumulates drift as teams adopt new naming conventions. Review it quarterly — check the __untagged__ spend trend and run a query against raw Tags JSON to identify tag keys that appear frequently but aren’t being extracted.

SQL ops/unrecognised_tag_keys.sql
-- Find tag keys that appear frequently but aren't mapped to attribution dimensions
SELECT
    key,
    COUNT(*) AS appearances,
    SUM("BilledCost") AS associated_cost
FROM  cloud_cost_facts,
      jsonb_each_text("Tags") AS t(key, value)
WHERE department IS NULL
  AND  charge_date >= current_date - 30
  AND  key NOT IN ('Name', 'aws:createdBy', 'aws:cloudformation:stack-name')
GROUP BY key
HAVING  COUNT(*) > 100
ORDER BY associated_cost DESC
LIMIT 20;

Budget review cadence. Budgets become stale as organisations change. A team that doubled headcount six months ago should not be operating against a budget set before that growth. Include a quarterly budget review as a standing agenda item in engineering leadership meetings. The platform makes this easy — the historical spend trend for each team is one query away.

Health metrics to track weekly

Untagged spend percentage by provider. Pipeline success rate (ingestion + rollup). Alert false positive rate by team. Sparse group percentage (from cardinality filter). Attribution suggestion acceptance rate. These five metrics tell you whether the platform is healthy and improving over time.

The foundation is now in place. Post 2 builds the forecasting and budget alert layer on top of this data store, giving teams early warning before periods close over budget.

Build this for your organisation

If your team is building a FinOps platform and wants to talk through architecture decisions, tag governance, or cloud cost strategy — get in touch.