Kymeca
FinOps Engineering

FinOps Engineering: Utilisation, Rightsizing, and Multi-Dimensional Cost

Mark 12 min read Part 4 of 5 — FinOps Engineering
The winding road up from the pier in Drumquin, symbolizing finding the perfect path and optimal utilisation.

Drumquin Pier, Dingle — Patrick / Unsplash

Billing Tells You What You Paid. Utilisation Tells You What You Got.

Every part of this series so far has worked with billing data — what the cloud charged you for running each resource. Billing data is complete, structured, and now standardised by FOCUS. But it is blind to one critical dimension: how much of what you paid for was actually used.

A resource that costs $500/month and runs at 8% average CPU utilisation tells a different story from one that costs $500/month at 95% CPU. The first is a rightsizing candidate — it is paying for capacity it doesn’t need. The second is working efficiently and may even be under-provisioned. Without utilisation data, the billing report treats both identically.

Cost is the bill. Utilisation is the receipt. The gap between them is where the savings hide.

No FOCUS equivalent for utilisation

FOCUS solved the billing normalisation problem. There is currently no equivalent open standard for utilisation metrics. Each cloud provider exposes monitoring data differently — CloudWatch, Azure Monitor, and GCP Cloud Monitoring all have different APIs, different metric names, different granularity options, and different retention policies. This post requires per-cloud adapter code that cannot be collapsed into a single universal loader the way FOCUS billing ingestion was.

What Utilisation Metrics to Collect

The metric set depends on the resource type. For a first implementation, focus on compute — it represents the largest share of spend for most organisations and has the clearest utilisation signal.

Resource TypeKey MetricSourceRightsizing signal
EC2 / VM / GCEcpu_utilisation_pctCloudWatch / Azure Monitor / GCPAverage < 20% over 14 days → candidate
EC2 / VM / GCEmemory_utilisation_pctCloudWatch agent / VM InsightsAverage < 20% → candidate (must install agent)
RDS / SQL DB / Cloud SQLdb_connections_avgCloudWatch / Azure MonitorMax connections < 10% of provisioned → candidate
RDS / SQL DB / Cloud SQLstorage_used_pctCloudWatch / Azure MonitorUsed < 30% of provisioned storage
EBS / Managed Diskiops_utilisation_pctCloudWatch / Azure MonitorPeak IOPS < 20% of provisioned
Load Balanceractive_connections_avgCloudWatch / Azure MonitorSustained near-zero → likely orphaned

The Utilisation Ingestion Pipeline

Unlike FOCUS billing exports which are delivered to cloud storage daily, utilisation metrics are fetched via API. Each cloud has its own API surface — the adapters below normalise these into a common schema.

Python utilisation/aws_cloudwatch.py
import boto3
from datetime  import datetime, timedelta, timezone
from .schema   import UtilisationRecord

cw = boto3.client('cloudwatch')
ec2 = boto3.client('ec2')

METRICS = [
    ('AWS/EC2', 'CPUUtilization',    'cpu_utilisation_pct'),
    ('CWAgent', 'mem_used_percent',   'memory_utilisation_pct'),
]


def fetch_ec2_utilisation(target_date: datetime.date) -> list[UtilisationRecord]:
    start = datetime(*target_date.timetuple()[:3], tzinfo=timezone.utc)
    end   = start + timedelta(days=1)

    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )
    instance_ids = [
        i['InstanceId']
        for r in instances['Reservations']
        for i in r['Instances']
    ]

    records = []
    for namespace, metric_name, field_name in METRICS:
        for instance_id in instance_ids:
            resp = cw.get_metric_statistics(
                Namespace  = namespace,
                MetricName = metric_name,
                Dimensions = [{'Name': 'InstanceId', 'Value': instance_id}],
                StartTime  = start, EndTime = end,
                Period     = 86400,   # 1 day in seconds
                Statistics = ['Average', 'Maximum'],
            )
            if not resp['Datapoints']:
                continue
            dp = resp['Datapoints'][0]
            records.append(UtilisationRecord(
                resource_id   = instance_id,
                provider      = 'aws',
                metric_name   = field_name,
                metric_date   = target_date,
                average_value = dp['Average'],
                max_value     = dp['Maximum'],
            ))
    return records
SQL schema/resource_utilisation.sql
CREATE TABLE resource_utilisation (
    id            uuid     PRIMARY KEY DEFAULT gen_random_uuid(),
    resource_id   varchar  NOT NULL,
    provider      varchar  NOT NULL,
    metric_name   varchar  NOT NULL,   -- 'cpu_utilisation_pct', 'memory_utilisation_pct' etc
    metric_date   date     NOT NULL,
    average_value numeric(8,4),
    max_value     numeric(8,4),
    p95_value     numeric(8,4),
    fetched_at    timestamptz DEFAULT now(),
    UNIQUE (resource_id, provider, metric_name, metric_date)
) PARTITION BY RANGE (metric_date);

CREATE INDEX ON resource_utilisation (resource_id, metric_date);
CREATE INDEX ON resource_utilisation (metric_name, metric_date);

Joining Billing and Utilisation

With both tables populated, we can join them by resource ID and date to produce a combined cost/utilisation view. This is the analytical foundation for the rightsizing engine.

SQL schema/resource_cost_utilisation.sql
CREATE MATERIALIZED VIEW resource_cost_utilisation AS
SELECT
    f."ResourceId"                                    AS resource_id,
    f."ProviderName"                                   AS provider,
    f."ServiceName",
    f."ServiceCategory",
    f.department, f.team, f.environment,
    f.charge_date,

    -- Billing dimensions
    SUM(f."BilledCost")                               AS billed_cost_usd,
    SUM(COALESCE(f."EffectiveCost", f."BilledCost"))  AS effective_cost_usd,

    -- Utilisation dimensions (null if no monitoring data)
    MAX(cpu.average_value)  AS cpu_avg_pct,
    MAX(cpu.max_value)     AS cpu_max_pct,
    MAX(mem.average_value)  AS mem_avg_pct,
    MAX(mem.max_value)     AS mem_max_pct

FROM  cloud_cost_facts f
LEFT JOIN resource_utilisation cpu
       ON  cpu.resource_id = f."ResourceId"
      AND  cpu.metric_date = f.charge_date
      AND  cpu.metric_name = 'cpu_utilisation_pct'
LEFT JOIN resource_utilisation mem
       ON  mem.resource_id = f."ResourceId"
      AND  mem.metric_date = f.charge_date
      AND  mem.metric_name = 'memory_utilisation_pct'
GROUP BY 1,2,3,4,5,6,7,8
WITH DATA;

The Four-Quadrant Model

With cost and utilisation joined per resource, every resource can be positioned in a two-dimensional space. The quadrant it occupies determines the appropriate optimisation action.

QuadrantDescriptionAction
High Cost · High UtilisationWorking as intended. Cost justified by demand.Consider commitment discounts (Reserved Instances, Savings Plans) to reduce effective cost at current utilisation.
Low Cost · High UtilisationEfficient but at risk. May be under-provisioned.Monitor for performance degradation. Not a rightsizing candidate — possibly a previous optimisation success.
High Cost · Low UtilisationPrimary rightsizing targets. Oversized instance types, unused Reserved Instances, or dev environments sized for production workloads.Downsize to recommended SKU.
Low Cost · Low UtilisationOrphan candidates. Low absolute cost but many such resources aggregate to material waste.Evaluate for termination — especially resources with zero utilisation for 7+ days.
SQL analytics/quadrant_classification.sql
-- Classify resources by 14-day rolling cost and utilisation averages
SELECT
    resource_id, provider, "ServiceName", department, team,
    AVG(effective_cost_usd) AS avg_daily_cost,
    AVG(cpu_avg_pct)        AS avg_cpu_pct,
    CASE
        WHEN AVG(effective_cost_usd) > cost_p50
         AND AVG(cpu_avg_pct) > util_p50   THEN 'high_cost_high_util'
        WHEN AVG(effective_cost_usd) > cost_p50
         AND AVG(cpu_avg_pct) <= util_p50   THEN 'high_cost_low_util'
        WHEN AVG(effective_cost_usd) <= cost_p50
         AND AVG(cpu_avg_pct) > util_p50   THEN 'low_cost_high_util'
        ELSE 'low_cost_low_util'
    END AS quadrant
FROM  resource_cost_utilisation,
      -- Dynamic percentile thresholds from rolling window
      (SELECT
          PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY effective_cost_usd) AS cost_p50,
          PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY cpu_avg_pct)        AS util_p50
       FROM resource_cost_utilisation
       WHERE charge_date >= current_date - 14
         AND cpu_avg_pct IS NOT NULL) thresholds
WHERE charge_date >= current_date - 14
  AND  cpu_avg_pct IS NOT NULL
GROUP BY 1,2,3,4,5, cost_p50, util_p50
ORDER BY avg_daily_cost DESC;

The Rightsizing Engine

The utilisation layer classifies resources into quadrants. The rightsizing engine converts that classification into a concrete recommendation: a specific smaller SKU, a monthly saving in dollars, and enough supporting context for an engineer to act on the suggestion without additional research.

A recommendation that says “this instance is oversized” is interesting. A recommendation that says “downsize i-0a4b2c8d from m6i.4xlarge to m6i.xlarge — your 14-day peak CPU was 18%, peak memory was 24%, estimated monthly saving is $312” is actionable. The difference is a SKU catalogue and a quantification step.

The job of a rightsizing engine is not to identify waste. It is to make acting on waste cheaper than ignoring it.

The SKU Catalogue

Each cloud provider offers a hierarchy of instance sizes. The SKU catalogue is a reference table mapping instance types to their vCPU, memory, and hourly on-demand price. AWS publishes this via the Pricing API; Azure and GCP have equivalent endpoints. Refresh the catalogue monthly — prices and new instance types change frequently.

SQL schema/instance_skus.sql
CREATE TABLE instance_skus (
    provider        varchar      NOT NULL,
    instance_type   varchar      NOT NULL,
    family          varchar,     -- 'm6i', 'r6i', 'c6i' etc
    vcpu            integer,
    memory_gb       numeric(8,2),
    price_usd_hour  numeric(10,6),
    region          varchar,
    os              varchar      DEFAULT 'Linux',
    refreshed_at    timestamptz  DEFAULT now(),
    PRIMARY KEY (provider, instance_type, region)
);

-- Lookup function: given a family and target vCPU, return the smallest fitting SKU
CREATE OR REPLACE FUNCTION next_smaller_sku(
    p_provider      varchar,
    p_instance_type varchar,
    p_region        varchar,
    p_headroom_pct  numeric DEFAULT 30   -- keep 30% headroom above peak utilisation
) RETURNS TABLE(instance_type varchar, vcpu integer, memory_gb numeric, price_usd_hour numeric) AS $$
WITH current_sku AS (
    SELECT family, vcpu, memory_gb, price_usd_hour
    FROM  instance_skus
    WHERE provider = p_provider AND instance_type = p_instance_type AND region = p_region
)
SELECT s.instance_type, s.vcpu, s.memory_gb, s.price_usd_hour
FROM  instance_skus s, current_sku c
WHERE s.provider = p_provider
  AND  s.family   = c.family
  AND  s.region   = p_region
  AND  s.vcpu     < c.vcpu
ORDER BY s.vcpu DESC
LIMIT 1;
$$ LANGUAGE sql;

Generating Recommendations

The engine iterates over resources in the high-cost/low-utilisation quadrant, computes the recommended downsize target using peak utilisation (not average — always size against the peak), and quantifies the monthly saving as the cost delta between current and recommended instance types.

Python rightsizing/engine.py
from dataclasses import dataclass
from datetime  import date
from .db       import fetch_high_cost_low_util, fetch_sku_recommendation, upsert_recommendation

@dataclass
class RightsizingRecommendation:
    resource_id:         str
    provider:            str
    region:              str
    current_instance:    str
    current_cost_month:  float
    recommended_instance:str
    recommended_cost_month:float
    monthly_saving_usd:  float
    cpu_avg_pct:         float
    cpu_peak_pct:        float
    mem_peak_pct:        float | None
    window_days:         int
    confidence:          str
    department:          str | None
    team:                str | None


def generate_recommendations(
    window_days:     int   = 14,
    min_saving_usd:  float = 50.0,   # ignore sub-$50/mo savings
    headroom_pct:    float = 30.0,   # reserve 30% above peak
) -> list[RightsizingRecommendation]:

    candidates = fetch_high_cost_low_util(window_days=window_days)
    recommendations = []

    for resource in candidates:
        # Use peak (not average) CPU for sizing — never downsize below peak needs
        effective_util = max(resource.cpu_peak_pct, resource.mem_peak_pct or 0)
        required_capacity = effective_util * (1 + headroom_pct / 100)

        rec_sku = fetch_sku_recommendation(
            provider       = resource.provider,
            instance_type  = resource.instance_type,
            region         = resource.region,
            required_pct   = required_capacity,
            headroom_pct   = headroom_pct,
        )
        if not rec_sku:
            continue  # already smallest in family

        current_cost_month  = resource.avg_daily_cost * 30
        recommended_cost_month = rec_sku.price_usd_hour * 24 * 30
        saving = current_cost_month - recommended_cost_month

        if saving < min_saving_usd:
            continue

        confidence = (
            'high'   if window_days >= 14 and resource.cpu_peak_pct < 30
            else 'medium' if resource.cpu_peak_pct < 50
            else 'low'
        )

        rec = RightsizingRecommendation(
            resource_id            = resource.resource_id,
            provider               = resource.provider,
            region                 = resource.region,
            current_instance       = resource.instance_type,
            current_cost_month     = current_cost_month,
            recommended_instance   = rec_sku.instance_type,
            recommended_cost_month = recommended_cost_month,
            monthly_saving_usd     = saving,
            cpu_avg_pct            = resource.cpu_avg_pct,
            cpu_peak_pct           = resource.cpu_peak_pct,
            mem_peak_pct           = resource.mem_peak_pct,
            window_days            = window_days,
            confidence             = confidence,
            department             = resource.department,
            team                   = resource.team,
        )
        recommendations.append(rec)
        upsert_recommendation(rec)

    return sorted(recommendations, key=lambda r: r.monthly_saving_usd, reverse=True)
Rightsizing is never zero-risk

Always include caveats in recommendation delivery. Peak CPU over 14 days may not capture seasonal load spikes — a service that handles Black Friday traffic at 10× normal load will appear safe to downsize in January. Require explicit acknowledgement from the resource owner before any resize is actioned. The recommendation engine surfaces the opportunity; the human decides whether to take it.

Feeding Recommendations Back to the Alert Layer

Rightsizing recommendations are stored in a rightsizing_recommendations table and surfaced in the budget alert payload for teams that are over budget. Rather than a generic “your forecast is 112% of budget” alert, the message now includes: “your three highest-potential rightsizing recommendations would reduce spend by $706/month — here are the resources.” This transforms budget alerts from information into a concrete action list.

Structural Attribution via Clustering

The utilisation data also provides a second, independent signal for attribution that complements the temporal correlation from Post 3. Where temporal correlation asks “do these resources’ costs move in lockstep over time?”, structural clustering asks “do these resources occupy the same position in cost/utilisation space?”

This addresses resources that are new, or whose cost time series is too short or too flat for temporal correlation to be meaningful. Resources that cluster tightly together in cost/utilisation space likely belong to the same workload.

Python attribution/structural_clustering.py
import numpy  as np
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from .db import fetch_resource_cost_util_vectors, fetch_tagged_group_centroids


def structural_attribution_suggestions(
    window_days: int  = 14,
    eps:         float = 0.5,   # DBSCAN neighbourhood radius
    min_samples: int  = 3,
) -> list[dict]:

    # Fetch [resource_id, avg_cost, avg_cpu, avg_mem] vectors
    vectors     = fetch_resource_cost_util_vectors(window_days=window_days)
    untagged    = [v for v in vectors if v.team is None]
    tagged      = [v for v in vectors if v.team is not None]

    if not untagged or not tagged:
        return []

    # Normalise all vectors to unit scale (cost and util are different magnitudes)
    scaler      = StandardScaler()
    all_X       = np.array([[v.avg_cost, v.avg_cpu, v.avg_mem or 0] for v in vectors])
    all_X_norm  = scaler.fit_transform(all_X)

    tagged_X    = all_X_norm[:len(tagged)]
    untagged_X  = all_X_norm[len(tagged):]

    # Cluster the full space — DBSCAN handles irregular shapes better than k-means
    labels      = DBSCAN(eps=eps, min_samples=min_samples).fit_predict(all_X_norm)

    suggestions = []
    for i, (resource, label) in enumerate(zip(untagged, labels[len(tagged):])):
        if label == -1:
            continue  # DBSCAN noise point — no cluster membership

        # Find which tagged resources share this cluster label
        cluster_tagged = [
            tagged[j] for j, l in enumerate(labels[:len(tagged)]) if l == label
        ]
        if not cluster_tagged:
            continue

        # If cluster is dominated by one team, suggest that attribution
        teams = [t.team for t in cluster_tagged]
        top_team = max(set(teams), key=teams.count)
        dominance = teams.count(top_team) / len(teams)

        if dominance >= 0.7:   # 70%+ of cluster is one team → suggest it
            top_tagged = next(t for t in cluster_tagged if t.team == top_team)
            suggestions.append({
                'resource_id':    resource.resource_id,
                'suggested_dept': top_tagged.department,
                'suggested_team': top_team,
                'method':         'structural_cluster',
                'cluster_label':  int(label),
                'dominance':      dominance,
                'cluster_size':   len(cluster_tagged),
            })

    return suggestions
Temporal vs structural — complementary signals

Use both attribution methods and combine their outputs in the suggestion review UI. A resource suggested by both temporal correlation (Post 3) and structural clustering (this section) is a high-confidence candidate for the suggested team. A resource suggested by only one method deserves more scrutiny before confirmation. The source field in attribution_overrides can store 'temporal_correlation', 'structural_cluster', or 'both' to track provenance.

Beyond CPU: The Limits of the Capacity Model

The CPU utilisation framework is grounded in a specific billing model: you provision capacity (an instance size), and efficiency is measured as the fraction of that capacity actually consumed. This maps naturally to EC2, VM-series, and most compute resources.

But a growing portion of cloud spend doesn’t follow this model at all. AI inference APIs charge per token, not per hour. Data transfer charges accumulate per gigabyte moved, not per instance running. Managed databases charge for operations as well as storage. SaaS APIs charge per API call or per seat. None of these have a “provisioned capacity” to compare consumption against.

The question for these service types is not “what percentage of provisioned capacity did you use?” It is “what did each unit of business output cost, and how does that compare to what it should cost?” That reframing requires a generalised unit cost model.

CPU efficiency tells you how well you used what you provisioned. Unit cost tells you how efficiently you’re converting cloud spend into business output.

The Unit Cost Registry

The generalised model is built around a registry that maps resource types to their relevant cost dimension and the business event that represents one unit of output. The registry is configuration, not code — it can be extended to new service types without modifying the cost calculation engine.

SQL schema/unit_cost_registry.sql
CREATE TABLE unit_cost_registry (
    id               uuid     PRIMARY KEY DEFAULT gen_random_uuid(),
    service_category varchar  NOT NULL,   -- 'AI and Machine Learning', 'Networking' etc
    service_name     varchar,              -- 'Amazon Bedrock', 'Azure OpenAI' etc — NULL = all
    cost_dimension   varchar  NOT NULL,   -- 'tokens', 'gb_transferred', 'api_calls', 'operations'
    output_unit      varchar  NOT NULL,   -- 'request', 'document_processed', 'image_generated'
    metric_source    varchar  NOT NULL,   -- 'cloudwatch', 'billing_tags', 'app_telemetry'
    metric_name      varchar  NOT NULL,   -- metric key in resource_utilisation or app event store
    notes            text
);

-- Seed with known service types
INSERT INTO unit_cost_registry (service_category, service_name, cost_dimension, output_unit, metric_source, metric_name) VALUES
  ('AI and Machine Learning', 'Amazon Bedrock',        'tokens',        'request',             'billing_tags',   'input_tokens,output_tokens'),
  ('AI and Machine Learning', 'Azure OpenAI',          'tokens',        'request',             'billing_tags',   'input_tokens,output_tokens'),
  ('Networking',             NULL,                    'gb_transferred', 'gb',                  'cloudwatch',     'NetworkOut'),
  ('Storage',                'Amazon S3',             'operations',    '1000_requests',       'billing_tags',   'request_count'),
  ('Database',               'Amazon DynamoDB',       'operations',    'read_write_unit',     'cloudwatch',     'ConsumedWriteCapacityUnits');

AI Inference Cost Modelling

AI inference is now a material line item in cloud bills for any organisation using LLM-based features. The cost structure is unusual: input tokens and output tokens are priced differently (output tokens cost 3–5× more than input tokens for most frontier models), and the model selection — which model you use for a given task — is often the single biggest lever on unit cost.

FOCUS 1.2+ and virtual currency columns

FOCUS 1.2 introduced x_ServiceUsageType and usage quantity columns that, for AI services, may expose token counts directly in the billing export. Check your provider’s FOCUS documentation — if token counts are present in the export, you can skip the CloudWatch metric lookup for AI services entirely. AWS Bedrock populates these columns for supported models.

Python ai_cost/model_efficiency.py
from dataclasses import dataclass
from .db import fetch_model_usage_daily, fetch_model_pricing

@dataclass
class ModelEfficiencyReport:
    model_id:            str
    service_name:        str
    department:          str
    team:                str
    period_days:         int
    total_requests:      int
    total_input_tokens:  int
    total_output_tokens: int
    total_cost_usd:      float
    cost_per_request:    float
    cost_per_1k_tokens:  float
    input_output_ratio:  float    # avg output tokens per input token
    cheaper_model:       str | None  # model suggestion if task quality allows
    potential_saving_usd:float


def analyse_model_efficiency(
    window_days: int = 30,
) -> list[ModelEfficiencyReport]:

    usage   = fetch_model_usage_daily(window_days=window_days)
    pricing = fetch_model_pricing()   # loaded from model_pricing table (refreshed monthly)
    reports = []

    for (model_id, dept, team), days in usage.items():
        total_input    = sum(d.input_tokens  for d in days)
        total_output   = sum(d.output_tokens for d in days)
        total_requests = sum(d.request_count  for d in days)
        total_cost     = sum(d.cost_usd       for d in days)

        p              = pricing[model_id]
        io_ratio       = total_output / max(total_input, 1)

        # Find cheaper model in same capability tier
        cheaper_model  = None
        potential_save = 0.0
        for alt_id, alt_p in pricing.items():
            if alt_id == model_id: continue
            if alt_p.capability_tier != p.capability_tier: continue
            alt_cost = (
                (total_input  / 1_000_000) * alt_p.input_price_per_million  +
                (total_output / 1_000_000) * alt_p.output_price_per_million
            )
            saving = total_cost - alt_cost
            if saving > potential_save:
                potential_save = saving
                cheaper_model  = alt_id

        reports.append(ModelEfficiencyReport(
            model_id            = model_id,
            service_name        = p.service_name,
            department          = dept,
            team                = team,
            period_days         = window_days,
            total_requests      = total_requests,
            total_input_tokens  = total_input,
            total_output_tokens = total_output,
            total_cost_usd      = total_cost,
            cost_per_request    = total_cost / max(total_requests, 1),
            cost_per_1k_tokens  = total_cost / max((total_input + total_output) / 1000, 1),
            input_output_ratio  = io_ratio,
            cheaper_model       = cheaper_model,
            potential_saving_usd= potential_save,
        ))

    return sorted(reports, key=lambda r: r.total_cost_usd, reverse=True)

Key AI cost efficiency signals

Network Transfer Cost Attribution

Network transfer is one of the most consistently misunderstood cost categories. The billing entry appears as a single line — “Data Transfer Out” — with a total cost and a total GB figure. But which services generated that transfer? Which teams are responsible? Without additional attribution, all network costs land in __untagged__.

FOCUS 1.3 introduced shared cost allocation columns that provide a standardised way to split shared costs across beneficiaries. For network costs specifically, this enables the cloud provider’s flow-level attribution data to be represented in the billing export where available. Where it isn’t available, the platform can approximate attribution using flow metrics from CloudWatch or Azure Network Watcher.

SQL schema/network_flow_attribution.sql
-- Store flow-level data transfer metrics per source resource
CREATE TABLE network_flow_daily (
    resource_id       varchar     NOT NULL,  -- source of outbound transfer
    provider          varchar     NOT NULL,
    destination_type  varchar,                -- 'internet', 'cross_region', 'cross_az', 'same_az'
    destination_region varchar,
    flow_date         date        NOT NULL,
    gb_transferred    numeric(12,4) NOT NULL,
    estimated_cost    numeric(10,4),          -- derived from per-GB rate for this transfer type
    UNIQUE(resource_id, provider, destination_type, destination_region, flow_date)
);

-- Attribution join: what fraction of total network cost does each resource represent?
CREATE VIEW network_cost_by_team AS
SELECT
    f.department, f.team,
    n.destination_type,
    n.flow_date,
    SUM(n.gb_transferred)    AS total_gb,
    SUM(n.estimated_cost)    AS estimated_cost_usd,
    COUNT(DISTINCT n.resource_id) AS source_resources
FROM  network_flow_daily n
JOIN  cloud_cost_facts f
       ON  f."ResourceId" = n.resource_id
      AND  f.charge_date   = n.flow_date
GROUP BY 1,2,3,4;
Cross-AZ transfer — the hidden cost

Cross-AZ data transfer is typically priced at $0.01/GB in both directions — seemingly small, but services making frequent cross-AZ calls at high throughput accumulate significant charges. The most common cause is a service configured to read from a replica in a different AZ than the application tier, or a microservices architecture with no affinity routing. Network flow attribution makes this visible; the fix is usually an architecture change.

Storage Multi-Dimensional Cost

S3, Azure Blob Storage, and GCS are priced across at least four independent dimensions simultaneously: capacity stored (GB/month), requests (GET, PUT, LIST operations), data retrieval (for infrequent access tiers), and lifecycle transitions. A resource showing low capacity utilisation may still be generating high costs from request volumes or retrieval fees.

DimensionSignalWhat to look for
CapacityGB stored × daysObjects older than retention requirement should be in Glacier or deleted
RequestGET/PUT/LIST operationsHigh LIST counts often indicate a misconfigured polling application
RetrievalGB retrieved from IA/GlacierRetrieval fees indicate objects in cold storage being accessed frequently — wrong storage class
TransferGB transferred outCheck for unintentional public exposure — public buckets serving large files bypass CDN and incur full egress rates

The Generalised Unit Cost View

With the unit cost registry defined, a single view can compute per-unit costs across all registered service types, joining billing data with the appropriate utilisation or usage metric for each service category.

SQL analytics/unit_cost_by_team.sql
-- Generalised unit cost aggregation across all registered service types
SELECT
    f.department,
    f.team,
    f."ServiceCategory",
    f."ServiceName",
    r.cost_dimension,
    r.output_unit,
    f.charge_date,

    SUM(f.effective_cost_usd)               AS total_cost_usd,
    SUM(u.average_value)                     AS total_units,

    -- Cost per output unit
    ROUND(
        SUM(f.effective_cost_usd) /
        NULLIF(SUM(u.average_value), 0),
    6) AS cost_per_unit

FROM  daily_cost_by_team f
JOIN  unit_cost_registry r
       ON  r.service_category = f."ServiceCategory"
      AND  (r.service_name = f."ServiceName" OR r.service_name IS NULL)
LEFT JOIN resource_utilisation u
       ON  u.metric_name = r.metric_name
      AND  u.metric_date = f.charge_date
WHERE f.charge_date >= current_date - 30
GROUP BY 1,2,3,4,5,6,7
HAVING  SUM(u.average_value) > 0
ORDER BY total_cost_usd DESC;
Unit cost trends are more useful than point values

The absolute cost per request for a given service type is interesting. The trend of cost per request over time is actionable. A rising unit cost trend — cost increasing faster than request volume — indicates a degrading efficiency pattern: model selection drift upward, caching disabled, prompt bloat, unnecessary retries, or architecture regressions. Configure anomaly detection on unit cost trends using the same rolling statistical model from Post 3, applied to cost_per_unit as the metric.

What This Adds to the Platform

The generalised unit cost model extends the platform’s analytical reach from capacity-based compute — where CPU utilisation was the efficiency signal — to the full spectrum of cloud service pricing models. AI inference, network transfer, storage operations, and managed service consumption are all now visible through the same lens: cost per unit of business output.

The anomaly history from Post 3, now enriched with unit cost anomalies from this post, feeds directly into Post 5. There, deployment correlation joins these timestamps against git history and CI/CD events to identify the specific code change that caused each cost spike — and to propose a fix back to the codebase as a draft pull request.

Building an optimisation programme?

Rightsizing recommendations, AI cost analysis, and network attribution are only as useful as the processes that act on them. If you’re working on building a FinOps practice and want to discuss governance, tooling, or team structure — get in touch.