Drumquin Pier, Dingle — Patrick / Unsplash
Every part of this series so far has worked with billing data — what the cloud charged you for running each resource. Billing data is complete, structured, and now standardised by FOCUS. But it is blind to one critical dimension: how much of what you paid for was actually used.
A resource that costs $500/month and runs at 8% average CPU utilisation tells a different story from one that costs $500/month at 95% CPU. The first is a rightsizing candidate — it is paying for capacity it doesn’t need. The second is working efficiently and may even be under-provisioned. Without utilisation data, the billing report treats both identically.
Cost is the bill. Utilisation is the receipt. The gap between them is where the savings hide.
FOCUS solved the billing normalisation problem. There is currently no equivalent open standard for utilisation metrics. Each cloud provider exposes monitoring data differently — CloudWatch, Azure Monitor, and GCP Cloud Monitoring all have different APIs, different metric names, different granularity options, and different retention policies. This post requires per-cloud adapter code that cannot be collapsed into a single universal loader the way FOCUS billing ingestion was.
The metric set depends on the resource type. For a first implementation, focus on compute — it represents the largest share of spend for most organisations and has the clearest utilisation signal.
| Resource Type | Key Metric | Source | Rightsizing signal |
|---|---|---|---|
| EC2 / VM / GCE | cpu_utilisation_pct | CloudWatch / Azure Monitor / GCP | Average < 20% over 14 days → candidate |
| EC2 / VM / GCE | memory_utilisation_pct | CloudWatch agent / VM Insights | Average < 20% → candidate (must install agent) |
| RDS / SQL DB / Cloud SQL | db_connections_avg | CloudWatch / Azure Monitor | Max connections < 10% of provisioned → candidate |
| RDS / SQL DB / Cloud SQL | storage_used_pct | CloudWatch / Azure Monitor | Used < 30% of provisioned storage |
| EBS / Managed Disk | iops_utilisation_pct | CloudWatch / Azure Monitor | Peak IOPS < 20% of provisioned |
| Load Balancer | active_connections_avg | CloudWatch / Azure Monitor | Sustained near-zero → likely orphaned |
Unlike FOCUS billing exports which are delivered to cloud storage daily, utilisation metrics are fetched via API. Each cloud has its own API surface — the adapters below normalise these into a common schema.
import boto3
from datetime import datetime, timedelta, timezone
from .schema import UtilisationRecord
cw = boto3.client('cloudwatch')
ec2 = boto3.client('ec2')
METRICS = [
('AWS/EC2', 'CPUUtilization', 'cpu_utilisation_pct'),
('CWAgent', 'mem_used_percent', 'memory_utilisation_pct'),
]
def fetch_ec2_utilisation(target_date: datetime.date) -> list[UtilisationRecord]:
start = datetime(*target_date.timetuple()[:3], tzinfo=timezone.utc)
end = start + timedelta(days=1)
instances = ec2.describe_instances(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
instance_ids = [
i['InstanceId']
for r in instances['Reservations']
for i in r['Instances']
]
records = []
for namespace, metric_name, field_name in METRICS:
for instance_id in instance_ids:
resp = cw.get_metric_statistics(
Namespace = namespace,
MetricName = metric_name,
Dimensions = [{'Name': 'InstanceId', 'Value': instance_id}],
StartTime = start, EndTime = end,
Period = 86400, # 1 day in seconds
Statistics = ['Average', 'Maximum'],
)
if not resp['Datapoints']:
continue
dp = resp['Datapoints'][0]
records.append(UtilisationRecord(
resource_id = instance_id,
provider = 'aws',
metric_name = field_name,
metric_date = target_date,
average_value = dp['Average'],
max_value = dp['Maximum'],
))
return records CREATE TABLE resource_utilisation (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
resource_id varchar NOT NULL,
provider varchar NOT NULL,
metric_name varchar NOT NULL, -- 'cpu_utilisation_pct', 'memory_utilisation_pct' etc
metric_date date NOT NULL,
average_value numeric(8,4),
max_value numeric(8,4),
p95_value numeric(8,4),
fetched_at timestamptz DEFAULT now(),
UNIQUE (resource_id, provider, metric_name, metric_date)
) PARTITION BY RANGE (metric_date);
CREATE INDEX ON resource_utilisation (resource_id, metric_date);
CREATE INDEX ON resource_utilisation (metric_name, metric_date); With both tables populated, we can join them by resource ID and date to produce a combined cost/utilisation view. This is the analytical foundation for the rightsizing engine.
CREATE MATERIALIZED VIEW resource_cost_utilisation AS
SELECT
f."ResourceId" AS resource_id,
f."ProviderName" AS provider,
f."ServiceName",
f."ServiceCategory",
f.department, f.team, f.environment,
f.charge_date,
-- Billing dimensions
SUM(f."BilledCost") AS billed_cost_usd,
SUM(COALESCE(f."EffectiveCost", f."BilledCost")) AS effective_cost_usd,
-- Utilisation dimensions (null if no monitoring data)
MAX(cpu.average_value) AS cpu_avg_pct,
MAX(cpu.max_value) AS cpu_max_pct,
MAX(mem.average_value) AS mem_avg_pct,
MAX(mem.max_value) AS mem_max_pct
FROM cloud_cost_facts f
LEFT JOIN resource_utilisation cpu
ON cpu.resource_id = f."ResourceId"
AND cpu.metric_date = f.charge_date
AND cpu.metric_name = 'cpu_utilisation_pct'
LEFT JOIN resource_utilisation mem
ON mem.resource_id = f."ResourceId"
AND mem.metric_date = f.charge_date
AND mem.metric_name = 'memory_utilisation_pct'
GROUP BY 1,2,3,4,5,6,7,8
WITH DATA; With cost and utilisation joined per resource, every resource can be positioned in a two-dimensional space. The quadrant it occupies determines the appropriate optimisation action.
| Quadrant | Description | Action |
|---|---|---|
| High Cost · High Utilisation | Working as intended. Cost justified by demand. | Consider commitment discounts (Reserved Instances, Savings Plans) to reduce effective cost at current utilisation. |
| Low Cost · High Utilisation | Efficient but at risk. May be under-provisioned. | Monitor for performance degradation. Not a rightsizing candidate — possibly a previous optimisation success. |
| High Cost · Low Utilisation | Primary rightsizing targets. Oversized instance types, unused Reserved Instances, or dev environments sized for production workloads. | Downsize to recommended SKU. |
| Low Cost · Low Utilisation | Orphan candidates. Low absolute cost but many such resources aggregate to material waste. | Evaluate for termination — especially resources with zero utilisation for 7+ days. |
-- Classify resources by 14-day rolling cost and utilisation averages
SELECT
resource_id, provider, "ServiceName", department, team,
AVG(effective_cost_usd) AS avg_daily_cost,
AVG(cpu_avg_pct) AS avg_cpu_pct,
CASE
WHEN AVG(effective_cost_usd) > cost_p50
AND AVG(cpu_avg_pct) > util_p50 THEN 'high_cost_high_util'
WHEN AVG(effective_cost_usd) > cost_p50
AND AVG(cpu_avg_pct) <= util_p50 THEN 'high_cost_low_util'
WHEN AVG(effective_cost_usd) <= cost_p50
AND AVG(cpu_avg_pct) > util_p50 THEN 'low_cost_high_util'
ELSE 'low_cost_low_util'
END AS quadrant
FROM resource_cost_utilisation,
-- Dynamic percentile thresholds from rolling window
(SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY effective_cost_usd) AS cost_p50,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY cpu_avg_pct) AS util_p50
FROM resource_cost_utilisation
WHERE charge_date >= current_date - 14
AND cpu_avg_pct IS NOT NULL) thresholds
WHERE charge_date >= current_date - 14
AND cpu_avg_pct IS NOT NULL
GROUP BY 1,2,3,4,5, cost_p50, util_p50
ORDER BY avg_daily_cost DESC; The utilisation layer classifies resources into quadrants. The rightsizing engine converts that classification into a concrete recommendation: a specific smaller SKU, a monthly saving in dollars, and enough supporting context for an engineer to act on the suggestion without additional research.
A recommendation that says “this instance is oversized” is interesting. A recommendation that says “downsize i-0a4b2c8d from m6i.4xlarge to m6i.xlarge — your 14-day peak CPU was 18%, peak memory was 24%, estimated monthly saving is $312” is actionable. The difference is a SKU catalogue and a quantification step.
The job of a rightsizing engine is not to identify waste. It is to make acting on waste cheaper than ignoring it.
Each cloud provider offers a hierarchy of instance sizes. The SKU catalogue is a reference table mapping instance types to their vCPU, memory, and hourly on-demand price. AWS publishes this via the Pricing API; Azure and GCP have equivalent endpoints. Refresh the catalogue monthly — prices and new instance types change frequently.
CREATE TABLE instance_skus (
provider varchar NOT NULL,
instance_type varchar NOT NULL,
family varchar, -- 'm6i', 'r6i', 'c6i' etc
vcpu integer,
memory_gb numeric(8,2),
price_usd_hour numeric(10,6),
region varchar,
os varchar DEFAULT 'Linux',
refreshed_at timestamptz DEFAULT now(),
PRIMARY KEY (provider, instance_type, region)
);
-- Lookup function: given a family and target vCPU, return the smallest fitting SKU
CREATE OR REPLACE FUNCTION next_smaller_sku(
p_provider varchar,
p_instance_type varchar,
p_region varchar,
p_headroom_pct numeric DEFAULT 30 -- keep 30% headroom above peak utilisation
) RETURNS TABLE(instance_type varchar, vcpu integer, memory_gb numeric, price_usd_hour numeric) AS $$
WITH current_sku AS (
SELECT family, vcpu, memory_gb, price_usd_hour
FROM instance_skus
WHERE provider = p_provider AND instance_type = p_instance_type AND region = p_region
)
SELECT s.instance_type, s.vcpu, s.memory_gb, s.price_usd_hour
FROM instance_skus s, current_sku c
WHERE s.provider = p_provider
AND s.family = c.family
AND s.region = p_region
AND s.vcpu < c.vcpu
ORDER BY s.vcpu DESC
LIMIT 1;
$$ LANGUAGE sql; The engine iterates over resources in the high-cost/low-utilisation quadrant, computes the recommended downsize target using peak utilisation (not average — always size against the peak), and quantifies the monthly saving as the cost delta between current and recommended instance types.
from dataclasses import dataclass
from datetime import date
from .db import fetch_high_cost_low_util, fetch_sku_recommendation, upsert_recommendation
@dataclass
class RightsizingRecommendation:
resource_id: str
provider: str
region: str
current_instance: str
current_cost_month: float
recommended_instance:str
recommended_cost_month:float
monthly_saving_usd: float
cpu_avg_pct: float
cpu_peak_pct: float
mem_peak_pct: float | None
window_days: int
confidence: str
department: str | None
team: str | None
def generate_recommendations(
window_days: int = 14,
min_saving_usd: float = 50.0, # ignore sub-$50/mo savings
headroom_pct: float = 30.0, # reserve 30% above peak
) -> list[RightsizingRecommendation]:
candidates = fetch_high_cost_low_util(window_days=window_days)
recommendations = []
for resource in candidates:
# Use peak (not average) CPU for sizing — never downsize below peak needs
effective_util = max(resource.cpu_peak_pct, resource.mem_peak_pct or 0)
required_capacity = effective_util * (1 + headroom_pct / 100)
rec_sku = fetch_sku_recommendation(
provider = resource.provider,
instance_type = resource.instance_type,
region = resource.region,
required_pct = required_capacity,
headroom_pct = headroom_pct,
)
if not rec_sku:
continue # already smallest in family
current_cost_month = resource.avg_daily_cost * 30
recommended_cost_month = rec_sku.price_usd_hour * 24 * 30
saving = current_cost_month - recommended_cost_month
if saving < min_saving_usd:
continue
confidence = (
'high' if window_days >= 14 and resource.cpu_peak_pct < 30
else 'medium' if resource.cpu_peak_pct < 50
else 'low'
)
rec = RightsizingRecommendation(
resource_id = resource.resource_id,
provider = resource.provider,
region = resource.region,
current_instance = resource.instance_type,
current_cost_month = current_cost_month,
recommended_instance = rec_sku.instance_type,
recommended_cost_month = recommended_cost_month,
monthly_saving_usd = saving,
cpu_avg_pct = resource.cpu_avg_pct,
cpu_peak_pct = resource.cpu_peak_pct,
mem_peak_pct = resource.mem_peak_pct,
window_days = window_days,
confidence = confidence,
department = resource.department,
team = resource.team,
)
recommendations.append(rec)
upsert_recommendation(rec)
return sorted(recommendations, key=lambda r: r.monthly_saving_usd, reverse=True) Always include caveats in recommendation delivery. Peak CPU over 14 days may not capture seasonal load spikes — a service that handles Black Friday traffic at 10× normal load will appear safe to downsize in January. Require explicit acknowledgement from the resource owner before any resize is actioned. The recommendation engine surfaces the opportunity; the human decides whether to take it.
Rightsizing recommendations are stored in a rightsizing_recommendations table and surfaced in the budget alert payload for teams that are over budget. Rather than a generic “your forecast is 112% of budget” alert, the message now includes: “your three highest-potential rightsizing recommendations would reduce spend by $706/month — here are the resources.” This transforms budget alerts from information into a concrete action list.
The utilisation data also provides a second, independent signal for attribution that complements the temporal correlation from Post 3. Where temporal correlation asks “do these resources’ costs move in lockstep over time?”, structural clustering asks “do these resources occupy the same position in cost/utilisation space?”
This addresses resources that are new, or whose cost time series is too short or too flat for temporal correlation to be meaningful. Resources that cluster tightly together in cost/utilisation space likely belong to the same workload.
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from .db import fetch_resource_cost_util_vectors, fetch_tagged_group_centroids
def structural_attribution_suggestions(
window_days: int = 14,
eps: float = 0.5, # DBSCAN neighbourhood radius
min_samples: int = 3,
) -> list[dict]:
# Fetch [resource_id, avg_cost, avg_cpu, avg_mem] vectors
vectors = fetch_resource_cost_util_vectors(window_days=window_days)
untagged = [v for v in vectors if v.team is None]
tagged = [v for v in vectors if v.team is not None]
if not untagged or not tagged:
return []
# Normalise all vectors to unit scale (cost and util are different magnitudes)
scaler = StandardScaler()
all_X = np.array([[v.avg_cost, v.avg_cpu, v.avg_mem or 0] for v in vectors])
all_X_norm = scaler.fit_transform(all_X)
tagged_X = all_X_norm[:len(tagged)]
untagged_X = all_X_norm[len(tagged):]
# Cluster the full space — DBSCAN handles irregular shapes better than k-means
labels = DBSCAN(eps=eps, min_samples=min_samples).fit_predict(all_X_norm)
suggestions = []
for i, (resource, label) in enumerate(zip(untagged, labels[len(tagged):])):
if label == -1:
continue # DBSCAN noise point — no cluster membership
# Find which tagged resources share this cluster label
cluster_tagged = [
tagged[j] for j, l in enumerate(labels[:len(tagged)]) if l == label
]
if not cluster_tagged:
continue
# If cluster is dominated by one team, suggest that attribution
teams = [t.team for t in cluster_tagged]
top_team = max(set(teams), key=teams.count)
dominance = teams.count(top_team) / len(teams)
if dominance >= 0.7: # 70%+ of cluster is one team → suggest it
top_tagged = next(t for t in cluster_tagged if t.team == top_team)
suggestions.append({
'resource_id': resource.resource_id,
'suggested_dept': top_tagged.department,
'suggested_team': top_team,
'method': 'structural_cluster',
'cluster_label': int(label),
'dominance': dominance,
'cluster_size': len(cluster_tagged),
})
return suggestions Use both attribution methods and combine their outputs in the suggestion review UI. A resource suggested by both temporal correlation (Post 3) and structural clustering (this section) is a high-confidence candidate for the suggested team. A resource suggested by only one method deserves more scrutiny before confirmation. The source field in attribution_overrides can store 'temporal_correlation', 'structural_cluster', or 'both' to track provenance.
The CPU utilisation framework is grounded in a specific billing model: you provision capacity (an instance size), and efficiency is measured as the fraction of that capacity actually consumed. This maps naturally to EC2, VM-series, and most compute resources.
But a growing portion of cloud spend doesn’t follow this model at all. AI inference APIs charge per token, not per hour. Data transfer charges accumulate per gigabyte moved, not per instance running. Managed databases charge for operations as well as storage. SaaS APIs charge per API call or per seat. None of these have a “provisioned capacity” to compare consumption against.
The question for these service types is not “what percentage of provisioned capacity did you use?” It is “what did each unit of business output cost, and how does that compare to what it should cost?” That reframing requires a generalised unit cost model.
CPU efficiency tells you how well you used what you provisioned. Unit cost tells you how efficiently you’re converting cloud spend into business output.
The generalised model is built around a registry that maps resource types to their relevant cost dimension and the business event that represents one unit of output. The registry is configuration, not code — it can be extended to new service types without modifying the cost calculation engine.
CREATE TABLE unit_cost_registry (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
service_category varchar NOT NULL, -- 'AI and Machine Learning', 'Networking' etc
service_name varchar, -- 'Amazon Bedrock', 'Azure OpenAI' etc — NULL = all
cost_dimension varchar NOT NULL, -- 'tokens', 'gb_transferred', 'api_calls', 'operations'
output_unit varchar NOT NULL, -- 'request', 'document_processed', 'image_generated'
metric_source varchar NOT NULL, -- 'cloudwatch', 'billing_tags', 'app_telemetry'
metric_name varchar NOT NULL, -- metric key in resource_utilisation or app event store
notes text
);
-- Seed with known service types
INSERT INTO unit_cost_registry (service_category, service_name, cost_dimension, output_unit, metric_source, metric_name) VALUES
('AI and Machine Learning', 'Amazon Bedrock', 'tokens', 'request', 'billing_tags', 'input_tokens,output_tokens'),
('AI and Machine Learning', 'Azure OpenAI', 'tokens', 'request', 'billing_tags', 'input_tokens,output_tokens'),
('Networking', NULL, 'gb_transferred', 'gb', 'cloudwatch', 'NetworkOut'),
('Storage', 'Amazon S3', 'operations', '1000_requests', 'billing_tags', 'request_count'),
('Database', 'Amazon DynamoDB', 'operations', 'read_write_unit', 'cloudwatch', 'ConsumedWriteCapacityUnits'); AI inference is now a material line item in cloud bills for any organisation using LLM-based features. The cost structure is unusual: input tokens and output tokens are priced differently (output tokens cost 3–5× more than input tokens for most frontier models), and the model selection — which model you use for a given task — is often the single biggest lever on unit cost.
FOCUS 1.2 introduced x_ServiceUsageType and usage quantity columns that, for AI services, may expose token counts directly in the billing export. Check your provider’s FOCUS documentation — if token counts are present in the export, you can skip the CloudWatch metric lookup for AI services entirely. AWS Bedrock populates these columns for supported models.
from dataclasses import dataclass
from .db import fetch_model_usage_daily, fetch_model_pricing
@dataclass
class ModelEfficiencyReport:
model_id: str
service_name: str
department: str
team: str
period_days: int
total_requests: int
total_input_tokens: int
total_output_tokens: int
total_cost_usd: float
cost_per_request: float
cost_per_1k_tokens: float
input_output_ratio: float # avg output tokens per input token
cheaper_model: str | None # model suggestion if task quality allows
potential_saving_usd:float
def analyse_model_efficiency(
window_days: int = 30,
) -> list[ModelEfficiencyReport]:
usage = fetch_model_usage_daily(window_days=window_days)
pricing = fetch_model_pricing() # loaded from model_pricing table (refreshed monthly)
reports = []
for (model_id, dept, team), days in usage.items():
total_input = sum(d.input_tokens for d in days)
total_output = sum(d.output_tokens for d in days)
total_requests = sum(d.request_count for d in days)
total_cost = sum(d.cost_usd for d in days)
p = pricing[model_id]
io_ratio = total_output / max(total_input, 1)
# Find cheaper model in same capability tier
cheaper_model = None
potential_save = 0.0
for alt_id, alt_p in pricing.items():
if alt_id == model_id: continue
if alt_p.capability_tier != p.capability_tier: continue
alt_cost = (
(total_input / 1_000_000) * alt_p.input_price_per_million +
(total_output / 1_000_000) * alt_p.output_price_per_million
)
saving = total_cost - alt_cost
if saving > potential_save:
potential_save = saving
cheaper_model = alt_id
reports.append(ModelEfficiencyReport(
model_id = model_id,
service_name = p.service_name,
department = dept,
team = team,
period_days = window_days,
total_requests = total_requests,
total_input_tokens = total_input,
total_output_tokens = total_output,
total_cost_usd = total_cost,
cost_per_request = total_cost / max(total_requests, 1),
cost_per_1k_tokens = total_cost / max((total_input + total_output) / 1000, 1),
input_output_ratio = io_ratio,
cheaper_model = cheaper_model,
potential_saving_usd= potential_save,
))
return sorted(reports, key=lambda r: r.total_cost_usd, reverse=True) Network transfer is one of the most consistently misunderstood cost categories. The billing entry appears as a single line — “Data Transfer Out” — with a total cost and a total GB figure. But which services generated that transfer? Which teams are responsible? Without additional attribution, all network costs land in __untagged__.
FOCUS 1.3 introduced shared cost allocation columns that provide a standardised way to split shared costs across beneficiaries. For network costs specifically, this enables the cloud provider’s flow-level attribution data to be represented in the billing export where available. Where it isn’t available, the platform can approximate attribution using flow metrics from CloudWatch or Azure Network Watcher.
-- Store flow-level data transfer metrics per source resource
CREATE TABLE network_flow_daily (
resource_id varchar NOT NULL, -- source of outbound transfer
provider varchar NOT NULL,
destination_type varchar, -- 'internet', 'cross_region', 'cross_az', 'same_az'
destination_region varchar,
flow_date date NOT NULL,
gb_transferred numeric(12,4) NOT NULL,
estimated_cost numeric(10,4), -- derived from per-GB rate for this transfer type
UNIQUE(resource_id, provider, destination_type, destination_region, flow_date)
);
-- Attribution join: what fraction of total network cost does each resource represent?
CREATE VIEW network_cost_by_team AS
SELECT
f.department, f.team,
n.destination_type,
n.flow_date,
SUM(n.gb_transferred) AS total_gb,
SUM(n.estimated_cost) AS estimated_cost_usd,
COUNT(DISTINCT n.resource_id) AS source_resources
FROM network_flow_daily n
JOIN cloud_cost_facts f
ON f."ResourceId" = n.resource_id
AND f.charge_date = n.flow_date
GROUP BY 1,2,3,4; Cross-AZ data transfer is typically priced at $0.01/GB in both directions — seemingly small, but services making frequent cross-AZ calls at high throughput accumulate significant charges. The most common cause is a service configured to read from a replica in a different AZ than the application tier, or a microservices architecture with no affinity routing. Network flow attribution makes this visible; the fix is usually an architecture change.
S3, Azure Blob Storage, and GCS are priced across at least four independent dimensions simultaneously: capacity stored (GB/month), requests (GET, PUT, LIST operations), data retrieval (for infrequent access tiers), and lifecycle transitions. A resource showing low capacity utilisation may still be generating high costs from request volumes or retrieval fees.
| Dimension | Signal | What to look for |
|---|---|---|
| Capacity | GB stored × days | Objects older than retention requirement should be in Glacier or deleted |
| Request | GET/PUT/LIST operations | High LIST counts often indicate a misconfigured polling application |
| Retrieval | GB retrieved from IA/Glacier | Retrieval fees indicate objects in cold storage being accessed frequently — wrong storage class |
| Transfer | GB transferred out | Check for unintentional public exposure — public buckets serving large files bypass CDN and incur full egress rates |
With the unit cost registry defined, a single view can compute per-unit costs across all registered service types, joining billing data with the appropriate utilisation or usage metric for each service category.
-- Generalised unit cost aggregation across all registered service types
SELECT
f.department,
f.team,
f."ServiceCategory",
f."ServiceName",
r.cost_dimension,
r.output_unit,
f.charge_date,
SUM(f.effective_cost_usd) AS total_cost_usd,
SUM(u.average_value) AS total_units,
-- Cost per output unit
ROUND(
SUM(f.effective_cost_usd) /
NULLIF(SUM(u.average_value), 0),
6) AS cost_per_unit
FROM daily_cost_by_team f
JOIN unit_cost_registry r
ON r.service_category = f."ServiceCategory"
AND (r.service_name = f."ServiceName" OR r.service_name IS NULL)
LEFT JOIN resource_utilisation u
ON u.metric_name = r.metric_name
AND u.metric_date = f.charge_date
WHERE f.charge_date >= current_date - 30
GROUP BY 1,2,3,4,5,6,7
HAVING SUM(u.average_value) > 0
ORDER BY total_cost_usd DESC; The absolute cost per request for a given service type is interesting. The trend of cost per request over time is actionable. A rising unit cost trend — cost increasing faster than request volume — indicates a degrading efficiency pattern: model selection drift upward, caching disabled, prompt bloat, unnecessary retries, or architecture regressions. Configure anomaly detection on unit cost trends using the same rolling statistical model from Post 3, applied to cost_per_unit as the metric.
The generalised unit cost model extends the platform’s analytical reach from capacity-based compute — where CPU utilisation was the efficiency signal — to the full spectrum of cloud service pricing models. AI inference, network transfer, storage operations, and managed service consumption are all now visible through the same lens: cost per unit of business output.
The anomaly history from Post 3, now enriched with unit cost anomalies from this post, feeds directly into Post 5. There, deployment correlation joins these timestamps against git history and CI/CD events to identify the specific code change that caused each cost spike — and to propose a fix back to the codebase as a draft pull request.
Rightsizing recommendations, AI cost analysis, and network attribution are only as useful as the processes that act on them. If you’re working on building a FinOps practice and want to discuss governance, tooling, or team structure — get in touch.