Mussenden Temple, Castlerock — K. Mitch Hodge / Unsplash
Post 1 established that we can reliably answer “how much did each team spend?” For that answer to drive behaviour, it needs to be compared against a target. That is the budget — a pre-agreed upper bound on what a department or team should spend in a given period. The forecast engine projects current spend trends to the end of the period. The alert queue fires when the projection indicates the budget will be breached.
The goal is never to report overages after they happen. It is to give teams enough warning to act before the period closes.
Budgets are stored as period-bounded records with a configurable alert threshold and a cost basis field that determines whether the forecast runs against EffectiveCost or BilledCost.
CREATE TABLE budgets (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
department varchar NOT NULL,
team varchar, -- NULL means applies to whole department
period_start date NOT NULL,
period_end date NOT NULL,
budget_usd numeric(12,2) NOT NULL,
cost_basis varchar NOT NULL DEFAULT 'effective',
alert_threshold_pct numeric NOT NULL DEFAULT 80,
created_by varchar,
created_at timestamptz DEFAULT now(),
CONSTRAINT valid_period CHECK (period_end > period_start),
CONSTRAINT valid_basis CHECK (cost_basis IN ('effective', 'billed'))
);
-- Track alert history for deduplication
CREATE TABLE budget_alerts_sent (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
budget_id uuid REFERENCES budgets(id),
sent_at timestamptz DEFAULT now(),
severity varchar NOT NULL,
forecast_pct numeric NOT NULL,
forecast_total numeric NOT NULL
); Most teams should start with effective — it reflects true amortised consumption and is the right basis for engineering accountability. Set billed only when a budget must reconcile directly to a cloud invoice, typically for external reporting or contractual commitments.
The forecast answers one question: given spending so far in this budget period, where will we land by the period end if current trends continue? A weighted linear regression over a rolling window provides this projection. Recent days are weighted higher than older days, so the forecast responds to acceleration in spend without being dominated by a single spike.
from dataclasses import dataclass
from datetime import date
import numpy as np
@dataclass
class ForecastResult:
department: str
team: str
period_start: date
period_end: date
cost_basis: str
actual_to_date: float
forecast_total: float
daily_rate: float # projected spend per remaining day
budget_usd: float
overage_forecast: float # negative = under budget
confidence: str # 'high' | 'medium' | 'low'
days_remaining: int
def forecast_period(
daily_costs: list[tuple[date, float]],
period_start: date,
period_end: date,
budget_usd: float,
cost_basis: str = 'effective',
window_days: int = 14,
dept: str = '',
team: str = '',
) -> ForecastResult:
today = date.today()
period_costs = [(d, c) for d, c in daily_costs if period_start <= d <= today]
actual_to_date = sum(c for _, c in period_costs)
days_elapsed = max((today - period_start).days + 1, 1)
days_remaining = max((period_end - today).days, 0)
window = period_costs[-window_days:]
if len(window) < 3:
# Too early in period — use simple daily average
daily_rate = actual_to_date / days_elapsed
confidence = 'low'
else:
xs = np.arange(len(window), dtype=float)
ys = np.array([c for _, c in window])
weights = np.linspace(0.4, 1.0, len(xs))
coeffs = np.polyfit(xs, ys, deg=1, w=weights)
daily_rate = max(float(np.poly1d(coeffs)(xs[-1])), 0.0)
confidence = 'high' if len(window) >= 10 else 'medium'
forecast_total = actual_to_date + daily_rate * days_remaining
return ForecastResult(
department=dept, team=team,
period_start=period_start, period_end=period_end,
cost_basis=cost_basis,
actual_to_date=actual_to_date,
forecast_total=forecast_total,
daily_rate=daily_rate,
budget_usd=budget_usd,
overage_forecast=forecast_total - budget_usd,
confidence=confidence,
days_remaining=days_remaining,
) The weighted linear model handles steady-state workloads well. For teams with strong weekly seasonality — batch jobs that only run on weekends — it may over- or under-project. Post 3’s anomaly detection layer will flag deviations from expected patterns. For orgs with mature cost history, Facebook’s Prophet library handles trend plus seasonality automatically and is worth adopting when the simpler model proves insufficient.
The evaluator runs nightly after the daily rollup refreshes. For every active budget it fetches historical costs, runs the forecast, and publishes an alert when the projected total breaches the configured threshold.
from datetime import date
from .forecasting import forecast_period
from .queue import publish_budget_alert
from .db import fetch_active_budgets, fetch_daily_costs, should_suppress
def evaluate_all_budgets() -> None:
today = date.today()
budgets = fetch_active_budgets(as_of=today)
for budget in budgets:
costs = fetch_daily_costs(
department = budget.department,
team = budget.team,
from_date = budget.period_start,
to_date = today,
cost_basis = budget.cost_basis, # effective_cost_usd or billed_cost_usd
)
result = forecast_period(
daily_costs = costs,
period_start = budget.period_start,
period_end = budget.period_end,
budget_usd = budget.budget_usd,
cost_basis = budget.cost_basis,
dept = budget.department,
team = budget.team or '',
)
fcst_pct = (result.forecast_total / budget.budget_usd) * 100
if fcst_pct < budget.alert_threshold_pct:
continue # within budget — no alert
severity = 'CRITICAL' if fcst_pct >= 100 else 'WARNING'
# Suppress if we already sent this severity and forecast hasn't worsened ≥5%
if should_suppress(budget.id, severity, fcst_pct):
continue
publish_budget_alert(budget=budget, result=result,
severity=severity, fcst_pct=fcst_pct) Every alert is published to a message queue — AWS SNS, GCP Pub/Sub, or Azure Service Bus depending on your primary cloud. The queue decouples the evaluation engine from delivery destinations. Downstream consumers handle routing to Slack, PagerDuty, email, or any webhook endpoint.
import boto3, json
from datetime import datetime, timezone
from .db import record_alert_sent, fetch_top_services
sns = boto3.client('sns')
TOPIC = "arn:aws:sns:us-east-1:123456789:finops-budget-alerts"
def publish_budget_alert(budget, result, severity, fcst_pct) -> None:
top_services = fetch_top_services(
department = budget.department,
team = budget.team,
period_start = budget.period_start,
limit = 5,
cost_basis = budget.cost_basis,
)
payload = {
"event_type": "BUDGET_FORECAST_OVERAGE",
"severity": severity,
"timestamp": datetime.now(timezone.utc).isoformat(),
"department": budget.department,
"team": budget.team,
"cost_basis": budget.cost_basis,
"period_start": budget.period_start.isoformat(),
"period_end": budget.period_end.isoformat(),
"budget_usd": budget.budget_usd,
"actual_to_date": result.actual_to_date,
"forecast_total": result.forecast_total,
"overage_usd": result.overage_forecast,
"forecast_pct": fcst_pct,
"daily_rate_usd": result.daily_rate,
"days_remaining": result.days_remaining,
"confidence": result.confidence,
"top_services": [
{"ServiceName": s.name, "ServiceCategory": s.category,
"cost_usd": s.cost}
for s in top_services
],
}
sns.publish(
TopicArn = TOPIC,
Message = json.dumps(payload),
Subject = (
f"[{severity}] {budget.department}/{budget.team or 'all'} "
f"forecast {fcst_pct:.0f}% of budget"
),
MessageAttributes={
'severity': {'DataType': 'String', 'StringValue': severity},
'department': {'DataType': 'String', 'StringValue': budget.department},
},
)
record_alert_sent(budget.id, severity, fcst_pct, result.forecast_total) The nightly evaluation sequence is:
MessageAttributes enable per-department topic filter subscriptionsAn alert that says “you are 85% through your budget” produces no action. An alert that says “you are 85% through your budget, forecast to land at 112%, current burn rate $340/day, top driver is Amazon EC2 Compute at $210/day — here are the 3 largest instances” gives the receiving team something concrete to investigate. The top_services array in the payload is what enables this. Post 3 will enrich this further with specific anomalous resources.
With the forecasting layer complete, the platform now actively watches budgets and notifies teams before periods close over budget. The combination of the FOCUS-aligned rollup from Post 1 and the forecast engine here means every alert carries both the historical trend and the projected outcome — not just a point-in-time snapshot.
Post 3 comes back upstream to the attribution layer and addresses the two most common sources of noise in any tag-driven system: singleton groupings that don’t represent real teams, and untagged resources whose ownership can be inferred from cost pattern similarity. These improvements also directly feed the anomaly detection alert payload — when an anomalous resource is untagged, the correlation engine runs immediately to suggest ownership.
Budget thresholds, deduplication windows, and routing logic are all organisation-specific. If you’re building this and want a sounding board — get in touch.