Changelog
What we ship, every week.
BlackLake is AI Control Infrastructure and Analytics. Here’s what’s landed recently — capture paths, policies, approvals, cost control, receipts, baselines, anomalies, exports, and the occasional UX correction. Newest first.
Heads up — npm package rename.
The five old artifacts (@blacklake-systems/surface-cli, @blacklake-systems/surface-sdk, @blacklake-systems/depth-cli, @blacklake-systems/depth-sdk, blx) have collapsed into a single blacklake npm package. Old entries below still reference the old names — that’s history, not advice. See the migration doc for search-and-replace examples.
- Feature
Stripe billing, plan limits, and value-moment nudges
Self-serve checkout, customer portal, and signed webhook delivery wire BlackLake into Stripe end-to-end. Free tier: 10,000 actions / month with a hard 402 PLAN_LIMIT_REACHED before the call burns quota; Team: 1,000,000 fair-use soft warn; Enterprise: contractual. /v1/billing exposes plan, usage block (used/limit/remaining/period_key), and trial_end so the console renders a coloured quota meter, an 80% nudge, a 100% notice, and the trial countdown without computing percentages against Infinity. Tier-aware audit retention (30d / 1y / 7y) shipped alongside.
- Improvement
Adoption work — onboarding, mobile approvals, lifecycle emails, manifests
Mobile-first approval card at /m/approvals/:id (PWA-friendly, 56px tap targets, decision_channel=mobile). Lifecycle email cron (24h first MCP, 72h first govern, 7d first approval, 100-action milestone) — each (org, kind) fires at most once. Bulk JSON-manifest apply at POST /v1/manifest/apply covers agents / tools / policies / bindings / webhooks with dry-run + idempotent re-apply. Demo→prod promotion at POST /v1/demo/promote (in-place) and /v1/demo/promote-to-workspace (cross-workspace, admin-on-target). Trial countdown phase-gated copy (>7d quiet → 3-7d value framing → ≤2d urgent).
- Improvement
Foundation — typed errors, content-negotiated envelopes, contract tests
SDK exposes BusinessRuleError (422), ConflictError (409), PaymentRequiredError (402) subclasses; HTTP status semantics audited across 9 routes (APPROVAL_EXPIRED, LAST_ADMIN, NO_FAILED_DELIVERIES, etc — all 400→422). Content-negotiated singular response envelope: SDK sends Accept-Envelope: v2 and peels { data: T } automatically; pre-FND-17 servers and not-yet-migrated routes still work. SDK ↔ API contract tests grew to 21 cases covering Agents, Tools, Policies, Approvals (request → approve → reject), Webhooks, Cost, Audit, MCP, Budgets. OpenAPI now exports 12 entity schemas as named components (Agent, Tool, Policy, Evaluation, Approval, Webhook, Budget, Organisation, etc) so console types are real shapes instead of {[key: string]: unknown}.
- Fix
Production divergences fixed — 6 routes
/v1/audit/export 500 → root cause: action_results.agent_id is a phantom column; demo-isolation guard now walks evaluation_id → policy_evaluations.agent_id. POST /v1/tools 500 on duplicate name → returns 409 TOOL_NAME_CONFLICT. webhook.test deliveries now persist signed_timestamp_ms (delivery was always signed in transit; only the audit row was missing the field). FND-19 idempotency middleware mounted globally on /v1/* — every mutating endpoint replays cached responses on Idempotency-Key match, returns 422 on key reuse with different body. denial_reason now persists on the evaluation row (was returned in govern() response but lost on the next list call). Surface-sdk republished 0.3.0 → 0.3.1 with corrected enums (CapturePath, ActorSource, BudgetScope, WebhookEvent) and missing methods (insights.coverageTrend / explain, cost.orphans, system.me / quota, webhooks.health / deadLetter, mcp.upstreams.*).
- Feature
Unified npm package — `blacklake`
Five npm artifacts collapse into one: @blacklake-systems/surface-cli, @blacklake-systems/surface-sdk, @blacklake-systems/depth-cli, @blacklake-systems/depth-sdk, and the standalone blx shell wrapper now ship as `blacklake` on npm. One install gives you the SDK (`import { govern, workflow, step } from 'blacklake'`), the CLI (`npx blacklake init / login / serve / run / policy simulate`), and the blx alias (`blx git push`). Old artifacts remain published for two minor versions as deprecated thin re-exports — see /docs/migration-from-old-packages for sed-style search/replace examples.
- Improvement
Category claim — AI Control Infrastructure and Analytics
BlackLake is now positioned as AI Control Infrastructure and Analytics — the control plane that decides what runs (allow, deny, approve, cap pre-spend) and the ledger that proves what did (cost attribution, baselines, anomalies, signed receipts, audit-ready exports). Engineering is where consequential AI work is already happening, but the platform message scales to security, finance, compliance, and ops as the workforce shifts. The April 30 'Tighter positioning' move (narrowing the headline to engineers) is reversed by this entry; that note remains in the changelog as history. Hero, footer, Product page, About, docs, and meta descriptions all updated. /why-ai-control lands the canonical definition.
- Feature
Cost capture + governance — Anthropic, OpenAI, Bedrock, Vertex, Foundry, Gemini, Ollama
Every observed LLM call now writes to a versioned cost_records table with input/output/cache/thinking decomposition and a pricing-snapshot stamp. The /proxy/* paths capture automatically; SDK callers attribute via POST /v1/cost/record (Bedrock/Vertex/Foundry/Gemini under your own auth). Cost rolls up onto the evaluation's cost_summary; the receipt's v2 decision_token cryptographically binds (evaluation_id, decision, cost_summary). Pricing config is JSON, versioned, per-provider — historical totals stay stable when prices change in future releases.
- Feature
Budgets — soft alerts + hard deny at govern() time
First-class budget primitive scoped to workspace, AI Actor, tool, or user with per-task / day / week / month windows and IANA timezones. Hard limits deny at govern() before the spend; soft limits fire webhooks at 50% / 80% / 100%. Reuses the existing webhook channel — Slack, email, PagerDuty wire up unchanged. Budget alerts dedupe per (budget_id, period, threshold) so a busy hour doesn't fire the same notification thirty times.
- Feature
Cost-aware policy DSL (mode: enforce | monitor)
Policies can now carry cost_conditions: { all: [{ signal, op, value }] }. Signals: agent.spend_today_usd, agent.spend_per_task_usd, agent.cumulative_spend_session_usd, tool.estimated_cost_usd, tool.input_tokens, tool.model, workspace.spend_today_usd, user.spend_today_usd. Operators: gt/gte/lt/lte/eq/ne/in. Cost-aware policies default to mode='monitor' so operators can observe before they restrict. Selector-only policies behave exactly as before.
- Feature
Pre-call cost estimation
POST /v1/cost/estimate computes a cost breakdown from { provider, model, input_tokens, output_ceiling_tokens } against the current pricing snapshot. Pass it to /v1/govern as `estimate: {...}` and cost-aware policies + budgets evaluate against the projected spend before the LLM call leaves the network. Reconciles against the actual cost the proxy or SDK records when the call returns.
- Feature
Usage page rebuild — Anomalies, Observations, Drift, Health snapshot
/usage rebuilt with totals, daily sparkline, and breakdowns by provider, model, capture path, AI Actor, tool, user, environment. New /budgets, /anomalies, /observations pages join the Insights nav. Anomalies surface receipts that materially deviate from baselines (high_retry, token_spike, cache_miss_heavy, long_tail, extended_thinking on simple input, idle_context). Observations is the workspace-facts feed; Health snapshot is the weekly digest; Drift compares this week to 30–37d ago with hypothesis prompts.
- Feature
Per-(AI Actor, tool) baselines + receipt-detail context
Rolling baselines per (AI Actor, tool) with median/p95/p99 input + output tokens, median/p95 cost, retry_rate, error_rate, and per-model breakdown. Computed via POST /v1/insights/baselines/recompute (default 30d window). The evaluation detail page shows how each call compares to its baseline — 'Input tokens: this call is 312% of the median (1820)'. The fact is surfaced; the conclusion is the operator's.
- Feature
Model-choice + counterfactual model substitution
GET /v1/insights/model-choice returns per-(AI Actor, tool) actuals across models actually used — counts, costs, approval rate. GET /v1/insights/model-substitution?from=...&to=... runs a counterfactual: what would costs be if from_model calls had been routed to to_model? Equivalence (approval-rate match, downstream tool usage) is left to the user — we surface the comparison data, not a recommendation.
- Feature
Signed cost exports — CSV + NDJSON
GET /v1/cost/export?format=csv|ndjson&period=30d streams cost-attributed evaluations for finance, procurement, and audit. Same authentication as the API; the workspace HMAC chain on receipts means rows are independently verifiable.
- Feature
Policy simulator — estimated dollar savings
/v1/policies/simulate now also reports cost_impact: { flips_to_blocked, evaluations_with_cost_matched, estimated_savings_usd }. For permissive→deny transitions across the window, sums the cost_records attached to the affected evaluations. Conservative when some flipped evaluations have no attached cost yet — `note` calls that out.
- Feature
/compare/cost-tracking — observability vs governance
New comparison page distinguishing BlackLake's cost-governance from observability-only cost tracking (Datadog, LangSmith, Helicone). Lead: 'Most tools tell you what AI cost. BlackLake lets you control what it can spend.' Concrete feature table — per-call attribution, signed receipts, cost-aware policies, budgets that deny before the spend, counterfactual substitution, signed exports.
- Improvement
/verify page demonstrates cost binding
Verify-receipt example response now includes the v2 token, receipt_version, and a full cost_summary. Hero copy upgraded: 'AI can hallucinate compliance, and it can hallucinate cost. Receipts prove both.' Hits the same trust point in marketing as the actual API response.
- Improvement
Three-pillar copy refresh
Capture / Govern / Prove pillars on the homepage now reflect the cost layer: per-call cost attribution as a Capture column; budgets + cost-aware policies as a Govern column; v2 signed receipts that bind cost as a Prove column. Same architectural shape, deeper depth on each pillar.
- Feature
Receipt schema versioning + GET /v1/enums + structured validation hints
policy_evaluations.receipt_version explicitly tracks v1 (no cost) vs v2 (cost-bound) receipts so future revisions don't have to migrate unversioned rows. The audit trail tells you which version was current when each receipt was written. (Enum catalog + validation details landed earlier this session — surfaced here for the cost-governance launch.)
- Improvement
DX hygiene — Dashboard count labels, GitHub events, webhook samples
Dashboard AI Actor stat is now labelled 'Active AI Actors' with a sub-label clarifying that archived rows aren't counted (Coverage uses the same exclusion now). Webhooks page has an inline 'What does a delivery look like?' sample with the canonical envelope shape. GitHub integrations page lists the events that flow in (push, pull_request, issues, workflow_run, etc.). /verify-receipt → /verify already redirects via next.config; documented for completeness.
- Feature
Guided OAuth setup with provider profiles
Set up OAuth on an MCP upstream is now a single guided panel. GET /v1/mcp/upstreams/oauth/providers returns reusable profiles (Google Cloud, Microsoft Graph, AWS Cognito, Slack, GitHub, Atlassian, Linear, Cloudflare, Sentry, Notion, Custom) with prefilled endpoints, default scopes, developer-console URLs, and step-by-step setup hints. The console picks a profile, runs the new POST /v1/mcp/upstreams/oauth/inspect to classify the upstream (auto_dcr_available / manual_required / endpoints_missing), shows a structured findings checklist, and either runs RFC 7591 dynamic registration or surfaces a manual form pre-populated from the profile.
- Feature
Upstream readiness — one field instead of three
Each upstream row carries a derived readiness: { state, label, hint } where state is one of reachable / auth_configuration_required / user_connection_required / ready / error / unknown. Console badges and SDK consumers should prefer this over piecing auth_type, oauth_state, and user_connected together by hand. Lower-level fields stay for debugging.
- Feature
POST /v1/govern/simulate
Distinct from /v1/policies/simulate (replay draft policies against historical evaluations). The new endpoint asks 'what would happen now?' for a single hypothetical agent + tool + action. Returns the would-be decision, whether the agent and tool resolve, whether a binding exists, the matched policy, and concrete next_steps to make the answer change. Does not record an evaluation, create an approval, or fire webhooks. Exposed in the console as the new /playground page.
- Feature
Approval URLs everywhere
/v1/govern responses now include evaluation_url and (when an approval is required) approval_url — stable console deep links so SDK callers and chat clients don't piece them together. The MCP cloud proxy returns the same approval_url under _meta.blacklake on timeout, the approval.created/approved/rejected webhook payloads carry both URLs, and the approval detail page has a Copy link button. The approval row also shows 'Waiting for first approver' / 'Waiting for second approver' in two-person flows so reviewers see at a glance what's actually pending.
- Feature
Smoke test workflow
POST /v1/test/run creates an isolated bl-smoke-agent / bl-smoke-tool / bl-smoke-policy chain, runs govern() end-to-end, and returns the decision + receipt. Generated resources are tagged owner='__bl_smoke_test__' and surfaced as 'smoke test' badges on the agents page. POST /v1/test/archive-test-resources cleans them up in one call. The console exposes both as the Smoke test tab on /playground.
- Improvement
Structured validation errors with did-you-mean hints
Every VALIDATION_ERROR now carries a details payload: { field, expected, received, hint, issues[] }. Bad enum values get a Levenshtein-best-match suggestion ('Did you mean require_approval?'); unknown keys are listed by name; type mismatches name the expected and received types. SDK consumers can read details.expected directly instead of regexing the message string.
- Improvement
GET /v1/enums + canonical enum table in the docs
Live catalog of every enumerable string the API expects (agent.environment, agent.risk_classification, policy.outcome, approval.status, mcp_upstream.readiness, evaluation.result_status, etc.) — at the endpoint and as a table in the API reference. SDK generators and console form components can pull the canonical list from one place instead of duplicating literals.
- Improvement
MCP timeout guidance in the API reference
The MCP data-path proxy holds approval-gated tool calls for up to 110 s under Cloudflare's edge cut. The docs now spell out recommended client timeouts (≥120 s; 180 s is safer for human-in-the-loop), what _meta.blacklake.status='timeout' means, and how to surface the approval_url to the user. Defaults of 60 s in many MCP clients were too short.
- Feature
Manual OAuth client registration for non-RFC-7591 upstreams
POST /v1/mcp/upstreams/:id/oauth/configure now accepts a caller-provided client_id (and optional client_secret + endpoint overrides). Unlocks Google Cloud, Microsoft Graph, AWS, Slack, and any other OAuth provider that doesn't expose dynamic client registration. Auto mode is unchanged for upstreams that do support 7591 (Atlassian, Linear, Cloudflare, Sentry, Notion). Console exposes both modes via dedicated buttons on the MCP upstream row.
- Feature
POST /v1/organisation/reset
Wipe operational data (agents, tools, bindings, policies, evaluations, approvals, MCP upstreams, webhooks, audit ingest, cost logs) without deleting the workspace. Preserves users, API keys, memberships, sessions, push subscriptions, and the GitHub installation handle — existing keys keep working. Admin-only, name-confirmation safeguard, 3/hour rate limit per IP and per key. Built so testers can rerun cloud onboarding from a clean slate.
- Fix
Push notifications now actually fire
web-push is CJS; under our ESM compile target await import('web-push') returned a Module Namespace whose exports nest under .default — so webPushModule.setVapidDetails was undefined and every approval push silently failed in production. Fixed with a default-unwrap. Added an un-mocked smoke test so the regression can't return.
- Feature
Resend the email-verification link
POST /v1/auth/resend-verification re-issues a fresh token (anti-enumeration: always 200, rate-limited 5/hour/IP). The console shows a banner on every authenticated page until the user verifies — a one-click button replaces the lost signup email. Unverified users get no approval pushes or emails, so this was a real recovery gap.
- Fix
MCP cloud proxy: SSE streaming + initialize forwarded + auto-register
Three layered failures unblocked: forwardJsonRpc now streams SSE incrementally and tolerates malformed data: frames (the silent ~150ms Cloud Run 502); initialize is forwarded to the upstream and the returned Mcp-Session-Id is propagated back to the client (was synthesised locally); first tools/call against a never-listed upstream auto-runs tools/list, registers agent + tools + bindings + a default policy, then retries govern.
- Improvement
Mobile responsive pass + ID hover-to-copy
Dashboard 5-col stat grid collapses to 2-up on phones, recent-evaluations table scrolls horizontally with a 640px floor, top bar hides the long status text on narrow viewports, page headers stack the action button below the heading, detail-page metadata rows go flex-col on small screens, agent_/tool_/eval_ IDs are hover-to-copy via title attributes across Dashboard, Approvals, and Evaluations.
- Fix
Coverage now matches the Dashboard
/v1/insights/coverage was counting soft-deleted AI Actors and tools while /v1/agents and /v1/tools hide them by default — so an operator saw 10 AI Actors on the Dashboard and 16 on Coverage for the same workspace. Aligned both: archived records are excluded from totals, active_recent, stale, never_seen, and the by-source bucket. New test guards the contract.
- Improvement
Channel-aware decision_reason on magic-link approvals
The hardcoded 'Approved via email link' text was wrong for half the deliveries — push notifications open the same URL. POST /v1/decide now accepts an optional channel parameter; email links pass channel=email, push deep-links pass channel=push, and the resulting decision_reason reads honestly: 'Approved via push notification' / 'via email link' / 'via console' / 'via API'. Free-text reason still wins when supplied.
- Improvement
/v1/version returns the build SHA
Used to return 'unknown' because the deploy script wasn't injecting a build identifier. Now resolves git rev-parse --short HEAD (suffixed -dirty when the working tree has uncommitted changes) and passes BUILD_SHA as a Cloud Run env var. Useful for support and changelog correlation.
- Improvement
Pricing tier maturity labels
Reframed Production tier into 'Available today' (audit-log export, org-scoped keys, capture-path attribution), 'Built with you' (SSO/SCIM, dedicated capacity, named support, SLA, self-hosted), and 'Roadmap' (push-based SIEM streaming, data residency). Stops the marketing surface from claiming features that don't ship yet — same trust failure we have a no-tolerance policy on.
- Improvement
Strict body schemas on /v1/mcp/upstreams and /v1/decide
Unknown fields used to be silently dropped — pass transport: 'http' or auth_mode: 'static_headers' (guessing from another product's API) and you'd get a 201 Created with the fields ignored. Now both schemas .strict() and return 400 VALIDATION_ERROR with the offending key called out.
- Improvement
Tighter positioning
Narrowed the headline to engineers working with AI — the people actually shipping production code with Claude, Cursor, and AI Actors in the loop. Refreshed the homepage hero, capabilities, and signup form to match.
- Feature
Cloud audit forwarder for GCP
A Cloud Function ships GCP audit-log events into BlackLake's external_events table. Reconciliation matches them against the evaluation log; the unmatched set surfaces production mutations that bypassed every governed path. Verified end-to-end on the function's own deployment.
- Feature
Two-person approval and break-glass override
High-risk policies can now require two approvers. Break-glass overrides for incidents are recorded with the override reason, the operator, and a separate audit row — visible in the Approvals timeline.
- Feature
Approver-role RBAC
Policies can specify approver_roles[]; only org members holding a matching role can approve. Roles are managed on the Members page with a single PATCH call and reflect immediately in the approval flow.
- Feature
Coverage and risk dashboards
/coverage shows which actors are governed and which capture path each one enters from. /risk surfaces the top denied actors, high-risk tools, approval rejection rate, and recent denied actions — the page an operator opens before standup.
- Feature
Policy simulation
Replay weeks of historical evaluations against a draft policy. Decision counts shift in front of you and sample changed evaluations are listed for spot-check. Ship gates with confidence, not guesswork.
- Feature
Audit export as NDJSON
Export evaluations, approvals, and result receipts as newline-delimited JSON. Stream into BigQuery, an SIEM, or a customer-assurance evidence pack — the ledger is portable.
- Feature
GitHub App installation flow
Install the BlackLake GitHub App on an org or repo and webhooks land on POST /integrations/github/webhook. HMAC-verified, organisation-scoped, ready for downstream policy work.
- Feature
blx shell wrapper
blx git push, blx terraform apply, blx gcloud run deploy. Classifies the command, governs it, runs it, records the exit code. The same audit ledger as MCP and SDK calls.
- Feature
surface-cli local launcher
npx @blacklake-systems/surface-cli starts the API, dashboard, and SQLite database with one command. No Docker, no external dependencies — the same product as cloud, on your machine.
Want it sooner?
Tell us what's missing or what's in your way. We read every reply.