Skip to content
BLACKLAKE
Concepts▾ docs nav

Core Concepts

This document explains how BlackLake works and what each part of the system does.


One-line glossary#

  • BlackLake is AI Control Infrastructure and Analytics — the system of record for every consequential AI action a company takes. It starts where AI is already doing operational work: Claude, Cursor, MCP clients, agents you've written, shell sessions, cloud automation, durable workflows, and the CI that ships their work.
  • Capture / Govern / Cost / Prove — the four verbs BlackLake resolves to. Agents, tools, policies, approvals, budgets, evaluations, receipts, the MCP proxy, the GitHub Action, the blx shell wrapper, SDK calls, and cloud audit ingest all flow through these.
  • Durable workflows — long-running AI work as TypeScript checkpointed steps under the same control layer. When workflow steps need checkpointing, resume, approval, cost visibility, and receipts. Docs at Durable workflows — Getting started.
  • Agent — a named AI actor: a Claude Code or Codex session, a Cursor agent-mode run, an MCP server you've connected, a backend worker calling LLMs, a Depth workflow, a governed CI step, or anything else making tool calls on a user's behalf. Internally we call this an agent (the schema noun); in product copy you'll see AI actor — same record, different audience. Every govern() call is attributed to one.
  • Tool — a named capability the agent can invoke (a function, an API, a file write).
  • Binding — an explicit allowlist entry: this agent is allowed to use this tool. Without a binding, every govern() call for that pair returns deny with reason "Tool is not bound to agent". Bindings exist so that accidentally granting a tool to the wrong agent takes a deliberate step, not an implicit one.
  • Policy — a rule with an agent selector, a tool selector, a priority, and an outcome. The first policy whose selectors match wins.
  • Evaluation — the immutable record of a single govern() call and its outcome. The audit trail.
  • Approval — a pending human decision created when a policy with outcome approval_required matches.
  • Decision — the result of govern(): one of allow, deny, approval_required, or default_deny. See the four-decisions section below.

Session-actor convention#

A single agent can stand in for many concrete sessions: every Claude Code window, every CI run, every backend worker shares the same agent record. To keep the audit trail useful when several humans share a workspace, callers can pass a structured session block on context whenever they call bl.govern():

await bl.govern({
  agent: 'james-codex-local',           // suggested naming pattern: <user>-<tool>-<env>
  tool: 'gcloud.run.deploy',
  action: { ... },
  context: {
    session: {
      machine: 'macbook-pro-2',
      user: 'james',
      tool_client: 'codex',
      tool_client_version: '0.7.2',
      repo: 'github.com/blacklake-systems/control-plane',
      branch: 'main',
      cwd: '/Users/james/dev/control-plane',
      tags: ['oncall'],
      // CI runners — auto-name actors as `ci-<workflow>-<job>`:
      ci: { run_url: '...', workflow: 'release', job: 'deploy-prod' },
    },
  },
});

Surface stores the block on policy_evaluations.request_context and renders it as a structured panel on the evaluation detail page and on the mobile approval (/decide) page — so an approver away from their terminal sees james-codex-local on macbook-pro-2/main instead of needing to derive context from a tool name.

Naming convention<user>-<tool>-<env> for human sessions (james-codex-local, maya-claude-code-prod) and ci-<workflow>-<job> for CI runners (ci-release-deploy-prod). Every field on session is optional; pass what you have.


How BlackLake Works#

BlackLake is AI Control Infrastructure and Analytics. It sits between the AI tools doing real work — Claude, Cursor, MCP clients, CI bots, shell commands, backend agents, cloud automation, and durable workflows — and the systems they touch. Every time one of those tools wants to perform a consequential action, it asks BlackLake first. BlackLake checks the request against your policies and budgets, returns a decision (allow, deny, or approval_required), and records an immutable evaluation with a signed receipt. The agent or caller — not BlackLake — is responsible for acting on the decision.

Cloud and local are deployment modes, not separate products. Cloud gives teams a shared console, shared policies, approvals, budgets, audit exports, and production governance at console.blacklake.systems. Local gives individuals a fast private first run with SQLite through npx blacklake serve. The control model is the same: policy, approval, cost, receipt. The MCP proxy works in both modes: point your AI tool at https://api.blacklake.systems/mcp/u/<upstream-name> (cloud) or http://localhost:3100/mcp/u/<upstream-name> (local), authenticate with a user-scoped API key, and every tool call is evaluated, recorded as a signed receipt, and forwarded to the upstream with the right outbound auth — static credentials for legacy MCPs, or per-user OAuth tokens for OAuth-protected ones (Atlassian, Linear, Sentry, Notion, Slack).


Local Mode#

Running npx blacklake serve starts BlackLake locally. On first run it creates ~/.blacklake/ and initialises a SQLite database there. Two servers start:

  • API at http://localhost:3100 — handles governance, agents, tools, policies, approvals, and the MCP proxy
  • Console at http://localhost:3200 — the dashboard UI

When running locally, everything is stored locally and no data leaves your machine. The local CLI is fully functional on its own; the cloud deployment uses the same control model but runs at console.blacklake.systems with persistence, multi-machine access, and team features.

The local database holds the full evaluation history. It persists between restarts. When you want that data available across machines or shared with a team, see Cloud Mode.


MCP Proxy#

MCP (Model Context Protocol) is the open standard that AI tools use to connect to external services like file systems, APIs, and databases. The MCP proxy is the fastest way to add oversight to your existing AI setup. It works with any MCP-compatible client — Cursor, Windsurf, Claude Code, Claude Desktop, and others.

The proxy runs in both modes:

  • Cloud: point your MCP client at https://api.blacklake.systems/mcp/u/<upstream-name> with a user-scoped API key (bl_usr_…). The cloud-side proxy looks up the upstream's outbound auth, runs governance, and forwards the JSON-RPC call.
  • Local (npx blacklake serve): the same path is exposed at http://localhost:3100/mcp/u/<upstream-name> with the local org's auth. Configure each upstream once in the console (or ~/.blacklake/mcp-config.json) and the proxy serves them.

The first time a tool is called through a given upstream, BlackLake auto-registers the upstream as an agent (mcp:<name>) and each capability as a tool. They appear on the dashboard immediately, ready to be referenced in policies.

Your MCP client does not need to know anything changed — it still speaks standard MCP. Only the URL changes.

Upstream auth modes#

Each upstream has an auth_type that drives how BlackLake authenticates the outbound call:

auth_typeOutbound authSet up by
static_headers (default)Decrypt and attach the headers configured at registration. Used for legacy MCPs that take an API key in Authorization or a custom header.POST /v1/mcp/upstreams with { headers: { Authorization: 'Bearer …' } }.
oauth2Look up the calling user's stored access token, refresh if expired, attach Authorization: Bearer <token>. Used for OAuth-protected MCPs (Atlassian, Linear, Sentry, Notion, Slack).Run POST /v1/mcp/upstreams/:id/oauth/configure once per upstream (BlackLake does RFC 8414 / RFC 7591 metadata discovery + dynamic client registration). Then each user clicks Connect in the console — the authorization-code + PKCE flow runs against the upstream and tokens are stored encrypted, scoped to that user.

The console upstreams page shows a status badge per row — Static credentials / OAuth pending / OAuth error / Auth required / Connected — so it's obvious when a user needs to reconnect.

The MCP Servers page at /mcp shows every proxied upstream: its status, the number of tools discovered, the last-seen timestamp, and any startup error.


Agents#

An agent represents an AI agent or automated system in your setup. BlackLake tracks governance decisions per agent.

Agents are created in two ways:

  • SDK-registered — your code calls bl.agents.create() or POST /v1/agents to register the agent explicitly, setting its name, environment, and risk classification.
  • MCP-auto-registered — when a new MCP server connects through the proxy and makes its first tool call, BlackLake creates an agent record automatically using the server name.

Auto-registered agents have environment: development and risk_classification: low by default. You can update these in the console or via the API.

FieldTypeDescription
namestringHuman-readable identifier. Used in bl.govern() calls and policy selectors.
environmentdevelopment | staging | productionDeployment environment. Policies can match on this.
risk_classificationlow | medium | high | criticalRisk level. Policies can match on this.
statusactive | suspended | disabledSuspended and disabled agents are denied immediately, before any policy is evaluated.
approval_modeauto_approve | require_approval | blockMetadata only. Stored with the agent for use by external workflows. The BlackLake policy engine does not read this field — setting it to require_approval does not cause approvals; write an approval_required policy for that.
ownerstring | nullFree-text identifier for the human or team responsible (email, team name, ticket). Used for audit attribution.
sourcemanual | mcp | sdk | ciWhere the agent was first registered from. Set automatically by the registration path.
last_seen_attimestamp | nullMost recent govern() call referencing this agent. Drives the coverage dashboard's stale-actor detection.

Tools#

A tool represents a capability an agent can invoke — a function, an API call, a file write, anything with a side effect.

Tools are created in two ways:

  • SDK-registered — your code calls bl.tools.create() or POST /v1/tools.
  • MCP-discovered — when the MCP proxy intercepts a call to a tool it has not seen before, it creates a tool record automatically.
FieldTypeDescription
namestringHuman-readable identifier. Used in bl.govern() calls and policy selectors.
risk_classificationlow | medium | high | criticalRisk level of the capability. Policies can match on this.
ownerstring | nullFree-text identifier for the human or team responsible. Used for audit attribution.
sourcemanual | mcp | sdk | ciWhere the tool was first registered from.
last_seen_attimestamp | nullMost recent govern() call referencing this tool.

For SDK-registered agents, a binding must exist between the agent and the tool before governance will proceed. The proxy creates bindings automatically as it discovers new tools.


Agent-Tool Bindings#

A binding is an explicit allowlist entry. For SDK-integrated agents, an agent can only be governed against tools it is bound to. If no binding exists, Surface returns deny without evaluating any policies.

The MCP proxy creates and manages bindings automatically for MCP-registered agents. When you use the SDK, call bl.agents.bindTool(agentId, toolId) after creating both records.


Policies#

A policy is a rule that maps a set of agents and tools to an outcome. Policies are the primary mechanism for expressing what your agents are and are not allowed to do.

FieldTypeDescription
namestringDescriptive label.
priorityintegerEvaluation order. Lower numbers are evaluated first.
agent_selectorobjectKey-value pairs matched against the agent record. An empty object ({}) matches all agents.
tool_selectorobjectKey-value pairs matched against the tool record. An empty object ({}) matches all tools.
outcomeallow | deny | approval_requiredThe decision returned when this policy matches.
enabledbooleanDisabled policies are excluded from evaluation entirely.

First match wins. Policies are evaluated in ascending priority order. The first policy whose selectors both match is applied. Evaluation stops there.

Priority is unique within an org. The API rejects a second policy at the same priority with 409 POLICY_PRIORITY_CONFLICT — two policies at the same priority would have undefined evaluation order.

The four decisions — and why deny and default_deny are different#

Every govern() call returns exactly one decision. They mean different things and demand different responses:

DecisionWhen it's returnedTreat it as
allowA policy matched with outcome allow.Proceed.
denyEither a policy matched with outcome deny, or the agent is suspended/disabled, or the tool is not bound to the agent.Explicit block. Do not retry; the rule is intentional.
approval_requiredA policy matched with outcome approval_required — an approval has been created.Wait for the approval (SDK: bl.approvals.wait(id)) or handle asynchronously via webhook.
default_denyNo policy matched at all. The fail-safe.Misconfiguration, not policy. Write a policy whose selectors match this (agent, tool) pair. Don't treat default_deny as "the policy engine said no" — it means "nothing said yes."

The practical consequence: a switch on decision should handle all four cases explicitly. If the default branch silently absorbs default_deny along with deny, you'll be blind to missing-policy bugs.

MCP policies. Policies for MCP-auto-registered agents work exactly the same as policies for SDK-registered agents. The mcp-config.json policy field is a shorthand that creates an initial policy when the server is first registered. You can edit or delete it in the console afterwards.

For a full guide to writing policies, see the Policy Guide.


Evaluations#

Every governance decision is recorded as an evaluation. Evaluations are immutable audit records. They are never deleted.

FieldDescription
agent_idThe agent that was governed.
tool_idThe tool that was requested.
policy_idThe policy that matched, or null if none matched or the request was denied before policy evaluation.
policy_snapshotThe matched policy's name, priority, selectors, outcome, and timestamps as they were when the decision was made. This stays on the evaluation even if the policy changes later.
outcomeThe final decision: allow, deny, approval_required, or default_deny.
action_payloadThe action object passed to bl.govern(), if provided. For MCP calls, this is the tool input.
request_contextThe context object passed to bl.govern(), if provided.
evaluated_atUTC timestamp of the decision.

Execution systems can also attach action-result evidence to an evaluation after the governed action runs. Result records capture status (succeeded, failed, skipped, or unknown), external IDs or URLs, duration, exit code, error text, an output digest, and integration metadata. This lets a receipt show both the decision BlackLake made and what the downstream system reports happened next.

MCP tool calls appear in the Evaluations page in real time. You can also query evaluations via GET /v1/evaluations or bl.evaluations.list().


Approvals#

An approval is created when a governance decision of approval_required is returned. It represents a pending human decision about whether a specific tool invocation should proceed.

Approvals are created only when a matched policy has outcome: 'approval_required'. All other outcomes do not create approvals.

For MCP calls: when an ask-type policy matches, the proxy holds the tool call open and waits for the approval to resolve before responding to the client. The tool either runs or is rejected based on the human decision — the agent sees a normal MCP response either way.

A magic-link is a one-shot URL that lets an authorised reviewer decide an approval without first signing in. BlackLake mints a per-(reviewer, approval) token on creation, embeds it in the approval-needed email, and reuses the same URL as the deep-link target of mobile push notifications. The token is single-use, expires when the approval does, and is revoked on any decision the underlying approval reaches through any other path. Tapping the link opens a confirmation page (/decide?token=...) that calls POST /v1/decide — same record, same decisions ledger entry, with decision_reason set to Approved via email link / Approved via push notification / etc. so the audit trail reflects the channel.

Lifecycle#

pending  →  approved
         →  rejected
         →  expired
StatusMeaning
pendingAwaiting a human decision.
approvedThe tool invocation may proceed.
rejectedThe tool invocation should not proceed.
expiredNo decision was made within 24 hours. Treated as a rejection.

Expiry is checked on read, not on a background schedule. An approval past its expires_at is reported as expired.

Two-person approval#

A policy with requires_two_person: true produces approvals that need two distinct approvers before they close. The first POST /v1/approvals/:id/approve records the decision but leaves the approval pending; a second approve from a different decided_by identity transitions it to approved. A single reject still terminates immediately. Each individual decision is appended to a decisions array on the approval so the audit trail captures who approved and when, not just the final outcome.

The same identity attempting to approve twice is rejected with 409 DUPLICATE_APPROVER.

Approver roles#

A policy can also pin which roles are allowed to decide. Set approver_roles: ['security'] on a policy and only members whose org_memberships.roles array contains security can approve or reject (matched by overlap — any one role is enough). API-key callers and break-glass overrides bypass the check, on the principle that workspace-scoped service identities and audit-flagged emergencies shouldn't be gated by org RBAC.

A user without a matching role gets 403 ROLE_REQUIRED. Manage member roles via PATCH /v1/organisation/members/:user_id/roles or the Members page in the console.

Break-glass#

POST /v1/approvals/:id/break-glass is an emergency override. It force-approves regardless of two-person requirements and sets break_glass: true on the approval as a permanent audit marker. The reason field is required to be at least 40 characters — the deliberate friction is the point.

If a workspace finds itself reaching for break-glass routinely, the underlying policy probably needs adjusting; the override is for genuine incidents.


Webhooks#

Webhooks deliver event notifications to a URL you register. BlackLake sends an HTTP POST to your endpoint with a signed JSON payload whenever any of the events below fires.

EventFired when
approval.createdA governance call returns approval_required and an approval record is created.
approval.approvedPOST /v1/approvals/:id/approve is called successfully.
approval.rejectedPOST /v1/approvals/:id/reject is called successfully.
evaluation.createdAny /v1/govern call lands a row on the audit ledger.
evaluation.deniedA governance call decides deny or default_deny.
evaluation.approval_requiredA governance call decides approval_required.
budget.threshold_crossedA budget crosses the 50 / 80 / 100 % soft thresholds.
budget.limit_exceededA budget exceeds the hard cap.
cost.recordedA cost record lands and rolls into the budget accounting.
upstream.unhealthyAn MCP upstream's health-ping streak flips to unhealthy.
upstream.recoveredAn MCP upstream recovers from an unhealthy run.

Delivery is fire-and-forget with a 5-second timeout per attempt; no automatic retry. Failed deliveries can be replayed via POST /v1/webhooks/:id/deliveries/:deliveryId/resend (one) or POST /v1/webhooks/:id/deliveries/resend-failed (bulk). Inspect delivery history via GET /v1/webhooks/:id/deliveries.

Every outgoing request is signed with HMAC-SHA256. See the API Reference for the signature format and a verification example.


Coverage#

Coverage answers a question that a board, an auditor, or an oncall engineer asks at different volumes: what does BlackLake actually govern, and how does it get there? Each agent and tool has a source field set at registration time — one of manual (created via console or API key), mcp (auto-registered by the MCP proxy), sdk (created by code calling bl.agents.create() or bl.tools.create()), ci (created or used by a governed CI step), shell (auto-registered when blx <command> runs), cloud_audit (synthesised by the audit-ingest reconciler from GCP / AWS / Azure events), existing_workflow_engine (n8n / Zapier / Airflow integrations), or depth (durable workflow steps). Every govern() call updates the agent's and tool's last_seen_at timestamp.

The Coverage page in the console aggregates this. It shows total registered actors split by source, how many were active in the last 30 days, how many are stale or never-seen, and what proportion of recent governed evaluations came in through each capture path. The same data is exposed at GET /v1/insights/coverage for dashboards and reporting.

Coverage has two halves. What BlackLake governs is the inventory side — the registered actors and the evaluations they generated. What slipped past is the audit-ingestion side — POST /v1/audit/ingest accepts events from GCP Cloud Audit Logs, AWS CloudTrail, GitHub audit events, and other sources, then reconciles them against the BlackLake evaluation log. Events that match a known evaluation get linked; events with no match are listed at GET /v1/audit/uncovered. That uncovered set is where production mutations that bypassed BlackLake become visible.


blx — governed shell wrapper#

blx is a CLI alias inside the blacklake package that gates a shell command on a BlackLake decision. It classifies the command, calls govern() with an actor named after the user and host, runs the command if the decision allows, and posts result evidence after.

blacklake blx git push origin main
blacklake blx gcloud run deploy control-plane-api --region europe-west2
blacklake blx terraform apply

The classifier covers the highest-blast-radius commands: git writes (push, tag, reset), GitHub PR/release/workflow actions, gcloud run deploy, mutating kubectl verbs, terraform apply / destroy, psql -c and bq query, and AWS subcommands like ecs, iam, lambda, s3, rds. Any command that doesn't match a class passes through unchanged — blx ls, blx npm test, blx git status all run with no governance call. That makes it safe to use as a shell prefix or alias without paying a cost on every invocation.

Approval-required decisions block until a human decides via the console, email magic-link, or mobile push. The wrapper polls for up to 15 minutes, then aborts.

BLACKLAKE_API_KEY is required (or a saved key in ~/.blacklake/config.json). The agent name defaults to cli:<user>@<host> and is auto-registered with source: 'ci'. Tools are auto-registered using the classification name (git.write, gcloud.run.deploy, etc.). Bind agents to tools and write policies as you would for any other actor.


Governed CI#

BlackLake ships a GitHub Action that gates a workflow step behind a policy decision and attaches an audit-grade receipt to the run. The action calls bl.govern() with the agent and tool you nominate, plus auto-collected GitHub context (repository, sha, run URL, actor). The decision controls what happens next:

  • allow — the step runs and the action exits successfully.
  • approval_required — the action polls the approval until a human decides via the console, email magic-link, or mobile push. Approve and the workflow continues; reject and the step fails.
  • deny / default_deny — the step fails immediately with the policy reason.

After the gated step finishes — success or failure — a post-run hook records the job outcome as action result evidence on the same evaluation. POST /v1/decisions/verify then returns both the policy snapshot at the time of decision and the actual deploy outcome.

- uses: blacklake-systems/control-plane/packages/govern-action@main
  with:
    api-key: ${{ secrets.BLACKLAKE_API_KEY }}
    agent: github-actions
    tool: gcloud.run.deploy
    action-payload: '{"environment":"production","service":"control-plane-api"}'

- name: Deploy to Cloud Run
  run: ./deploy.sh

CI is the highest-leverage non-MCP control point: every production change already passes through it, the action is one step to add, and the receipt links the policy verdict to the GitHub run URL — so an auditor can trace a Cloud Run revision back to the commit, the approver, and the policy that allowed the deploy.


API Proxy#

The API proxy routes your LLM API calls through BlackLake so usage is tracked centrally.

Proxy pathForwards to
/proxy/anthropic/*api.anthropic.com
/proxy/openai/*api.openai.com
/proxy/ollama/*localhost:11434

Point your AI SDK's baseURL at the local proxy. BlackLake forwards the request using your own API key (passed in the Authorization header as normal) and records the model, token counts, and estimated cost from the response. No request or response body is stored — only the usage metadata.

The API proxy does not require authentication against BlackLake. It uses whatever credentials you pass to the upstream API.


Cost Governance#

BlackLake captures token + dollar cost on every observed LLM call and binds that cost to the governance receipt. Cost is not a separate dashboard bolted on — it's part of the audit ledger.

Capture points. The /proxy/anthropic, /proxy/openai, and /proxy/ollama paths capture cost automatically — token counts come from the provider's response, run through the pricing config, and write to cost_records plus the legacy api_call_logs. For direct calls (Bedrock, Vertex, Foundry, Gemini, custom OpenAI-compatible servers) the SDK attributes via bl.cost.record({ evaluation_id, provider, model, input_tokens, output_tokens }). Same canonical destination either way.

Cost decomposition. Every record carries an input_usd / output_usd / cache_read_usd / cache_write_usd / thinking_usd breakdown. Anthropic's prompt-cache hits, OpenAI's cached input, and extended-thinking tokens (where the provider bills them separately) all surface as line items.

Pricing config. A versioned JSON snapshot lives at apps/api/src/lib/pricing/config.ts. Each receipt stamps the version it was costed under (e.g. 2026-05). When pricing changes in a future release, historical totals stay stable — you compare today's cost using the snapshot that was current when the call ran.

Cost-aware policies#

The policy DSL extends with a cost_conditions block:

{
  "name": "block-very-expensive",
  "priority": 5,
  "agent_selector": { "environment": "production" },
  "outcome": "deny",
  "mode": "monitor",
  "cost_conditions": {
    "all": [
      { "signal": "tool.estimated_cost_usd", "op": "gt", "value": 1.00 }
    ]
  }
}

Signals: agent.spend_today_usd, agent.spend_per_task_usd, agent.cumulative_spend_session_usd, tool.estimated_cost_usd, tool.input_tokens, tool.model, workspace.spend_today_usd, user.spend_today_usd. Operators: gt / gte / lt / lte / eq / ne / in.

mode is enforce (apply outcome) or monitor (record the would-be decision; never deny). Cost-aware policies default to monitor so operators can observe before they restrict.

Pre-call cost estimation#

POST /v1/cost/estimate computes a breakdown from { provider, model, input_tokens, output_ceiling_tokens } against the current pricing snapshot. Pass it to bl.govern({ estimate: ... }) and cost-aware policies + budgets evaluate against the projected spend before the LLM call leaves the network. Reconciles against the actual when cost is recorded.


Budgets#

First-class budget primitive. Soft + hard USD limits scoped to a workspace, an agent, a tool, a user, or a Depth lineage identifier (workflow / run / step), evaluated over per-task / day / week / month windows in any IANA timezone.

Hard limits deny at govern() time — before the LLM call runs. This is the difference between cost governance and cost observability: an alert is too late when the spend already happened.

Soft limits fire webhooks at 50% / 80% / 100% (budget.threshold_crossed) and again at hard 100% (budget.limit_exceeded). The dispatcher dedupes per (budget_id, period_key, threshold) so a busy hour doesn't fire the same notification thirty times.

{
  "name": "Production Opus / month",
  "scope_type": "agent",
  "scope_id": "agent_…",
  "period": "month",
  "timezone": "Europe/London",
  "soft_limit_usd": 800,
  "hard_limit_usd": 1000
}

Race window: budget reads are not synchronously locked against concurrent writes. Two calls each at $99 of a $100 limit can both pass before the first cost record lands; the workspace then settles at $101 and the third call denies. For exact accounting, set the soft limit at your effective cap.


Receipts: v1 and v2#

A decision_token accompanies every governance result. The token is an HMAC binding (evaluation_id, decision) so an LLM cannot fabricate a denial that never happened.

  • v1 (bldt_v1:…) — pre-cost. Signs (evaluation_id, decision). Receipts written before any cost was captured stay v1.
  • v2 (bldt_v2:…) — cost-bound. Signs (evaluation_id, decision, cost_summary) so the dollar figure is cryptographically tied to the decision. Surfaces in the response of GET /v1/cost/by-evaluation/:id once a cost record has been attached. Verify with the same POST /v1/decisions/verify endpoint — it accepts both versions and tells you which it validated.

policy_evaluations.receipt_version is 1 for cost-less receipts and 2 once the first cost_records row rolls up onto the evaluation. Future revs will be additive.


Baselines & Anomalies#

For every active (agent, tool) pair, BlackLake computes a rolling baseline: median / p95 / p99 input + output tokens, median / p95 cost, median / p95 latency, retry rate, error rate, and a per-model breakdown with approval rate. Trigger via POST /v1/insights/baselines/recompute (default 30-day window).

The evaluation detail page surfaces a comparison to the baseline alongside each receipt — "Input tokens: this call is 312% of the median (1820)". The fact is surfaced; the conclusion is the operator's.

Anomaly classes are receipts that materially deviate from baseline. Six classes ship today, all dismissible per pattern:

  • high_retry — retry count above the threshold for this pair
  • token_spike — input or output at p99 of the recent baseline
  • cache_miss_heavy — cache miss rate over the workspace threshold on a call that should hit the cache
  • long_tail — output tokens or duration > 2× p95
  • extended_thinking — large thinking-token spend on a small input
  • idle_context — session resumed after long idle (cache_creation tokens get billed again)

Thresholds are env-overridable via BL_ANOMALY_* vars; a settings table for per-workspace tuning is on the roadmap. Trigger detection via POST /v1/insights/anomalies/recompute.


Observations#

A workspace-facts feed — anomalies, drift, cost decomposition entries, policy-author hints, coverage gaps. Every observation is one fact about the workspace; the operator decides whether it matters. Surfaces in the console under Observations.

Three companion endpoints:

  • GET /v1/insights/health-snapshot — weekly digest (cost, top agents, anomaly count, decision breakdown). Designed to be the artefact a workspace owner shares with their team or finance lead.
  • GET /v1/insights/drift — workspace cost change vs prior weeks; surfaces dollar delta and a hypothesis list (model switch, expanded toolset, heavier workload).
  • GET /v1/insights/model-choice and /model-substitution — comparison data across models actually used by an (agent, tool) pair, plus a counterfactual: "if these Opus calls had been Sonnet, what would the cost have been?" Equivalence (approval-rate match, downstream tool usage) is left to the user.

Usage Tracking#

The /usage page is the cost governance view. Backed by the same cost_records table that drives budgets, decomposition, and the v2 receipts.

Breakdowns: by provider, model, capture path (proxy / sdk / mcp / ci / manual), agent, tool, user, environment. Daily sparkline. CSV + NDJSON export for finance / procurement / auditors. Drill from any cell into the underlying receipts.

The legacy GET /v1/usage endpoint (api_call_logs only) is preserved for backward compatibility but the canonical surface is GET /v1/cost.


Cloud Mode#

The cloud product at console.blacklake.systems adds:

  • Persistence across machines — your agents, tools, policies, and evaluation history are stored centrally
  • Team access — multiple users can view and manage the console
  • Mobile approval UI — approve or reject pending tool calls from your phone
  • Long-term retention — evaluation history is not bounded by local disk space
  • OAuth-protected MCP upstreams — cloud handles RFC 7591 dynamic client registration and per-user OAuth tokens for upstreams like Atlassian, Linear, Sentry, Notion, and Slack. See the MCP Proxy section above for the auth-mode model.

When you create a cloud account, you get an API key. Point the SDK at https://api.blacklake.systems and pass that key. The cloud API is identical to the local API — the same endpoints, same data model, same policy engine.

The local CLI can sync to the cloud: run npx blacklake serve --cloud with your API key set and local data is pushed up on startup.


API Keys#

API keys come in two scopes — pick based on what's calling the API:

ScopePrefixCarries user identity?Use for
User-scopedbl_usr_…Yes — userId flows through the audit ledgerMCP clients (Claude Code, Cursor, Claude Desktop), personal CLI usage, anything where audit attribution matters. Required to call OAuth-protected MCP upstreams.
Org-scopedbl_…No — workspace-level service identityCI runners, automation scripts, service accounts. Cannot connect OAuth upstreams (no user to bind tokens to).

Mint a user-scoped key from Settings → API keys → Generate key → User-scoped in the console (or POST /v1/api-keys/user). Org-scoped keys are still available for service accounts via POST /v1/api-keys. Both list and revoke through the same endpoints.

The plaintext key is shown exactly once at creation. Revoking is immediate; the key cannot be un-revoked.


Governance Flow#

Agent wants to use Tool
         |
         v
  Is agent status active?
  (suspended or disabled)
         |
    YES  |  NO --> deny immediately
         |
         v
  Is there a binding between
  this agent and this tool?
  (SDK agents only; MCP agents
  are auto-bound)
         |
    YES  |  NO --> deny (no binding)
         |
         v
  Load enabled policies,
  ordered by priority ASC
         |
         v
  For each policy:
  Does agent_selector match?  AND
  Does tool_selector match?
         |
  First  |  No match for any policy
  match  |
    |    +---> default_deny
    v
  Apply outcome:
  allow / deny / approval_required
         |
         v
  Record evaluation (always)
         |
         v
  Return decision to caller

The caller — your agent runtime or the MCP proxy — is responsible for acting on the decision. Surface records the decision but does not itself execute or block the tool.


Workspace lifecycle#

Deleting a workspace is reversible for 30 days. POST /v1/organisation/delete soft-deletes the workspace: API keys stop working immediately, but the data is kept and the workspace can be recovered with POST /v1/organisation/restore (see the API reference). The original API key remains valid for restore requests during this window — keep it somewhere safe if you might want the workspace back.

After 30 days the workspace and all its data are permanently removed.


Security and Local Mode#

BlackLake runs on localhost over HTTP. This is safe for local development — the API and dashboard are only accessible from your machine.

Do not expose ports 3100 or 3200 to your network. If you need remote access to your governance data, use the cloud console at console.blacklake.systems.

Webhook signing secrets and API keys are stored in the local SQLite database at ~/.blacklake/blacklake.db. This file is not encrypted. For production use with sensitive data, connect to the cloud console.