BETA
Skip to content

LLM Operations

AI Audit uses LLMs at two points in the pipeline: the accessibility scanner calls Claude Haiku for WCAG analysis, and the AI analyzer calls Claude Sonnet for prioritisation, scoring, and recommended fixes. The same Krafter.Audit.LLMClient module backs both call sites.

Provider routing

Each call attempts Anthropic first, then falls back to OpenAI on failure:

ProviderDefault modelUsed by
Anthropic (primary)claude-haiku-4-5-20251001A11yScanWorker (accessibility)
Anthropic (primary)claude-sonnet-4-5-20251022AiAnalyzerWorker (analysis, scoring)
OpenAI (fallback)gpt-4o-miniBoth, when Anthropic fails

The fallback path triggers on:

  • HTTP 429 from Anthropic — retried once after a 1-second backoff before failing over.
  • Any non-2xx Anthropic response, transport error, or unexpected response shape.

If neither provider has a key configured, the LLM call returns {:error, :not_configured}. The accessibility scanner records this as a scan failure for that scanner; the AI analyzer logs it and completes the scan with no AI scoring (findings are still saved with their default score: 0 priority).

API keys

LLM keys are stored centrally in the ai_provider_settings table, encrypted at rest via Krafter.Secrets, and loaded on every call by Krafter.AI.provider_api_key/1. There is currently no per-team BYOK (bring-your-own-key) — every team uses the platform-managed key.

The LLMClient.configured?/0 helper returns true when at least one provider (Anthropic or OpenAI) has an enabled key. Quota enforcement and gating in the audit context check this flag before charging quota.

Quotas

Three audit endpoints invoke Krafter.Audit.Helpers.ensure_ai_quota/2 before doing work that consumes the LLM:

EndpointQuantity charged
POST /audit/scans/run1
POST /audit/verifications/run1
POST /audit/reports/export1

The quota is enforced only when LLM is configured. If no provider key exists, calls succeed without consuming quota — useful for self-hosted deployments without an LLM key.

When the call would exceed the team's monthly cap, the controller returns 429 Too Many Requests with the standard envelope:

json
{
  "data": null,
  "meta": { "request_id": "..." },
  "error": { "code": "quota_exceeded" }
}

Monthly limits are plan-dependent (see Krafter.Billing.Plans — the :ai_audit_actions_monthly key). Teams can also have ai_audit_quota_override set on their team record to lift or replace the plan default. Reset is on the calendar-month boundary in UTC.

Token usage

Each LLM call returns {:ok, text, %{input_tokens: n, output_tokens: n}}. The numbers are logged but not yet surfaced in API responses or the billing dashboard — per-scan token cost reporting is tracked for a future release.

If you need to estimate cost, the dominant calls are:

  • Accessibility scan: one Haiku call per page, with HTML truncated to 40 KB of input.
  • AI analysis: one Sonnet call per scan, with input scaling linearly in the number of findings produced by the four scanners.

JSON parsing

Both call sites use LLMClient.complete_json/3, which parses the model output as JSON. The parser:

  1. Tries Jason.decode/1 on the trimmed response first.
  2. Falls back to extracting from a fenced code block (```json ... ``` or ``` ... ```) when the model wraps its answer.
  3. Returns {:error, :json_parse_failed} if neither succeeds — the calling worker logs and skips scoring without aborting the scan.

This makes both Anthropic and OpenAI responses interchangeable; OpenAI more often wraps output in fences while Anthropic typically returns bare JSON.

Operational notes

  • Timeouts: receive_timeout: 60_000 (60 s) on both providers. Longer-running calls fail and may fall back to OpenAI.
  • Max tokens: default 4096 output tokens per call; can be overridden per call via the :max_tokens option.
  • No streaming: responses are awaited in full. The AI analyzer is the longest single call (up to ~30 s for a finding-heavy scan).
  • Logging: failures log the provider error body. Successful calls log only at info level.

Built by Krafter Studio