Appearance
Scan pipeline
A scan goes through five distinct stages handled by Oban workers on the :audit queue. Understanding the pipeline helps you reason about latency, partial failures, and what to expect when polling a scan run.
Stage 1 — Kickoff
POST /audit/scans/run validates the URL (HTTP/HTTPS only, SSRF-protected), creates an audit_scan_runs row with status: "queued", and enqueues Krafter.Workers.ScanRunWorker. The endpoint returns 202 Accepted immediately with the scan ID.
This stage is throttled by the :strict_rate_limit plug — 30 requests per 60-second window per team. See Scans → Rate limit.
Stage 2 — Parallel scanners
ScanRunWorker flips the scan to status: "running", seeds the scanner_status map with one entry per scanner, and uses Oban.insert_all/1 to fan out to four scanner workers in parallel:
| Scanner | Worker | What it does |
|---|---|---|
performance | Krafter.Audit.Scanners.PerformanceScanWorker | Calls Google PageSpeed Insights (mobile strategy), checks Core Web Vitals (LCP, FCP, CLS, TBT, etc.) |
security | Krafter.Audit.Scanners.SecurityScanWorker | Fetches the URL and inspects HTTP response headers (HSTS, CSP, X-Content-Type-Options, X-Frame-Options, Referrer-Policy, Permissions-Policy) |
seo | Krafter.Audit.Scanners.SeoScanWorker | Parses HTML with Floki and checks title, meta description, headings, canonical, Open Graph, and <img> alt attributes |
accessibility | Krafter.Audit.Scanners.A11yScanWorker | Sends truncated HTML (40 KB cap) to Claude Haiku for WCAG 2.1 AA analysis |
Each scanner updates its own slot in scanner_status (pending → running → done/failed) and broadcasts progress over PubSub on the topic audit:scan:<scan_run_id>. Findings are upserted via ScanHelpers.upsert_finding_from_scanner/2 with ON CONFLICT (team_id, project_id, domain, title) so re-runs replace prior detections.
Stage 3 — AI analysis
The last scanner to finish (ScanHelpers.all_scanners_finished?/1) enqueues Krafter.Audit.Scanners.AiAnalyzerWorker. This worker is unique: [keys: [:scan_run_id], period: 300], so concurrent finishes won't double-fire it.
The analyzer loads every finding produced by the scan, builds a Claude Sonnet prompt (claude-sonnet-4-5-20251022), and asks for an executive summary plus per-finding scores. Real responses populate:
finding.impact_score,effort_score,risk_score,priority_score(each0..100)finding.recommended_fix({summary, steps[]})scan_run.summary.health_score(0..100) andscan_run.summary.summary(executive paragraph)
If LLM is not configured (no Anthropic or OpenAI key in the database) or the scan produced zero findings, this stage is skipped and the scan still completes successfully — just with no AI scoring. See the LLM Operations page for fallback and quota details.
Stage 4 — Verification (on demand)
Verifications run separately, not as part of a scan. POST /audit/verifications/run with a task_id enqueues Krafter.Workers.VerificationWorker, which:
- Resolves the task's linked finding and its
affected_assets. - Re-fetches each URL and evaluates the verification's
checksarray. Built-in checks include the five security headers (security_header_csp|hsts|xfo|xcto|referrer); unknown check IDs are recorded asfailed. - Marks the verification
result: "passed"only if every check passed; otherwiseresult: "failed". - On a full pass, transactionally flips the linked task to
doneand the linked finding toresolved. Reopening (POST .../verifications/:id/reopen) reverses all three and opens a new regression withtrigger: "verification_reopened".
See Remediation → Verifications for the API surface.
Stage 5 — Report export (on demand)
POST /audit/reports/export creates an audit_reports row with status: "queued" and enqueues Krafter.Workers.ReportExportWorker. The worker:
- Loads every finding for the project (
limit: 10_000, ordered bypriority_score desc, inserted_at asc). - Builds format-specific content — JSON, CSV, or HTML (the
pdfformat currently emits HTML, not a binary PDF). - Uploads the artefact to Garage S3 at
audit/<team_id>/<project_id>/<report_id>.<ext>. - Updates the report row to
status: "exported"and stampssnapshot.export_url.
If the upload fails, the worker sets status: "failed" and leaves snapshot.export_url: nil.
End-to-end latency
| Stage | Typical wall time |
|---|---|
| Kickoff | <100 ms (HTTP response) |
| Parallel scanners | 5–60 s (PageSpeed call dominates) |
| AI analysis | 5–30 s (depends on finding count) |
| Verification | 1–10 s per task |
| Report export | 1–5 s + S3 upload |
A typical scan with no AI bottleneck completes in well under one minute. Add the verification time on demand. Track progress via the PubSub topic audit:scan:<scan_run_id> or by polling GET /audit/scans/:id.
Failure modes
- Scanner failure: the failing scanner's
scanner_statusslot is set tofailed. Other scanners keep running. The scan still finalises once all four have reached a terminal state, and the AI analyzer runs over whatever findings were produced. - AI analyzer failure: the scan completes anyway —
summary.health_scoreandsummary.summaryare omitted, but findings remain queryable. - Verification failure: the verification row holds
result: "failed"and the per-check status array; the linked task and finding are not transitioned. - Report export failure:
status: "failed"and the snapshot retains its original metrics.
Next steps
- LLM Operations — Provider routing, models, quotas, and fallback behaviour
- Findings Guide — Status lifecycle for the findings the pipeline produces
- Remediation Guide — Tasks, verifications, and reports built on top of the pipeline