How it works

How StackProof analyzes your repos

A walkthrough of the scan pipeline for engineers who want to understand what runs when they connect a repository.

A StackProof scan runs through six sequential phases: guard checks, repository fetch and file selection, dual-agent AI analysis, citation verification, reconciliation and scoring, and report generation. The source code files are deleted after the scan finishes. A typical Pro scan takes 30 to 120 seconds depending on repository size.

01

Guard checks

Before any analysis work begins, two guards run. A deduplication check (via Redis) blocks a second concurrent scan on the same repository if one is already in progress, returning the ID of the running scan so the client can poll for it. A quota check reads the user's monthly scan counter from the database and returns an error with a reset timestamp if the limit is exhausted.

Quota enforcement uses optimistic locking (updateMany WHERE monthlyScansUsed < limit) to prevent race conditions from concurrent requests.

02

Repository fetch and file selection

The repository is fetched from GitHub using the access token associated with your account. Before downloading any file content, a tree check runs against the GitHub tree API to count files and estimate total size without consuming tokens.

Repositories within the synchronous size limit are scanned in real time. Larger repositories are queued for background processing; the client receives a queued status immediately and polls for results.

File selection then ranks the tree by signal (source files over config, entry points and services over generated output) and selects the highest-signal files for analysis, currently up to 50 per scan across all tiers. Per-tier caps are planned.

Individual files larger than 100 KB (sync) or 500 KB (async) are skipped. The scan continues with remaining files and records a size-violation count in the metadata. Generated or minified files are excluded from analysis regardless of size.

03

Dual-agent AI analysis

The selected files are sent to two AI agents running in parallel. Model assignment is fixed by tier; there is no dynamic routing.

TierAgent AAgent BMode
ScoutGemini FlashNoneSingle-agent, max 3 findings
ProGemini 3.1 ProGemini 3 FlashDual-agent consensus
Lifetime (BYOK)User-suppliedUser-suppliedDual-agent (BYOK)

Each agent produces a structured set of findings independently. Findings include a category (security, architecture, code quality), a severity level, a description, and a citation: a specific file path and line range from the repository snapshot.

The token budget per scan is fixed by tier, with Pro receiving several times the input budget of Scout.

If Agent B is unreachable or times out, the scan completes on Agent A's output alone and the report is flagged. If both agents fail, the scan fails entirely and quota is not decremented.

04

Citation verification

Before any finding reaches the database, it goes through three verification steps designed to prevent AI hallucination from entering the report.

The Bailiff checks that each finding includes a citation (file path + line range) referencing the actual repository snapshot. Findings without a verifiable citation are rejected or downgraded. A fuzzy ±5-line tolerance handles minor model drift on exact line numbers.

A citation verifier confirms that cited file paths exist in the snapshot. A prose bailiff runs on the generated report text and removes any claim that cannot be traced back to a finding in the structured output.

The scan report includes a verification rate metric (citations verified vs. total AI checks run) visible in the report UI.

05

Reconciliation and scoring

The reconciler service compares Agent A and Agent B's finding sets and produces a consensus output. Where agents agree, findings are confirmed. Where they disagree, the reconciler resolves conflicts according to severity and citation quality.

Scoring starts at 100 and applies deductions for verified findings based on their severity. The score is an integer from 0 to 100. If the same repository has been scanned before, the report includes a score delta and a direction indicator (up, down, or flat). The delta helps track whether a repository is improving over time.

06

Career packet generation

For Pro and Lifetime scans, a narrator agent (Gemini Flash) generates a prose career packet from the structured findings. The narrator's only input is the verified finding set, so it cannot introduce claims that are not grounded in a prior finding. The prose bailiff enforces this constraint.

The career packet includes a narrative summary of the scan results, a skill evidence section drawn from your repositories, and interview preparation material tailored to the findings. It is stored encrypted in your account and is available to export.

If the prose generation step fails, the structured report is returned without the prose section. The scan is not failed; it is marked degraded, and the structured findings remain fully available.

07

Storage and cleanup

The completed report is written to the database encrypted (AES-256-GCM when user retention is enabled). The cloned repository files are deleted. The Redis deduplication lock is cleared. Quota is incremented unless the scan fell within the 24-hour free re-scan window on an existing scan.

The scan result returned to the client includes the scan ID, analysis, scores, remaining quota, and any warnings (storage degraded, agents degraded, prose skipped).

Re-scans

A re-scan is a scoped re-verification of an existing scan's findings rather than a full repository analysis. File selection is restricted to paths flagged in the parent scan, so a re-scan uses roughly 20% of the token budget of a full scan. Re-scans are tracked with a separate quota counter (Pro: 16/month).

The scope restriction also prevents using re-scans as a backdoor to perform a full analysis at lower cost.

Timing reference

PhaseTypical duration
Guard checks< 200 ms
GitHub repo fetch2 to 15 s (varies by size)
File selection< 500 ms
AI agent analysis (per agent)10 to 60 s
Citation verification2 to 10 s
Prose report generation5 to 30 s
Database storage< 500 ms
Total (typical Pro scan)30 to 120 s

Ready to scan a repository?

Scout is free. One starter scan, no credit card required.