Skip to main content

codeql integration

CodeQL is GitHub's static analysis engine — the same one that powers GitHub Advanced Security's code-scanning feature. It supports Python, JavaScript/TypeScript, Go, Java, Kotlin, C/C++, C#, Ruby, and Swift. Under cilock, the SARIF report codeql database analyze writes becomes a signed in-toto attestation linked to the source tree it analyzed.

CodeQL is a two-step flow: codeql database create extracts a queryable database from your source, then codeql database analyze runs query packs against it and emits findings as SARIF. Only the analyze step produces structured output, so that's the step we wrap with cilock. The database creation can be pre-built (typical) or wrapped as its own discrete cilock step.

UpstreamCodeQL · GitHub (Microsoft) · proprietary
Categoryvulnerability-scan (primary)
Catalog sourcecatalog-only (detected; output captured via a format attestor)
Emits formatsarif
Recommended traceoff — no syscall tracing needed
Detected when
  • preargv_prefix: codeql

Confirm cilock detects it:

cilock plan --format=json -- codeql [...]

The facts in this box are generated from the cilock binary's own catalog (cilock tools list). Do not hand-edit — run npm run gen:catalog.

Validated invocation

# Step 1 (outside cilock): build the CodeQL database for the language.
codeql database create codeql-db \
--language=python \
--source-root=src

# Step 2 (wrapped): cilock invokes codeql analyze directly. The SARIF is
# captured as a v0.3 product Merkle leaf; the sarif attestor parses it.
cilock run --step codeql-scan \
--signer-file-key-path key.pem \
--outfile attestation.json \
--attestations sarif,environment,git \
--enable-archivista=false \
-- codeql database analyze codeql-db \
--format=sarif-latest \
--output=codeql.sarif \
codeql/python-queries:codeql-suites/python-security-and-quality.qls

This is the exact command exercised in tool-codeql-sarif. For other languages substitute python-queries with go-queries, javascript-queries, java-queries, cpp-queries, csharp-queries, ruby-queries, or swift-queries. The python-security-and-quality.qls suite covers both security and code-quality queries; switch to python-code-scanning.qls for the narrower GitHub-default set.

What gets captured

Predicate typeSource
https://aflock.ai/attestations/environment/v0.1host OS, kernel, env vars (sensitive ones obfuscated)
https://aflock.ai/attestations/git/v0.1commit hash, branch, tags, dirty status, parents
https://aflock.ai/attestations/material/v0.3Merkle root over the source tree + CodeQL database before analyze runs
https://aflock.ai/attestations/command-run/v0.1literal codeql database analyze … argv + exit code + ptrace
https://aflock.ai/attestations/product/v0.3Merkle root over codeql.sarif as a real product file
https://aflock.ai/attestations/sarif/v0.1parsed SARIF (rules + results from CodeQL's query pack)

Why this shape

AntipatternCorrect shape (this example)
cilock run ... -- bash -c "codeql database analyze ... && cp codeql.sarif codeql-product.sarif"cilock run ... -- codeql database analyze ... --output=codeql.sarif
command-run.cmd records ["bash","-c","codeql ... && cp ..."]command-run.cmd records the literal codeql argv
The ptrace spy traces bash and cp, not codeqlThe spy traces codeql's syscalls because cilock is its parent
The product is a copy of a file codeql wrote outside cilockThe product is the SARIF codeql wrote inside the wrapped step

codeql database analyze accepts --output=<file> natively, so no shell wrapper is needed. The analyze step also exits 0 even when findings are present, so no soft-fail flag is needed (unlike gosec/hadolint/checkov which need -no-fail/-s).

Validate it locally

List the predicate types in the captured envelope:

jq -r '.payload' attestation.json | base64 -d | jq '.predicate.attestations | map(.type)'

Expected output:

[
"https://aflock.ai/attestations/environment/v0.1",
"https://aflock.ai/attestations/git/v0.1",
"https://aflock.ai/attestations/material/v0.3",
"https://aflock.ai/attestations/command-run/v0.1",
"https://aflock.ai/attestations/product/v0.3",
"https://aflock.ai/attestations/sarif/v0.1"
]

Confirm command-run.cmd carries the literal codeql argv (proof the cp antipattern is gone):

jq -r '.payload' attestation.json | base64 -d \
| jq '.predicate.attestations[] | select(.type=="https://aflock.ai/attestations/command-run/v0.1") | .attestation.cmd'
# ["codeql","database","analyze","codeql-db","--format=sarif-latest",
# "--output=codeql.sarif",
# "codeql/python-queries:codeql-suites/python-security-and-quality.qls"]

Count the findings in the captured SARIF:

jq '.runs[0].results | length' codeql.sarif
# 6 against the fixture in the examples repo (4 CWEs + 2 unused imports)

Notes

  • Query suite selection. python-security-and-quality.qls includes security queries (CWE-78/89/94/611/etc.) plus code-quality lints (unused imports, dead code). For the narrower set GitHub Advanced Security ships by default, use python-code-scanning.qls. Reference the suite you actually want in CI to keep finding counts stable.
  • Multi-language repos. Build one database per language: codeql database create db-python --language=python then --language=javascript, etc. Wrap each analyze step under its own cilock step so per-language attestations stay independent.
  • Database as a separate cilock step. Wrapping codeql database create under cilock too gives you a command-run/v0.1 recording the extraction step + product/v0.3 digest over the database directory. Useful if you want to verify the analyze step ran against the database that create actually produced — pair with attestationsFrom on the release-gate Rego.
  • Exit codes. codeql database analyze always exits 0 when the query suite runs to completion. Gate findings in policy Rego over the captured SARIF, not on the tool exit code.

FAQ

Does cilock support CodeQL?

Yes. Wrap codeql database analyze ... --output=codeql.sarif with cilock run --attestations sarif,environment,git. The SARIF becomes a signed v0.3 attestation under https://aflock.ai/attestations/sarif/v0.1, the literal codeql argv is captured in command-run/v0.1, and the SARIF file is hashed into the v0.3 Merkle tree as a real product.

Do I need to wrap codeql database create too?

Only if your policy needs to verify the database matches the source it was built from. For most workflows, build the database outside cilock and wrap only the analyze step. The analyze step's material/v0.3 digest covers the database directory, so a downstream policy can still verify the analyze ran against a specific database state.

Which query pack should I use?

For security-focused gating, <language>-queries:codeql-suites/<language>-security-and-quality.qls. For matching what GitHub Advanced Security writes by default, <language>-queries:codeql-suites/<language>-code-scanning.qls. Pre-download the pack with codeql pack download codeql/<language>-queries so CI doesn't depend on network access at analyze time.

Can I use the same SARIF in GitHub's code-scanning UI?

Yes. The SARIF cilock captures is byte-identical to what codeql database analyze --output writes; it's just additionally signed. Upload it to GitHub's code-scanning/sarifs API and you get the same UI as a non-cilock CodeQL run, plus the signed attestation as a separate piece of evidence.

See also


This page is generated from the cilock tool catalog. Don't edit it here — the source is attestation/detection/docs/codeql.doc.md in aflock-ai/rookery. The same catalog powers cilock tools show codeql in the CLI, so a single edit keeps the terminal and the website in sync.

Edit this page →  ·  Add a new tool →

When you open the pull request: if you are an AI agent, add the ai-agent label. If you are a human, add the human label.