Exploit feeds, scan ingest, and what runs on a schedule

Exploit-aware ranking is only as good as its data. This page explains the EPSS/KEV feeds, the daily refresh job, how scans get into Perimeter, and — honestly — which parts run live today vs which are deferred to the hosted tier.

1 · What’s live vs deferred at a glance

Capability	Status	Notes
Findings model, de-noising, ranking, control mapping, evidence, reports	Live now	Runs entirely in your browser on seeded data.
The ingest pipeline (normalize → dedup → enrich → map controls)	Live now	Run it yourself in the Ingest tab against sample payloads.
EPSS/KEV enrichment	Live now (snapshot)	Point-in-time values embedded in the seed data.
The feed-refresh logic (build/merge/enrich/diff caches)	Live now (pure)	The merge math runs locally; only the network pull is gated.
Live EPSS/KEV network pull	Deferred / hosted	Isolated boundary, off by default; the free/local tier makes zero network calls.
Live scan engines (Nuclei / Trivy / OpenVAS)	Deferred / hosted	Run on the hosted ASM runner (external) and the Lookout agent (internal).
Scheduling / continuous re-discovery / drift alerts	Deferred / hosted	The shared scheduled runner owns the timer; the job descriptor is built.
Server-side PDF + signed report storage	Deferred / hosted	Client renders print-to-PDF from the same model.

2 · EPSS and CISA KEV feeds

Two public feeds power exploit-aware ranking:

Feed	Source	What it adds
EPSS	FIRST.org (api.first.org/data/v1/epss)	A daily-updated exploitation probability + percentile per CVE.
CISA KEV	CISA (known_exploited_vulnerabilities.json)	The catalog of confirmed-exploited CVEs, each with a remediation due date.

In the MVP these are point-in-time snapshots baked into the seed data, so ranking works offline. The merge engine that applies a refresh — build a cache, merge a newer one over it (newer wins), re-enrich findings, and diff what changed — is real and runs locally; only the actual network pull is held behind a boundary.

Deferred / hosted The live pull does not run in the browser. By design, the local tier makes zero network calls — so the pull function refuses to run unless explicitly given a fetcher and a live:true opt-in, rather than silently faking data. In production a scheduled job pulls FIRST EPSS + CISA KEV daily into shared cache tables.

3 · The daily refresh job

Products don’t each run their own timer — a shared scheduled runner (Workers Cron) owns scheduling, and a product registers a job it should call. Perimeter’s job is the daily exploit-intel refresh:

Field	Value	Why
kind	perimeter.feed_refresh	Product-namespaced job id.
cron	0 6 * * * (06:00 UTC daily)	Lands just after FIRST publishes the day’s EPSS.
payload	{ sources: ["epss","kev"] }	Secret-free — the FIRST/CISA endpoints are public, so no keys or targets are in the job.

When the runner fires the callback, the refresh work pulls the live feeds, merges them over the existing cache (newer wins), and reports which CVEs moved — an EPSS score change or a new KEV listing — which is what drives “what changed this week” and new-KEV drift alerts.

Deferred / hosted How to enable it: on the Sign in / Cloud tab, with the Continuous hosted ASM entitlement unlocked, click Schedule daily EPSS/KEV refresh. It’s idempotent — if the job is already registered for your tenant it won’t double-register. Without the entitlement you’ll see “Continuous hosted scanning is a Pro feature.”

4 · How a scan gets in — the ingest shape

The real engines run off the browser and POST raw results to a stable contract. Keeping that contract fixed is what lets the live engines be built later without touching the pipeline. The wire shape (formalized in assets/data/ingest.schema.json) is:

Field	Meaning
_meta.scan	id, type (external/internal), engine (nuclei/trivy/openvas), scope, started_at, source (asm-runner / lookout-agent).
results[]	One raw per-check result. Required: asset_identifier, severity.
asset_identifier	Resolved against your inventory; unknown → dropped (out of scope).
template_id / check_id	Nuclei template id, or OpenVAS OID / Trivy vulnerability id.
cve	Drives EPSS/KEV enrichment and the dedup key.
cvss / cvss_vector	Base score and vector for the technical report.
category	Maps to control IDs (vuln_management / patch_remediation / tls_crypto / exposed_service / attack_surface).
matched_at / extracted	Evidence — kept local and redacted before any AI call; never published raw.

The pipeline then runs normalize → dedup (one record per asset × check × engine) → enrich (EPSS/KEV) → score → corroborate (multi-engine) → de-noise → map controls, merging into your live finding set (existing workflow fields like status/owner win; the new scan refreshes last-seen + evidence).

5 · Try it: the Ingest scan tab

Open the Ingest scan tab.
Click Ingest Nuclei scan (external ASM) or Ingest Trivy scan (internal agent).
Read the summary: how many raw results came in, how many were in-scope and normalized, how many were new vs refreshed (deduped), and how many were dropped (unknown asset) or blocked (unverified scope).
The new findings appear on the Dashboard and Findings tabs, already scored, enriched, de-noised, and control-mapped — exactly the path a live runner/agent POST takes.

The Trivy sample deliberately includes an out-of-scope row so you can watch the pipeline drop it. This is the same scan-authorization guard described in Assets & scan authorization.

6 · The scan engines (deferred)

Deferred / hosted The actual scanners are open-source and run outside the browser:

Where	Engines	Covers
External — hosted ASM runner	Nuclei + discovery (subfinder / httpx / naabu) + Trivy (public images)	Internet-facing CVEs, exposures, misconfig, subdomain takeover.
Internal — Lookout agent	OpenVAS/Greenbone + Trivy (OS/container/IaC/SBOM) + internal Nuclei	Authenticated internal scanning, outbound-only from your network.

Live progress is delivered by scheduled re-scans + polling rather than a persistent connection. The ingest contract above is the seam these post to.

Next: Cloud & entitlements to enable scheduling, or Troubleshooting: no live data.