Release notes

What changed in each ami release.

The authoritative, commit-level history lives on the releases page. This page summarises each version.

v0.1.0

The first release. ami re-fetches every URL in a seed and packs the results into WARC files and a columnar Parquet index.

ami crawl <seed> reads a seed (a list of URLs), re-fetches every one concurrently, and writes gzipped WARC files plus a captures.parquet index under --out.
Four seed formats. A text file (one URL per line, - for stdin), newline-delimited JSON with a url field, a Parquet file with a url column, or an XML sitemap, all driving the same fetch engine.
A fetch engine sized for one fast machine. Thousands of workers over sharded keep-alive transport pools, with per-host connection caps and a per-domain failure threshold, a fast/polite header profile, and a post-fetch digest comparison that records an unchanged response as a revisit.
Standard output. WARC 1.1 files that open in any WARC tool, and a Parquet index with a row per fetch (url, host, status, fetched_at, content_type, body_length, digest, unchanged, warc_file, warc_offset, warc_length, error, meta_json) that points back into them.
ami inspect summarises a capture index and samples its rows with no Parquet tool installed.
Sharded runs. --shard/--shards partition a seed deterministically across machines, so each process fetches a disjoint slice.
Packaged everywhere. Archives, .deb/.rpm/.apk, a multi-arch GHCR image, checksums, SBOMs, and a cosign signature.