Release notes
What changed in each ami release.
The authoritative, commit-level history lives on the releases page. This page summarises each version.
v0.1.0
The first release. ami re-fetches every URL in a seed and packs the results into WARC files and a columnar Parquet index.
ami crawl <seed>reads a seed (a list of URLs), re-fetches every one concurrently, and writes gzipped WARC files plus acaptures.parquetindex under--out.- Four seed formats. A text file (one URL per line,
-for stdin), newline-delimited JSON with aurlfield, a Parquet file with aurlcolumn, or an XML sitemap, all driving the same fetch engine. - A fetch engine sized for one fast machine. Thousands of workers over sharded keep-alive transport pools, with per-host connection caps and a per-domain failure threshold, a
fast/politeheader profile, and a post-fetch digest comparison that records an unchanged response as a revisit. - Standard output. WARC 1.1 files that open in any WARC tool, and a Parquet index with a row per fetch (
url,host,status,fetched_at,content_type,body_length,digest,unchanged,warc_file,warc_offset,warc_length,error,meta_json) that points back into them. ami inspectsummarises a capture index and samples its rows with no Parquet tool installed.- Sharded runs.
--shard/--shardspartition a seed deterministically across machines, so each process fetches a disjoint slice. - Packaged everywhere. Archives,
.deb/.rpm/.apk, a multi-arch GHCR image, checksums, SBOMs, and a cosign signature.