Benchmarks
The repository ships with a benchmark tool under packages/typed-xlsx/benchmark/ that measures stream throughput, file size, and memory at different row counts and batch sizes.
Benchmark setup
The benchmark generates synthetic data with a representative schema (8 columns: string ids, dates, numeric amounts, enum statuses, and a string array that triggers sub-row expansion). All measurements run on Node.js 22, macOS (M-series), NVMe SSD for file-backed spool.
File size by row count
| Rows | Columns | Inline strings | Shared strings |
|---|---|---|---|
| 10,000 | 8 | 1.8 MB | 1.2 MB |
| 50,000 | 8 | 8.9 MB | 5.8 MB |
| 100,000 | 8 | 17.6 MB | 11.4 MB |
| 500,000 | 8 | 88 MB | 57 MB |
| 1,000,000 | 8 | 176 MB | 114 MB |
Shared strings compress better for data with many repeated values (enums, categories, region names). Inline strings avoid the shared string table in heap but produce larger files.
Generation time (stream mode, file-backed spool)
| Rows | Batch size | Total time | Peak heap |
|---|---|---|---|
| 10,000 | 1,000 | ~0.3s | ~60 MB |
| 100,000 | 5,000 | ~2.1s | ~80 MB |
| 500,000 | 10,000 | ~10.5s | ~95 MB |
| 1,000,000 | 10,000 | ~21s | ~100 MB |
Peak heap stays nearly flat as row count increases. Most of the variation comes from batch size (larger batches spike momentarily during serialization) and the shared string table (grows with unique string cardinality).
Generation time (buffered mode)
| Rows | Total time | Peak heap |
|---|---|---|
| 10,000 | ~0.5s | ~120 MB |
| 50,000 | ~2.4s | ~550 MB |
| 100,000 | ~5.1s | ~1.1 GB |
| 200,000 | ~11s | ~2.2 GB |
Buffered mode heap scales linearly. Above ~50k rows, you risk hitting Node's default heap limit in constrained environments.
Batch size effect (stream mode, 100k rows)
| Batch size | Total time | Peak heap |
|---|---|---|
| 500 | ~3.2s | ~65 MB |
| 1,000 | ~2.5s | ~70 MB |
| 5,000 | ~2.1s | ~78 MB |
| 10,000 | ~2.0s | ~90 MB |
| 50,000 | ~1.9s | ~280 MB |
Smaller batches increase spool write overhead. Larger batches spike heap during serialization. The sweet spot for most schemas is 1,000–10,000 rows per batch.
tempStorage: file vs memory (100k rows, 5k-row batches)
| tempStorage | Total time | Peak heap |
|---|---|---|
"file" | ~2.1s | ~78 MB |
"memory" | ~1.4s | ~340 MB |
File-backed spool keeps heap low at the cost of disk I/O. Memory spool is faster but the serialized XML accumulates in heap. On NVMe SSDs the difference is modest; on network-attached storage it is larger.
Running the benchmarks yourself
cd packages/typed-xlsx
bun run benchmark
The benchmark script accepts --rows, --batchSize, --tempStorage, and --strings flags. Results are printed to stdout and optionally written to benchmark/results/.