Install dependencies with:
pip install -r requirements.txt
project_root/
├── benchmarks/ # Benchmark strategies
├── data/ # TPC-H tables in Parquet format
├── engine/
│ ├── duckdb_engine.py
│ └── custom_engine.py
├── results/
│ ├── benchmark/ # Custom engine query results (CSV)
│ └── target/ # DuckDB query results (CSV)
├── init.py # Data generation script
├── main.py # Main entry point
└── summary.csv
Run the init command to generate TPCH tables in Parquet format and initialize result directories:
python main.py init
The following files will be created for scale factors [0.5, 1, 2, 5]
data/sf{0.5, 1, 2, 5}/
├── customer.parquet
├── lineitem.parquet
├── nation.parquet
├── orders.parquet
├── part.parquet
├── partsupp.parquet
├── region.parquet
└── supplier.parquet
python main.py benchmark
--out summary.csv: Output file for the benchmark results (default: summary.csv)--benchmark 5: Number of timed repetitions per scale factor after one warm-up run (default: 5)--strategy <strategy>: Benchmark execution strategy:
interweave (default)duckdb_firstcustom_engine_first--enable_profiling: Enable detailed profiling for the custom engineThis will:
results/benchmark and results/target--benchmark timed runssummary.csvpython main.py check
This will compare all matching CSV files in results/benchmark and results/target and print any mismatches.