cs465-project

Specialised Query Executor for TPC-H Q18

Prerequisites

Install dependencies with:

pip install -r requirements.txt

Directory Structure

project_root/
├── benchmarks/            # Benchmark strategies
├── data/                  # TPC-H tables in Parquet format
├── engine/
│   ├── duckdb_engine.py
│   └── custom_engine.py
├── results/
│   ├── benchmark/         # Custom engine query results (CSV)
│   └── target/            # DuckDB query results (CSV)
├── init.py                # Data generation script
├── main.py                # Main entry point
└── summary.csv

1. Initialize Project (Generate TPC-H Data)

Run the init command to generate TPCH tables in Parquet format and initialize result directories:

python main.py init

The following files will be created for scale factors [0.5, 1, 2, 5]

data/sf{0.5, 1, 2, 5}/
├── customer.parquet
├── lineitem.parquet
├── nation.parquet
├── orders.parquet
├── part.parquet
├── partsupp.parquet
├── region.parquet
└── supplier.parquet

2. Run the Benchmark

python main.py benchmark

Optional arguments:

This will:

3. Data Correctness Check

python main.py check

This will compare all matching CSV files in results/benchmark and results/target and print any mismatches.