To test solutions, run the following commands. Note that the execution may take minutes per query, therefore you may want to limit the number of lines in queries-test.csv.
time python3 reorg.py /opt/lsde/dataset-sf100-csvs/
time python3 cruncher.py /opt/lsde/dataset-sf100-csvs/ queries-test.csv out.csvCompare the results with the expected output using:
diff queries-test-output-sf100-all.csv out.csvThe submission system will perform the same operations (with different queries).
There is a Jupyter notebook available (assignment-1c.ipynb) to run interactive experiments in your browser. To run it, issue:
jupyter notebookAnd use the resulting URL on 127.0.0.1 (http://127.0.0.1:8888?token=...). This URL will also work in the host operating system due to the port forwarding set up in the virtual machines.
reorg.py and cruncher.py files.
This assignment only uses the SF100 data set, which is stored in CSV format (before reorg).