Write an Apache Beam batch job in Python satisfying the following requirements
- Read the input from
gs://cloud-samples-data/bigquery/sample-transactions/transactions.csv - Find all transactions have a
transaction_amountgreater than20 - Exclude all transactions made before the year
2010 - Sum the total by
date - Save the output into
output/results.jsonl.gzand make sure all files in theoutput/directory is git ignored
If the output is in a CSV file, it would have the following format
date, total_amount
2011-01-01, 12345.00
...
Following up on the same Apache Beam batch job, also do the following
- Group all transform steps into a single
Composite Transform - Add a unit test to the Composite Transform using tooling / libraries provided by Apache Beam