This project takes the public GitHub dataset on BigQuery and models it into a Star Schema. It provides a single source of truth, enabling analysis of market share and open-source legal compliance.
- Market Insight: Identify which programming languages are gaining popularity.
- Legal Compliance: Track the percentage of repositories utilising a valid license.
The Data Warehouse focuses on monitoring Market Intelligence and Legal Compliance business processes at GitHub.
- Monitoring Market Intelligence would allow developers to identify which areas to focus their feature development on.
- Monitoring Legal Compliance allows GitHub to monitor if open-source repositories are being used lawfully.
- Source: BigQuery github_repos public dataset.
- Staging: Relevant tables identified in staging models and written to BigQuery dataset.
- Warehouse: Data transformed into fact and dimension tables into BigQuery data warehouse.
- BigQuery (Data Warehouse)
- dbt-core / dbt-bigquery (Transformation)
- Looker Studio (Visualisation)
- A paginated report was created using Looker Studio connected to the BigQuery data warehouse.
- Advanced data visualisations were used to report on Market Intelligence and Legal Compliance.
- Link: https://lookerstudio.google.com/reporting/ed4e1d6b-6636-408d-b360-bd7348d02eae
- Conceptual, Logical and Physical data models were created using draw.io.