Skip to content

marcalehub/Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering ToolKit

Overview

This repository contains a comprehensive collection of scripts and tools for data engineering tasks, focusing on database interactions, data processing, and automation. The toolkit includes PowerShell scripts for system automation, Python modules for data manipulation, SQL queries for Snowflake operations, and supporting documentation.

IMPORTANT: Do not focus on the tool, instead focus on business logic and full understanding of your ERP/System.

TIPS: Download this repo to easily navigate through the files.

CAUTION: Please review security policy within your workspace before solution implementation.

PowerShell Scripts (powershell)

Collection of PowerShell scripts for various data engineering operations:

  • Database Operations:

    • access_addTable.ps1 - Creates tables in Access database
    • access_alterTableAddColumn.ps1 - Add tables column in Access database
    • access_alterTableRenameColumn.ps1 - Rename tables column in Access database
    • access_alterTableDropColumn.ps1 - Delete tables column in Access database
    • access_updateValue.ps1 - Update tables value in Access database
    • access_addTableFromSelect.ps1 - Creates tables from SELECT queries in Access
    • access_addValue.ps1 - Inserts values into Access tables
    • access_deleteTable.ps1 - Deletes tables from Access database
    • access_readValue.ps1 - Reads values from Access database
    • access_webScrapping.ps1 - Creates tables in Access database from webScrapping
    • odbc_readValue.ps1 - Reads data via ODBC connections
    • oracle_readValue.ps1 - Reads data from Oracle databases
    • access_clearFile.ps1 - Clear database Files
  • Execution Scripts:

    • call_ps1Files.ps1 - Executes other PowerShell scripts
    • execute_pythonScript.ps1 - Runs Python scripts asynchronously
    • execute_snowCLI.ps1 - Executes Snowflake CLI commands
    • execute_snowSQL.ps1 - Runs SnowSQL queries
  • Data Processing:

    • consolidateSqlQuery.ps1 - Consolidates SQL queries
    • excel_updateQuery.ps1 - Updates Excel files with query results
    • folder_updates.ps1 - Handles folder updates and file operations
    • save_excel.ps1 - Saves data to Excel files
  • Utilities:

    • key-pair-auth.ps1 - Handles key-pair authentication
    • odbc_driversHandle.ps1 - Manages ODBC drivers
    • outlook_api.ps1 - Interacts with Outlook API
    • outlook_legacy.ps1 - Legacy Outlook operations
    • pbi_restApi.ps1 - Power BI REST API interactions
    • linkedIn_api.ps1 - LinkedIn Business API interactions
    • readFile.ps1 - File reading operations
    • regexp.ps1 - Regular expression operations
    • shp_restApi.ps1 - SharePoint REST API operations
    • taskSchedule.ps1 - Task scheduling
    • datatype.ps1 - PowerShell Script DataType
    • transcript.ps1 - Logging and transcription
    • workingDates.ps1 - Date handling and working day calculations
    • job_create.ps1 - Create a Job
    • job_readVariable - Read Local Variable within a job
    • job_threadLike - Simulate thread to run multi jobs at the same time
    • job_schedule - Schedule Job to run
    • error_handle.ps1 - PowerShell Script Template with Error Handle Structure
    • condition.ps1 - PowerShell Script Template with Condition Structure
    • condition_elseif.ps1 - PowerShell Script Template with Multiples Conditions / elseif Structure
    • loop_while.ps1 - PowerShell Script Template with While Structure
    • loop_for.ps1 - PowerShell Script Template with For Structure
    • define_variables.ps1 - PowerShell Script Define Varibales
    • define_pipe.ps1 - PowerShell Script Define commandlets pipe
    • writeFile.ps1 - Write output to a file
    • array_hashtable.ps1 - Create arrays or hashtable
    • showProgress.ps1 - Visualization of PowerShell Script progress
  • Design Patterns:

    • desing_pattern.ps1 - PowerShell script template with standard structure
  • Application:

    • desktopApplication_createWindow.ps1 - PowerShell .Net FrameWork Desktop Application Creation
    • desktopApplication_addTable.ps1 - PowerShell .Net FrameWork Desktop Application Add Tables With Data
    • desktopApplication_addButton.ps1 - PowerShell .Net FrameWork Desktop Application Add Button
    • desktopApplication_commonWidget.ps1 - PowerShell .Net FrameWork Desktop Application Common Widget
    • desktopApplication_commonHandler.ps1 - PowerShell .Net FrameWork Desktop Application Common Windows Handler

Python Modules (python)

Python scripts for data processing and database interactions:

  • libraries.py - Common imports and dependencies
  • database_readValue.py - Functions for reading from various databases (ODBC, Oracle, SQLAlchemy, Snowflake)
  • save_excel.py - Excel file generation with formatting
  • column_validation.py - Data validation functions
  • date_handle.py - Date manipulation utilities
  • file_updateValidation.py - File update validation
  • null_handle.py - Null value handling
  • readFile.py - File reading utilities
  • receive_analystProcess.py - Data analysis processing
  • search.py - Search and filtering functions
  • streamingData.py - Data streaming operations
  • runTerminal.py - Run Terminal Command from Python
  • WebScrapping.py - Getting data from web pages

Notebook (python & notebook)

  • notes.ipynb - Jupyter notebook with basic examples

Snowflake SQL Scripts (snow sql/cli)

SQL scripts for Snowflake operations:

  • cte_dataReusage.sql - Common Table Expressions for data reuse
  • cte.sql - CTE examples
  • downstreamData.sql - Downstream data processing
  • loadFiles.sql - File loading operations
  • loadFilesSTP.sql - Stored procedure for file loading
  • regexp.sql - Regular expression operations in SQL
  • stageFileFormat.sql - Stage and file format creation
  • create_database.sql - Create Database
  • create_wh.sql - Create Warehouse
  • grant_access.sql - Grant access

Prerequisites

Software Requirements

  • Windows PowerShell 5.1 or higher
  • Python 3.x
  • Git (with SSH key setup)
  • Microsoft Access (for .accdb files)
  • Snowflake account and SnowSQL/SnowCLI
  • ODBC drivers for database connections
  • Oracle client (if using Oracle databases)

Python Dependencies

Key libraries used (install via pip):

  • snowflake-connector-python
  • sqlalchemy
  • pandas
  • numpy
  • openpyxl
  • dateutil
  • workdays
  • pyodbc (for ODBC connections)
  • cx_Oracle (for Oracle connections)

Authentication Setup

  • Configure SSH keys for Git and secure connections
  • Set up Snowflake key-pair authentication
  • Configure database connection credentials

Workflow Overview

The typical workflow follows this pattern:

  1. PowerShell handlers initialize variables and execute files
  2. Python scripts process data and create parameters
  3. PowerShell processes execute commands
  4. SnowSQL/SnowCLI interacts with Snowflake database
  5. Scripts complete execution

Common Workflow:

---
title: Project Workflow
config:
  flowchart:
    htmlLabels: false
---
    flowchart TB
    POWERSHELL["Powershell Handler"] -->
    BEGIN["Code Block Begin: Define Variable & Execute File for Python and SnowSQL/SnowCLI"]

    PYPROCESS["Code Block Process: Execute commands"]
    PYTHON@{ shape: procs, label: "Python: Create manual file parameters "}

    PSPROCESS["Code Block Process: Execute commands"]
    
    SNOWSQL("SnowSQL/SnowCLI")
    DATABASE[(Snowflake)]

    END["Code Block End: Script ends"]

    BEGIN --> PYEXE["Python Exe File Path"] --> PYPROCESS --> PYTHON --> END

    BEGIN --> PSEXE["PowerShell Exe File Path"] --> PSPROCESS --> SNOWSQL --> DATABASE --> END
Loading

Security Considerations

  • Always use key-pair authentication for secure communication
  • Follow security policies for database access
  • Protect private keys and sensitive information

Best Practices

  1. Follow the established design patterns in desing_pattern.ps1
  2. Test scripts thoroughly before committing

License

This project is proprietary. All rights reserved by Marcos De Vargas.

Version History

  • v1.0 - Initial release with comprehensive data engineering toolkit
  • Date: March 17, 2026

Contact Information

Personal Profile/Resume LinkedIn

Email Outlook/iCloud

Phone Dominican Republic

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors