FinTech Banking ETL Pipeline with PySpark Delta Lake and Medallion Architecture
Group: Capstone Project
|Product Category: Cloud & Data Engineering
|Sub Category: Apache Spark
About this Product
NovaPay ETL Pipeline is an advanced data engineering capstone project that builds a production-grade ETL pipeline for a fictional digital bank — NovaPay — processing 8.5M+ rows across 7 tables through a Bronze → Silver → Gold medallion architecture using PySpark and Delta Lake, with data quality gates and incremental processing.
With this project, you'll build a pipeline that can:
- Ingest banking data from 3 source formats — CSV, JSON, and Parquet — into Bronze Delta Lake tables with idempotent re-run support
- Clean, validate, and deduplicate 5M+ transactions — null handling, type casting, referential integrity checks, and Delta MERGE upserts
- Enforce data quality gates at every layer — halting on critical failures like null customer IDs or row count drops above 20%
- Compute 4 Gold analytics tables — daily transaction summary, customer 360, branch performance, and product adoption metrics
- Run in full refresh and incremental modes via a single configurable spark-submit command
This project teaches you:
- PySpark pipeline design across Bronze, Silver, and Gold medallion layers
- Delta Lake operations — MERGE upserts, partitioning, and schema enforcement
- Data quality framework — reusable null, duplicate, FK, and row count checks
- Incremental processing with watermark-based daily loads
- Modular architecture with pytest unit testing and YAML-driven parameters
It uses Python, PySpark, Delta Lake, Apache Spark, Parquet, CSV, JSON, and YAML config management.
Why this project matters:
ETL pipeline design is the most tested skill in data engineering interviews. This project mirrors real take-home assignments — multi-format ingestion, quality gates, and Gold aggregates — mapping directly to production work.
Project Mentors
Similar Products
Product Performance Dataset
Topics: SQL, PostgreSQL, Retail Performance
Basic Professional Data Analysis
Topics: SQL, PostgreSQL, Data Quality Analysis
Restaurant Performance & Menu Optimization
Topics: SQL, PostgreSQL, Data Analytics
Similar Services
Finding the best experts for you...
No Services Yet
Expert services for this product will appear here once available.
Top User Reviews
Loading reviews...
Be the first to review this product!
Please try refreshing the page.