​​ FinTech Banking ETL Pipeline with PySpark Delta Lake and Medallion Architecture
How It Works

FinTech Banking ETL Pipeline with PySpark Delta Lake and Medallion Architecture

Group: Capstone Project

|

Product Category: Cloud & Data Engineering

|

Sub Category: Apache Spark

About this Product

NovaPay ETL Pipeline is an advanced data engineering capstone project that builds a production-grade ETL pipeline for a fictional digital bank — NovaPay — processing 8.5M+ rows across 7 tables through a Bronze → Silver → Gold medallion architecture using PySpark and Delta Lake, with data quality gates and incremental processing.

With this project, you'll build a pipeline that can:

  • Ingest banking data from 3 source formats — CSV, JSON, and Parquet — into Bronze Delta Lake tables with idempotent re-run support
  • Clean, validate, and deduplicate 5M+ transactions — null handling, type casting, referential integrity checks, and Delta MERGE upserts
  • Enforce data quality gates at every layer — halting on critical failures like null customer IDs or row count drops above 20%
  • Compute 4 Gold analytics tables — daily transaction summary, customer 360, branch performance, and product adoption metrics
  • Run in full refresh and incremental modes via a single configurable spark-submit command

This project teaches you:

  • PySpark pipeline design across Bronze, Silver, and Gold medallion layers
  • Delta Lake operations — MERGE upserts, partitioning, and schema enforcement
  • Data quality framework — reusable null, duplicate, FK, and row count checks
  • Incremental processing with watermark-based daily loads
  • Modular architecture with pytest unit testing and YAML-driven parameters

It uses Python, PySpark, Delta Lake, Apache Spark, Parquet, CSV, JSON, and YAML config management.

Why this project matters: 

ETL pipeline design is the most tested skill in data engineering interviews. This project mirrors real take-home assignments — multi-format ingestion, quality gates, and Gold aggregates — mapping directly to production work.

FinTech Banking ETL Pipeline with PySpark Delta Lake and Medallion Architecture
57% OFF
Topics: Data Engineering, ETL Pipeline Design, Medallion Architecture, Data Quality & Validation, Incremental Processing, Delta Lake & Lakehouse

Languages: English

Skills: Python, PySpark, Apache Spark, Delta Lake, Parquet, ETL, Medallion Architecture, Data Quality, pytest

Business Domain: FinTech

Level: Advanced
$21.00 $9.00

Similar Products

Similar Services

Finding the best experts for you...

Top User Reviews

Loading reviews...