Synthetic Data Sources

Explore industry-grade synthetic datasets for ML, analytics, SQL practice, and backend development.

Zwiggy

Zwiggy is a synthetic, full-scale dataset designed to simulate the operations of a modern food delivery platform, catering to data professionals, engineers, and learners. It covers key components such as user profiles, restaurant menus, orders, deliveries, and payments, enabling hands-on practice with real-world scenarios in a privacy-compliant format.

Food Delivery SQL ETL
FREE Explore
MyTube

MyTube is a synthetic dataset that simulates the ecosystem of a large-scale video streaming platform, offering a realistic environment for modeling user behavior, analyzing engagement, and experimenting with recommendation and monetization strategies.

Streaming ML Engagement
FREE Explore
Musify

Musify is a synthetic dataset that replicates the core functions of a modern music streaming platform, including user activity, audio content, playlists, and recommendation dynamics. Designed for data scientists, machine learning engineers, and developers.

Music Recommendation Filtering
FREE Explore
Photogram

Photogram is a synthetic dataset simulating a modern image-sharing social media platform, designed for data science, software development, and analytics. It includes data on users, posts, likes, comments, follows, hashtags, and media metadata.

Social Media Analytics Content
FREE Explore
Amazing

Amazing is a synthetic dataset simulating a large-scale e-commerce platform, covering everything from product listings and customer profiles to orders, payments, and post-purchase interactions. It provides a structured environment for developing recommendation engines, sales forecasting models, and fraud detection systems.

E-commerce Fraud Detection SQL
FREE Explore
CarLelo

Carlelo is a synthetic dataset designed to simulate a car listing and sales platform, featuring data on vehicle listings, buyer interactions, and pricing trends. It supports price prediction, customer preference analysis, and backend testing.

Automotive Pricing ML
FREE Explore
uSkill

uSkill is a synthetic dataset simulating an online learning platform, offering data on user behavior, course effectiveness, and platform performance. It includes records for users, courses, instructors, enrollments, quizzes, and feedback.

EdTech Churn ETL
FREE Explore
MakeYourTrip

MakeYourTrip is a synthetic dataset simulating a travel booking platform, covering flight reservations, hotel bookings, customer preferences, and transactions. It supports modeling user behavior, predicting travel trends, and optimizing booking systems.

Travel Booking Forecasting
FREE Explore
BOYO

BOYO is a synthetic dataset simulating a digital hospitality booking platform, covering the entire guest journey from property discovery to booking, check-in, and feedback. Suitable for machine learning projects, backend testing, and data analysis.

Hospitality Dynamic Pricing ML
FREE Explore
CashBharo

CashBharo is a synthetic dataset that replicates a cashback and affiliate marketing platform, capturing user interactions with offers, purchases, referrals, and campaigns. It supports machine learning projects for personalized recommendations, fraud detection, and redemption predictions.

Fintech Fraud Marketing
FREE Explore
GOAT

GOAT is a synthetic dataset that mirrors the ecosystem of a direct-to-consumer electronics brand, covering categories like audio gear, wearables, and smart accessories. Suitable for machine learning, development, and analytics supporting demand forecasting and A/B testing.

Electronics D2C A/B Testing
FREE Explore
100mcg

100mcg is a synthetic dataset that simulates an online healthcare and pharmacy platform, including pharmaceutical product listings, user purchases, medicine interactions, and customer feedback. Ideal for building predictive models, recommendation systems, and pricing optimization.

Healthcare Pharma Prediction
FREE Explore