Synthetic Data Sources
Explore industry-grade synthetic datasets for ML, analytics, SQL practice, and backend development.
Zwiggy is a synthetic, full-scale dataset designed to simulate the operations of a modern food delivery platform, catering to data professionals, engineers, and learners. It covers key components such as user profiles, restaurant menus, orders, deliveries, and payments, enabling hands-on practice with real-world scenarios in a privacy-compliant format.
MyTube is a synthetic dataset that simulates the ecosystem of a large-scale video streaming platform, offering a realistic environment for modeling user behavior, analyzing engagement, and experimenting with recommendation and monetization strategies.
Musify is a synthetic dataset that replicates the core functions of a modern music streaming platform, including user activity, audio content, playlists, and recommendation dynamics. Designed for data scientists, machine learning engineers, and developers.
Photogram is a synthetic dataset simulating a modern image-sharing social media platform, designed for data science, software development, and analytics. It includes data on users, posts, likes, comments, follows, hashtags, and media metadata.
Amazing is a synthetic dataset simulating a large-scale e-commerce platform, covering everything from product listings and customer profiles to orders, payments, and post-purchase interactions. It provides a structured environment for developing recommendation engines, sales forecasting models, and fraud detection systems.
Carlelo is a synthetic dataset designed to simulate a car listing and sales platform, featuring data on vehicle listings, buyer interactions, and pricing trends. It supports price prediction, customer preference analysis, and backend testing.
uSkill is a synthetic dataset simulating an online learning platform, offering data on user behavior, course effectiveness, and platform performance. It includes records for users, courses, instructors, enrollments, quizzes, and feedback.
MakeYourTrip is a synthetic dataset simulating a travel booking platform, covering flight reservations, hotel bookings, customer preferences, and transactions. It supports modeling user behavior, predicting travel trends, and optimizing booking systems.
BOYO is a synthetic dataset simulating a digital hospitality booking platform, covering the entire guest journey from property discovery to booking, check-in, and feedback. Suitable for machine learning projects, backend testing, and data analysis.
CashBharo is a synthetic dataset that replicates a cashback and affiliate marketing platform, capturing user interactions with offers, purchases, referrals, and campaigns. It supports machine learning projects for personalized recommendations, fraud detection, and redemption predictions.
GOAT is a synthetic dataset that mirrors the ecosystem of a direct-to-consumer electronics brand, covering categories like audio gear, wearables, and smart accessories. Suitable for machine learning, development, and analytics supporting demand forecasting and A/B testing.
100mcg is a synthetic dataset that simulates an online healthcare and pharmacy platform, including pharmaceutical product listings, user purchases, medicine interactions, and customer feedback. Ideal for building predictive models, recommendation systems, and pricing optimization.