MyTube: Your
Real-World
Video Platform
Data Playground
MyTube is a ready-to-use synthetic dataset that mimics the experience of a real video-sharing platform. Ideal for content recommendations, viewer retention analysis, and feed personalization.
Overview
MyTube is a ready-to-use synthetic dataset that mimics the experience of a real video-sharing platform. It includes everything you'd expect—users, video uploads, likes, dislikes, comments, subscriptions, watch history, and even ad interactions. Whether you're working on content recommendations, viewer retention, or feed personalization, MyTube gives you the data to simulate real-world scenarios and test your ideas confidently.
Perfect for students, developers, and data enthusiasts, this dataset is a great way to sharpen your skills in SQL, machine learning, and backend development. Use it to build dashboards, run A/B tests, or train models for video suggestions, sentiment analysis, and creator analytics—all in a safe and privacy-friendly environment.
Full Platform Lifecycle
Simulates a full video platform lifecycle, including uploads, views, likes, dislikes, comments, and subscriptions.
Built for Development & Testing
Enables development and testing of video recommendation systems, audience clustering, and creator performance analysis.
Backend Feature Testing
Suitable for backend feature testing such as watch history tracking, channel feed generation, and moderation workflows.
Rich Engagement Data
Includes data for user activity patterns, search queries, session duration, ad impressions, and engagement metrics.
Analytics & Research
Excellent for use in behavioral analytics, trend detection, A/B testing, and digital media research.
How it Works
AI-Generated & Fully Synthetic
The MyTube dataset is generated using advanced AI agents, creating a realistic yet entirely synthetic representation of video platform interactions with zero real-world or personally identifiable data.
Realistic Simulation with Privacy
It simulates user accounts, video uploads, view histories, comments, likes, and content engagement behaviors informed by industry-standard trends and public consumption patterns.
High-Quality & Safe for Use
Built using insights from public trends and industry data, the dataset delivers structured, high-quality data suitable for recommendation systems, UI testing, and data-driven performance experiments.
Dataset Schema
A comprehensive relational model representing a modern video-sharing platform engineered for deep analysis and complex querying.
Users
Stores account information like login details, profile, and contact info.
Channels
Represents user-managed content hubs with channel name and ownership.
Channel Details
Contains extended info like banner images and social media links.
Videos
Stores video data including title, description, URLs, and view count.
Video Views
Tracks views, timestamps, and user behavior to analyze video engagement.
Likes / Dislikes
Records user feedback (likes and dislikes) for videos.
Comments
Manages video comments including text, authorship, and timestamps.
Categories
Organizes videos by content type (e.g., Education, Entertainment).
Playlists
Allows users to group videos into collections for easier viewing or curation.
Payments
Records user payments, such as subscriptions or channel features.
Subscriptions
Tracks user subscriptions to channels for personalized feeds and notifications.
Shorts
Manages short-form video content, ideal for quick consumption.
Stories
Supports temporary content (stories), providing engagement through short-lived videos.
Video Tags
Maintains tags for content labeling, aiding search and discovery.
Playlist Videos
Manages the relationship between playlists and the videos they contain.
- CSV
- JSON
- Excel
- MySQL
- PostgreSQL
- SQL Server
- Snowflake