MyTube is a ready-to-use synthetic dataset that mimics the experience of a real video-sharing platform. It includes everything you'd expect—users, video uploads, likes, dislikes, comments, subscriptions, watch history, and even ad interactions. Whether you're working on content recommendations, viewer retention, or feed personalization, MyTube gives you the data to simulate real-world scenarios and test your ideas confidently.
Perfect for students, developers, and data enthusiasts, this dataset is a great way to sharpen your skills in SQL, machine learning, and backend development. Use it to build dashboards, run A/B tests, or train models for video suggestions, sentiment analysis, and creator analytics—all in a safe and privacy-friendly environment.
Highlights:
- Simulates a full video platform lifecycle, including uploads, views, likes, dislikes, comments, and subscriptions.
- Enables development and testing of video recommendation systems, audience clustering, and creator performance analysis.
- Suitable for backend feature testing such as watch history tracking, channel feed generation, and moderation workflows.
- Includes data for user activity patterns, search queries, session duration, ad impressions, and engagement metrics.
- Excellent for use in behavioral analytics, trend detection, A/B testing, and digital media research.
- Supports hands-on practice with SQL, NoSQL modeling, time-series analysis, and data warehousing for streaming platforms.
The Mytube dataset schema represents a modern video-sharing platform, similar to YouTube. It supports content management, user interaction, monetization, subscriptions, and personalized features. Ideal for developing multimedia applications, analytics, and media platform simulations.
Key tables in the dataset include:
- Users: Stores account information like login details, profile, and contact info. Central to all user actions.
- Channels: Represents user-managed content hubs, with details like channel name, account type, and ownership.
- Channel Details: Contains extended info like banner images, contact emails, and social media links.
- Videos: Stores video data, including title, description, URLs, view count, and duration, linked to the user.
- Video Views: Tracks views, timestamps, and user behavior to analyze video engagement.
- Video Likes/Dislikes: Records user feedback (likes and dislikes) for videos.
- Comments: Manages video comments, including text, authorship, and timestamps, tied to users and videos.
- Video Categories: Organizes videos by content type (e.g., Education, Entertainment).
- Video Tags: Maintains tags for content labeling, aiding search and discovery.
- Video Tag Mapping: Connects videos and tags for a many-to-many relationship, enhancing metadata.
- Playlists: Allows users to group videos into collections for easier viewing or curation.
- Playlist Videos: Manages the relationship between playlists and the videos they contain.
- Payments: Records user payments, such as subscriptions or channel features.
- Subscriptions: Tracks user subscriptions to channels for personalized feeds and notifications.
- Shorts: Manages short-form video content, ideal for quick consumption.
- Stories: Supports temporary content (stories), providing engagement through short-lived videos.
The Mytube dataset emulates a rich video-streaming platform environment using fully synthetic data generated through intelligent AI simulation agents. It includes modeled user accounts, video uploads, view histories, comments, likes, and content engagement behaviors. These behaviors are informed by industry-standard trends and public consumption patterns from video platforms like YouTube, yet the data remains entirely artificial and free from personal identifiers. Our ethical AI-driven simulation ensures the dataset feels authentic, providing meaningful context for developers working on video content platforms. Whether it’s used for recommendation system training, UI testing, or data-driven performance experiments, Mytube’s dataset balances realism with responsible data generation. s
MyTube’s dataset offers rich media-centric data to simulate video consumption behaviors and social interactions. Built to accommodate data science, product testing, and backend services, it supports smooth import into popular formats, databases, and cloud platforms to enable performance benchmarking and algorithm development.
- Available file formats: CSV, JSON, Excel
- Available databases: MySQL, PostgreSQL, SQL Server
- Cloud database access: Snowflake