Musify is a realistic synthetic dataset that mirrors how users interact with a modern music streaming app. It captures the flow of listening sessions, playlists, likes, skips, and recommendations — all designed to help data scientists, engineers, and analysts experiment, test, and build smarter audio experiences. Whether you're working on user preference modeling, recommendation systems, or audio analysis, Musify gives you a safe, structured playground to dive deep into music data without privacy concerns.
The dataset includes everything from users and songs to artists, albums, genres, and playback histories. It’s perfect for building and testing collaborative filtering systems, ranking algorithms, and engagement tracking tools. Whether you’re preparing for interviews, building a data pipeline, or simulating app behavior, Musify helps you explore real-world scenarios with clean, comprehensive music platform data.
Highlights:
- Simulates a complete music streaming experience, including users, songs, albums, artists, and playback interactions.
- Excellent for building and testing recommendation systems, skip prediction models, and playlist generators.
- Supports use cases in user engagement analysis, retention modeling, and content-based filtering.
- Useful for developers working on playback logic, personalization features, or backend audio catalog design.
- Includes well-structured tables for audio metadata, listening sessions, playlist behaviors, and user feedback.
- Suitable for SQL practice, ETL development, time-series analytics, and A/B testing simulations.
The Musify schema is designed for a modern music streaming platform, supporting essential features like content management, user personalization, and monetization. It connects artists, users, and content, making it ideal for both mobile and web applications.
Key tables in the dataset include:
- Users: Stores login, profile, and subscription info. All user actions (payments, playlists, likes) are linked here.
- Artists: Contains artist info, including bio and profile images. Each song and album is tied to an artist.
- Albums: Represents music albums with metadata like title, release date, and cover art.
- Songs: Stores track details, including title, duration, album, and artist info.
- Genres: Contains music genres (e.g., Pop, Jazz, Hip-Hop) to classify songs and aid discovery.
- Song Genres: Links songs to genres for flexible categorization.
- Playlists: Represents user-created song collections, with metadata like titles and descriptions.
- Playlist Songs: Connects songs to playlists, tracking which song is added to which playlist.
- User Playlist Follows: Tracks users following other users' playlists for a social music experience.
- Songs Liked: Stores songs liked by users for personalized recommendations.
- User Listening History: Logs user playback history for session analytics and music suggestions.
- Payments: Records payments made by users, mainly for subscriptions.
- Subscriptions: Tracks user subscription details, including type and dates, alongside payments for premium access.
The Musify dataset is developed using generative AI tools to simulate a digital music streaming platform experience. It includes users, artists, songs, albums, playlists, likes, listening history, and payment details—all artificially created based on known patterns from real music apps. Every interaction in the dataset, from song plays to playlist follows, is crafted using AI agents that model real-world user behavior and content trends while ensuring no actual user or copyrighted data is included. By simulating these interactions in a privacy-respecting environment, Musify provides a rich, high-quality dataset that’s ready for testing recommender systems, building analytics dashboards, or training machine learning models—ethically and risk-free.