GOAT is a synthetic dataset designed to mirror the operations of a direct-to-consumer electronics brand, focusing on products like audio gear, wearables, and smart accessories. It offers detailed data on users, transactions, product listings, reviews, inventory, and support tickets, providing a rich environment for analyzing consumer behavior, marketing strategies, and post-purchase interactions. Whether you're working on demand forecasting, A/B testing, or campaign performance, GOAT offers a realistic framework for exploring these use cases.
Ideal for data scientists, developers, and analysts, GOAT supports a variety of applications, from machine learning tasks like customer lifetime value modeling and product recommendation to backend testing of inventory systems and APIs. It also provides an excellent resource for learners to practice SQL, data transformations, and ETL workflows. With its comprehensive and structured data, GOAT allows you to dive into customer behavior, sales trends, and operational optimization in the consumer electronics space.
Highlights:
- Simulates a full-stack D2C electronics brand — covering product catalog, purchases, fulfillment, and customer feedback.
- Perfect for building models in customer segmentation, sales prediction, and return forecasting.
- Supports analysis of marketing funnels, influencer impact, and seasonal product trends.
- Includes transactional flows for cart creation, payments, shipping, warranty handling, and returns.
- Ideal for backend and systems testing: inventory sync, order status workflows, and logistics integration.
s
- Structured for real-world data projects in product analytics, user behavior tracking, and database design.
The Goat schema manages an e-commerce platform for selling audio products like headphones, earphones, and speakers. It covers user management, product details, orders, payments, reviews, and user activities. It also integrates discounts and notifications to improve customer experience.
Key tables in the dataset include:
- Users: Stores user data like login details, contact info, and role (customer/admin), linked to orders, payments, and reviews.
- Products: Contains product details such as name, description, price, stock, and category (headphones, earphones, speakers).
- Product Images: Stores images for products (front, back, side views) linked to the respective products.
- Orders: Tracks customer orders, including total amount, status (pending, completed), and shipping and billing info.
- Order Items: Details individual items in an order, including product, quantity, unit price, and total price.
- Payments: Stores payment details for orders, including method, amount, status (pending, completed), and transaction reference.
- Reviews: Records customer reviews for products, including ratings (1-5 stars) and feedback, linked to users and products.
- Discounts: Defines discount codes with percentage and validity period for promotions.
- User Activities: Logs user actions like viewing products, adding items to cart, and making orders/payments.
- Notifications: Stores notifications for users (order updates, offers) with read/unread status for engagement.
The GOAT dataset is a synthetic representation of an e-commerce platform focused on audio and tech accessories. It simulates a comprehensive buying experience: product listings, image galleries, order placements, user reviews, discount applications, and payment workflows. Using AI agents trained on digital commerce behaviors, the data captures natural interactions between users and a product catalog—without involving any real transactions or consumer data. This dataset is ideal for developers and analysts working on e-commerce platforms, especially those involving electronics and consumer goods.
GOAT mimics the dynamics of an online electronics store, including purchases, product reviews, and cart behaviors. It has been prepared to work effortlessly across various formats, making it ideal for A/B testing, consumer analysis, and backend service development in local or cloud environments.
- Available file formats: CSV, JSON, Excel
- Available databases: MySQL, PostgreSQL, SQL Server
- Cloud database access: Snowflake