📂Datasets

The Excellence of Our AI Models Springs From the Quality and Diversity of Our Carefully Selected Datasets

At CREATUS, the datasets we use for training our AI models form the cornerstone of our project. In our pursuit of excellence, we're adamant about sourcing only the highest quality data that is legally and ethically available for commercial use.

Open Source Datasets

In the quest to build a superior AI platform, we have relied on a variety of open-source datasets, namely:

AI Text Generation: Our models are trained on the Gutenberg Corpus from NLTK, which houses about 25,000 free electronic books.
AI Image Generation & GAN Image Manipulation: We harness the COCO Dataset and the LSUN dataset, known for their extensive image libraries, for our image generation and manipulation needs.
AI Video Generation: The Kinetics dataset from Google's DeepMind aids us in training our AI for video generation.
AI Avatars Generation: For creating realistic AI avatars, we rely on the Icons8 Universal Multimedia Dataset and the CelebA dataset, both available for commercial use.
AI Voice Generation: Our voice generation AI has been honed with the help of the LibriSpeech dataset.
AI Music Generation: The Lakh MIDI dataset (LMD) contributes to our music generation AI's capabilities.

Purchased Datasets

In addition to open-source data, we also procure datasets from leading commercial providers. These collaborations include:

AI Text Generation & AI Voice Generation: We have acquired vast text and voice corpora from renowned data providers Lionbridge AI and Appen.
AI-powered Management Platform: We have improved our management platform using user interaction and behavior data from market research giants Nielsen and ComScore.
AI Image Generation & GAN Image Manipulation: High-quality image annotation services from Scale AI and diverse image datasets from Kaggle have been instrumental for our image generation and manipulation AI.
AI-powered Image Editor: We owe our optimal 'before and after' image sets to companies like Alegion and iMerit.
AI Video Generation: Video data crucial for our AI video generation feature has been sourced from stock providers like Shutterstock and Pond5, and annotated video data providers like Scale AI and Samasource.
AI Music Generation: We have sourced innovative datasets from OpenAI for our AI music generation feature.
AI Avatars Generation: Specialized data for creating AI avatars has been custom-developed for us by Scale AI.
AI-powered Video Editor: To perfect our AI-powered video editor, we've obtained 'before and after' video datasets from custom providers.

Synthetic Data Platforms

In our quest to build a world-class AI platform, while giving utmost priority to user privacy, we also leverage synthetic data platforms. These platforms enable us to generate artificial datasets with the same statistical characteristics as real data, which helps us develop and test our AI models without compromising privacy. Here are some of the key providers we work with:

Hazy: Hazy provides fast and secure synthetic data generation with the aim of promoting privacy, facilitating compliance, and enabling safe data innovation.
Neuromation: Neuromation offers an array of synthetic data services, including data creation, annotation, and validation, making it easier to train and fine-tune AI models.
Mostly AI: Mostly AI generates synthetic data that retain the statistical properties of the original dataset, helping us to optimize our models without exposing any personal information.
Gretel.ai: Gretel.ai provides synthetic data as a service, generating and transforming data that preserves privacy, making it an effective tool for AI model development and testing.

By combining open-source, purchased, and synthetic data, we have created a robust, varied, and privacy-focused foundation for our AI technologies. As we continue to evolve, we remain dedicated to our commitment to data ethics, ensuring that we respect privacy, maintain transparency, and operate within the bounds of commercial usage regulations. In fact, all our datasets, whether open source or purchased, are sourced with full respect to legality, privacy, and ethical considerations.

We place paramount importance on Ethics in AI and Privacy. We strictly adhere to the terms and conditions of each dataset. When using open-source datasets, we always provide the necessary attribution, refrain from misrepresentation, and follow all the guidelines laid out by the dataset creators. In using purchased datasets, we ensure the data is appropriately anonymized and safeguards user privacy.

PreviousOur Features NextCredits/Rewards Structure

Last updated 2 years ago

hashtagOpen Source Datasets

hashtagPurchased Datasets

hashtagSynthetic Data Platforms

Open Source Datasets

Purchased Datasets

Synthetic Data Platforms