Getting started

Machine Learning models largely reflect the data used to train them, meaning that you need high-quality training data to build the best models. Unfortunately, creating large, high-quality datasets is a labor-intensive and time-consuming process. SuperAnnotate simplifies and accelerates your dataset creation and model development by letting you create, curate, and label data, fine-tune models, evaluate, red-team, and build trust—all within a single, integrated enterprise platform.

This guide will walk you through the basics of the platform to get you started on building better datasets for your models, be it for computer vision, text, or large language models. On this page, you'll learn more about how to import your data and any existing annotations into the platform. You'll also learn how to explore and manage your dataset at scale, and how to create automation pipelines to streamline your tasks across your systems.

For a deep dive into the details of how to set up data creation and model evaluation tasks for Large Language Models and other Generative AI applications, go to the LLMs and GenAI page.

To learn more about traditional ML tasks for image, video, and text data, go to the Data annotation page.

To set up your account, use our account management guides.

Import

There are a few ways to get data into SuperAnnotate. We recommend that you use one of our integrations to attach our platform to your data storage; that way, the data will remain in your storage and will only be rendered to the platform user. For more details on integrations and some other ways to get data into SuperAnnotate, please see the list below:

  • Direct Upload - You can upload the data directly from a local source for a convenient process.
  • Integrations - You can integrate an external storage for easy and secure access to your data.
  • Attach URLs - You can attach URLs by using a CSV so that you can fine-tune and specify exactly which parts of your dataset you want to upload rather than importing everything at once.

Each upload method supports specific project types, so be sure to check them out.

Pre-annotations

SuperAnnotate is built to support dataset building throughout the entire model development cycle. It is possible to create annotations from scratch and import predictions from existing models, known as pre-annotations, in our platform. Using pre-annotations is a great way to enable active learning and model-assisted labeling. Once your model performs well enough, you can transition from manually annotating data to having people review and correct model predictions, significantly improving the annotation speed. This can also be used together with priority scores.

Import annotations

When importing your annotations into a project, some required fields must be taken into account depending on the project type and the instances you're importing. You can review our import annotation formats to learn more about these requirements!

Export annotations

The annotation JSON exported from your project follows the same structure as the import JSON. You can export an entire project or select folders from that project, and you'll receive the exported annotation JSON file with all of the corresponding data. You can use our Python SDK API to get the annotations only for specific items for more control of what to export.

Explore

The platform has a tool called Explore to interact with your dataset at scale. With it, you can manage the status and assignments of items, and use different filtration methods for your quality assurance needs.

SuperAnnotate has a native query language built to give you a flexible and intuitive way of filtering the dataset.

Some of these filtrations can also be done with the innovative AI technique of embeddings in your items. This allows you to use our Similarity Search and Annotate Similar features by providing a data sample or by conducting a general description search.

There are a few different ways to view your dataset.

Subsets

Subsets are a way to categorize your items in the Explore tab into separate curation sets.

Bulk actions

The range of actions available on the platform can be performed in bulk and with ease.

Orchestrate

Orchestrate is a great tool for automating processes in your ML pipeline. You can use this feature to automate and streamline certain aspects of your project, such as monitoring project updates and creation, to increase your day-to-day efficiency.

You can create your automation from the pipeline canvas, using various components that allow you to set up a fully customized automation. In this pipeline, you can also use custom actions that you can build to suit your needs on a case-to-case basis.

When you've set up automation, you can monitor them further through an extensive graph and table that analyzes your pipeline data.

Using Python SDK with SuperAnnotate

When setting up a complete machine learning and data pipeline, it is essential that all systems can connect efficiently. SuperAnnotate, therefore, has an extensive Python SDK API that allows you to interact programmatically with the platform. This can be used for high-level tasks such as managing projects, people, and data, monitoring individual annotations, leaving comments, and running quality control scripts.