Databricks integration

You can use your Databricks integration to upload items into your LLMs and GenAI projects by using SQL queries. Please make sure that the Databricks SQL Connector is set up according to the following documentation.

Step 1: Begin integration setup

To set up the integration:

  1. Go to Integrations from your Organization tab or through Team Setup.
  2. Click + New Integration.
  3. Select Databricks.
  4. In the Integration name field, type in a unique name for your integration.
  5. Under Team, you can add the integration to one or multiple teams (optional). The integration will be available in the selected team(s) only.

🚧

User Role Permissions

  • Only Organization Owners can set up an integration from the Organization tab.
  • Team Admins can set up an integration for the team they’re a part of, from the Team Setup tab.

Step 2: Server Hostname

  1. There are two values that you can choose from to enter into the Server Hostname field. Copy one of the values indicated in the locations below:

    • From DatabricksSQL Warehouses<TARGET_WAREHOUSE>Connection details, copy the SQL Warehouse server hostname.
    • From DatabricksCompute<TARGET_CLUSTER>ConfigurationAdvanced optionsJDBC/ODBC, copy the cluster server hostname.
  2. After copying the required value, paste it into the Server Hostname field.

Step 3: HTTP path

  1. There are two values that you can choose from to enter into theHTTP path field: from your SQL warehouse, or your cluster server. The values can be found in the same locations as shown in Step 2: Server Hostname.
  2. After copying the required value, paste it into the HTTP path field.

Step 4: Access token and finalize setup

In this step, you’ll need the personal access token from your Databricks workspace. Learn more.

  1. In your Databricks workspace, click on your username in the top right corner.
  2. Select User Settings.
  3. Click on Developer.
  4. Next to Access tokens, click Manage.
  5. Click Generate new token. Once it’s created, copy it and save it somewhere safe, as you won’t be able to see it later.
  6. Next, go back to SuperAnnotate’s integration setup page.
  7. Paste the token into the Access token field.
  8. When you’ve completed all the steps, click Create.

❗️

Please note that your integration won’t work if the personal access token expires. The data you've uploaded from the integration will still be accessible in the project.

Validate Integration

To validate your Databricks integration:

  1. In Integrations, find your integration.
  2. Click the three dots .
  3. Select Check connection.

Edit team

If you need to make your integration available for more teams, or you want to revoke a team's access to it, you may edit the permissions accordingly.

To add or remove an integration to one or multiple teams:

  1. In Integrations, find your integration.
  2. Click the three dots .
  3. Select Edit team.
  4. Add one or multiple teams from the dropdown, or remove a team by clicking the X on their name. To add all teams, choose Select all. To remove all teams, click the X on the right side of the field.
  5. Click Save.

📘

If you remove a Databricks integration from the team, any data you’ve uploaded from it will remain accessible in your projects.

Delete integration

To delete an integration:

  1. In Integrations, find your integration.
  2. Click the three dots .
  3. Select Delete Integration.
  4. In the popup, click Delete.

📘

If you delete a Databricks integration, any data you’ve uploaded from it will remain accessible in your projects.

Add items with Databricks integration

You can add items into your LLMs and GenAI project by selecting the integration upon upload:

  1. In Data, click Add.
  2. Select Upload Items.
  3. Select your Databricks integration you want to upload from.
  4. Type in the SQL query to retrieve data rows from your Databricks database.
  5. Click Run to get the queried table column names.
  6. Under Item name, you must either select a query result from the list, or type in a name prefix:
    • Query result - from your query results, you can select the column names, whose values will become the names for your uploaded items. In case of any duplicate item names, only the first of that name will be uploaded.
    • Name prefix - you can manually type in a name prefix. A randomly generated, 10-character suffix will be added to it automatically.
  7. Select component IDs that exist in your project from the dropdown list, and map them to the corresponding column names from your queried table.
    • In this dropdown, you’ll only see component IDs of the Input, Select, or Media component types that haven't been excluded from export.
    • If the column name matches the component ID exactly (case-insensitive), then those IDs will be automatically selected.
  8. Once you’re done, click Upload.

After uploading, each row from the mapped columns will become an item with a defined name. All of the values of the mapped columns will be uploaded to their corresponding component IDs.

📘

Any Single- or Multi-select values that don’t exist in the project will be skipped during the upload.
In the case of Select components, the options in your table should be provided as a list of strings, as shown below:

  • ["Partially complete, needs review", "Incomplete"]
  • For Range sliders - [2,5]

🚧

You can only map each component ID to one column at a time. If you try to map a component ID that has already been mapped to another column, then the previous mapping will be removed.