When working with an external Cloud Storage connection (S3, GCS, Azure), keep the following in mind:
- Label Studio doesn’t import the data stored in the bucket or folder, but instead creates references to the objects. Therefore, you must have full access control on the data to be synced and shown on the labeling screen.
- Sync operations with external buckets or folders only goes one way. It either creates tasks from objects in the folder (Source storage) or pushes annotations to the output folder (Target storage). Changing something on the cloud side doesn’t guarantee consistency in results.
- We recommend using a separate folder for each Label Studio project.
When I click Sync, I don't see my data in project
Sometimes the sync process doesn’t start immediately. That is because syncing process is based on internal job scheduler. If after a period of time nothing happens, follow the steps below.
First, check that you have specified the correct credentials.
Then go to the cloud storage settings page and click Edit next to the cloud connection. From here, you can check the following:
-
The File Filter Regex is set and correct. When no filters are specified, all found items are skipped. The filter should be a valid regular expression, not a wildcard (e.g.
.*
is a valid,*.
is not valid) -
Treat every bucket object as a source file should be toggled
ON
if you work with images, audio, text files or any other binary content stored in the bucket.This instructs Label Studio to create URI endpoints and store this as a labeling task payload, and resolve them into presigned
https
URLs when opening the labeling screen.If you store JSON tasks in the Label Studio format in your bucket - turn this toggle
OFF
. -
Check for rq worker failures. An easy way to check rq workers is complete an export operation.
From the Data manager, click Export, and create a new snapshot and download the JSON file. If you see an Error, most likely your rq workers are having problems. Another way to check rq workers is to login as a superuser and go to the
/django-rq
page. You should see aworkers
column. If the values are0
or the column is empty, this can indicate a failure.
If none of these steps work, submit a ticket and include the time when you launched the Sync job.