If you are deploying Label Studio Enterprise or Label Studio Starter on-prem and are upgrading from a pre-2.34 release, you will need to run a script to backfill changes to annotator agreement.
What is agreement and how has it changed
When you have multiple annotators working on a task, agreement shows how much overlap there is between their submissions.
Label Studio 2.34 introduces a number of changes and enhancement, including:
- Consensus methodology for scoring
- The ability to configure agreement metrics for each control tag
- The ability to view per-control-tag agreement in the Data Manager
See the Label Studio 2.34 release notes for complete overview of the changes.
What this script does
With Label Studio 2.34, agreement is backed by a new data model.
Because of this new model, existing annotation data must be reprocessed. This script will migrate all your old data to this new model.
Who needs to run this script
This script only needs to be run once, and is only needed for organizations that are:
- On-prem: Running Label Studio Enterprise or Label Studio Starter
- Upgrading: Upgrading from a pre-2.34 release. New organizations do not need to migrate.
What happens if you do not run this script
If you do not migrate, nothing in your deployment will break.
However, tasks in existing projects will show empty or zero per-control-tag agreement scores for all historical tasks and annotators.
Effects while running the script
- Zero Downtime: You do not need to pause labeling operations. The system remains fully accessible to all users.
- Background Resource Usage: The migration is designed to trigger asynchronous, parallel background jobs on a per-project basis. During the execution period, you may observe an increase in background processing load.
Agreement scores will start appearing project by project. You do not need to wait for the entire organization to finish before results are visible.
If a project job fails, you can retry without reprocessing completed projects. Project that have already been backfilled will be automatically skipped on subsequent runs.
How to migrate
You can trigger and monitor the migration programmatically without needing to use the Label Studio UI.
The migration operates on your entire organization simultaneously. See Trigger agreement backfill for organization in our API reference
You must have Administrator or Owner permissions. You can approach this migration in several ways:
Option 1: Batched rollout (Recommended)
Use this to process a controlled number of projects per call. This is the safest approach for large organizations — it keeps the background job queue from filling up and avoids blocking other async work (storage syncs, ML backend calls, etc.).
Using the Python SDK
import time
from label_studio_sdk import LabelStudio
ls = LabelStudio(
base_url="<https://your-label-studio-instance.com>",
api_key="your-api-key",
)
BATCH_SIZE = 10 # adjust based on your queue capacity and urgency
result = ls.dimensions.trigger_backfill(num_projects=BATCH_SIZE)
print(
f"Queued:{result.jobs_queued}, "
f"Skipped (already done):{result.projects_skipped}, "
f"Remaining:{result.projects_remaining}"
)
if result.projects_remaining == 0:
print("All projects queued or complete.")
Using requests
import time
import requests
BASE_URL = "<https://your-label-studio-instance.com>"
HEADERS = {"Authorization": "Token your-api-key", "Content-Type": "application/json"}
BATCH_SIZE = 10 # adjust based on your queue capacity and urgency
response = requests.post(
f"{BASE_URL}/api/dimensions/backfill/",
headers=HEADERS,
json={"num_projects": BATCH_SIZE},
)
data = response.json()
print(
f"Queued:{data['jobs_queued']}, "
f"Skipped (already done):{data['projects_skipped']}, "
f"Remaining:{data['projects_remaining']}"
)
if data["projects_remaining"] == 0:
print("All projects queued or complete.")
Option 2: Single project
Use this when you want to backfill one specific project, for example to validate the migration before rolling it out broadly.
Using the Python SDK
from label_studio_sdk import LabelStudio
ls = LabelStudio(
base_url="<https://your-label-studio-instance.com>",
api_key="your-api-key",
)
result = ls.dimensions.trigger_backfill(project_id=42)
print(result)
# jobs_queued=1, projects_skipped=0, projects_remaining=0, ...
Using requests
import requests
BASE_URL = "<https://your-label-studio-instance.com>"
HEADERS = {"Authorization": "Token your-api-key", "Content-Type": "application/json"}
response = requests.post(
f"{BASE_URL}/api/dimensions/backfill/",
headers=HEADERS,
json={"project_id": 42},
)
print(response.json())
# {"jobs_queued": 1, "projects_skipped": 0, "projects_remaining": 0, ...}
Option 3: All projects at once
Warning: This cancels all in-flight backfill jobs and immediately enqueues every unprocessed project in your organization. On large instances this can flood the background job queue and delay other async operations (storage syncs, webhooks, ML backend predictions) for an extended period. Use batched mode unless you are intentionally prioritizing the backfill above all other background work.
Using the SDK
from label_studio_sdk import LabelStudio
ls = LabelStudio(
base_url="<https://your-label-studio-instance.com>",
api_key="your-api-key",
)
result = ls.dimensions.trigger_backfill(all_projects=True)
print(result)
# jobs_queued=150, projects_skipped=12, projects_remaining=0, ...
Using requests
import requests
BASE_URL = "<https://your-label-studio-instance.com>"
HEADERS = {"Authorization": "Token your-api-key", "Content-Type": "application/json"}
response = requests.post(
f"{BASE_URL}/api/dimensions/backfill/",
headers=HEADERS,
json={"all_projects": True},
)
print(response.json())
# {"jobs_queued": 150, "projects_skipped": 12, "projects_remaining": 0, ...}
Migration time
Total duration will scale according to the number of tasks, number of annotations per task, and number of control tags in your projects.
Approximate time per project:
| Organization Size | Approximate Duration |
|---|---|
| Small (<10K entities) | < 1 minute |
| Medium (10K-100K entities) | 1-5 minutes |
| Large (100K-1M entities) | 5-10 minutes |
| Very Large (1M+ entities) | 10-60 minutes |
Monitor progress
Check overall organization status
Using the Python SDK
status = ls.dimensions.get_backfill_status()
print(status)
# org_status={"completed": 42, "pending": 8, "failed": 1, ...}
Using requests
response = requests.get(f"{BASE_URL}/api/dimensions/backfill/status/", headers=HEADERS)
print(response.json())
# {"org_status": {"completed": 42, "pending": 8, "failed": 1, ...}}
Check a specific project
Using the Python SDK
status = ls.dimensions.get_backfill_status(project_id=42) print(status) # job_id=17, status="COMPLETED", ...
Using requests
response = requests.get(
f"{BASE_URL}/api/dimensions/backfill/status/",
headers=HEADERS,
params={"project_id": 42},
)
print(response.json())
# {"job_id": 17, "status": "COMPLETED", ...}
List all jobs with filtering
Using the Python SDK
jobs = ls.dimensions.list_backfills(status="FAILED")
# status options: PENDING, QUEUED, RUNNING, COMPLETED, FAILED
for job in jobs.results:
print(job)
Using requests
response = requests.get(
f"{BASE_URL}/api/dimensions/backfill/jobs/",
headers=HEADERS,
params={"status": "FAILED"}, # PENDING, QUEUED, RUNNING, COMPLETED, FAILED
)
print(response.json())
# {"count": 2, "results": [...]}
Cancel the migration
Cancel all jobs for the organization
Using the Python SDK
result = ls.dimensions.cancel_backfill() print(result) # cancelled_count=5, ...
Using requests
response = requests.delete(f"{BASE_URL}/api/dimensions/backfill/", headers=HEADERS)
print(response.json())
# {"cancelled_count": 5, "message": "Successfully cancelled 5 Agreement V2 backfill job(s)"}
Cancel jobs for a specific project
Using the SDK
result = ls.dimensions.cancel_backfill(project_id=42) print(result)
Using requests
response = requests.delete(
f"{BASE_URL}/api/dimensions/backfill/",
headers=HEADERS,
params={"project_id": 42},
)
print(response.json())
Comments
0 comments
Please sign in to leave a comment.