Back to blog

Research · April 14, 2026

Data labelling vs data annotation in the UK

In UK RFPs you will often see data labelling and data annotation used as if they mean the same thing. For day-to-day work they usually do—but for contracts, compliance, and model quality they are not interchangeable labels. The gap shows up in delivery: vague scopes produce vague boxes on video, inconsistent taxonomies in NLP, and audit trails that do not survive a GDPR review.

This note is for ML leads and procurement owners who need one shared vocabulary before they sign a statement of work.

What buyers usually mean

Data labelling tends to describe the output: a class name, a tag, a yes/no flag. It fits image classification, simple moderation, and cataloguing tasks where the unit of work is small and the rules are stable. Data annotation tends to describe the process and the structure around the label: guidelines, tooling, consensus, adjudication, and export format. It fits detection, segmentation, temporal spans in video, NER, and any task where context and edge cases matter. In practice, a “labelling” SOW that asks for instance masks or track IDs across frames is an annotation programme with a misleading title. The reverse also happens: “annotation” contracts that only require single-tag classification are over-scoped on paper and under-scoped on QA.

Why the wording matters in the UK

Three local factors make precision worth the extra page in the contract. Regulatory framing. Annotation often touches personal data—faces, voices, number plates, clinical imagery. UK GDPR expects you to document purpose, retention, subprocessors, and whether work happens in the UK or abroad. Calling everything “labelling” can hide PII-heavy work from legal review until delivery has already started. Vendor market. The UK market mixes boutique specialists, global platforms with UK entities, and offshore teams reached through intermediaries. Interchangeable terms make it harder to compare like-for-like on IAA targets, turnaround, and where pixels actually get viewed. Downstream exports. Training pipelines expect specific schemas—COCO-style JSON, temporal action segments, conversation turns with speaker IDs. If the SOW says “labels” but engineering needs polygons and attributes, you pay twice: once for the wrong deliverable, once…

How to specify work so vendors align

Use the contract to pin down mechanics, not marketing language. 1. Task type — Name the annotation modality (classification, bounding box, polygon, keypoints, transcription, preference ranking, etc.) and forbid substitutes without written change control. 2. Guidelines — Attach a versioned instruction pack with positive and negative examples, occlusion rules, and class definitions. Reference the version in the SOW. 3. Quality — State inter-annotator agreement or audit sampling (for example, double annotation on 10% of items, adjudication on disagreement above a threshold). 4. Throughput — Quote throughput per *defined* unit (frame, clip, utterance, document), not per vague “asset,” and tie SLAs to guideline complexity. 5. Delivery — List export format, manifest fields, and whether you need provenance (annotator tier, timestamp, tool version). A one-page table in the SOW beats a glossary…

A simple procurement checklist

Question · Labelling-heavy programme · Annotation-heavy programme · Typical output · Tags, classes, ranks · Geometry, spans, relations, tracks

Bottom line

>-