Analysis

The Analysis view lets you compute inter-annotator agreement (IAA), inspect annotation distributions, compare human and LLM outputs, and create gold standard columns.

Inter-Annotator Agreement

IAA metrics quantify how consistently annotators label the same documents. Polyphony computes agreement per variable for each pair or group of annotators in a workflow.

Supported Metrics

Metric	Variable types	Notes
Cohen's Kappa	Categorical, Likert	Pairwise; accounts for chance agreement
Fleiss' Kappa	Categorical	Three or more annotators simultaneously
Krippendorff's Alpha	All types	Handles missing data; generalises across scales
Gwet's AC1/AC2	Categorical, Ordinal	More stable than Kappa when agreement is very high or low
Percentage agreement	All types	Simple but does not correct for chance
Pearson / Spearman correlation	Integer, Float, Likert	Measures rank or linear agreement

Reading the IAA Panel

Select a workflow and a variable to see the agreement matrix. Each cell shows the pairwise metric between two annotators. A summary row shows the overall metric across all annotators.

Values near 1.0 indicate high agreement; values near 0 indicate chance-level agreement; negative values indicate systematic disagreement.

Annotation Distributions

For categorical variables, the analysis panel shows the frequency of each label per annotator. This helps identify label imbalance or annotator bias before aggregating.

Human vs. LLM Comparison

If the same variable has both human annotations and LLM outputs, the analysis view shows a comparison tab. Agreement metrics are computed between each human annotator and the LLM, and between all annotators combined and the LLM.

Use this to evaluate whether LLM annotations can substitute for human labels, or to identify documents where the model disagrees with human judgement.

Gold Standard Creation

A gold standard column aggregates multiple annotators' responses into a single reference label per document. Go to the Gold Standard tab, select a workflow and variable, and choose an aggregation strategy:

Strategy	Description
Majority vote	The most common label wins; ties can be left blank or flagged
Mean	Average of numeric values (Integer, Float, Likert)
Median	Median of numeric values

Confidence Threshold

For majority vote, you can require a minimum number of annotators to agree (e.g. at least 2 out of 3 must choose the same label). Documents that do not meet the threshold are left blank in the gold standard column.

Adjudication

Documents with ties or low confidence are listed in an adjudication panel. You can review them and manually select the correct label before creating the gold standard.

Output

The gold standard is stored as a new metadata column in the corpus under the name you choose. It appears in the corpus table, in exports, and can be used as a feature in downstream modelling.

Exporting Analysis Results

The full annotation matrix (annotator × document × variable) is available via the corpus Export button as an XLSX file. Each annotator's values appear in a separate column named {variable} ({annotator}).