Analysis
The Analysis view lets you compute inter-annotator agreement (IAA), inspect annotation distributions, compare human and LLM outputs, and create gold standard columns.
Inter-Annotator Agreement
IAA metrics quantify how consistently annotators label the same documents. Polyphony computes agreement per variable for each pair or group of annotators in a workflow.
Supported Metrics
| Metric | Variable types | Notes |
|---|---|---|
| Cohen's Kappa | Categorical, Likert | Pairwise; accounts for chance agreement |
| Fleiss' Kappa | Categorical | Three or more annotators simultaneously |
| Krippendorff's Alpha | All types | Handles missing data; generalises across scales |
| Gwet's AC1/AC2 | Categorical, Ordinal | More stable than Kappa when agreement is very high or low |
| Percentage agreement | All types | Simple but does not correct for chance |
| Pearson / Spearman correlation | Integer, Float, Likert | Measures rank or linear agreement |
Reading the IAA Panel
Select a workflow and a variable to see the agreement matrix. Each cell shows the pairwise metric between two annotators. A summary row shows the overall metric across all annotators.
Values near 1.0 indicate high agreement; values near 0 indicate chance-level agreement; negative values indicate systematic disagreement.
Annotation Distributions
For categorical variables, the analysis panel shows the frequency of each label per annotator. This helps identify label imbalance or annotator bias before aggregating.
Human vs. LLM Comparison
If the same variable has both human annotations and LLM outputs, the analysis view shows a comparison tab. Agreement metrics are computed between each human annotator and the LLM, and between all annotators combined and the LLM.
Use this to evaluate whether LLM annotations can substitute for human labels, or to identify documents where the model disagrees with human judgement.
Gold Standard Creation
A gold standard column aggregates multiple annotators' responses into a single reference label per document. Go to the Gold Standard tab, select a workflow and variable, and choose an aggregation strategy:
| Strategy | Description |
|---|---|
| Majority vote | The most common label wins; ties can be left blank or flagged |
| Mean | Average of numeric values (Integer, Float, Likert) |
| Median | Median of numeric values |
Confidence Threshold
For majority vote, you can require a minimum number of annotators to agree (e.g. at least 2 out of 3 must choose the same label). Documents that do not meet the threshold are left blank in the gold standard column.
Adjudication
Documents with ties or low confidence are listed in an adjudication panel. You can review them and manually select the correct label before creating the gold standard.
Output
The gold standard is stored as a new metadata column in the corpus under the name you choose. It appears in the corpus table, in exports, and can be used as a feature in downstream modelling.
Exporting Analysis Results
The full annotation matrix (annotator × document × variable) is available via the corpus
Export button as an XLSX file. Each annotator's values appear in a separate column
named {variable} ({annotator}).