Use Case Examples

These end-to-end examples illustrate common ways researchers use Polyphony. Each example walks through the key steps from corpus upload to analysis.


1. Sentiment Classification with Human Annotators

Goal: Label a dataset of 1 000 product reviews as positive, neutral, or negative with two independent annotators, then compute agreement and create a gold standard.

Steps

  1. Upload corpus — Go to Corpus and upload a CSV with columns review_id, text, and product. The system creates a corpus with the uploaded columns.
  2. Define variable — Go to Variables and create a Single categorical variable named Sentiment with options positive, neutral, and negative.
  3. Build annotation form — Go to Annotation Builder, create a form, add a Document viewer block and an Input component block linked to the Sentiment variable.
  4. Invite annotators — Go to Annotator Management and invite two colleagues.
  5. Create workflow — Go to Human Workflow, create an Overlap workflow covering the whole corpus, assign both annotators, and activate it. Each annotator receives 1 000 tasks.
  6. Annotate — Annotators log in and work through their task queues.
  7. Analyse — Once both annotators are done, go to Analysis to view Cohen's Kappa and other IAA metrics. Create a gold standard column using majority vote.

2. LLM-Assisted Annotation with Human Review

Goal: Use an LLM to pre-annotate 10 000 news articles for topic, then have a researcher review and correct a sample.

Steps

  1. Upload corpus — Upload a CSV with article_id and body columns.
  2. Create partitions — Split articles into review-sample (500 articles) and llm-only (9 500 articles) using the partitions column in the upload file, or via the Corpus view.
  3. Define variable — Create a Single categorical variable Topic with options politics, sports, technology, entertainment, other.
  4. Create LLM pipeline — Go to LLM Pipeline, create a pipeline for the corpus, add a prompt for the Topic variable, and run it. The LLM annotates all 10 000 articles.
  5. Human review — Create a Human Workflow scoped to the review-sample partition and assign the researcher. After reviewing, compute agreement between the human corrections and LLM outputs in Analysis.
  6. Export — Export the final corpus with LLM outputs as XLSX via the Corpus export button.

3. Multi-Dimensional Annotation for NLP Training Data

Goal: Collect fine-grained annotations for 500 customer support messages across three dimensions: intent, urgency, and sentiment, and produce training data for a supervised model.

Steps

  1. Upload corpus — Upload a CSV with ticket_id and message columns.
  2. Define variables — Create three variables:
    • Single categorical Intent (billing, technical, shipping, other)
    • Single categorical Urgency (low, medium, high)
    • Likert Sentiment (1–5)
  3. Build annotation form — Create one form with a document viewer and three input components, one for each variable.
  4. Create three-annotator workflow — Use Overlap mode with three annotators so every message is annotated three times.
  5. Analyse — After annotation, go to Analysis to compute Fleiss' Kappa for each variable. Create gold standard columns for variables with sufficient agreement.
  6. Export training data — Export the corpus as XLSX; the gold standard columns appear alongside the original text and metadata.