LLM Pipeline

The LLM Pipeline view lets you configure prompts for model-based annotation and run them across your corpus at scale.

Overview

An LLM pipeline consists of one or more prompts. Each prompt:

  • Is tied to one variable (the dimension being annotated)
  • Uses a prompt template with a {{document}} placeholder that is replaced by the document text at inference time
  • Is run against a specific model using your configured API key

When a pipeline runs, the model's response for each document is stored as an LLM output — the machine equivalent of a human annotation task value.

Creating a Pipeline

  1. Go to LLM Pipeline in the project sidebar.
  2. Click New pipeline and choose a name and scope (corpus or partition).
  3. Inside the pipeline, click Add prompt.
  4. Select the variable the prompt should annotate.
  5. Write a prompt template. Use {{document}} to insert the document text, and describe the expected output format clearly.
  6. Select the model to use (e.g. claude-3-5-sonnet, gpt-4o).
  7. Add your LLM API key in Settings if not already configured.

Prompt Templates

A good prompt template specifies:

  • What the model should do (e.g. Classify the sentiment of the following review)
  • The output format (e.g. Reply with exactly one word: positive, neutral, or negative)
  • The document placeholder: {{document}}

Example:

Classify the sentiment of the following customer review.
Reply with exactly one word: positive, neutral, or negative.

Review: {{document}}

Trying Out a Prompt

Before running a full batch, use the Tryout button to send a single document through the prompt and inspect the raw model response. This lets you refine the template without consuming quota for the entire corpus.

Running a Pipeline

Click Run on a pipeline to process all documents in its scope. Two batch modes are available:

ModeDescription
SequentialDocuments sent one by one; slower but predictable
Native batchDocuments sent in parallel using the provider's batch API; faster for large datasets

Progress is shown in real time. Running can be interrupted and resumed.

Viewing Results

LLM outputs appear in the corpus table as additional columns. They are also included in the Analysis view for comparison with human annotations.

Export the corpus to XLSX to get all LLM output columns alongside the document data.

Pipeline Status

A pipeline is considered complete when every document in its scope has a stored output for every prompt. Adding rows to the corpus reverts the pipeline status to in progress and triggers a warning before the upload is confirmed.

API Credentials

LLM API keys are managed per user in Settings. Polyphony supports Anthropic, OpenAI, and compatible providers. Keys are encrypted at rest.