Source: frameworks/kit-interview-scorecard-design/03b-golden-example-agent.md

Golden Example — Scorecard Design (Agent Process)

The Benchmark

Status: AWAITING FIRST DEPLOYMENT

The golden example for an AI-assisted scorecard workflow will be drawn from the first completed client deployment where AI tools were used in the design process and the output passed full QC. This file will be updated with the specific workflow, prompts, and output analysis once that deployment exists.

What the Golden Example Will Demonstrate

When the first qualifying deployment is complete, this file will document:

Workflow Benchmark

Which steps in the scorecard design process the AI assisted with
The specific prompts used and why they were structured that way
The quality of AI-generated output versus what required human revision
Where the AI accelerated the process and where it slowed it down
The full prompt chain from input to deliverable

Output Quality Benchmark

AI-generated behavior-based questions — quality, relevance, and what required editing
AI-structured scorecard template — format accuracy, section completeness
AI-generated debrief summary — what it captured, what it missed
AI-generated candidate write-up — how it handled strengths, risks, and areas for exploration

What to Study

How the practitioner reviewed and edited AI output before it became the deliverable
Which AI outputs were usable as-is versus which required significant revision
How the prompt was structured to produce role-specific output rather than generic competency language
How context was provided to the AI (position profile, must-haves, organization context) to improve output quality

Interim Agent Workflow Specifications

Until the golden example exists, use these specifications for AI-assisted scorecard production:

Where AI Can Assist

Question generation from focus areas. Given a focus area description and the role context, AI can generate candidate behavior-based questions. The practitioner reviews, edits, and selects. AI generates options; the practitioner chooses.

Scorecard template structuring. Given extraction notes and focus area assignments, AI can produce a formatted scorecard draft. The practitioner validates structure, content, and completeness.

Completed scorecard analysis. After interviewers submit scorecards, AI can aggregate scores, identify areas of agreement and disagreement, and produce a pre-debrief summary for the facilitator.

Debrief transcript summary. After the debrief is recorded and transcribed, AI can produce a structured summary capturing key discussion points, areas of consensus, areas of disagreement, and the decision outcome.

Candidate write-up from interview transcript. After an interview is recorded and transcribed, AI can produce a structured write-up using the practitioner's template format.

Where AI Cannot Replace the Practitioner

Focus area selection. Determining what to evaluate requires understanding the organization, the role's context, the hiring authority's priorities, and the political dynamics of the search. AI does not have this context and cannot reliably determine what matters.

Scoring scale design. The choice of scale and the calibration of levels requires practitioner judgment about the evaluation's purpose and the interview team's sophistication.

Focus area assignments. Mapping focus areas to interviewers requires knowledge of each interviewer's expertise, role in the organization, and relationship to the search. AI does not have this context.

Debrief facilitation. The debrief is a live, facilitated discussion that requires reading the room, challenging vague assessments, and navigating interpersonal dynamics. AI produces the summary; the practitioner facilitates the conversation.

Final quality review. Every AI-generated output must be reviewed by the practitioner before it becomes a deliverable. AI produces drafts. The practitioner produces deliverables.

Prompt Design Principles

When using AI for scorecard production:

Provide full role context. Include the position profile, must-haves, organization description, and any relevant organizational dynamics. The more context, the more specific the output.

Specify the output format. If you want behavior-based questions in STAR format, say so. If you want a scoring scale with 5 levels and behavioral anchors, specify the structure. AI performs better with format constraints than with open-ended requests.

Request options, not answers. "Generate 8 behavior-based questions for the strategic vision focus area" is better than "Write the questions for the scorecard." The practitioner selects and edits from options.

Separate generation from evaluation. Generate the content in one step. Evaluate and edit in a separate step. Do not try to generate and finalize simultaneously.

Include negative constraints. "Do not use generic competency language. Do not include hypothetical questions. Every question must ask about a specific past experience." Constraints improve output quality significantly.

What the Golden Example Does NOT Provide

Even after the golden example is established:

Prompts for your scorecard. The golden example's prompts were designed for a specific role, organization, and AI tool. Your prompts must be tailored to the current search context.

AI tool selection. The golden example documents which tool was used, but the tool choice depends on the practitioner's preference and the engagement's requirements.

Workflow sequence for your build. The golden example shows one path. Your workflow depends on which steps benefit from AI assistance for the current search and which don't.