Process Agent — AI-Assisted Scorecard Workflow
What This File Covers
This file defines where and how AI tools can be used in the scorecard design and evaluation process. It is a supplement to the consultant process — not a replacement. Every AI-generated output requires practitioner review before it becomes a deliverable.
The agent process has two phases:
- Design phase — AI assists in building the scorecard (generating questions, structuring templates, developing focus area descriptions)
- Evaluation phase — AI assists in processing completed scorecards (aggregating scores, summarizing debriefs, producing candidate write-ups)
Design Phase — Building the Scorecard
Generating Behavior-Based Questions
When to use AI: After focus areas have been defined and described by the practitioner. AI generates candidate questions for the practitioner to review and select from.
What to provide:
- The focus area name and description
- The role title and level
- The organization context (industry, size, mission)
- The must-have requirements this focus area evaluates
- The question framework the practitioner uses (STAR, behavioral, situational)
- Any negative constraints ("no hypothetical questions," "no questions that could elicit protected-class information")
Expected output: 6-10 candidate questions per focus area. The practitioner selects 3-5 and may edit them.
Quality check: Every question must be reviewed by the practitioner for:
- Relevance to the focus area
- Appropriate difficulty for the role level
- Behavior-based structure (asks about past experience, not hypothetical intent)
- No protected-class exposure
- Alignment with the practitioner's voice and interview style
Common AI failure modes:
- Generating questions that sound sophisticated but evaluate the wrong competency
- Defaulting to hypothetical framing ("What would you do if...") when behavior-based was specified
- Producing questions that are too generic — applicable to any role, not specific to this one
- Including questions that inadvertently surface protected-class information (family status, age, disability, religion)
Structuring the Scorecard Template
When to use AI: After all design decisions have been made (focus areas, assignments, scoring scale, questions). AI assembles the document.
What to provide:
- Complete focus area list with descriptions, assignments, and questions
- Scoring scale with level definitions
- Recommendation framework
- Submission instructions
- Header information (role, organization, version)
- The target format (Word, Google Doc, PDF, etc.)
Expected output: A formatted scorecard document ready for practitioner review.
Quality check: Verify all content is present, correctly assigned, and properly formatted. Run Gate 2 QC from 04-quality.md.
Developing Focus Area Descriptions
When to use AI: After focus areas have been named and the practitioner has confirmed what each one covers. AI drafts the "what good looks like" and "what risk looks like" descriptions.
What to provide:
- The focus area name
- The practitioner's verbal description from the extraction
- The role context and must-have requirements
- Examples of the depth and tone desired (from the golden example, once available)
Expected output: Draft descriptions for practitioner review.
Quality check: Descriptions must be specific to the role, not generic competency language. "Demonstrates strategic thinking" is not a description. "Articulates a coherent organizational strategy informed by market dynamics, financial constraints, and stakeholder priorities" is a description.
Evaluation Phase — Processing Completed Scorecards
Pre-Debrief Score Aggregation
When to use AI: After all interviewers have submitted scorecards and before the debrief. AI aggregates the scores and produces a facilitator summary.
What to provide:
- All completed scorecards for a single candidate
- The scoring scale definitions
- The recommendation framework
- Instructions on attribution (typically: aggregate without attributing scores to specific interviewers)
Expected output: A summary showing:
- Overall recommendation breakdown (e.g., 3 Strong Yes, 2 Yes, 1 No, 2 Strong No)
- Average scores per focus area across all interviewers
- Areas of strong agreement (most interviewers scored similarly)
- Areas of disagreement (significant score variance)
- Key themes from justification text (strengths cited by multiple interviewers, risks cited by multiple interviewers)
Quality check: The facilitator reviews the summary for accuracy. Verify that the aggregation doesn't flatten important nuance — two interviewers giving a 3 for different reasons is not the same as agreement.
Critical rule: The summary goes to the facilitator only. It is not shared with the interview team before the debrief. It is a facilitation tool, not a pre-debrief anchor.
Debrief Transcript Summary
When to use AI: After the debrief is recorded and transcribed. AI produces a structured summary for distribution to decision makers who weren't in the debrief, or as a record of the discussion.
What to provide:
- The full debrief transcript
- The candidate's name and role
- The scorecard summary (from the aggregation step)
- The output format (what the summary needs to include)
Expected output: A structured summary containing:
- Candidate name and role
- Overall recommendation outcome
- Breakdown of recommendations (without attribution, unless the practitioner's methodology includes attribution)
- Key strengths identified during discussion (with supporting evidence cited by interviewers)
- Key concerns identified during discussion (with supporting evidence)
- Areas recommended for further exploration in subsequent interviews
- Decision: advance, hold, or release
Quality check: The practitioner reviews the summary for:
- Accuracy — does it capture what was actually discussed, not what the AI inferred?
- Completeness — are key discussion points represented?
- Tone — is it neutral and factual, not editorialized?
- Attribution — does it follow the practitioner's rules on when names are attached to assessments?
Candidate Write-Up from Interview Transcript
When to use AI: After an interview is recorded and transcribed. AI produces a structured write-up for the client using the practitioner's template format.
What to provide:
- The full interview transcript
- The practitioner's write-up template (sections, tone, what to include)
- The position profile (so the write-up can connect candidate experience to role requirements)
- The practitioner's rubric for the interview
- Instructions on tone (particularly: the write-up is not selling the candidate)
Expected output: A structured write-up containing:
- Candidate summary (professional background, relevant experience)
- Strengths identified during the interview (with specific evidence)
- Areas for deeper exploration (not weaknesses — areas where more information is needed)
- Practitioner's recommendation (with rationale)
Quality check: The practitioner reviews for:
- Accuracy of facts (did the candidate actually say what the write-up claims?)
- Tone (is it neutral and evidence-based, not advocacy?)
- Completeness (are key discussion points from the interview represented?)
- Alignment with the practitioner's actual assessment (does the write-up match what the practitioner observed?)
Critical distinction: The write-up presents the candidate fairly. It does not sell them. The practitioner works on behalf of the client, not the candidate. This is a fundamental difference from executive search firms where the recruiter's compensation increases with the candidate's salary. The write-up should help the client make an informed decision — not persuade them to advance a candidate.
AI Tool Selection
This kit does not prescribe a specific AI tool. The practitioner uses whatever tool they are comfortable with and that produces acceptable quality output. Common options include Claude, ChatGPT, and Gemini.
Tool selection considerations:
- Context window size (longer transcripts require larger context windows)
- Output quality for structured documents
- Ability to follow format constraints
- Privacy and confidentiality (candidate information is sensitive — ensure the tool's data handling policies are acceptable for the engagement)
What AI Does Not Do
Make evaluation decisions. AI can aggregate scores but cannot recommend whether a candidate should advance.
Replace the practitioner's judgment on focus areas. AI can generate focus area options, but determining what matters for a specific role in a specific organization requires human judgment informed by organizational context.
Facilitate the debrief. The debrief is a live conversation requiring interpersonal skill, the ability to read the room, and the judgment to challenge vague assessments. AI produces the inputs; the practitioner runs the discussion.
Guarantee defensibility. AI can help produce consistent documentation, but defensibility depends on the process being followed consistently by humans. A well-documented scorecard that wasn't used consistently is not defensible.
Serve as the final quality gate. Every AI output is a draft. The practitioner converts drafts into deliverables through review, editing, and professional judgment. Gate 2 QC is always performed by a human.