Quality — Scorecard QC Checklists
Gate 1: Binary — all items must pass before building starts. No scoring. No partial credit. A single failure stops the build.
Gate 2: Weighted — 100 points total. Pass threshold: 90/100. Run after every build and after every revision.
Both gates must pass before any scorecard is deployed to an interview team.
Gate 1 — Pre-Build (Gap Protocol)
Run this before opening the build skill. Any failure stops the build. Produce a gap report and resolve before proceeding.
Routing Check — Run This First
Is an extraction interview available for this practitioner's scorecard methodology?
- Yes — interview transcript or detailed session notes exist → proceed through Gate 1 with the transcript as primary source
- No — only a prior scorecard template or no source material exists → stop here. Do not run Gate 1 yet. Go to
06-consultant-methodology.mdand conduct the extraction interview first.
If the advisor confirms that a template-only build is intentional (e.g., producing a working draft for practitioner review) — document that decision, label the output clearly as a working draft, and proceed with the understanding that significant gaps will exist and the scorecard is not deployment-ready.
Upstream Inputs
- [ ] Position profile or job description is available for this role
- [ ] Must-have and nice-to-have requirements have been confirmed
- [ ] Organization's mission, vision, and values are documented or captured
- [ ] Interview team composition is known (at minimum, roles; ideally, names)
Extraction Coverage
- [ ] Extraction interview transcript or session notes are available
- [ ] Practitioner's scoring methodology has been captured (scale, definitions, justification requirement)
- [ ] Focus area determination approach has been documented
- [ ] Behavior-based question approach has been confirmed (framework, bank vs. custom, number per area)
- [ ] Interviewer preparation methodology has been captured
Focus Area Design
- [ ] Focus areas have been identified for this role
- [ ] Each focus area has a description of what "good" looks like
- [ ] Focus areas trace back to must-have requirements or confirmed competency domains
- [ ] No orphaned focus areas (every focus area maps to a role requirement)
- [ ] No orphaned requirements (every critical must-have is covered by at least one focus area)
- [ ] Focus areas have been assigned to specific interviewers (or roles, if names not yet confirmed)
Scoring
- [ ] Scoring scale is defined with descriptions for each level
- [ ] Overall recommendation scale is defined (Strong Yes / Yes / No / Strong No or equivalent)
- [ ] Written justification is required for every score
- [ ] Fact-based justification requirement is explicit (not opinions, not impressions)
Gap Report Status
- [ ] All gaps identified are listed in the gap report
- [ ] Every gap is either RESOLVED or has a documented resolution path with advisor sign-off
- [ ] No gap is marked RESOLVED by inference or assumption
Gate 2 — Post-Build (100 points, 90+ to pass)
Run in this order after every build and every revision.
Design Integrity (35 points)
| # | Check | Points |
|---|---|---|
| 1 | Every focus area traces to a must-have requirement or confirmed competency domain | 6 |
| 2 | Every critical must-have from the position profile is covered by at least one focus area | 6 |
| 3 | No focus area is duplicated across interviewers unless intentionally overlapped | 4 |
| 4 | Each focus area description defines what "good" looks like in observable terms | 5 |
| 5 | Each focus area contains 3-5 behavior-based questions (not hypothetical, not leading) | 5 |
| 6 | Questions within each focus area are distinct — no overlapping evaluation targets | 4 |
| 7 | Scoring scale levels are defined and distinguishable from each other | 5 |
Design integrity failures are blocking. A scorecard that evaluates the wrong things, misses critical requirements, or uses undefined scoring is not deployable regardless of total score.
Legal Defensibility (20 points)
| # | Check | Points |
|---|---|---|
| 8 | Same questions will be asked of every candidate for the same focus area | 5 |
| 9 | Scoring criteria are consistent across candidates — no per-candidate adjustments | 4 |
| 10 | Written justification is required (not optional) for every score | 4 |
| 11 | Justification prompt specifies fact-based evidence, not impressions or feelings | 4 |
| 12 | No questions that could elicit protected-class information (age, family status, religion, disability, national origin) | 3 |
Defensibility failures are blocking. A scorecard that enables inconsistent evaluation or includes legally problematic questions must be fixed before deployment.
Content Accuracy (15 points)
| # | Check | Points |
|---|---|---|
| 13 | Role title matches position profile exactly | 2 |
| 14 | Organization name matches reference data exactly | 2 |
| 15 | Interviewer names (if included) match reference data exactly | 3 |
| 16 | Must-have requirements referenced in focus areas match position profile language | 4 |
| 17 | No content drawn from the golden example as a source | 2 |
| 18 | No content from a prior client's scorecard carried into this build | 2 |
Usability (15 points)
| # | Check | Points |
|---|---|---|
| 19 | Scorecard can be completed within the interview time allocation (not too many questions per focus area) | 3 |
| 20 | Instructions are clear — interviewer knows what to do with each section without external explanation | 3 |
| 21 | Justification fields have enough space for meaningful responses (not a single-line text box) | 2 |
| 22 | Focus area descriptions are specific enough that two interviewers would evaluate the same things | 3 |
| 23 | Recommendation section is clearly separated from section-level scoring | 2 |
| 24 | Submission instructions are included (where to send, deadline, who receives it) | 2 |
Presentation Section (10 points — skip if no presentation in process)
| # | Check | Points |
|---|---|---|
| 25 | Presentation evaluation criteria are defined (not just "how they presented") | 3 |
| 26 | Time management is an explicit criterion (did they stay within limits?) | 2 |
| 27 | Q&A handling is an explicit criterion | 2 |
| 28 | Criteria are consistent for every candidate presenting | 3 |
If no presentation in the process, redistribute these 10 points: add 3 to Design Integrity, 3 to Defensibility, 2 to Usability, 2 to Content Accuracy.
Debrief Readiness (5 points)
| # | Check | Points |
|---|---|---|
| 29 | Scorecard structure allows facilitator to aggregate scores across interviewers | 2 |
| 30 | Recommendation scale is consistent across all interviewers' scorecards | 2 |
| 31 | Submission deadline and no-cross-visibility rule are documented in the scorecard or its instructions | 1 |
Scoring Summary
| Category | Points |
|---|---|
| Design Integrity | 35 |
| Legal Defensibility | 20 |
| Content Accuracy | 15 |
| Usability | 15 |
| Presentation Section | 10 |
| Debrief Readiness | 5 |
| Total | 100 |
Pass threshold: 90/100
Blocking failures (must fix regardless of score):
- Any design integrity failure (focus areas don't trace to requirements, critical must-haves uncovered)
- Any defensibility failure (inconsistent evaluation criteria, legally problematic questions)
- Any invented content (focus areas from the golden example, content from another client's scorecard)
After any revision: Return to Gate 2 and run all checks again. A fix may introduce a new issue.
Common Failure Modes
| Failure | What It Looks Like | Root Cause | Fix |
|---|---|---|---|
| Generic focus areas | Scorecard evaluates "Leadership" and "Communication" without role-specific definition | Focus areas pulled from a generic competency model, not from the position profile and extraction | Rebuild focus areas from must-haves and extraction; generic competency names are starting points, not finished focus areas |
| Orphaned requirements | Position profile lists "donor relationship management" as a must-have; no focus area evaluates it | Focus area design didn't systematically trace back to must-haves | Cross-reference every must-have against focus area list; add coverage for any gap |
| Hypothetical questions | "What would you do if you inherited a team with morale issues?" | Questions not structured as behavior-based; interviewer gets hypothetical answers, not evidence | Rewrite: "Tell me about a time you inherited a team with morale challenges. What did you find, what did you do, and what changed?" |
| Undefined scoring scale | Scale says "1-5" but never defines what each number means | Scoring methodology not captured during extraction; scale imported from a template without calibration | Define each level in behavioral terms; confirm with practitioner |
| Duplicated focus areas | Two interviewers both evaluating "strategic thinking" with different questions | Focus area assignments not mapped during alignment; interviewers chose their own areas | Map all focus areas to interviewers before deployment; identify intentional overlaps vs. accidental duplication |
| Opinion-based justification | Interviewer writes "Seemed like a great culture fit" with no supporting evidence | Justification prompt doesn't explicitly require fact-based evidence | Add explicit prompt: "Reference specific candidate statements, behaviors, or demonstrated competencies" |
| Prior scorecard recycled | Scorecard uses focus areas from a different role at a different organization | Template-only build without extraction; prior scorecard treated as content source | Every scorecard is built from the current role's position profile and extraction, not from a prior engagement |
| Protected-class questions | "Do you have children?" appears in a work-life balance focus area | Question bank not reviewed for legal compliance | Remove immediately; review all questions for protected-class exposure; add legal review to QC |