Skill evals and setup hardening
Every skill now ships with eval definitions for repeatable quality measurement. Setup flow hardened for edge cases in SSH detection and JSON escaping.
Added
- Eval definitions (evals.json) added to 15 skills with pass/fail expectations and baseline scorecardFeature
- Anthropic skill-creator eval plugin integrated for automated trial runs during skill developmentFeature
- Design system viewer upgraded to flat shared-border grid layout with richer icon, token, and component visualsFeature
Changed
- Improved copy-review skill based on findings from the initial eval runImprovement
- Aligned eval expectations with actual script behavior for mixed-format parsing and deduplicationImprovement
Fixed
- Hardened setup flow for SSH URL auto-detection and JSON escaping edge casesFix
- Fixed prepare-pr to correctly skip merged and closed PRs when detecting existing PRsFix