ADR-001: Documentation Tracking and Staleness Detection¶
Status: Implemented Date: 2025-10-16 Requirement: CBIN-136 Decision Makers: Development Team
Context¶
CANARY tokens track implementation status through STATUS fields (STUB, IMPL, TESTED, BENCHED), but there was no mechanism to ensure that documentation stayed synchronized with code changes. Documentation drift is a common problem where:
- Documentation becomes outdated as code evolves
- No automated way to detect when docs need updating
- Manual verification is time-consuming and error-prone
- No visibility into documentation coverage across the project
We needed a system to automatically track documentation freshness and provide actionable metrics.
Decision¶
Implement a content-based documentation tracking system using SHA256 hashing with the following components:
1. Hash-Based Staleness Detection¶
Rationale: Content hashing provides deterministic, efficient staleness detection without timestamp dependencies.
Implementation: - SHA256 algorithm for cryptographic-grade collision resistance - CRLF→LF line ending normalization for cross-platform consistency - Hash abbreviated to 16 characters (64 bits) - sufficient for documentation tracking - Stored in DOC_HASH field in CANARY tokens
Performance: - Hash calculation: <0.01ms per KB - Scalable to large documentation sets - No network or database dependencies for hash calculation
2. Type-Prefixed Documentation Paths¶
Rationale: Support multiple documentation types per requirement with clear categorization.
Types Supported:
- user: - User-facing documentation (how-to guides, tutorials)
- api: - API reference documentation (function signatures, parameters)
- technical: - Technical design documentation (architecture, implementation)
- feature: - Feature specifications (requirements, acceptance criteria)
- architecture: - Architecture Decision Records (ADRs)
Format:
DOC=user:docs/user/auth.md,api:docs/api/auth-api.md
DOC_HASH=8f434346648f6b96,a1b2c3d4e5f6g7h8
Benefits: - Multiple docs per requirement - Clear documentation purpose - Type-specific reporting and metrics
3. Database Schema Extension¶
New Fields in Token Table:
doc_path VARCHAR, -- Comma-separated type-prefixed paths
doc_hash VARCHAR, -- Comma-separated abbreviated hashes
doc_type VARCHAR, -- Primary documentation type
doc_checked_at DATETIME, -- ISO 8601 timestamp of last verification
doc_status VARCHAR -- CURRENT, STALE, MISSING, UNHASHED
Migration Strategy: - Schema migration applied via golang-migrate/migrate - Backward compatible (new fields nullable) - Automatic migration on first run
4. CLI Command Interface¶
Commands:
- canary doc create - Create documentation from template
- canary doc update - Recalculate and update hashes
- canary doc status - Check staleness status
- canary doc report - Generate coverage and health metrics
Batch Operations:
- --all flag for bulk operations
- --stale-only flag for selective updates
- Efficient processing of multiple requirements
5. Scan Integration¶
Automatic Field Extraction:
- DOC, DOC_HASH, DOC_TYPE extracted during canary index
- No manual database updates required
- Fully automated workflow
Alternatives Considered¶
Alternative 1: Timestamp-Based Tracking¶
Approach: Compare file modification time with last-checked timestamp.
Rejected Because: - Timestamps unreliable (git clone resets mtime) - No content verification (false positives/negatives) - Timezone and clock drift issues - Cannot detect reverted changes
Alternative 2: Full Content Storage¶
Approach: Store entire documentation content in database for comparison.
Rejected Because: - Massive storage overhead - Complex diff logic required - Database bloat for large docs - Performance degradation
Alternative 3: Line Count Heuristic¶
Approach: Track number of lines as simple change detector.
Rejected Because: - Too coarse-grained (misses content changes) - False positives on whitespace/formatting changes - No verification of actual content changes - Unreliable for refactoring
Alternative 4: Git Integration¶
Approach: Use git commit history to track doc changes.
Rejected Because: - Requires git repository (limits portability) - Complex logic to map commits to docs - Doesn't work for uncommitted changes - Coupling to VCS system
Implementation Details¶
Hash Calculation Algorithm¶
func CalculateHash(filePath string) (string, error) {
content, err := os.ReadFile(filePath)
if err != nil {
return "", err
}
// Normalize line endings (CRLF → LF)
normalized := bytes.ReplaceAll(content, []byte("\r\n"), []byte("\n"))
// Calculate SHA256
hash := sha256.Sum256(normalized)
// Abbreviate to 16 characters
return fmt.Sprintf("%x", hash)[:16], nil
}
Key Properties: - Deterministic (same content = same hash) - Collision-resistant (SHA256 provides 128-bit security with 64-bit abbreviation) - Platform-independent (line ending normalization) - Fast (optimized for small-to-medium files)
Staleness Check Logic¶
func CheckStaleness(token *storage.Token) (string, error) {
// No documentation path → N/A
if token.DocPath == "" {
return "", nil
}
// No hash → UNHASHED
if token.DocHash == "" {
return "DOC_UNHASHED", nil
}
// File doesn't exist → MISSING
if _, err := os.Stat(token.DocPath); os.IsNotExist(err) {
return "DOC_MISSING", nil
}
// Recalculate current hash
currentHash, err := CalculateHash(token.DocPath)
if err != nil {
return "", err
}
// Compare hashes
if currentHash == token.DocHash {
return "DOC_CURRENT", nil
}
return "DOC_STALE", nil
}
Multiple Documentation Handling¶
func CheckMultipleDocumentation(token *storage.Token) (map[string]string, error) {
docPaths := strings.Split(token.DocPath, ",")
docHashes := strings.Split(token.DocHash, ",")
results := make(map[string]string)
for i, docPath := range docPaths {
// Strip type prefix (user:, api:, etc.)
docPath = strings.TrimSpace(docPath)
if strings.Contains(docPath, ":") {
parts := strings.SplitN(docPath, ":", 2)
if len(parts) == 2 {
docPath = parts[1]
}
}
// Check staleness for this doc
singleToken := &storage.Token{
DocPath: docPath,
DocHash: strings.TrimSpace(docHashes[i]),
}
status, err := CheckStaleness(singleToken)
if err != nil {
return nil, err
}
results[docPath] = status
}
return results, nil
}
Consequences¶
Positive¶
- Automated Detection: No manual checking required - staleness automatically detected
- Reliable: Content-based verification eliminates false positives/negatives
- Scalable: Hash calculation is fast enough for large codebases
- Portable: No dependency on git or other external tools
- Transparent: Clear status values (CURRENT, STALE, MISSING, UNHASHED)
- Metrics: Coverage and health reporting enables data-driven decisions
- Batch Operations: Efficient bulk updates for large projects
- Type Safety: Type prefixes enable category-specific workflows
Negative¶
- Hash Storage: Requires database field for each documentation path
- Migration Required: Existing projects need schema migration
- Manual Updates: Hashes must be updated after documentation edits
- False Positives: Whitespace-only changes trigger staleness (though rare due to normalization)
Neutral¶
- Workflow Change: Developers must run
canary doc updateafter editing docs - Learning Curve: New commands to learn (create, update, status, report)
- Template System: Documentation templates need to be maintained
Metrics¶
Performance Benchmarks¶
BenchmarkHashCalculation-8 50000 0.008 ms/op (1KB file)
BenchmarkHashCalculation-8 10000 0.080 ms/op (10KB file)
BenchmarkHashCalculation-8 1000 0.800 ms/op (100KB file)
Test Coverage¶
- 6 integration tests (100% passing)
- Core engine: TESTED + BENCHED
- CLI commands: TESTED
- Multi-doc handling: TESTED
- Batch operations: TESTED
Current Usage (as of 2025-10-16)¶
- Total tokens: 746
- Tokens with documentation: 9 (1.2%)
- Documentation coverage: 6/125 requirements (4.8%)
- Staleness: 8 missing, 4 unhashed
Future Enhancements¶
Potential Improvements¶
- AI-Assisted Documentation:
- Auto-generate documentation from code/specs
- Smart suggestions for documentation updates
-
Template personalization based on requirement type
-
Git Integration:
- Detect documentation changes in git commits
- Pre-commit hooks to verify doc currency
-
Automatic hash updates in git workflow
-
Enhanced Reporting:
- Historical trends (coverage over time)
- Documentation age metrics
-
Per-developer documentation stats
-
Template Enhancements:
- More template types (troubleshooting, migration guides)
- Template variables (auto-fill requirement details)
-
Template versioning and updates
-
Watch Mode:
- File system watcher for automatic hash updates
- Real-time staleness notifications
- Editor integration
References¶
- CBIN-136 Requirement Specification
- SHA256 Algorithm: FIPS 180-4
- golang-migrate Documentation: https://github.com/golang-migrate/migrate
- CANARY Constitutional Article VII: Documentation Currency
Revision History¶
- 2025-10-16: Initial ADR - System implemented and tested