Skip to content

ADR-001: Documentation Tracking and Staleness Detection

Status: Implemented Date: 2025-10-16 Requirement: CBIN-136 Decision Makers: Development Team

Context

CANARY tokens track implementation status through STATUS fields (STUB, IMPL, TESTED, BENCHED), but there was no mechanism to ensure that documentation stayed synchronized with code changes. Documentation drift is a common problem where:

  • Documentation becomes outdated as code evolves
  • No automated way to detect when docs need updating
  • Manual verification is time-consuming and error-prone
  • No visibility into documentation coverage across the project

We needed a system to automatically track documentation freshness and provide actionable metrics.

Decision

Implement a content-based documentation tracking system using SHA256 hashing with the following components:

1. Hash-Based Staleness Detection

Rationale: Content hashing provides deterministic, efficient staleness detection without timestamp dependencies.

Implementation: - SHA256 algorithm for cryptographic-grade collision resistance - CRLF→LF line ending normalization for cross-platform consistency - Hash abbreviated to 16 characters (64 bits) - sufficient for documentation tracking - Stored in DOC_HASH field in CANARY tokens

Performance: - Hash calculation: <0.01ms per KB - Scalable to large documentation sets - No network or database dependencies for hash calculation

2. Type-Prefixed Documentation Paths

Rationale: Support multiple documentation types per requirement with clear categorization.

Types Supported: - user: - User-facing documentation (how-to guides, tutorials) - api: - API reference documentation (function signatures, parameters) - technical: - Technical design documentation (architecture, implementation) - feature: - Feature specifications (requirements, acceptance criteria) - architecture: - Architecture Decision Records (ADRs)

Format:

DOC=user:docs/user/auth.md,api:docs/api/auth-api.md
DOC_HASH=8f434346648f6b96,a1b2c3d4e5f6g7h8

Benefits: - Multiple docs per requirement - Clear documentation purpose - Type-specific reporting and metrics

3. Database Schema Extension

New Fields in Token Table:

doc_path VARCHAR,        -- Comma-separated type-prefixed paths
doc_hash VARCHAR,        -- Comma-separated abbreviated hashes
doc_type VARCHAR,        -- Primary documentation type
doc_checked_at DATETIME, -- ISO 8601 timestamp of last verification
doc_status VARCHAR       -- CURRENT, STALE, MISSING, UNHASHED

Migration Strategy: - Schema migration applied via golang-migrate/migrate - Backward compatible (new fields nullable) - Automatic migration on first run

4. CLI Command Interface

Commands: - canary doc create - Create documentation from template - canary doc update - Recalculate and update hashes - canary doc status - Check staleness status - canary doc report - Generate coverage and health metrics

Batch Operations: - --all flag for bulk operations - --stale-only flag for selective updates - Efficient processing of multiple requirements

5. Scan Integration

Automatic Field Extraction: - DOC, DOC_HASH, DOC_TYPE extracted during canary index - No manual database updates required - Fully automated workflow

Alternatives Considered

Alternative 1: Timestamp-Based Tracking

Approach: Compare file modification time with last-checked timestamp.

Rejected Because: - Timestamps unreliable (git clone resets mtime) - No content verification (false positives/negatives) - Timezone and clock drift issues - Cannot detect reverted changes

Alternative 2: Full Content Storage

Approach: Store entire documentation content in database for comparison.

Rejected Because: - Massive storage overhead - Complex diff logic required - Database bloat for large docs - Performance degradation

Alternative 3: Line Count Heuristic

Approach: Track number of lines as simple change detector.

Rejected Because: - Too coarse-grained (misses content changes) - False positives on whitespace/formatting changes - No verification of actual content changes - Unreliable for refactoring

Alternative 4: Git Integration

Approach: Use git commit history to track doc changes.

Rejected Because: - Requires git repository (limits portability) - Complex logic to map commits to docs - Doesn't work for uncommitted changes - Coupling to VCS system

Implementation Details

Hash Calculation Algorithm

func CalculateHash(filePath string) (string, error) {
    content, err := os.ReadFile(filePath)
    if err != nil {
        return "", err
    }

    // Normalize line endings (CRLF → LF)
    normalized := bytes.ReplaceAll(content, []byte("\r\n"), []byte("\n"))

    // Calculate SHA256
    hash := sha256.Sum256(normalized)

    // Abbreviate to 16 characters
    return fmt.Sprintf("%x", hash)[:16], nil
}

Key Properties: - Deterministic (same content = same hash) - Collision-resistant (SHA256 provides 128-bit security with 64-bit abbreviation) - Platform-independent (line ending normalization) - Fast (optimized for small-to-medium files)

Staleness Check Logic

func CheckStaleness(token *storage.Token) (string, error) {
    // No documentation path → N/A
    if token.DocPath == "" {
        return "", nil
    }

    // No hash → UNHASHED
    if token.DocHash == "" {
        return "DOC_UNHASHED", nil
    }

    // File doesn't exist → MISSING
    if _, err := os.Stat(token.DocPath); os.IsNotExist(err) {
        return "DOC_MISSING", nil
    }

    // Recalculate current hash
    currentHash, err := CalculateHash(token.DocPath)
    if err != nil {
        return "", err
    }

    // Compare hashes
    if currentHash == token.DocHash {
        return "DOC_CURRENT", nil
    }

    return "DOC_STALE", nil
}

Multiple Documentation Handling

func CheckMultipleDocumentation(token *storage.Token) (map[string]string, error) {
    docPaths := strings.Split(token.DocPath, ",")
    docHashes := strings.Split(token.DocHash, ",")

    results := make(map[string]string)

    for i, docPath := range docPaths {
        // Strip type prefix (user:, api:, etc.)
        docPath = strings.TrimSpace(docPath)
        if strings.Contains(docPath, ":") {
            parts := strings.SplitN(docPath, ":", 2)
            if len(parts) == 2 {
                docPath = parts[1]
            }
        }

        // Check staleness for this doc
        singleToken := &storage.Token{
            DocPath: docPath,
            DocHash: strings.TrimSpace(docHashes[i]),
        }

        status, err := CheckStaleness(singleToken)
        if err != nil {
            return nil, err
        }

        results[docPath] = status
    }

    return results, nil
}

Consequences

Positive

  1. Automated Detection: No manual checking required - staleness automatically detected
  2. Reliable: Content-based verification eliminates false positives/negatives
  3. Scalable: Hash calculation is fast enough for large codebases
  4. Portable: No dependency on git or other external tools
  5. Transparent: Clear status values (CURRENT, STALE, MISSING, UNHASHED)
  6. Metrics: Coverage and health reporting enables data-driven decisions
  7. Batch Operations: Efficient bulk updates for large projects
  8. Type Safety: Type prefixes enable category-specific workflows

Negative

  1. Hash Storage: Requires database field for each documentation path
  2. Migration Required: Existing projects need schema migration
  3. Manual Updates: Hashes must be updated after documentation edits
  4. False Positives: Whitespace-only changes trigger staleness (though rare due to normalization)

Neutral

  1. Workflow Change: Developers must run canary doc update after editing docs
  2. Learning Curve: New commands to learn (create, update, status, report)
  3. Template System: Documentation templates need to be maintained

Metrics

Performance Benchmarks

BenchmarkHashCalculation-8    50000    0.008 ms/op    (1KB file)
BenchmarkHashCalculation-8    10000    0.080 ms/op    (10KB file)
BenchmarkHashCalculation-8     1000    0.800 ms/op    (100KB file)

Test Coverage

  • 6 integration tests (100% passing)
  • Core engine: TESTED + BENCHED
  • CLI commands: TESTED
  • Multi-doc handling: TESTED
  • Batch operations: TESTED

Current Usage (as of 2025-10-16)

  • Total tokens: 746
  • Tokens with documentation: 9 (1.2%)
  • Documentation coverage: 6/125 requirements (4.8%)
  • Staleness: 8 missing, 4 unhashed

Future Enhancements

Potential Improvements

  1. AI-Assisted Documentation:
  2. Auto-generate documentation from code/specs
  3. Smart suggestions for documentation updates
  4. Template personalization based on requirement type

  5. Git Integration:

  6. Detect documentation changes in git commits
  7. Pre-commit hooks to verify doc currency
  8. Automatic hash updates in git workflow

  9. Enhanced Reporting:

  10. Historical trends (coverage over time)
  11. Documentation age metrics
  12. Per-developer documentation stats

  13. Template Enhancements:

  14. More template types (troubleshooting, migration guides)
  15. Template variables (auto-fill requirement details)
  16. Template versioning and updates

  17. Watch Mode:

  18. File system watcher for automatic hash updates
  19. Real-time staleness notifications
  20. Editor integration

References

  • CBIN-136 Requirement Specification
  • SHA256 Algorithm: FIPS 180-4
  • golang-migrate Documentation: https://github.com/golang-migrate/migrate
  • CANARY Constitutional Article VII: Documentation Currency

Revision History

  • 2025-10-16: Initial ADR - System implemented and tested