Skip to content

Centralized Regex Pattern System - COMPLETE

Single Source of Truth Architecture

The CustomMarkdownProcessor serves as the perfect centralization point for ALL regex configurations across the platform, creating a comprehensive pattern library system.

Architecture Overview

RegexPatternLibrary.ts - Central Pattern Repository

typescript
// Single location for ALL regex patterns across the platform
export class RegexPatternLibrary {
  static readonly MARKDOWN_PATTERNS = { ... };     // Core markdown
  static readonly BUSINESS_PATTERNS = { ... };     // BI markup  
  static readonly VISUAL_PATTERNS = { ... };       // Charts & visuals
  static readonly CLEANING_PATTERNS = { ... };     // Content cleaning
  static readonly ARTIFACT_PATTERNS = { ... };     // AI content fixes
  static readonly VALIDATION_PATTERNS = { ... };   // Input validation
}

Pattern Categories & Examples

1. Business Intelligence Patterns

javascript
// Financial metrics with trends
[metric:revenue:$2.5M:15%↑]    // Revenue with upward trend
[kpi:satisfaction:4.8/5:stable] // KPI with status

// Recommendations with priorities
[recommendation:high:Implement new strategy]
[action:urgent:Review pricing by month end]

// Risk and confidence indicators
[risk:medium:Market volatility may impact Q4]
[confidence:high]

// Timeline and scheduling
[timeline:Q1:Launch|Q2:Expansion|Q3:Review]
[deadline:End of Q4]

2. Visual & Chart Patterns

javascript
// Chart references that preserve embedding
[chart:revenue_trend:Revenue Growth Over Time]
[visual:org_chart:Team Structure Diagram]
[infographic:process_flow:Sales Process]
[presentation:quarterly_review:Q4 Review Deck]

3. Content Cleaning Patterns

javascript
// Removes problematic artifacts from AI-generated content
colorCodeInjection: /[a-fA-F0-9]{6};">/g     // Fixes: 1f4788;">
malformedStyles: /style="[^"]*[a-fA-F0-9]{6}[^"]*"/g
brokenAttributes: /\w+;">/g                   // Fixes: color;">
trailingAsterisks: /\*\*\s*$/gm              // Fixes: Bold text **

4. AI Artifact Patterns (Fixes Common AI Issues)

javascript
// Fixes malformed headers
malformedHeaders: /^(#{1,6})\s*([^#\n]+)\s*\*\*\s*$/gm
// Before: ## Section Title **
// After:  ## Section Title

// Fixes incomplete markdown
incompleteMarkdown: /\*([^*\n]+)(?!\*)/gm
// Before: *incomplete italic
// After:  *incomplete italic*

5. Validation Patterns

javascript
validImageUrl: /^(https?:\/\/|data:image\/)/i
validChartReference: /^[a-zA-Z0-9_-]+$/
validMetricValue: /^[\$€£¥]?[\d,]+(?:\.\d+)?[%]?$/
validPriorityLevel: /^(critical|high|medium|low)$/i

Usage Examples

In CustomMarkdownProcessor

typescript
// OLD: Multiple scattered patterns
private readonly businessPatterns = { ... };
private readonly corePatterns = { ... };

// NEW: Single centralized source
private readonly patterns = RegexPatternLibrary.getAllPatterns();

// Usage
html.replace(this.patterns.business.metric.pattern, '$1');
html.replace(this.patterns.visual.chart.pattern, '<div>Chart: $1</div>');

For Content Cleaning Across Platform

typescript
// Clean AI-generated content anywhere
const cleanedContent = RegexPatternLibrary.cleanContent(rawAIContent);

// Validate content quality
const validation = RegexPatternLibrary.validateContent(userInput);
if (!validation.isValid) {
  console.log('Issues found:', validation.issues);
}

For Testing Patterns

typescript
// Test any pattern against sample content
const result = RegexPatternLibrary.testPattern(
  /\[metric:\s*([^:]+):\s*([^:]+)\]/gi,
  "Revenue [metric:sales:$1.2M] increased significantly"
);
// Result: { matches: [...], count: 1, examples: ["[metric:sales:$1.2M]"] }

Benefits of Centralized System

Single Source of Truth

  • All patterns defined in one location
  • No scattered regex across multiple files
  • Easy to maintain and update

Performance Optimized

  • Compiled patterns cached and reused
  • No duplicate pattern compilation
  • Faster processing across the platform

Consistency Guaranteed

  • Same processing logic everywhere
  • Standardized pattern naming
  • Unified behavior across components

Easy Testing & Debugging

  • Built-in pattern testing utilities
  • Validation and cleanup functions
  • Clear pattern documentation with examples

Extensible Architecture

  • Easy to add new pattern categories
  • Simple to extend existing patterns
  • Modular pattern organization

Integration Points

Current Usage:

  1. CustomMarkdownProcessor - Core BI markup processing
  2. Content cleaning - AI artifact removal
  3. Validation - Input sanitization
  4. Chart processing - Visual content handling

Future Expansion Possibilities:

  1. Email processing - Extract and format business data from emails
  2. Report validation - Ensure consistent formatting across reports
  3. Data extraction - Pull metrics from unstructured text
  4. Content migration - Convert legacy formats to new markup
  5. Quality control - Automated content quality checks

Real-World Usage Examples

Content Processing Pipeline:

typescript
// 1. Clean raw AI content
let content = RegexPatternLibrary.cleanContent(rawAIOutput);

// 2. Validate content quality  
const validation = RegexPatternLibrary.validateContent(content);
if (!validation.isValid) {
  // Handle validation issues
  content = fixValidationIssues(content, validation.issues);
}

// 3. Process through custom markdown processor
const html = customProcessor.process(content);

// 4. Result: Clean, validated, formatted HTML for PDFs

Dynamic Pattern Updates:

typescript
// Add new business pattern dynamically
RegexPatternLibrary.BUSINESS_PATTERNS.newMetric = {
  pattern: /\[newmetric:\s*([^\]]+)\]/gi,
  description: 'New metric type',
  examples: ['[newmetric:engagement:85%]']
};

// Use immediately across all processors
const processor = new CustomMarkdownProcessor();
// Will automatically pick up new pattern

Impact on PDF Generation

Before: Scattered Patterns

  • Multiple regex definitions across files
  • Inconsistent processing
  • Hard to maintain and debug
  • Performance overhead from pattern recompilation

After: Centralized System

  • Single regex configuration location
  • Consistent business markup processing
  • Easy spacing and formatting control
  • Performance-optimized pattern caching
  • Comprehensive testing and validation
  • Clean, professional PDF output

Future Enhancements Made Easy

Want to add new business markup? Just update the pattern library:

typescript
// Add to RegexPatternLibrary.ts
forecast: {
  pattern: /\[forecast:\s*([^:]+):\s*([^\]]+)\]/gi,
  description: 'Financial forecasts: [forecast:Q4:$3.2M]',
  examples: ['[forecast:Q4:$3.2M]', '[forecast:2024:Growth expected]']
}

That's it! All processors automatically use the new pattern.

Spacing Control

The centralized system makes spacing adjustments trivial:

typescript
// In ProcessorOptions
paragraphSpacing: '16px',    // Space between paragraphs
sectionSpacing: '24px',      // Space between sections  
listItemSpacing: '8px',      // Space between list items
businessItemSpacing: '12px'  // Space around BI elements

Benefits Summary

The CustomMarkdownProcessor as a centralized regex hub provides:

  • Single point of control for all text processing
  • Performance gains from pattern optimization
  • Consistency across your entire platform
  • Easy maintenance and pattern updates
  • Professional output quality
  • Future-proof extensibility

This architecture enables unlimited text processing capabilities while maintaining clean, consistent output across all business intelligence reports.

Strategic Intelligence Hub Documentation