Centralized Regex Pattern System - COMPLETE
Single Source of Truth Architecture
The CustomMarkdownProcessor serves as the perfect centralization point for ALL regex configurations across the platform, creating a comprehensive pattern library system.
Architecture Overview
RegexPatternLibrary.ts - Central Pattern Repository
typescript
// Single location for ALL regex patterns across the platform
export class RegexPatternLibrary {
static readonly MARKDOWN_PATTERNS = { ... }; // Core markdown
static readonly BUSINESS_PATTERNS = { ... }; // BI markup
static readonly VISUAL_PATTERNS = { ... }; // Charts & visuals
static readonly CLEANING_PATTERNS = { ... }; // Content cleaning
static readonly ARTIFACT_PATTERNS = { ... }; // AI content fixes
static readonly VALIDATION_PATTERNS = { ... }; // Input validation
}Pattern Categories & Examples
1. Business Intelligence Patterns
javascript
// Financial metrics with trends
[metric:revenue:$2.5M:15%↑] // Revenue with upward trend
[kpi:satisfaction:4.8/5:stable] // KPI with status
// Recommendations with priorities
[recommendation:high:Implement new strategy]
[action:urgent:Review pricing by month end]
// Risk and confidence indicators
[risk:medium:Market volatility may impact Q4]
[confidence:high]
// Timeline and scheduling
[timeline:Q1:Launch|Q2:Expansion|Q3:Review]
[deadline:End of Q4]2. Visual & Chart Patterns
javascript
// Chart references that preserve embedding
[chart:revenue_trend:Revenue Growth Over Time]
[visual:org_chart:Team Structure Diagram]
[infographic:process_flow:Sales Process]
[presentation:quarterly_review:Q4 Review Deck]3. Content Cleaning Patterns
javascript
// Removes problematic artifacts from AI-generated content
colorCodeInjection: /[a-fA-F0-9]{6};">/g // Fixes: 1f4788;">
malformedStyles: /style="[^"]*[a-fA-F0-9]{6}[^"]*"/g
brokenAttributes: /\w+;">/g // Fixes: color;">
trailingAsterisks: /\*\*\s*$/gm // Fixes: Bold text **4. AI Artifact Patterns (Fixes Common AI Issues)
javascript
// Fixes malformed headers
malformedHeaders: /^(#{1,6})\s*([^#\n]+)\s*\*\*\s*$/gm
// Before: ## Section Title **
// After: ## Section Title
// Fixes incomplete markdown
incompleteMarkdown: /\*([^*\n]+)(?!\*)/gm
// Before: *incomplete italic
// After: *incomplete italic*5. Validation Patterns
javascript
validImageUrl: /^(https?:\/\/|data:image\/)/i
validChartReference: /^[a-zA-Z0-9_-]+$/
validMetricValue: /^[\$€£¥]?[\d,]+(?:\.\d+)?[%]?$/
validPriorityLevel: /^(critical|high|medium|low)$/iUsage Examples
In CustomMarkdownProcessor
typescript
// OLD: Multiple scattered patterns
private readonly businessPatterns = { ... };
private readonly corePatterns = { ... };
// NEW: Single centralized source
private readonly patterns = RegexPatternLibrary.getAllPatterns();
// Usage
html.replace(this.patterns.business.metric.pattern, '$1');
html.replace(this.patterns.visual.chart.pattern, '<div>Chart: $1</div>');For Content Cleaning Across Platform
typescript
// Clean AI-generated content anywhere
const cleanedContent = RegexPatternLibrary.cleanContent(rawAIContent);
// Validate content quality
const validation = RegexPatternLibrary.validateContent(userInput);
if (!validation.isValid) {
console.log('Issues found:', validation.issues);
}For Testing Patterns
typescript
// Test any pattern against sample content
const result = RegexPatternLibrary.testPattern(
/\[metric:\s*([^:]+):\s*([^:]+)\]/gi,
"Revenue [metric:sales:$1.2M] increased significantly"
);
// Result: { matches: [...], count: 1, examples: ["[metric:sales:$1.2M]"] }Benefits of Centralized System
Single Source of Truth
- All patterns defined in one location
- No scattered regex across multiple files
- Easy to maintain and update
Performance Optimized
- Compiled patterns cached and reused
- No duplicate pattern compilation
- Faster processing across the platform
Consistency Guaranteed
- Same processing logic everywhere
- Standardized pattern naming
- Unified behavior across components
Easy Testing & Debugging
- Built-in pattern testing utilities
- Validation and cleanup functions
- Clear pattern documentation with examples
Extensible Architecture
- Easy to add new pattern categories
- Simple to extend existing patterns
- Modular pattern organization
Integration Points
Current Usage:
- CustomMarkdownProcessor - Core BI markup processing
- Content cleaning - AI artifact removal
- Validation - Input sanitization
- Chart processing - Visual content handling
Future Expansion Possibilities:
- Email processing - Extract and format business data from emails
- Report validation - Ensure consistent formatting across reports
- Data extraction - Pull metrics from unstructured text
- Content migration - Convert legacy formats to new markup
- Quality control - Automated content quality checks
Real-World Usage Examples
Content Processing Pipeline:
typescript
// 1. Clean raw AI content
let content = RegexPatternLibrary.cleanContent(rawAIOutput);
// 2. Validate content quality
const validation = RegexPatternLibrary.validateContent(content);
if (!validation.isValid) {
// Handle validation issues
content = fixValidationIssues(content, validation.issues);
}
// 3. Process through custom markdown processor
const html = customProcessor.process(content);
// 4. Result: Clean, validated, formatted HTML for PDFsDynamic Pattern Updates:
typescript
// Add new business pattern dynamically
RegexPatternLibrary.BUSINESS_PATTERNS.newMetric = {
pattern: /\[newmetric:\s*([^\]]+)\]/gi,
description: 'New metric type',
examples: ['[newmetric:engagement:85%]']
};
// Use immediately across all processors
const processor = new CustomMarkdownProcessor();
// Will automatically pick up new patternImpact on PDF Generation
Before: Scattered Patterns
- Multiple regex definitions across files
- Inconsistent processing
- Hard to maintain and debug
- Performance overhead from pattern recompilation
After: Centralized System
- Single regex configuration location
- Consistent business markup processing
- Easy spacing and formatting control
- Performance-optimized pattern caching
- Comprehensive testing and validation
- Clean, professional PDF output
Future Enhancements Made Easy
Want to add new business markup? Just update the pattern library:
typescript
// Add to RegexPatternLibrary.ts
forecast: {
pattern: /\[forecast:\s*([^:]+):\s*([^\]]+)\]/gi,
description: 'Financial forecasts: [forecast:Q4:$3.2M]',
examples: ['[forecast:Q4:$3.2M]', '[forecast:2024:Growth expected]']
}That's it! All processors automatically use the new pattern.
Spacing Control
The centralized system makes spacing adjustments trivial:
typescript
// In ProcessorOptions
paragraphSpacing: '16px', // Space between paragraphs
sectionSpacing: '24px', // Space between sections
listItemSpacing: '8px', // Space between list items
businessItemSpacing: '12px' // Space around BI elementsBenefits Summary
The CustomMarkdownProcessor as a centralized regex hub provides:
- Single point of control for all text processing
- Performance gains from pattern optimization
- Consistency across your entire platform
- Easy maintenance and pattern updates
- Professional output quality
- Future-proof extensibility
This architecture enables unlimited text processing capabilities while maintaining clean, consistent output across all business intelligence reports.