Skip to content

AI Consulting Platform - Administrative Dashboard Guide

🎯 Overview

The AI Consulting Platform Administrative Dashboard provides comprehensive oversight and monitoring capabilities for system administrators. It offers real-time visibility into execution health, failure patterns, performance trends, and automated troubleshooting recommendations.

This dashboard system was designed to enable quick identification of failure sources during orchestrator execution, with specific focus on external service dependencies (D1 Database, R2 Storage, AI Services).


📊 Dashboard API Endpoints

1. System Health Overview

Endpoint: GET /api/admin/dashboard/overview

Purpose: High-level system health monitoring and performance metrics

Key Features:

  • Health Score (0-100): Overall system performance rating
  • Success Rate: Percentage of successful executions over last 7 days
  • Performance Metrics: Average processing times and execution counts
  • Error Source Analysis: Categorized failures by service type
  • Real-time Alerts: Automated warnings for system issues
  • Hourly Performance Trends: 24-hour execution pattern analysis

Sample Response:

json
{
  "dashboard": {
    "health": {
      "score": 87,
      "status": "GOOD",
      "total_executions": 156,
      "success_rate": 87.18,
      "avg_processing_time_ms": 45230
    },
    "failures": {
      "by_source": [
        {"error_source": "D1_DATABASE", "error_count": 8},
        {"error_source": "AI_SERVICE", "error_count": 5},
        {"error_source": "R2_STORAGE", "error_count": 2}
      ]
    },
    "alerts": [
      {
        "level": "MEDIUM",
        "type": "DATABASE_ISSUES", 
        "message": "Database errors detected: 8 occurrences",
        "recommendation": "Check D1 database connectivity and schema integrity"
      }
    ]
  }
}

2. Comprehensive Failure Analysis

Endpoint: GET /api/admin/dashboard/failures

Purpose: Detailed failure analysis and troubleshooting guidance

Query Parameters:

  • timeframe: 24h (default), 7d, 30d

Key Features:

  • Error Pattern Detection: Automatic categorization of failure types
  • Step Failure Analysis: Identification of problematic orchestrator steps
  • Troubleshooting Recommendations: Automated suggestions based on patterns
  • Affected Client Context: Company and industry impact analysis
  • Failure Timeline: Historical failure tracking

Error Pattern Categories:

  • DATABASE_CONSTRAINT: Schema/constraint violations
  • SCHEMA_MISSING: Missing database tables
  • D1_CONNECTION: Database connectivity issues
  • R2_CONNECTION: Storage access problems
  • AI_SERVICE: AI model generation failures
  • TIMEOUT: Execution timeouts
  • RATE_LIMIT: Service quota exceeded
  • OTHER: Uncategorized failures

Sample Response:

json
{
  "failure_analysis": {
    "error_patterns": [
      {
        "error_pattern": "DATABASE_CONSTRAINT",
        "occurrences": 12,
        "latest_occurrence": "2024-01-15T14:30:00Z",
        "sample_messages": "NOT NULL constraint failed: orchestrator_processing_log.processing_step"
      }
    ],
    "troubleshooting_recommendations": [
      {
        "issue": "Database Constraint Violations",
        "count": 12,
        "recommendation": "Run schema migration: migrations/add_workflow_logger_columns.sql",
        "priority": "HIGH"
      }
    ]
  }
}

3. Bulk Execution Monitoring

Endpoint: GET /api/admin/dashboard/executions

Purpose: Recent execution monitoring with bulk status overview

Query Parameters:

  • limit: Number of executions to return (default: 50, max: 100)
  • status: Filter by status (all, completed, failed, processing)

Key Features:

  • Recent Executions: Detailed list with progress tracking
  • Long-running Detection: Executions stuck processing >10 minutes
  • Progress Visualization: Step completion percentages
  • Client Context: Company names and industry identification
  • Performance Metrics: Duration analysis and trends
  • Health Status Indicators: Visual status classification

Sample Response:

json
{
  "executions": {
    "summary": {
      "total": 45,
      "completed": 38,
      "failed": 5,
      "processing": 2,
      "average_duration_ms": 42150,
      "long_running_count": 1
    },
    "recent": [
      {
        "tracking_id": "M4A4SSPV",
        "processing_status": "completed",
        "company_name": "TechCorp Inc",
        "industry_id": "technology",
        "progress_percentage": 100,
        "health_status": "HEALTHY",
        "duration_ms": 38420
      }
    ],
    "long_running": [
      {
        "tracking_id": "DJPN8WSX", 
        "running_seconds": 1240,
        "processing_status": "processing"
      }
    ]
  }
}

🚨 Alert System & Health Indicators

Health Score Calculation

  • 90-100: EXCELLENT - System operating optimally
  • 85-89: GOOD - Minor issues, monitoring recommended
  • 70-84: FAIR - Performance degradation detected
  • <70: POOR - Immediate attention required

Automated Alert Types

High Priority Alerts

  • FAILURE_RATE: >20% failure rate in recent executions
  • AI_SERVICE_ISSUES: >10 AI service failures detected
  • SCHEMA_MISSING: Critical database schema errors

Medium Priority Alerts

  • DATABASE_ISSUES: >5 D1 database errors detected
  • STORAGE_ISSUES: >3 R2 storage errors detected
  • LONG_RUNNING: Executions stuck >10 minutes

Alert Response Actions

  1. Immediate: Check Cloudflare logs for error details
  2. Investigate: Review failure patterns in dashboard
  3. Resolve: Apply recommended troubleshooting steps
  4. Monitor: Track system health improvement

🔧 Troubleshooting Guide

Common Issues & Solutions

1. Database Constraint Violations

Symptoms:

  • Error: "NOT NULL constraint failed: orchestrator_processing_log.processing_step"
  • High DATABASE_CONSTRAINT error pattern count

Solution:

bash
# Run database migration
wrangler d1 execute STRATEGIC_INTELLIGENCE_DB --file=migrations/add_workflow_logger_columns.sql

Prevention:

  • Verify schema integrity before major deployments
  • Monitor constraint violation alerts

2. AI Service Rate Limiting

Symptoms:

  • Multiple AI_SERVICE failures
  • Error messages containing "rate limit" or "quota exceeded"

Solutions:

  • Implement exponential backoff in AI service calls
  • Monitor Cloudflare AI service quotas
  • Consider request batching optimization

3. R2 Storage Access Issues

Symptoms:

  • R2_STORAGE error pattern increases
  • Storage processing steps failing consistently

Solutions:

  • Verify R2 bucket permissions and accessibility
  • Check CORS configuration for admin interface
  • Monitor R2 usage quotas

4. Long-Running Executions

Symptoms:

  • Executions stuck in 'processing' status >10 minutes
  • No recent WorkflowLogger updates

Investigation Steps:

  1. Check current step via status API
  2. Review WorkflowLogger for last successful step
  3. Identify bottleneck (D1, R2, or AI service)
  4. Consider manual intervention or restart

Database Schema Verification

Use the schema check endpoint to verify WorkflowLogger compatibility:

bash
GET /health?schema=check

Expected result: isWorkflowLoggerCompatible: true


📈 Performance Optimization Insights

Key Performance Indicators (KPIs)

Execution Performance

  • Target Average Duration: <60 seconds for comprehensive analysis
  • Success Rate Target: >95% for production stability
  • Step Failure Rate: <2% per individual step

Service Reliability Metrics

  • D1 Database Uptime: >99.9%
  • R2 Storage Availability: >99.9%
  • AI Service Success Rate: >98%

Performance Optimization Strategies

1. Database Optimization

  • Index Management: Ensure proper indexing on tracking_id columns
  • Connection Pooling: Monitor D1 connection efficiency
  • Query Optimization: Review slow queries in processing logs

2. Storage Efficiency

  • Asset Size Monitoring: Track R2 storage usage patterns
  • Upload Optimization: Implement parallel asset processing
  • CDN Integration: Consider CloudFront for asset delivery

3. AI Service Optimization

  • Model Selection: Optimize model choice for analysis depth
  • Prompt Engineering: Reduce token usage through prompt optimization
  • Batch Processing: Group similar analysis requests

📊 Dashboard Integration Recommendations

Frontend Integration

Create admin dashboard components that consume these APIs:

1. Health Overview Widget

javascript
// Fetch system health every 30 seconds
const healthData = await fetch('/api/admin/dashboard/overview');
// Display health score, alerts, and key metrics

2. Failure Analysis Panel

javascript
// Filter failures by timeframe and pattern
const failures = await fetch('/api/admin/dashboard/failures?timeframe=24h');
// Show error patterns, troubleshooting recommendations

3. Execution Monitor Table

javascript
// Real-time execution status with auto-refresh
const executions = await fetch('/api/admin/dashboard/executions?status=all&limit=25');
// Display with progress bars, health indicators

Monitoring & Alerting Integration

CloudFlare Analytics Integration

  • Set up custom metrics for dashboard KPIs
  • Configure alerts for critical thresholds
  • Track performance trends over time

Webhook Notifications

Consider implementing webhook notifications for:

  • Health score drops below 80
  • Critical alert generation
  • System-wide failure pattern detection

Data Export Capabilities

Future enhancement opportunities:

  • CSV export of failure analysis data
  • Performance report generation
  • Historical trend analysis

🔍 Advanced Administrative Features

WorkflowLogger Deep Dive

The WorkflowLogger integration provides step-by-step execution tracking:

Step Categories:

  • Profile Operations: Database queries for client data
  • AI Processing: Model inference and content generation
  • Storage Operations: R2 uploads and metadata storage
  • System Operations: Status updates and error handling

Logging Metadata:

json
{
  "step_name": "ai_financial_analysis",
  "step_number": 5,
  "status": "completed", 
  "duration_ms": 3240,
  "tokens_used": 289,
  "metadata": {
    "model": "@cf/meta/llama-3.1-8b-instruct",
    "content_length": 1847,
    "step_type": "ai_analysis"
  }
}

Error Source Identification

The system automatically categorizes errors by service:

D1 Database Errors:

  • Connection timeouts
  • Schema constraint violations
  • Query syntax errors
  • Permission issues

R2 Storage Errors:

  • Upload failures
  • Access permission errors
  • Quota exceeded errors
  • Network connectivity issues

AI Service Errors:

  • Model inference failures
  • Rate limit exceeded
  • Invalid prompt formats
  • Service unavailability

🚀 Best Practices for Administrators

Daily Monitoring Routine

  1. Check Health Overview - Review overall system health score
  2. Scan Recent Failures - Identify any new failure patterns
  3. Monitor Long-Running Executions - Address stuck processes
  4. Review Performance Trends - Track system performance over time

Weekly Analysis Tasks

  1. Failure Pattern Analysis - Deep dive into recurring issues
  2. Performance Optimization - Identify bottlenecks and improvements
  3. Capacity Planning - Monitor resource usage trends
  4. System Maintenance - Apply necessary updates and fixes

Incident Response Workflow

  1. Detection: Automated alerts or manual discovery
  2. Assessment: Use dashboard APIs to understand scope
  3. Investigation: Identify root cause using error categorization
  4. Resolution: Apply appropriate troubleshooting steps
  5. Follow-up: Monitor system recovery and prevent recurrence

Proactive Monitoring

  • Set up automated health checks
  • Configure alert thresholds appropriate for your usage
  • Regularly review and update troubleshooting procedures
  • Maintain documentation of common issues and solutions

🔗 API Reference Summary

EndpointMethodPurposeKey Parameters
/api/admin/dashboard/overviewGETSystem health & performance metricsNone
/api/admin/dashboard/failuresGETDetailed failure analysistimeframe: 24h, 7d, 30d
/api/admin/dashboard/executionsGETBulk execution monitoringstatus: all, completed, failed, processing
limit: 1-100
/api/admin/status/:trackingIdGETIndividual execution detailstrackingId: specific execution
/api/admin/profilesGETClient profile managementpage, limit, status, search

📝 Changelog & Version Information

Version: 2.3.1-enhanced Last Updated: January 2024
Features Added:

  • Comprehensive admin dashboard API system
  • Automated error categorization and troubleshooting
  • WorkflowLogger integration for step-by-step tracking
  • Real-time health monitoring and alerting
  • Performance trend analysis and optimization insights

Dependencies:

  • Cloudflare Workers Runtime
  • D1 Database (STRATEGIC_INTELLIGENCE_DB)
  • R2 Storage (STRATIQX_REPORTS)
  • Cloudflare AI Services
  • WorkflowLogger integration

For technical support or feature requests, please refer to the project repository or contact the development team.

Strategic Intelligence Hub Documentation