Skip to content

Cumulative Asset Storage Issue

📋 Summary

When executing the orchestrator multiple times for the same tracking ID, the system creates and stores new assets without removing or replacing previous ones. This results in accumulating multiple sets of assets for a single profile.


🔍 Current Behavior

The AI Consulting Platform currently exhibits cumulative asset storage behavior where:

  • Multiple orchestrator executions for the same tracking_id create new assets
  • Previous assets remain in storage and database records
  • Each execution generates a unique deliveryId
  • No cleanup or archival mechanism exists for outdated assets

📊 Evidence from Testing

Profile Analysis: M4A4SSPV

First execution: Created 2 assets (PDF + HTML) with delivery ID fallback_1754968421525_o3gbze
Second execution: Created 2 more assets with delivery ID fallback_1754968866964_owoiqx
Third execution: Created 2 more assets with delivery ID fallback_1755035925041_76rwmq

Result: 6 total assets for one profile (3 PDFs, 3 HTMLs)

API Response Evidence

From the status API response:

json
{
  "downloadLinks": [
    // Latest execution assets
    {
      "id": "asset_1755035926033_o8ug1nuyk", 
      "deliveryId": "fallback_1755035925041_76rwmq"
    },
    {
      "id": "asset_1755035926267_rf75hl6tg", 
      "deliveryId": "fallback_1755035925041_76rwmq"
    },

    // Previous execution assets
    {
      "id": "asset_1754968867989_yn16wr3xm", 
      "deliveryId": "fallback_1754968866964_owoiqx"
    },
    {
      "id": "asset_1754968867751_5tym7faoq", 
      "deliveryId": "fallback_1754968866964_owoiqx"
    },

    // Even older execution assets
    {
      "id": "asset_1754968422257_eyu1cwp2t", 
      "deliveryId": "fallback_1754968421525_o3gbze"
    },
    {
      "id": "asset_1754968422443_t7ahihtre", 
      "deliveryId": "fallback_1754968421525_o3gbze"
    }
  ]
}

🏗️ Storage Pattern Analysis

Based on the current implementation:

R2 Storage Structure

  • Assets stored at: reports/{deliveryId}/{assetId}.{extension}
  • Each execution generates a new unique deliveryId
  • Previous asset files remain in R2 storage

Database Structure

  • Records created in report_assets table with tracking_id as foreign key
  • No archival flags or version management
  • All assets for a tracking_id returned in API responses

Asset Generation Process

  1. Orchestrator completes successfully
  2. PDF and HTML assets generated
  3. New deliveryId created (timestamp-based)
  4. Assets uploaded to R2 with new delivery structure
  5. Database records inserted without cleanup

⚠️ Potential Issues

1. Storage Bloat

  • R2 storage continuously grows with duplicate content
  • Unnecessary storage of outdated reports
  • Linear growth with each re-execution

2. Database Growth

  • report_assets table accumulates redundant records
  • No cleanup mechanism for historical data
  • Query performance may degrade over time

3. API Response Size

  • Status API returns increasingly large payloads
  • Multiple versions of same report type
  • Network overhead for client applications

4. User Experience Issues

  • Confusion about which assets are current
  • UI displays multiple versions without clear indication
  • Difficulty identifying latest deliverables

5. Cost Implications

  • Unnecessary storage costs for duplicate assets
  • Bandwidth costs for large API responses
  • Resource inefficiency in asset management

When executing orchestrator for an existing tracking_id:

Database Schema Update

sql
-- Add archival columns to report_assets table
ALTER TABLE report_assets ADD COLUMN archived BOOLEAN DEFAULT false;
ALTER TABLE report_assets ADD COLUMN archived_at DATETIME DEFAULT NULL;

Implementation Logic

sql
-- Before creating new assets
UPDATE report_assets
SET archived = true, archived_at = CURRENT_TIMESTAMP
WHERE tracking_id = ? AND archived = false;

Benefits

  • Maintains historical record with archived flag
  • Only returns non-archived assets in status API
  • Optional R2 cleanup for archived assets
  • Preserves audit trail

Option 2: Replace Assets

Implementation Logic

sql
-- Delete previous assets before creating new ones
DELETE FROM report_assets WHERE tracking_id = ?;
-- Also delete corresponding R2 objects

Benefits

  • Simpler implementation approach
  • Immediate storage cleanup
  • Reduced database size

Drawbacks

  • Loses historical execution data
  • No audit trail for previous versions
  • May conflict with compliance requirements

Option 3: Versioning System

Database Schema Update

sql
-- Add version tracking columns
ALTER TABLE report_assets ADD COLUMN version INTEGER DEFAULT 1;
ALTER TABLE report_assets ADD COLUMN is_latest BOOLEAN DEFAULT true;

Implementation Logic

sql
-- Mark previous versions as non-latest
UPDATE report_assets 
SET is_latest = false 
WHERE tracking_id = ? AND is_latest = true;

-- Insert new assets with is_latest = true and incremented version

Benefits

  • Comprehensive version history
  • Flexible access to previous versions
  • Clear latest version identification
  • Future-proof for advanced features

🔧 Implementation Locations

Based on the codebase structure, changes would be needed in:

1. StorageHandler.js:321 - storeAsset() method

Add cleanup logic before storing new assets

2. AdminHandlerRoutes.js:663 - Status API query

Modify to filter archived/old assets:

sql
SELECT * FROM report_assets
WHERE tracking_id = ? AND archived = false
ORDER BY created_at DESC

3. AdminHandlerRoutes.js:501 - Storage processing step

Add pre-storage cleanup routine


📝 Minimal Code Change Suggestion

StorageHandler.js Implementation

javascript
async processOrchestratorResponse(trackingId, orchestratorData) {
    // Archive existing assets for this tracking ID
    await this.archiveExistingAssets(trackingId);

    // Continue with normal processing...
    // ... existing implementation
}

async archiveExistingAssets(trackingId) {
    try {
        // Mark existing assets as archived
        await this.db.prepare(`
            UPDATE report_assets
            SET archived = true, archived_at = CURRENT_TIMESTAMP
            WHERE tracking_id = ? AND (archived IS NULL OR archived = false)
        `).bind(trackingId).run();

        // Optionally: Schedule R2 cleanup for archived assets
        console.log(`Archived existing assets for tracking ID: ${trackingId}`);
    } catch (error) {
        console.error('Error archiving assets:', error);
        // Non-blocking - continue with new asset creation
    }
}

AdminHandlerRoutes.js Status API Update

javascript
// Modified query to exclude archived assets
const assetQuery = `
    SELECT * FROM report_assets 
    WHERE tracking_id = ? 
    AND (archived IS NULL OR archived = false)
    ORDER BY created_at DESC
`;

✅ Benefits of Addressing This Issue

Operational Benefits

  • Reduced storage costs: Elimination of duplicate asset storage
  • Improved performance: Smaller API payloads and faster queries
  • Better user experience: Clear identification of current assets
  • Maintainable history: Optional access to previous versions

Technical Benefits

  • Database efficiency: Reduced table size and improved query performance
  • Storage optimization: R2 storage usage aligned with actual needs
  • API responsiveness: Faster status endpoint responses
  • System scalability: Better resource utilization as system grows

Business Benefits

  • Cost reduction: Lower cloud storage and bandwidth costs
  • User satisfaction: Cleaner interface with relevant assets only
  • Compliance readiness: Proper asset lifecycle management
  • Operational efficiency: Reduced confusion and support overhead

🚨 Migration Considerations

Database Migration Script

sql
-- Add new columns for archival functionality
ALTER TABLE report_assets ADD COLUMN archived BOOLEAN DEFAULT false;
ALTER TABLE report_assets ADD COLUMN archived_at DATETIME DEFAULT NULL;

-- Create index for performance
CREATE INDEX idx_report_assets_archived ON report_assets(tracking_id, archived);

-- Optional: Archive assets older than latest for each tracking_id
UPDATE report_assets SET archived = true, archived_at = CURRENT_TIMESTAMP
WHERE id NOT IN (
    SELECT MAX(id) FROM report_assets GROUP BY tracking_id
);

Deployment Strategy

  1. Phase 1: Deploy database schema changes
  2. Phase 2: Update StorageHandler with archival logic
  3. Phase 3: Modify Status API to filter archived assets
  4. Phase 4: Optional R2 cleanup implementation

📊 Monitoring & Validation

Success Metrics

  • Reduced average API response size for status endpoint
  • Decreased R2 storage growth rate
  • Improved status API response times
  • Reduced duplicate asset complaints from users

Monitoring Queries

sql
-- Track archived assets ratio
SELECT 
    COUNT(*) as total_assets,
    SUM(CASE WHEN archived = true THEN 1 ELSE 0 END) as archived_assets,
    ROUND(SUM(CASE WHEN archived = true THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as archive_percentage
FROM report_assets;

-- Monitor asset accumulation per tracking_id
SELECT 
    tracking_id,
    COUNT(*) as total_assets,
    SUM(CASE WHEN archived = false THEN 1 ELSE 0 END) as active_assets
FROM report_assets 
GROUP BY tracking_id 
HAVING COUNT(*) > 2
ORDER BY total_assets DESC;

This issue affects system efficiency and user experience. Implementing the recommended archival solution will improve storage utilization, API performance, and overall system maintainability.

Strategic Intelligence Hub Documentation