Cumulative Asset Storage Issue
📋 Summary
When executing the orchestrator multiple times for the same tracking ID, the system creates and stores new assets without removing or replacing previous ones. This results in accumulating multiple sets of assets for a single profile.
🔍 Current Behavior
The AI Consulting Platform currently exhibits cumulative asset storage behavior where:
- Multiple orchestrator executions for the same
tracking_idcreate new assets - Previous assets remain in storage and database records
- Each execution generates a unique
deliveryId - No cleanup or archival mechanism exists for outdated assets
📊 Evidence from Testing
Profile Analysis: M4A4SSPV
First execution: Created 2 assets (PDF + HTML) with delivery ID fallback_1754968421525_o3gbze
Second execution: Created 2 more assets with delivery ID fallback_1754968866964_owoiqx
Third execution: Created 2 more assets with delivery ID fallback_1755035925041_76rwmq
Result: 6 total assets for one profile (3 PDFs, 3 HTMLs)
API Response Evidence
From the status API response:
{
"downloadLinks": [
// Latest execution assets
{
"id": "asset_1755035926033_o8ug1nuyk",
"deliveryId": "fallback_1755035925041_76rwmq"
},
{
"id": "asset_1755035926267_rf75hl6tg",
"deliveryId": "fallback_1755035925041_76rwmq"
},
// Previous execution assets
{
"id": "asset_1754968867989_yn16wr3xm",
"deliveryId": "fallback_1754968866964_owoiqx"
},
{
"id": "asset_1754968867751_5tym7faoq",
"deliveryId": "fallback_1754968866964_owoiqx"
},
// Even older execution assets
{
"id": "asset_1754968422257_eyu1cwp2t",
"deliveryId": "fallback_1754968421525_o3gbze"
},
{
"id": "asset_1754968422443_t7ahihtre",
"deliveryId": "fallback_1754968421525_o3gbze"
}
]
}🏗️ Storage Pattern Analysis
Based on the current implementation:
R2 Storage Structure
- Assets stored at:
reports/{deliveryId}/{assetId}.{extension} - Each execution generates a new unique
deliveryId - Previous asset files remain in R2 storage
Database Structure
- Records created in
report_assetstable withtracking_idas foreign key - No archival flags or version management
- All assets for a
tracking_idreturned in API responses
Asset Generation Process
- Orchestrator completes successfully
- PDF and HTML assets generated
- New
deliveryIdcreated (timestamp-based) - Assets uploaded to R2 with new delivery structure
- Database records inserted without cleanup
⚠️ Potential Issues
1. Storage Bloat
- R2 storage continuously grows with duplicate content
- Unnecessary storage of outdated reports
- Linear growth with each re-execution
2. Database Growth
report_assetstable accumulates redundant records- No cleanup mechanism for historical data
- Query performance may degrade over time
3. API Response Size
- Status API returns increasingly large payloads
- Multiple versions of same report type
- Network overhead for client applications
4. User Experience Issues
- Confusion about which assets are current
- UI displays multiple versions without clear indication
- Difficulty identifying latest deliverables
5. Cost Implications
- Unnecessary storage costs for duplicate assets
- Bandwidth costs for large API responses
- Resource inefficiency in asset management
💡 Recommended Solutions
Option 1: Archive Previous Assets (Recommended)
When executing orchestrator for an existing tracking_id:
Database Schema Update
-- Add archival columns to report_assets table
ALTER TABLE report_assets ADD COLUMN archived BOOLEAN DEFAULT false;
ALTER TABLE report_assets ADD COLUMN archived_at DATETIME DEFAULT NULL;Implementation Logic
-- Before creating new assets
UPDATE report_assets
SET archived = true, archived_at = CURRENT_TIMESTAMP
WHERE tracking_id = ? AND archived = false;Benefits
- Maintains historical record with archived flag
- Only returns non-archived assets in status API
- Optional R2 cleanup for archived assets
- Preserves audit trail
Option 2: Replace Assets
Implementation Logic
-- Delete previous assets before creating new ones
DELETE FROM report_assets WHERE tracking_id = ?;
-- Also delete corresponding R2 objectsBenefits
- Simpler implementation approach
- Immediate storage cleanup
- Reduced database size
Drawbacks
- Loses historical execution data
- No audit trail for previous versions
- May conflict with compliance requirements
Option 3: Versioning System
Database Schema Update
-- Add version tracking columns
ALTER TABLE report_assets ADD COLUMN version INTEGER DEFAULT 1;
ALTER TABLE report_assets ADD COLUMN is_latest BOOLEAN DEFAULT true;Implementation Logic
-- Mark previous versions as non-latest
UPDATE report_assets
SET is_latest = false
WHERE tracking_id = ? AND is_latest = true;
-- Insert new assets with is_latest = true and incremented versionBenefits
- Comprehensive version history
- Flexible access to previous versions
- Clear latest version identification
- Future-proof for advanced features
🔧 Implementation Locations
Based on the codebase structure, changes would be needed in:
1. StorageHandler.js:321 - storeAsset() method
Add cleanup logic before storing new assets
2. AdminHandlerRoutes.js:663 - Status API query
Modify to filter archived/old assets:
SELECT * FROM report_assets
WHERE tracking_id = ? AND archived = false
ORDER BY created_at DESC3. AdminHandlerRoutes.js:501 - Storage processing step
Add pre-storage cleanup routine
📝 Minimal Code Change Suggestion
StorageHandler.js Implementation
async processOrchestratorResponse(trackingId, orchestratorData) {
// Archive existing assets for this tracking ID
await this.archiveExistingAssets(trackingId);
// Continue with normal processing...
// ... existing implementation
}
async archiveExistingAssets(trackingId) {
try {
// Mark existing assets as archived
await this.db.prepare(`
UPDATE report_assets
SET archived = true, archived_at = CURRENT_TIMESTAMP
WHERE tracking_id = ? AND (archived IS NULL OR archived = false)
`).bind(trackingId).run();
// Optionally: Schedule R2 cleanup for archived assets
console.log(`Archived existing assets for tracking ID: ${trackingId}`);
} catch (error) {
console.error('Error archiving assets:', error);
// Non-blocking - continue with new asset creation
}
}AdminHandlerRoutes.js Status API Update
// Modified query to exclude archived assets
const assetQuery = `
SELECT * FROM report_assets
WHERE tracking_id = ?
AND (archived IS NULL OR archived = false)
ORDER BY created_at DESC
`;✅ Benefits of Addressing This Issue
Operational Benefits
- Reduced storage costs: Elimination of duplicate asset storage
- Improved performance: Smaller API payloads and faster queries
- Better user experience: Clear identification of current assets
- Maintainable history: Optional access to previous versions
Technical Benefits
- Database efficiency: Reduced table size and improved query performance
- Storage optimization: R2 storage usage aligned with actual needs
- API responsiveness: Faster status endpoint responses
- System scalability: Better resource utilization as system grows
Business Benefits
- Cost reduction: Lower cloud storage and bandwidth costs
- User satisfaction: Cleaner interface with relevant assets only
- Compliance readiness: Proper asset lifecycle management
- Operational efficiency: Reduced confusion and support overhead
🚨 Migration Considerations
Database Migration Script
-- Add new columns for archival functionality
ALTER TABLE report_assets ADD COLUMN archived BOOLEAN DEFAULT false;
ALTER TABLE report_assets ADD COLUMN archived_at DATETIME DEFAULT NULL;
-- Create index for performance
CREATE INDEX idx_report_assets_archived ON report_assets(tracking_id, archived);
-- Optional: Archive assets older than latest for each tracking_id
UPDATE report_assets SET archived = true, archived_at = CURRENT_TIMESTAMP
WHERE id NOT IN (
SELECT MAX(id) FROM report_assets GROUP BY tracking_id
);Deployment Strategy
- Phase 1: Deploy database schema changes
- Phase 2: Update StorageHandler with archival logic
- Phase 3: Modify Status API to filter archived assets
- Phase 4: Optional R2 cleanup implementation
📊 Monitoring & Validation
Success Metrics
- Reduced average API response size for status endpoint
- Decreased R2 storage growth rate
- Improved status API response times
- Reduced duplicate asset complaints from users
Monitoring Queries
-- Track archived assets ratio
SELECT
COUNT(*) as total_assets,
SUM(CASE WHEN archived = true THEN 1 ELSE 0 END) as archived_assets,
ROUND(SUM(CASE WHEN archived = true THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as archive_percentage
FROM report_assets;
-- Monitor asset accumulation per tracking_id
SELECT
tracking_id,
COUNT(*) as total_assets,
SUM(CASE WHEN archived = false THEN 1 ELSE 0 END) as active_assets
FROM report_assets
GROUP BY tracking_id
HAVING COUNT(*) > 2
ORDER BY total_assets DESC;This issue affects system efficiency and user experience. Implementing the recommended archival solution will improve storage utilization, API performance, and overall system maintainability.