Skip to main content

Software Mapping Catalog System

Overview

The Software Mapping Catalog is a tenant-less, shared knowledge base that maps software instance names to their core software products and families. This system dramatically reduces AI API calls by caching mappings across all customers.

Benefits

  1. Reduced AI Costs: Instead of calling AI for every software instance, we check the catalog first
  2. Shared Knowledge: All tenants benefit from the collective software mappings
  3. Faster Processing: Instant lookups instead of waiting for AI responses
  4. Continuous Learning: The catalog grows with each new mapping

Architecture

Model: SoftwareMapping

Located at: /backend/models/softwaremapping.js

Key fields:

  • instanceName: The full software instance name (e.g., "Microsoft Visual C++ 2022 X64 Runtime - 14.36.32532")
  • normalizedInstanceName: Lowercase, cleaned version for matching
  • software.name: The core software product (e.g., "Microsoft Visual C++ Redistributable")
  • software.vendor: The vendor name
  • family.name: The software family (e.g., "Runtime Environments")
  • metadata.usageCount: How many times this mapping has been used
  • metadata.source: Where the mapping came from (manual, ai, import, system)

Hooks V2

  • softwareInstanceHooksV2.js: Uses catalog instead of AI for instance → software mapping
  • softwareHooksV2.js: Uses catalog instead of AI for software → family mapping

How It Works

Automated Flow:

  1. Software Instance Created (from scan data)

  2. Hook checks mapping catalog:

    • If mapping exists → Creates Software and Family CIs with relationships (tenant-specific)
    • If no mapping → Adds to catalog with 'pending' status and stops
  3. Batch AI Processing (run periodically):

    • Process all pending mappings
    • Update catalog with AI results
    • Next time same software is seen, it uses the catalog

Usage Workflow

1. Initial Setup (One-time)

# Migrate existing mappings to catalog
node scripts/migrate-software-to-mapping-catalog.js

2. Normal Operations

  • Hooks automatically check catalog and create hierarchy when mapping exists
  • Unknown software is added to catalog as 'pending'
  • No AI calls during normal operations

3. Process Pending Mappings (Periodic)

# Process pending mappings with AI (run daily/weekly)
node scripts/process-pending-software-mappings.js

4. Bulk Discovery Mode

# Disable hooks during bulk discovery
export DISABLE_SOFTWARE_HOOKS=true

# Run your discovery process...

# After discovery, process in batches
node scripts/classify-software-instances-with-catalog.js

Migration Steps

  1. Run the migration script (one-time):

    node scripts/migrate-software-to-mapping-catalog.js

    This copies existing software mappings to the catalog.

  2. Update hooks (already done):

    • The system now uses V2 hooks that check the catalog first
  3. For new discoveries:

    • Set DISABLE_SOFTWARE_HOOKS=true during bulk import
    • Run batch processing after import

Maintenance

Adding Manual Mappings

// In MongoDB shell or script
db.softwaremappings.insertOne({
instanceName: "My Custom Software v1.2.3",
normalizedInstanceName: "my custom software",
software: {
name: "My Custom Software",
vendor: "My Company",
normalizedName: "my custom software"
},
family: {
name: "Enterprise Applications",
category: "Enterprise Applications"
},
metadata: {
source: "manual",
confidence: 100,
usageCount: 0
}
});
// Find most used mappings
db.softwaremappings.find().sort({ "metadata.usageCount": -1 }).limit(10)

Cleaning Up Bad Mappings

// Remove low-confidence or incorrect mappings
db.softwaremappings.deleteMany({ "metadata.confidence": { $lt: 50 } })

Performance Impact

Before:

  • 5,000 software instances = 5,000 AI calls
  • Cost: ~$50-100 per large discovery
  • Time: 30-60 minutes

After:

  • 5,000 software instances = ~50-100 AI calls (only for new/unknown software)
  • Cost: ~$1-2 per large discovery
  • Time: 5-10 minutes

Future Enhancements

  1. Pattern Matching: Add regex patterns for complex software names
  2. Version Extraction: Extract version info from instance names
  3. Confidence Scoring: Track accuracy of mappings
  4. Admin UI: Build interface to manage the catalog
  5. Export/Import: Share catalog between environments