Software Mapping Catalog System
Overview
The Software Mapping Catalog is a tenant-less, shared knowledge base that maps software instance names to their core software products and families. This system dramatically reduces AI API calls by caching mappings across all customers.
Benefits
- Reduced AI Costs: Instead of calling AI for every software instance, we check the catalog first
- Shared Knowledge: All tenants benefit from the collective software mappings
- Faster Processing: Instant lookups instead of waiting for AI responses
- Continuous Learning: The catalog grows with each new mapping
Architecture
Model: SoftwareMapping
Located at: /backend/models/softwaremapping.js
Key fields:
instanceName: The full software instance name (e.g., "Microsoft Visual C++ 2022 X64 Runtime - 14.36.32532")normalizedInstanceName: Lowercase, cleaned version for matchingsoftware.name: The core software product (e.g., "Microsoft Visual C++ Redistributable")software.vendor: The vendor namefamily.name: The software family (e.g., "Runtime Environments")metadata.usageCount: How many times this mapping has been usedmetadata.source: Where the mapping came from (manual, ai, import, system)
Hooks V2
softwareInstanceHooksV2.js: Uses catalog instead of AI for instance → software mappingsoftwareHooksV2.js: Uses catalog instead of AI for software → family mapping
How It Works
Automated Flow:
-
Software Instance Created (from scan data)
-
Hook checks mapping catalog:
- If mapping exists → Creates Software and Family CIs with relationships (tenant-specific)
- If no mapping → Adds to catalog with 'pending' status and stops
-
Batch AI Processing (run periodically):
- Process all pending mappings
- Update catalog with AI results
- Next time same software is seen, it uses the catalog
Usage Workflow
1. Initial Setup (One-time)
# Migrate existing mappings to catalog
node scripts/migrate-software-to-mapping-catalog.js
2. Normal Operations
- Hooks automatically check catalog and create hierarchy when mapping exists
- Unknown software is added to catalog as 'pending'
- No AI calls during normal operations
3. Process Pending Mappings (Periodic)
# Process pending mappings with AI (run daily/weekly)
node scripts/process-pending-software-mappings.js
4. Bulk Discovery Mode
# Disable hooks during bulk discovery
export DISABLE_SOFTWARE_HOOKS=true
# Run your discovery process...
# After discovery, process in batches
node scripts/classify-software-instances-with-catalog.js
Migration Steps
-
Run the migration script (one-time):
node scripts/migrate-software-to-mapping-catalog.jsThis copies existing software mappings to the catalog.
-
Update hooks (already done):
- The system now uses V2 hooks that check the catalog first
-
For new discoveries:
- Set
DISABLE_SOFTWARE_HOOKS=trueduring bulk import - Run batch processing after import
- Set
Maintenance
Adding Manual Mappings
// In MongoDB shell or script
db.softwaremappings.insertOne({
instanceName: "My Custom Software v1.2.3",
normalizedInstanceName: "my custom software",
software: {
name: "My Custom Software",
vendor: "My Company",
normalizedName: "my custom software"
},
family: {
name: "Enterprise Applications",
category: "Enterprise Applications"
},
metadata: {
source: "manual",
confidence: 100,
usageCount: 0
}
});
Viewing Popular Mappings
// Find most used mappings
db.softwaremappings.find().sort({ "metadata.usageCount": -1 }).limit(10)
Cleaning Up Bad Mappings
// Remove low-confidence or incorrect mappings
db.softwaremappings.deleteMany({ "metadata.confidence": { $lt: 50 } })
Performance Impact
Before:
- 5,000 software instances = 5,000 AI calls
- Cost: ~$50-100 per large discovery
- Time: 30-60 minutes
After:
- 5,000 software instances = ~50-100 AI calls (only for new/unknown software)
- Cost: ~$1-2 per large discovery
- Time: 5-10 minutes
Future Enhancements
- Pattern Matching: Add regex patterns for complex software names
- Version Extraction: Extract version info from instance names
- Confidence Scoring: Track accuracy of mappings
- Admin UI: Build interface to manage the catalog
- Export/Import: Share catalog between environments