Software Identification and Mapping System Workflow
Overview
This document explains the complete workflow of how the CMDB system identifies, maps, and creates relationships for software discovered during network scans. The process involves multiple stages, from initial discovery to AI-powered enrichment and relationship creation.
System Architecture
The software identification system consists of several key components:
-
Data Models:
RawScanData- Raw discovery data from network scansServerAppsInstalled- Installed applications on serversSoftwareMapping- Normalized software name mappings with pattern supportCI- Configuration Items (Software Instance, Software, Software Family)CIRelationship- Relationships between CIsCPEData- Common Platform Enumeration entries for vulnerability tracking
-
Scan Processors: Automated processors that run during scan import
-
Batch Processors: Scheduled jobs for AI enrichment and hierarchy creation
-
Manual Tools: UI components for manual mapping and enrichment
-
Controllers & Services: API endpoints for process identification and mapping
Timeline of Software Discovery and Mapping
Phase 1: Initial Discovery and Data Import
1. Network Scan Execution
- Discovery agents scan network devices using WMI, SSH, SNMP, etc.
- Agents collect server information, installed applications, processes, network connections
- Data is sent to backend as
RawScanDatawith status = 'pending'
2. WMI Processor Activation (wmi_processor.js)
- Triggers when new
RawScanDataarrives - Creates Server CI for each discovered device
- Stores server details (OS, hardware, network info) in CI custom fields
- Calls
ciCollectionProcessorto process collections - Updates
RawScanDatastatus to 'processed'
Phase 2: Collection Processing and Software Record Creation
3. Collection Processor (cicollection_processor.js)
- Processes all collections within the scan data:
- Creates
ServerDiskrecords - Creates
ServerUserAccountrecords - Creates
ServerNetworkConnectionrecords - Creates
ServerNetworkAdapterrecords - Creates
ServerProcessrecords - Creates
ServerAppsInstalledrecords ← This is where software discovery begins
- Creates
4. ServerAppsInstalled Creation
- For each installed application found:
{
ci_id: [Server CI ID],
name: "Microsoft SQL Server 2019",
vendor: "Microsoft Corporation",
version: "15.0.2000.5",
install_date: "2023-01-15",
install_location: "C:\\Program Files\\Microsoft SQL Server"
} - Immediately triggers
softwareInstanceProcessor.linkOrCreateSoftwareInstance()
Phase 3: Software Instance CI Creation
5. Software Instance Processor (softwareInstanceProcessor.js)
-
For each
ServerAppsInstalledrecord:a. Check for existing mapping:
- Looks up
SoftwareMappingcollection - Uses
findBestMatch()to find mapping by:- Exact instance name match
- Normalized name match (removes versions, architecture)
- Vendor + name combination
- Regex pattern matching
b. If mapping exists:
- Creates Software Instance CI using mapped software/family info
- Inherits customer/tenant from parent Server CI
- Updates
ServerAppsInstalled.softwareInstanceCiId - Increments mapping usage count
- Creates Software CI if needed (when mapping includes software info)
c. If no mapping exists:
- Creates a "pending"
SoftwareMappingentry - Creates Software Instance CI with basic info
- Marks for later AI enrichment
d. CPE Integration:
- If
cpe_nameprovided, looks up CPEData - Links Software Instance to CPE entry
- Looks up
Phase 4: Relationship Creation
6. Software Installation Relationships (software_installation_relation_processor.js)
- Finds all
ServerAppsInstalledrecords withsoftwareInstanceCiId - Creates "Installed On" relationships:
- Source: Software Instance CI
- Target: Server CI
- Type: "Installed On"
- Links software instances to their host servers
7. Network Connection Relationships (cirelation_processor.js)
- Processes
ServerNetworkConnectionrecords - Creates "Connected To" relationships between servers
- Enriches relationships with process information:
- Source process name
- Target process name
- Port and protocol info
Phase 5: AI Enrichment (Optional)
8. AI Insights Generation (ciai_insights.js)
- If AWS Bedrock credentials are configured:
- Generates AI insights for Server CIs
- Analyzes server role, criticality, dependencies
- Stores insights in
AiInsightscollection
9. AI Relationship Analysis (ciairelation_processor.js)
- If enabled, analyzes network connections using AI
- Identifies application protocols and communication patterns
- Enriches relationships with AI-discovered metadata
Phase 6: Batch Processing and Hierarchy Creation
These processes run separately as scheduled jobs or manual triggers:
10. Software Instance to Software Mapping (softwareInstanceToSoftwareProcessor.js)
- Finds Software Instance CIs without Software relationships
- For each unlinked instance: a. Extracts software name, vendor, version from CI attributes b. Uses AWS Bedrock AI (Claude 3 Sonnet) to identify generic software product c. AI prompt specifically designed to extract product name without version d. Falls back to pattern-based extraction if AI unavailable e. Creates Software CI if not exists (tenant-aware) f. Updates Software Instance with software reference g. Creates "Instance Of" relationship
- Example: "Microsoft SQL Server 2019 (15.0.2000.5)" → "Microsoft SQL Server"
- Batch processing: Default 10 items per run, configurable
11. Software to Family Classification (softwareToFamilyProcessor.js)
- Finds Software CIs without family relationships
- Uses AWS Bedrock AI to classify software into 13 standard families:
- Browsers, Databases, Development Tools, Runtime Environments
- Office Productivity, Security Software, Operating Systems
- Virtualization, Web Servers, Monitoring Tools
- Multimedia, Utilities, Enterprise Applications
- AI prompt includes guidelines for each family category
- Falls back to pattern-based classification if AI unavailable
- Creates Software Family CIs (tenant-aware with caching)
- Updates Software CI with family reference
- Creates "Member Of" relationships
- Example: "Microsoft SQL Server" → "Database Management Systems"
- Family cache: 1-hour TTL, tenant-isolated
12. CPE Data Enrichment (softwareInstanceCPEProcessor.js)
- Finds Software Instance CIs without CPE data
- Implements multiple search strategies (in priority order):
- Version-based search (most effective)
- Vendor + Product keyword search
- Full name search (fallback)
- Advanced features:
- Vendor normalization mappings (e.g., "Microsoft Corporation" → "microsoft")
- Product name cleaning (removes versions, editions)
- Special handling for Visual C++ year-to-version mapping
- Match scoring algorithm with weighted criteria
- NVD API integration:
- Uses NVD REST API v2.0
- Rate limiting: 0.7s with API key, 7s without
- Retry logic and error handling
- Creates CPEData entries if not exist
- Updates Software Instances with CPE references
- Enables vulnerability tracking and compliance checks
Phase 7: Manual Mapping and Enrichment
13. Process Mapping UI (CMDBSettings.tsx)
The Process Mapping Configuration UI provides several modes:
a. Enrich Mode:
- Finds software mappings without process names
- Uses three strategies:
- Pattern matching against known process mappings:
- Visual C++ Redistributables
- Browsers (Chrome, Firefox, Edge)
- Development tools (VS Code, Visual Studio)
- Databases (SQL Server, MySQL, PostgreSQL)
- System utilities and services
- Discovery from existing CI relationships
- AI identification using Claude service
- Pattern matching against known process mappings:
- Updates mappings with discovered process names
- Stores platform-specific process names (Windows, macOS, Linux)
- Tracks Windows service names separately
b. Apply Mode:
- Finds unmapped CI relationships (those with process names but no application info)
- Looks up software mappings by process name
- Updates relationships with:
- Application name and vendor
- Software family information
- Sets
aiEnhancedflag to true
- Special handling for Windows OS processes:
- Can apply "Microsoft OS Services" mapping to all Windows system processes
- Includes: svchost.exe, services.exe, lsass.exe, csrss.exe, etc.
c. Fix Relationships Mode:
- Creates missing Software Family CIs
- Repairs broken software hierarchy
- Ensures all relationships are properly linked
14. AI-Powered Process Identification When enriching mappings, the system uses Claude AI to:
- Identify process names for software products
- Provide OS-specific process names (Windows, Linux, macOS)
- Return structured data including:
- Main process names by platform
- Associated Windows services
- Process type (application/service/library)
- Additional notes or special instructions
- Return confidence scores for accuracy
- Only applies mappings with confidence ≥ 0.7
- Stores metadata about source (pattern/discovery/ai)
Key Functions and Their Roles
Core Processing Functions
-
linkOrCreateSoftwareInstance()(softwareInstanceProcessor.js)- Creates Software Instance CIs with tenant inheritance
- Links to existing mappings and increments usage
- Can create Software CIs when mapping includes software info
- Handles CPE linkage if available
- Uses upsert pattern for idempotency
- Triggers on every ServerAppsInstalled creation
-
findBestMatch()(SoftwareMapping model)- Intelligent software name matching with multiple strategies:
- Exact instance name match (preserves versions)
- Normalized name match (removes versions, architecture)
- Vendor prefix matching
- Regex pattern matching from instancePatterns array
- Only returns active mappings (status = 'active')
- Returns best mapping or null
- Intelligent software name matching with multiple strategies:
-
createOrUpdateMapping()(SoftwareMapping model)- Creates new software mappings
- Updates existing mappings
- Tracks usage and confidence
-
enrichMappingsWithProcessNames()(processApplicationMapping.js)- Enriches mappings with process information
- Uses patterns, discovery, and AI
- Updates mapping metadata
AI Integration Functions
-
identifyProcessNamesWithAI()- Calls Claude API for process identification
- Provides context about software (name, vendor, category)
- Returns structured process information:
- Platform-specific executables
- Associated services
- Process type classification
- Handles API errors gracefully
-
classifySoftwareFamily()- Uses AWS Bedrock AI to categorize software
- Validates against 13 standard families
- Falls back to pattern-based classification
- Creates family hierarchy with tenant isolation
- Maintains consistency through caching
Trigger Mechanisms
Automatic Triggers
- On RawScanData import → WMI processor → Collection processor
- On ServerAppsInstalled save → Software Instance creation (synchronous)
- On Software Instance CI save → Hooks trigger:
- CPE processor (if no CPE assigned)
- Software linking processor (if no software relationship)
- On Software CI save → Family processor (if no family assigned)
- Post-scan completion → Installation relationship processor
Manual Triggers
- Process Mapping UI → User-initiated enrichment/application
- Scheduled Jobs → Batch processors for hierarchy creation
- API Endpoints:
/api/cmdb-settings/process-mapping/run- Comprehensive processing/api/process-application-mapping/apply- Direct mapping application/api/process-identification/*- Process identification operations
- Migration Scripts → One-time data migration and cleanup
Event-Driven Triggers
- Model hooks → Pre/post save operations
- Queue processing → Asynchronous job execution
- WebSocket events → Real-time updates
Data Flow Summary
Network Scan → RawScanData → WMI Processor → Collection Processor
↓
ServerAppsInstalled
↓
Software Instance Processor
↓
Software Instance CI
↓
┌─────────────────┴─────────────────┐
↓ ↓
Relationship Processors Batch Processors
↓ ↓
"Installed On" Software Hierarchy
"Connected To" (Software → Family)
↓
AI Enrichment
(Process Names,
Classification)
Best Practices
-
Maintain Software Mappings:
- Regularly review and update mappings for accuracy
- Use pattern-based mappings for common software
- Set appropriate confidence scores
-
Use AI Enrichment:
- Enable AWS Bedrock for automatic classification
- Configure API keys for better rate limits
- Monitor AI usage and costs
-
Monitor Pending Mappings:
- Check for pending mappings that need review
- Use bulk operations for efficiency
- Track mapping usage statistics
-
Run Batch Processors:
- Schedule regular runs of hierarchy creation jobs
- Configure appropriate batch sizes
- Monitor success/failure rates
-
Leverage Manual Tools:
- Use Process Mapping UI for complex scenarios
- Apply known mappings to Windows OS processes
- Track unmapped processes for future enhancement
-
Performance Optimization:
- Use caching for frequently accessed data
- Implement rate limiting for external APIs
- Batch process large datasets
Troubleshooting
Common Issues
-
Missing Software Instance CIs
- Check if SoftwareMapping exists and is active
- Verify SoftwareInstance CIType is defined (check variations)
- Ensure parent CI has customer/tenant info
- Review processor logs for errors
-
Incomplete Hierarchy
- Run "Fix Relationships" mode in CMDBSettings
- Check AWS Bedrock credentials and permissions
- Verify Software/Family CITypes exist
- Check for rate limiting on AI APIs
- Review fallback pattern matching
-
Process Mapping Failures
- Review Claude AI API credentials
- Check for pattern matching conflicts
- Verify CI relationship data integrity
- Ensure process names are properly extracted
- Check for case sensitivity issues
Monitoring Points
SoftwareMappingcollection for pending entries and usage statistics- CI relationships without application data (use unmapped-stats endpoint)
- Software CIs without family relationships
- Software Instance CIs without software relationships
- Process enrichment statistics in CMDBSettings UI
- CPE matching success rates
- AI API usage and rate limiting
- Batch processor performance metrics
Additional Capabilities
Software Mapping Management
-
Pattern-Based Matching:
- Support for regex patterns in
instancePatternsarray - Version extraction patterns for parsing version numbers
- Architecture removal patterns (x64, x86, 32-bit, 64-bit)
- Support for regex patterns in
-
Mapping Metadata:
- Usage count tracking for popularity analysis
- Confidence scores (0-100) for mapping quality
- Source tracking (manual, ai, import, system, pending)
- Status management (active, pending, review)
- Last verified timestamps
-
Platform-Specific Process Management:
- Separate process names for Windows, macOS, Linux
- Windows service name tracking
- Process type classification (application/service/library)
- Process details with notes and special instructions
API Endpoints Summary
-
CMDB Settings (
/api/cmdb-settings/):GET /process-mapping/analyze- Current mapping statisticsGET /software-families- Available software familiesPOST /process-mapping/run- Execute process mapping
-
Process Application Mapping (
/api/process-application-mapping/):POST /apply- Apply mappings to relationshipsGET /unmapped-stats- Statistics on unmapped processes
-
Process Identification (
/api/process-identification/):GET /software-without-processes- List unmapped softwarePOST /identify/:softwareId- Identify processes for softwareGET /process-names/:softwareId- Get process namesPOST /bulk-identify- Batch process identification
-
Software Mapping (Additional endpoints from routes):
- Standard CRUD operations for manual mapping management
Bulk Operations and Scripts
-
Analysis Scripts:
check-mapping-summary.js- Overall mapping statisticssimple-mapping-check.js- Quick validation checksanalyze-software-instance-mappings.js- Instance analysis
-
Migration Scripts:
migrate-software-to-mapping-catalog.js- Migrate existing databulk-apply-process-mappings.js- Batch apply mappingscreate-missing-software-mappings.js- Fill gaps in mappings
-
Cleanup Scripts:
clean-softwaremapping-duplicates.js- Remove duplicatesfix-software-relationships.js- Repair broken linksupdate-relationships-software-family.js- Update family links
Integration with Model Hooks
The system uses Mongoose model hooks for automation:
-
Software Instance Hooks V2:
- Post-save: Triggers CPE processor if no CPE assigned
- Post-save: Triggers software linking if no software relationship
-
Software Hooks:
- Post-save: Triggers family assignment if no family
- Pre-save: Validates required fields
-
ServerAppsInstalled Hooks:
- Post-save: Triggers software instance creation inline
Known Process Mappings
The system includes predefined mappings for common software:
- Microsoft Visual C++ Redistributables (2005-2022)
- Browsers: Chrome, Firefox, Edge, Safari, Opera
- Development Tools: Visual Studio, VS Code, IntelliJ IDEA
- Databases: SQL Server, MySQL, PostgreSQL, MongoDB
- Web Servers: Apache, Nginx, IIS
- Runtime Environments: Java, .NET, Node.js, Python
- Monitoring Tools: Motadata Agent, Zabbix, Nagios
- Office Software: Microsoft Office, LibreOffice
- Security Software: Antivirus, firewalls
- Utilities: 7-Zip, WinRAR, Notepad++
Conclusion
The software identification and mapping system provides a comprehensive solution for discovering, normalizing, and organizing software data in the CMDB. By combining automatic discovery, intelligent mapping, and AI-powered enrichment, the system maintains accurate software inventory and relationships with minimal manual intervention. The extensive API endpoints, bulk operations, and model hooks ensure the system can handle enterprise-scale deployments while maintaining data quality and consistency.