Software Identification and Mapping System Workflow

Overview

This document explains the complete workflow of how the CMDB system identifies, maps, and creates relationships for software discovered during network scans. The process involves multiple stages, from initial discovery to AI-powered enrichment and relationship creation.

System Architecture

The software identification system consists of several key components:

Data Models:
- RawScanData - Raw discovery data from network scans
- ServerAppsInstalled - Installed applications on servers
- SoftwareMapping - Normalized software name mappings with pattern support
- CI - Configuration Items (Software Instance, Software, Software Family)
- CIRelationship - Relationships between CIs
- CPEData - Common Platform Enumeration entries for vulnerability tracking
Scan Processors: Automated processors that run during scan import
Batch Processors: Scheduled jobs for AI enrichment and hierarchy creation
Manual Tools: UI components for manual mapping and enrichment
Controllers & Services: API endpoints for process identification and mapping

Timeline of Software Discovery and Mapping

Phase 1: Initial Discovery and Data Import

1. Network Scan Execution

Discovery agents scan network devices using WMI, SSH, SNMP, etc.
Agents collect server information, installed applications, processes, network connections
Data is sent to backend as RawScanData with status = 'pending'

2. WMI Processor Activation (wmi_processor.js)

Triggers when new RawScanData arrives
Creates Server CI for each discovered device
Stores server details (OS, hardware, network info) in CI custom fields
Calls ciCollectionProcessor to process collections
Updates RawScanData status to 'processed'

Phase 2: Collection Processing and Software Record Creation

3. Collection Processor (cicollection_processor.js)

Processes all collections within the scan data:
- Creates ServerDisk records
- Creates ServerUserAccount records
- Creates ServerNetworkConnection records
- Creates ServerNetworkAdapter records
- Creates ServerProcess records
- Creates ServerAppsInstalled records ← This is where software discovery begins

4. ServerAppsInstalled Creation

For each installed application found:

{
  ci_id: [Server CI ID],
  name: "Microsoft SQL Server 2019",
  vendor: "Microsoft Corporation",
  version: "15.0.2000.5",
  install_date: "2023-01-15",
  install_location: "C:\\Program Files\\Microsoft SQL Server"
}

Immediately triggers softwareInstanceProcessor.linkOrCreateSoftwareInstance()

Phase 3: Software Instance CI Creation

5. Software Instance Processor (softwareInstanceProcessor.js)

For each ServerAppsInstalled record:

a. Check for existing mapping:
- Looks up SoftwareMapping collection
- Uses findBestMatch() to find mapping by:
  - Exact instance name match
  - Normalized name match (removes versions, architecture)
  - Vendor + name combination
  - Regex pattern matching
b. If mapping exists:
- Creates Software Instance CI using mapped software/family info
- Inherits customer/tenant from parent Server CI
- Updates ServerAppsInstalled.softwareInstanceCiId
- Increments mapping usage count
- Creates Software CI if needed (when mapping includes software info)
c. If no mapping exists:
- Creates a "pending" SoftwareMapping entry
- Creates Software Instance CI with basic info
- Marks for later AI enrichment
d. CPE Integration:
- If cpe_name provided, looks up CPEData
- Links Software Instance to CPE entry

Phase 4: Relationship Creation

6. Software Installation Relationships (software_installation_relation_processor.js)

Finds all ServerAppsInstalled records with softwareInstanceCiId
Creates "Installed On" relationships:
- Source: Software Instance CI
- Target: Server CI
- Type: "Installed On"
Links software instances to their host servers

7. Network Connection Relationships (cirelation_processor.js)

Processes ServerNetworkConnection records
Creates "Connected To" relationships between servers
Enriches relationships with process information:
- Source process name
- Target process name
- Port and protocol info

Phase 5: AI Enrichment (Optional)

8. AI Insights Generation (ciai_insights.js)

If AWS Bedrock credentials are configured:
- Generates AI insights for Server CIs
- Analyzes server role, criticality, dependencies
- Stores insights in AiInsights collection

9. AI Relationship Analysis (ciairelation_processor.js)

If enabled, analyzes network connections using AI
Identifies application protocols and communication patterns
Enriches relationships with AI-discovered metadata

Phase 6: Batch Processing and Hierarchy Creation

These processes run separately as scheduled jobs or manual triggers:

10. Software Instance to Software Mapping (softwareInstanceToSoftwareProcessor.js)

Finds Software Instance CIs without Software relationships
For each unlinked instance: a. Extracts software name, vendor, version from CI attributes b. Uses AWS Bedrock AI (Claude 3 Sonnet) to identify generic software product c. AI prompt specifically designed to extract product name without version d. Falls back to pattern-based extraction if AI unavailable e. Creates Software CI if not exists (tenant-aware) f. Updates Software Instance with software reference g. Creates "Instance Of" relationship
Example: "Microsoft SQL Server 2019 (15.0.2000.5)" → "Microsoft SQL Server"
Batch processing: Default 10 items per run, configurable

11. Software to Family Classification (softwareToFamilyProcessor.js)

Finds Software CIs without family relationships
Uses AWS Bedrock AI to classify software into 13 standard families:
- Browsers, Databases, Development Tools, Runtime Environments
- Office Productivity, Security Software, Operating Systems
- Virtualization, Web Servers, Monitoring Tools
- Multimedia, Utilities, Enterprise Applications
AI prompt includes guidelines for each family category
Falls back to pattern-based classification if AI unavailable
Creates Software Family CIs (tenant-aware with caching)
Updates Software CI with family reference
Creates "Member Of" relationships
Example: "Microsoft SQL Server" → "Database Management Systems"
Family cache: 1-hour TTL, tenant-isolated

12. CPE Data Enrichment (softwareInstanceCPEProcessor.js)

Finds Software Instance CIs without CPE data
Implements multiple search strategies (in priority order):
1. Version-based search (most effective)
2. Vendor + Product keyword search
3. Full name search (fallback)
Advanced features:
- Vendor normalization mappings (e.g., "Microsoft Corporation" → "microsoft")
- Product name cleaning (removes versions, editions)
- Special handling for Visual C++ year-to-version mapping
- Match scoring algorithm with weighted criteria
NVD API integration:
- Uses NVD REST API v2.0
- Rate limiting: 0.7s with API key, 7s without
- Retry logic and error handling
Creates CPEData entries if not exist
Updates Software Instances with CPE references
Enables vulnerability tracking and compliance checks

Phase 7: Manual Mapping and Enrichment

13. Process Mapping UI (CMDBSettings.tsx) The Process Mapping Configuration UI provides several modes:

a. Enrich Mode:

Finds software mappings without process names
Uses three strategies:
1. Pattern matching against known process mappings:
  - Visual C++ Redistributables
  - Browsers (Chrome, Firefox, Edge)
  - Development tools (VS Code, Visual Studio)
  - Databases (SQL Server, MySQL, PostgreSQL)
  - System utilities and services
2. Discovery from existing CI relationships
3. AI identification using Claude service
Updates mappings with discovered process names
Stores platform-specific process names (Windows, macOS, Linux)
Tracks Windows service names separately

b. Apply Mode:

Finds unmapped CI relationships (those with process names but no application info)
Looks up software mappings by process name
Updates relationships with:
- Application name and vendor
- Software family information
- Sets aiEnhanced flag to true
Special handling for Windows OS processes:
- Can apply "Microsoft OS Services" mapping to all Windows system processes
- Includes: svchost.exe, services.exe, lsass.exe, csrss.exe, etc.

c. Fix Relationships Mode:

Creates missing Software Family CIs
Repairs broken software hierarchy
Ensures all relationships are properly linked

14. AI-Powered Process Identification When enriching mappings, the system uses Claude AI to:

Identify process names for software products
Provide OS-specific process names (Windows, Linux, macOS)
Return structured data including:
- Main process names by platform
- Associated Windows services
- Process type (application/service/library)
- Additional notes or special instructions
Return confidence scores for accuracy
Only applies mappings with confidence ≥ 0.7
Stores metadata about source (pattern/discovery/ai)

Key Functions and Their Roles

Core Processing Functions

linkOrCreateSoftwareInstance() (softwareInstanceProcessor.js)
- Creates Software Instance CIs with tenant inheritance
- Links to existing mappings and increments usage
- Can create Software CIs when mapping includes software info
- Handles CPE linkage if available
- Uses upsert pattern for idempotency
- Triggers on every ServerAppsInstalled creation
findBestMatch() (SoftwareMapping model)
- Intelligent software name matching with multiple strategies:
  - Exact instance name match (preserves versions)
  - Normalized name match (removes versions, architecture)
  - Vendor prefix matching
  - Regex pattern matching from instancePatterns array
- Only returns active mappings (status = 'active')
- Returns best mapping or null
createOrUpdateMapping() (SoftwareMapping model)
- Creates new software mappings
- Updates existing mappings
- Tracks usage and confidence
enrichMappingsWithProcessNames() (processApplicationMapping.js)
- Enriches mappings with process information
- Uses patterns, discovery, and AI
- Updates mapping metadata

AI Integration Functions

identifyProcessNamesWithAI()
- Calls Claude API for process identification
- Provides context about software (name, vendor, category)
- Returns structured process information:
  - Platform-specific executables
  - Associated services
  - Process type classification
- Handles API errors gracefully
classifySoftwareFamily()
- Uses AWS Bedrock AI to categorize software
- Validates against 13 standard families
- Falls back to pattern-based classification
- Creates family hierarchy with tenant isolation
- Maintains consistency through caching

Trigger Mechanisms

Automatic Triggers

On RawScanData import → WMI processor → Collection processor
On ServerAppsInstalled save → Software Instance creation (synchronous)
On Software Instance CI save → Hooks trigger:
- CPE processor (if no CPE assigned)
- Software linking processor (if no software relationship)
On Software CI save → Family processor (if no family assigned)
Post-scan completion → Installation relationship processor

Manual Triggers

Process Mapping UI → User-initiated enrichment/application
Scheduled Jobs → Batch processors for hierarchy creation
API Endpoints:
- /api/cmdb-settings/process-mapping/run - Comprehensive processing
- /api/process-application-mapping/apply - Direct mapping application
- /api/process-identification/* - Process identification operations
Migration Scripts → One-time data migration and cleanup

Event-Driven Triggers

Model hooks → Pre/post save operations
Queue processing → Asynchronous job execution
WebSocket events → Real-time updates

Data Flow Summary

Network Scan → RawScanData → WMI Processor → Collection Processor
                                                      ↓
                                            ServerAppsInstalled
                                                      ↓
                                         Software Instance Processor
                                                      ↓
                                            Software Instance CI
                                                      ↓
                                    ┌─────────────────┴─────────────────┐
                                    ↓                                   ↓
                          Relationship Processors              Batch Processors
                                    ↓                                   ↓
                            "Installed On"                    Software Hierarchy
                            "Connected To"                    (Software → Family)
                                                                       ↓
                                                               AI Enrichment
                                                              (Process Names,
                                                               Classification)

Best Practices

Maintain Software Mappings:
- Regularly review and update mappings for accuracy
- Use pattern-based mappings for common software
- Set appropriate confidence scores
Use AI Enrichment:
- Enable AWS Bedrock for automatic classification
- Configure API keys for better rate limits
- Monitor AI usage and costs
Monitor Pending Mappings:
- Check for pending mappings that need review
- Use bulk operations for efficiency
- Track mapping usage statistics
Run Batch Processors:
- Schedule regular runs of hierarchy creation jobs
- Configure appropriate batch sizes
- Monitor success/failure rates
Leverage Manual Tools:
- Use Process Mapping UI for complex scenarios
- Apply known mappings to Windows OS processes
- Track unmapped processes for future enhancement
Performance Optimization:
- Use caching for frequently accessed data
- Implement rate limiting for external APIs
- Batch process large datasets

Troubleshooting

Common Issues

Missing Software Instance CIs
- Check if SoftwareMapping exists and is active
- Verify SoftwareInstance CIType is defined (check variations)
- Ensure parent CI has customer/tenant info
- Review processor logs for errors
Incomplete Hierarchy
- Run "Fix Relationships" mode in CMDBSettings
- Check AWS Bedrock credentials and permissions
- Verify Software/Family CITypes exist
- Check for rate limiting on AI APIs
- Review fallback pattern matching
Process Mapping Failures
- Review Claude AI API credentials
- Check for pattern matching conflicts
- Verify CI relationship data integrity
- Ensure process names are properly extracted
- Check for case sensitivity issues

Monitoring Points

SoftwareMapping collection for pending entries and usage statistics
CI relationships without application data (use unmapped-stats endpoint)
Software CIs without family relationships
Software Instance CIs without software relationships
Process enrichment statistics in CMDBSettings UI
CPE matching success rates
AI API usage and rate limiting
Batch processor performance metrics

Additional Capabilities

Software Mapping Management

Pattern-Based Matching:
- Support for regex patterns in instancePatterns array
- Version extraction patterns for parsing version numbers
- Architecture removal patterns (x64, x86, 32-bit, 64-bit)
Mapping Metadata:
- Usage count tracking for popularity analysis
- Confidence scores (0-100) for mapping quality
- Source tracking (manual, ai, import, system, pending)
- Status management (active, pending, review)
- Last verified timestamps
Platform-Specific Process Management:
- Separate process names for Windows, macOS, Linux
- Windows service name tracking
- Process type classification (application/service/library)
- Process details with notes and special instructions

API Endpoints Summary

CMDB Settings (/api/cmdb-settings/):
- GET /process-mapping/analyze - Current mapping statistics
- GET /software-families - Available software families
- POST /process-mapping/run - Execute process mapping
Process Application Mapping (/api/process-application-mapping/):
- POST /apply - Apply mappings to relationships
- GET /unmapped-stats - Statistics on unmapped processes
Process Identification (/api/process-identification/):
- GET /software-without-processes - List unmapped software
- POST /identify/:softwareId - Identify processes for software
- GET /process-names/:softwareId - Get process names
- POST /bulk-identify - Batch process identification
Software Mapping (Additional endpoints from routes):
- Standard CRUD operations for manual mapping management

Bulk Operations and Scripts

Analysis Scripts:
- check-mapping-summary.js - Overall mapping statistics
- simple-mapping-check.js - Quick validation checks
- analyze-software-instance-mappings.js - Instance analysis
Migration Scripts:
- migrate-software-to-mapping-catalog.js - Migrate existing data
- bulk-apply-process-mappings.js - Batch apply mappings
- create-missing-software-mappings.js - Fill gaps in mappings
Cleanup Scripts:
- clean-softwaremapping-duplicates.js - Remove duplicates
- fix-software-relationships.js - Repair broken links
- update-relationships-software-family.js - Update family links

Integration with Model Hooks

The system uses Mongoose model hooks for automation:

Software Instance Hooks V2:
- Post-save: Triggers CPE processor if no CPE assigned
- Post-save: Triggers software linking if no software relationship
Software Hooks:
- Post-save: Triggers family assignment if no family
- Pre-save: Validates required fields
ServerAppsInstalled Hooks:
- Post-save: Triggers software instance creation inline

Known Process Mappings

The system includes predefined mappings for common software:

Microsoft Visual C++ Redistributables (2005-2022)
Browsers: Chrome, Firefox, Edge, Safari, Opera
Development Tools: Visual Studio, VS Code, IntelliJ IDEA
Databases: SQL Server, MySQL, PostgreSQL, MongoDB
Web Servers: Apache, Nginx, IIS
Runtime Environments: Java, .NET, Node.js, Python
Monitoring Tools: Motadata Agent, Zabbix, Nagios
Office Software: Microsoft Office, LibreOffice
Security Software: Antivirus, firewalls
Utilities: 7-Zip, WinRAR, Notepad++

Conclusion

The software identification and mapping system provides a comprehensive solution for discovering, normalizing, and organizing software data in the CMDB. By combining automatic discovery, intelligent mapping, and AI-powered enrichment, the system maintains accurate software inventory and relationships with minimal manual intervention. The extensive API endpoints, bulk operations, and model hooks ensure the system can handle enterprise-scale deployments while maintaining data quality and consistency.

Overview​

System Architecture​

Timeline of Software Discovery and Mapping​

Phase 1: Initial Discovery and Data Import​

Phase 2: Collection Processing and Software Record Creation​

Phase 3: Software Instance CI Creation​

Phase 4: Relationship Creation​

Phase 5: AI Enrichment (Optional)​

Phase 6: Batch Processing and Hierarchy Creation​

Phase 7: Manual Mapping and Enrichment​

Key Functions and Their Roles​

Core Processing Functions​

AI Integration Functions​

Trigger Mechanisms​

Automatic Triggers​

Manual Triggers​

Event-Driven Triggers​

Data Flow Summary​

Best Practices​

Troubleshooting​

Common Issues​

Monitoring Points​

Additional Capabilities​

Software Mapping Management​

API Endpoints Summary​

Bulk Operations and Scripts​

Integration with Model Hooks​

Known Process Mappings​

Conclusion​