Skip to main content

Software Identification and Mapping System Workflow

Overview

This document explains the complete workflow of how the CMDB system identifies, maps, and creates relationships for software discovered during network scans. The process involves multiple stages, from initial discovery to AI-powered enrichment and relationship creation.

System Architecture

The software identification system consists of several key components:

  1. Data Models:

    • RawScanData - Raw discovery data from network scans
    • ServerAppsInstalled - Installed applications on servers
    • SoftwareMapping - Normalized software name mappings with pattern support
    • CI - Configuration Items (Software Instance, Software, Software Family)
    • CIRelationship - Relationships between CIs
    • CPEData - Common Platform Enumeration entries for vulnerability tracking
  2. Scan Processors: Automated processors that run during scan import

  3. Batch Processors: Scheduled jobs for AI enrichment and hierarchy creation

  4. Manual Tools: UI components for manual mapping and enrichment

  5. Controllers & Services: API endpoints for process identification and mapping

Timeline of Software Discovery and Mapping

Phase 1: Initial Discovery and Data Import

1. Network Scan Execution

  • Discovery agents scan network devices using WMI, SSH, SNMP, etc.
  • Agents collect server information, installed applications, processes, network connections
  • Data is sent to backend as RawScanData with status = 'pending'

2. WMI Processor Activation (wmi_processor.js)

  • Triggers when new RawScanData arrives
  • Creates Server CI for each discovered device
  • Stores server details (OS, hardware, network info) in CI custom fields
  • Calls ciCollectionProcessor to process collections
  • Updates RawScanData status to 'processed'

Phase 2: Collection Processing and Software Record Creation

3. Collection Processor (cicollection_processor.js)

  • Processes all collections within the scan data:
    • Creates ServerDisk records
    • Creates ServerUserAccount records
    • Creates ServerNetworkConnection records
    • Creates ServerNetworkAdapter records
    • Creates ServerProcess records
    • Creates ServerAppsInstalled records ← This is where software discovery begins

4. ServerAppsInstalled Creation

  • For each installed application found:
    {
    ci_id: [Server CI ID],
    name: "Microsoft SQL Server 2019",
    vendor: "Microsoft Corporation",
    version: "15.0.2000.5",
    install_date: "2023-01-15",
    install_location: "C:\\Program Files\\Microsoft SQL Server"
    }
  • Immediately triggers softwareInstanceProcessor.linkOrCreateSoftwareInstance()

Phase 3: Software Instance CI Creation

5. Software Instance Processor (softwareInstanceProcessor.js)

  • For each ServerAppsInstalled record:

    a. Check for existing mapping:

    • Looks up SoftwareMapping collection
    • Uses findBestMatch() to find mapping by:
      • Exact instance name match
      • Normalized name match (removes versions, architecture)
      • Vendor + name combination
      • Regex pattern matching

    b. If mapping exists:

    • Creates Software Instance CI using mapped software/family info
    • Inherits customer/tenant from parent Server CI
    • Updates ServerAppsInstalled.softwareInstanceCiId
    • Increments mapping usage count
    • Creates Software CI if needed (when mapping includes software info)

    c. If no mapping exists:

    • Creates a "pending" SoftwareMapping entry
    • Creates Software Instance CI with basic info
    • Marks for later AI enrichment

    d. CPE Integration:

    • If cpe_name provided, looks up CPEData
    • Links Software Instance to CPE entry

Phase 4: Relationship Creation

6. Software Installation Relationships (software_installation_relation_processor.js)

  • Finds all ServerAppsInstalled records with softwareInstanceCiId
  • Creates "Installed On" relationships:
    • Source: Software Instance CI
    • Target: Server CI
    • Type: "Installed On"
  • Links software instances to their host servers

7. Network Connection Relationships (cirelation_processor.js)

  • Processes ServerNetworkConnection records
  • Creates "Connected To" relationships between servers
  • Enriches relationships with process information:
    • Source process name
    • Target process name
    • Port and protocol info

Phase 5: AI Enrichment (Optional)

8. AI Insights Generation (ciai_insights.js)

  • If AWS Bedrock credentials are configured:
    • Generates AI insights for Server CIs
    • Analyzes server role, criticality, dependencies
    • Stores insights in AiInsights collection

9. AI Relationship Analysis (ciairelation_processor.js)

  • If enabled, analyzes network connections using AI
  • Identifies application protocols and communication patterns
  • Enriches relationships with AI-discovered metadata

Phase 6: Batch Processing and Hierarchy Creation

These processes run separately as scheduled jobs or manual triggers:

10. Software Instance to Software Mapping (softwareInstanceToSoftwareProcessor.js)

  • Finds Software Instance CIs without Software relationships
  • For each unlinked instance: a. Extracts software name, vendor, version from CI attributes b. Uses AWS Bedrock AI (Claude 3 Sonnet) to identify generic software product c. AI prompt specifically designed to extract product name without version d. Falls back to pattern-based extraction if AI unavailable e. Creates Software CI if not exists (tenant-aware) f. Updates Software Instance with software reference g. Creates "Instance Of" relationship
  • Example: "Microsoft SQL Server 2019 (15.0.2000.5)" → "Microsoft SQL Server"
  • Batch processing: Default 10 items per run, configurable

11. Software to Family Classification (softwareToFamilyProcessor.js)

  • Finds Software CIs without family relationships
  • Uses AWS Bedrock AI to classify software into 13 standard families:
    • Browsers, Databases, Development Tools, Runtime Environments
    • Office Productivity, Security Software, Operating Systems
    • Virtualization, Web Servers, Monitoring Tools
    • Multimedia, Utilities, Enterprise Applications
  • AI prompt includes guidelines for each family category
  • Falls back to pattern-based classification if AI unavailable
  • Creates Software Family CIs (tenant-aware with caching)
  • Updates Software CI with family reference
  • Creates "Member Of" relationships
  • Example: "Microsoft SQL Server" → "Database Management Systems"
  • Family cache: 1-hour TTL, tenant-isolated

12. CPE Data Enrichment (softwareInstanceCPEProcessor.js)

  • Finds Software Instance CIs without CPE data
  • Implements multiple search strategies (in priority order):
    1. Version-based search (most effective)
    2. Vendor + Product keyword search
    3. Full name search (fallback)
  • Advanced features:
    • Vendor normalization mappings (e.g., "Microsoft Corporation" → "microsoft")
    • Product name cleaning (removes versions, editions)
    • Special handling for Visual C++ year-to-version mapping
    • Match scoring algorithm with weighted criteria
  • NVD API integration:
    • Uses NVD REST API v2.0
    • Rate limiting: 0.7s with API key, 7s without
    • Retry logic and error handling
  • Creates CPEData entries if not exist
  • Updates Software Instances with CPE references
  • Enables vulnerability tracking and compliance checks

Phase 7: Manual Mapping and Enrichment

13. Process Mapping UI (CMDBSettings.tsx) The Process Mapping Configuration UI provides several modes:

a. Enrich Mode:

  • Finds software mappings without process names
  • Uses three strategies:
    1. Pattern matching against known process mappings:
      • Visual C++ Redistributables
      • Browsers (Chrome, Firefox, Edge)
      • Development tools (VS Code, Visual Studio)
      • Databases (SQL Server, MySQL, PostgreSQL)
      • System utilities and services
    2. Discovery from existing CI relationships
    3. AI identification using Claude service
  • Updates mappings with discovered process names
  • Stores platform-specific process names (Windows, macOS, Linux)
  • Tracks Windows service names separately

b. Apply Mode:

  • Finds unmapped CI relationships (those with process names but no application info)
  • Looks up software mappings by process name
  • Updates relationships with:
    • Application name and vendor
    • Software family information
    • Sets aiEnhanced flag to true
  • Special handling for Windows OS processes:
    • Can apply "Microsoft OS Services" mapping to all Windows system processes
    • Includes: svchost.exe, services.exe, lsass.exe, csrss.exe, etc.

c. Fix Relationships Mode:

  • Creates missing Software Family CIs
  • Repairs broken software hierarchy
  • Ensures all relationships are properly linked

14. AI-Powered Process Identification When enriching mappings, the system uses Claude AI to:

  • Identify process names for software products
  • Provide OS-specific process names (Windows, Linux, macOS)
  • Return structured data including:
    • Main process names by platform
    • Associated Windows services
    • Process type (application/service/library)
    • Additional notes or special instructions
  • Return confidence scores for accuracy
  • Only applies mappings with confidence ≥ 0.7
  • Stores metadata about source (pattern/discovery/ai)

Key Functions and Their Roles

Core Processing Functions

  1. linkOrCreateSoftwareInstance() (softwareInstanceProcessor.js)

    • Creates Software Instance CIs with tenant inheritance
    • Links to existing mappings and increments usage
    • Can create Software CIs when mapping includes software info
    • Handles CPE linkage if available
    • Uses upsert pattern for idempotency
    • Triggers on every ServerAppsInstalled creation
  2. findBestMatch() (SoftwareMapping model)

    • Intelligent software name matching with multiple strategies:
      • Exact instance name match (preserves versions)
      • Normalized name match (removes versions, architecture)
      • Vendor prefix matching
      • Regex pattern matching from instancePatterns array
    • Only returns active mappings (status = 'active')
    • Returns best mapping or null
  3. createOrUpdateMapping() (SoftwareMapping model)

    • Creates new software mappings
    • Updates existing mappings
    • Tracks usage and confidence
  4. enrichMappingsWithProcessNames() (processApplicationMapping.js)

    • Enriches mappings with process information
    • Uses patterns, discovery, and AI
    • Updates mapping metadata

AI Integration Functions

  1. identifyProcessNamesWithAI()

    • Calls Claude API for process identification
    • Provides context about software (name, vendor, category)
    • Returns structured process information:
      • Platform-specific executables
      • Associated services
      • Process type classification
    • Handles API errors gracefully
  2. classifySoftwareFamily()

    • Uses AWS Bedrock AI to categorize software
    • Validates against 13 standard families
    • Falls back to pattern-based classification
    • Creates family hierarchy with tenant isolation
    • Maintains consistency through caching

Trigger Mechanisms

Automatic Triggers

  1. On RawScanData import → WMI processor → Collection processor
  2. On ServerAppsInstalled save → Software Instance creation (synchronous)
  3. On Software Instance CI save → Hooks trigger:
    • CPE processor (if no CPE assigned)
    • Software linking processor (if no software relationship)
  4. On Software CI save → Family processor (if no family assigned)
  5. Post-scan completion → Installation relationship processor

Manual Triggers

  1. Process Mapping UI → User-initiated enrichment/application
  2. Scheduled Jobs → Batch processors for hierarchy creation
  3. API Endpoints:
    • /api/cmdb-settings/process-mapping/run - Comprehensive processing
    • /api/process-application-mapping/apply - Direct mapping application
    • /api/process-identification/* - Process identification operations
  4. Migration Scripts → One-time data migration and cleanup

Event-Driven Triggers

  1. Model hooks → Pre/post save operations
  2. Queue processing → Asynchronous job execution
  3. WebSocket events → Real-time updates

Data Flow Summary

Network Scan → RawScanData → WMI Processor → Collection Processor

ServerAppsInstalled

Software Instance Processor

Software Instance CI

┌─────────────────┴─────────────────┐
↓ ↓
Relationship Processors Batch Processors
↓ ↓
"Installed On" Software Hierarchy
"Connected To" (Software → Family)

AI Enrichment
(Process Names,
Classification)

Best Practices

  1. Maintain Software Mappings:

    • Regularly review and update mappings for accuracy
    • Use pattern-based mappings for common software
    • Set appropriate confidence scores
  2. Use AI Enrichment:

    • Enable AWS Bedrock for automatic classification
    • Configure API keys for better rate limits
    • Monitor AI usage and costs
  3. Monitor Pending Mappings:

    • Check for pending mappings that need review
    • Use bulk operations for efficiency
    • Track mapping usage statistics
  4. Run Batch Processors:

    • Schedule regular runs of hierarchy creation jobs
    • Configure appropriate batch sizes
    • Monitor success/failure rates
  5. Leverage Manual Tools:

    • Use Process Mapping UI for complex scenarios
    • Apply known mappings to Windows OS processes
    • Track unmapped processes for future enhancement
  6. Performance Optimization:

    • Use caching for frequently accessed data
    • Implement rate limiting for external APIs
    • Batch process large datasets

Troubleshooting

Common Issues

  1. Missing Software Instance CIs

    • Check if SoftwareMapping exists and is active
    • Verify SoftwareInstance CIType is defined (check variations)
    • Ensure parent CI has customer/tenant info
    • Review processor logs for errors
  2. Incomplete Hierarchy

    • Run "Fix Relationships" mode in CMDBSettings
    • Check AWS Bedrock credentials and permissions
    • Verify Software/Family CITypes exist
    • Check for rate limiting on AI APIs
    • Review fallback pattern matching
  3. Process Mapping Failures

    • Review Claude AI API credentials
    • Check for pattern matching conflicts
    • Verify CI relationship data integrity
    • Ensure process names are properly extracted
    • Check for case sensitivity issues

Monitoring Points

  • SoftwareMapping collection for pending entries and usage statistics
  • CI relationships without application data (use unmapped-stats endpoint)
  • Software CIs without family relationships
  • Software Instance CIs without software relationships
  • Process enrichment statistics in CMDBSettings UI
  • CPE matching success rates
  • AI API usage and rate limiting
  • Batch processor performance metrics

Additional Capabilities

Software Mapping Management

  1. Pattern-Based Matching:

    • Support for regex patterns in instancePatterns array
    • Version extraction patterns for parsing version numbers
    • Architecture removal patterns (x64, x86, 32-bit, 64-bit)
  2. Mapping Metadata:

    • Usage count tracking for popularity analysis
    • Confidence scores (0-100) for mapping quality
    • Source tracking (manual, ai, import, system, pending)
    • Status management (active, pending, review)
    • Last verified timestamps
  3. Platform-Specific Process Management:

    • Separate process names for Windows, macOS, Linux
    • Windows service name tracking
    • Process type classification (application/service/library)
    • Process details with notes and special instructions

API Endpoints Summary

  1. CMDB Settings (/api/cmdb-settings/):

    • GET /process-mapping/analyze - Current mapping statistics
    • GET /software-families - Available software families
    • POST /process-mapping/run - Execute process mapping
  2. Process Application Mapping (/api/process-application-mapping/):

    • POST /apply - Apply mappings to relationships
    • GET /unmapped-stats - Statistics on unmapped processes
  3. Process Identification (/api/process-identification/):

    • GET /software-without-processes - List unmapped software
    • POST /identify/:softwareId - Identify processes for software
    • GET /process-names/:softwareId - Get process names
    • POST /bulk-identify - Batch process identification
  4. Software Mapping (Additional endpoints from routes):

    • Standard CRUD operations for manual mapping management

Bulk Operations and Scripts

  1. Analysis Scripts:

    • check-mapping-summary.js - Overall mapping statistics
    • simple-mapping-check.js - Quick validation checks
    • analyze-software-instance-mappings.js - Instance analysis
  2. Migration Scripts:

    • migrate-software-to-mapping-catalog.js - Migrate existing data
    • bulk-apply-process-mappings.js - Batch apply mappings
    • create-missing-software-mappings.js - Fill gaps in mappings
  3. Cleanup Scripts:

    • clean-softwaremapping-duplicates.js - Remove duplicates
    • fix-software-relationships.js - Repair broken links
    • update-relationships-software-family.js - Update family links

Integration with Model Hooks

The system uses Mongoose model hooks for automation:

  1. Software Instance Hooks V2:

    • Post-save: Triggers CPE processor if no CPE assigned
    • Post-save: Triggers software linking if no software relationship
  2. Software Hooks:

    • Post-save: Triggers family assignment if no family
    • Pre-save: Validates required fields
  3. ServerAppsInstalled Hooks:

    • Post-save: Triggers software instance creation inline

Known Process Mappings

The system includes predefined mappings for common software:

  • Microsoft Visual C++ Redistributables (2005-2022)
  • Browsers: Chrome, Firefox, Edge, Safari, Opera
  • Development Tools: Visual Studio, VS Code, IntelliJ IDEA
  • Databases: SQL Server, MySQL, PostgreSQL, MongoDB
  • Web Servers: Apache, Nginx, IIS
  • Runtime Environments: Java, .NET, Node.js, Python
  • Monitoring Tools: Motadata Agent, Zabbix, Nagios
  • Office Software: Microsoft Office, LibreOffice
  • Security Software: Antivirus, firewalls
  • Utilities: 7-Zip, WinRAR, Notepad++

Conclusion

The software identification and mapping system provides a comprehensive solution for discovering, normalizing, and organizing software data in the CMDB. By combining automatic discovery, intelligent mapping, and AI-powered enrichment, the system maintains accurate software inventory and relationships with minimal manual intervention. The extensive API endpoints, bulk operations, and model hooks ensure the system can handle enterprise-scale deployments while maintaining data quality and consistency.