Discovery Overview
NopeSight Discovery automatically finds and maps your entire IT infrastructure, creating a real-time, accurate view of all hardware, software, and their relationships. Using multiple discovery methods and AI-powered analysis, it eliminates manual inventory processes and ensures your CMDB stays current.
What is Discovery?
Discovery is the automated process of:
- 🔍 Finding devices and applications on your network
- 📊 Collecting detailed configuration and state information
- 🔗 Mapping relationships and dependencies
- 🧠 Analyzing patterns and anomalies with AI
- 🔄 Updating the CMDB with current data
Discovery Architecture
Discovery Methods
🔌 Agent-Based Discovery
Advantages:
- Deep system information
- Real-time updates
- Behind firewall access
- Minimal network impact
Supported Platforms:
- Windows (PowerShell agent)
- Linux (Python agent)
- Unix (Shell agent)
- Container environments
📡 Agentless Discovery
Network Scanning:
- SNMP v1/v2c/v3
- WMI (Windows)
- SSH (Linux/Unix)
- ICMP ping sweep
API-Based:
- VMware vSphere
- AWS EC2
- Azure Resource Manager
- Google Cloud Platform
- Kubernetes API
🔄 Hybrid Discovery
Combines agent and agentless methods for:
- Complete coverage
- Minimal blind spots
- Optimized performance
- Flexible deployment
Discovery Process
1. Initial Discovery
Phase 1 - Network Sweep:
- IP range scanning
- Port identification
- Basic device classification
- Initial inventory
Phase 2 - Deep Discovery:
- Credential-based access
- Detailed configuration
- Software inventory
- Process analysis
Phase 3 - Relationship Mapping:
- Network connections
- Application dependencies
- Service mapping
- Data flow analysis
2. Continuous Discovery
AI-Powered Discovery Features
Intelligent Device Classification
{
"discovered_device": {
"ip": "10.1.1.50",
"open_ports": [22, 80, 443, 3306],
"banner_info": "Apache/2.4.41 (Ubuntu)"
},
"ai_classification": {
"type": "Web Server",
"os": "Ubuntu Linux",
"role": "Application Server",
"services": ["Apache Web Server", "MySQL Database"],
"confidence": 94
}
}
Pattern Recognition
- Identifies standard deployment patterns
- Detects application stacks
- Recognizes clustering configurations
- Maps load-balanced services
Anomaly Detection
Anomaly Detected:
Type: New Device
IP: 10.1.5.200
Classification: Unknown Web Server
Risk: Medium
Recommendation:
- Verify if authorized
- Check security compliance
- Update firewall rules
Discovery Credentials
Credential Management
Credential Vault:
Windows Domain:
Type: Active Directory
Scope: *.corp.local
Privileges: Read-only
Linux Systems:
Type: SSH Key
Scope: Production Subnet
Sudo: NOPASSWD for discovery
Network Devices:
Type: SNMP v3
Scope: Network Infrastructure
Security: authPriv
Security Best Practices
- ✅ Use read-only credentials
- ✅ Implement credential rotation
- ✅ Limit scope by IP range
- ✅ Monitor credential usage
- ✅ Encrypt credentials at rest
Discovery Scheduling
Schedule Types
Full Discovery
- Complete infrastructure scan
- All attributes collected
- Relationship rebuild
- Schedule: Weekly
Incremental Discovery
- Changes only
- New/modified devices
- Quick updates
- Schedule: Every 4 hours
Real-time Discovery
- Agent-based changes
- Immediate updates
- Critical systems only
- Schedule: Continuous
Smart Scheduling
{
"smart_schedule": {
"production_servers": {
"method": "agent",
"frequency": "real-time"
},
"development_servers": {
"method": "agentless",
"frequency": "daily"
},
"network_devices": {
"method": "snmp",
"frequency": "every_4_hours"
},
"workstations": {
"method": "agent",
"frequency": "on_login"
}
}
}
CI Matching and Reconciliation
Overview
The CI matching and reconciliation mechanism is critical for preventing duplicate Configuration Items during discovery. It uses a hierarchical lookup strategy to find existing CIs before creating new ones, ensuring data integrity and preventing duplication.
Matching Hierarchy
The system uses a priority-based matching approach to identify existing CIs:
Priority Order:
1. Serial Number + MAC Address # Highest confidence - unique hardware
2. Serial Number Only # High confidence - usually unique
3. MAC Address Only # Medium confidence - can change
4. Hostname # Low confidence - can be duplicated
5. IP Address # Last resort - dynamic/reusable
Tenant Isolation
Critical Requirement: All CI lookups MUST include tenant filtering to ensure proper data isolation:
// Correct: Includes tenant in lookup
const existingCI = await CI.findOne({
'customFields.serial_number': serialNumber,
'type': deviceType,
'tenant': scanData.tenant // REQUIRED for isolation
});
// Incorrect: Missing tenant filter (causes duplicates)
const existingCI = await CI.findOne({
'customFields.serial_number': serialNumber,
'type': deviceType
// Missing tenant filter - will create duplicates!
});
Matching Algorithm
Implementation Details
1. Hardware Identifier Priority
// Hardware identifiers take precedence
const hasHardwareIdentifiers = serialNumber || macAddress;
if (hasHardwareIdentifiers) {
// Skip IP-based matching when hardware IDs exist
// This prevents overwriting different physical devices
// that happen to share an IP address
}
2. Conflict Detection
// Detect when same IP has different hardware
if (existingCI) {
const existingSerial = existingCI.customFields.serial_number;
const existingMAC = existingCI.customFields.mac_address;
if (serialNumber !== existingSerial || macAddress !== existingMAC) {
// IP conflict detected - different physical device
// Create new CI instead of updating
logger.warn('IP conflict: Creating new CI for different device');
}
}
3. Virtual Machine Handling
Virtual machines require special handling due to:
- Cloned VMs may share serial numbers
- MAC addresses can be regenerated
- VMware serial format:
VMware-42 XX XX XX...
// VM Detection
const isVirtualMachine = serialNumber?.startsWith('VMware-') ||
serialNumber?.includes('Virtual');
if (isVirtualMachine) {
// Use MAC address as primary identifier
// Consider VM UUID if available
}
Common Matching Scenarios
Scenario 1: Hardware Refresh
Old Device:
Serial: ABC123
MAC: 00:11:22:33:44:55
IP: 192.168.1.100
New Device:
Serial: XYZ789 # Different
MAC: AA:BB:CC:DD:EE:FF # Different
IP: 192.168.1.100 # Same (reused)
Result: Creates new CI (different hardware)
Scenario 2: Network Change
Before:
Serial: ABC123
MAC: 00:11:22:33:44:55
IP: 192.168.1.100
After:
Serial: ABC123 # Same
MAC: 00:11:22:33:44:55 # Same
IP: 10.0.0.50 # Different (moved)
Result: Updates existing CI (same hardware)
Scenario 3: MAC Address Change
Before:
Serial: ABC123
MAC: 00:11:22:33:44:55
After:
Serial: ABC123 # Same
MAC: AA:BB:CC:DD:EE:FF # Different (NIC replaced)
Result: Updates existing CI (serial match)
Duplicate Prevention
Root Cause of Duplicates
Duplicates occur when:
- Missing Tenant Filter: Lookups don't include tenant field
- Timing Issues: Concurrent scans of same device
- Identifier Changes: Hardware changes between scans
- Data Quality: Missing or invalid identifiers
Prevention Strategies
// 1. Always include tenant in lookups
const lookupQuery = {
$and: [
{ 'customFields.serial_number': serialNumber },
{ 'type': deviceType },
{ 'tenant': scanData.tenant } // Critical!
]
};
// 2. Use transactions for atomic operations
const session = await mongoose.startSession();
await session.withTransaction(async () => {
const existingCI = await CI.findOne(lookupQuery).session(session);
if (!existingCI) {
await CI.create([ciData], { session });
}
});
// 3. Implement retry logic for conflicts
const maxRetries = 3;
for (let i = 0; i < maxRetries; i++) {
try {
await processCI(scanData);
break;
} catch (error) {
if (error.code === 11000 && i < maxRetries - 1) {
// Duplicate key error - retry
await sleep(100 * Math.pow(2, i));
} else {
throw error;
}
}
}
Troubleshooting Duplicates
Finding Duplicates
// Script to identify duplicate CIs
const duplicates = await CI.aggregate([
{
$match: {
'customFields.serial_number': { $exists: true, $ne: '' }
}
},
{
$group: {
_id: {
serial: '$customFields.serial_number',
tenant: '$tenant'
},
count: { $sum: 1 },
ids: { $push: '$_id' }
}
},
{
$match: { count: { $gt: 1 } }
}
]);
Resolution Steps
- Identify duplicates using the script above
- Compare last_scan timestamps to find most recent
- Merge custom fields and relationships
- Update references in related documents
- Delete older duplicate CIs
- Verify no orphaned relationships remain
Best Practices
- Always Include Tenant: Every CI lookup must filter by tenant
- Use Hardware IDs: Prioritize serial/MAC over IP address
- Handle Conflicts: Detect and log identifier mismatches
- Monitor Duplicates: Regular audits for duplicate detection
- Test Thoroughly: Verify matching logic with edge cases
Discovery Data Processing
Data Flow Pipeline
Data Quality Controls
Validation Rules:
- Required field checks
- Format validation
- Range verification
- Consistency checks
Deduplication Logic:
- Serial number matching
- MAC address correlation
- Hostname resolution
- UUID comparison
Performance & Scalability
Discovery Metrics
Performance Targets:
Devices per Hour: 10,000
Concurrent Scans: 500
Data Processing: 1M attributes/min
CMDB Updates: 50,000/min
Current Performance:
Network Utilization: 12%
CPU Usage: 35%
Memory Usage: 4.2 GB
Queue Depth: 127 devices
Scaling Strategies
Horizontal Scaling:
- Multiple discovery engines
- Distributed processing
- Load balancing
- Regional collectors
Optimization Techniques:
- Parallel scanning
- Batch processing
- Caching mechanisms
- Smart scheduling
Discovery Reporting
Discovery Dashboard
{
"discovery_summary": {
"total_devices": 15847,
"discovered_today": 234,
"failed_discoveries": 12,
"success_rate": "98.7%",
"coverage": {
"servers": "100%",
"workstations": "94%",
"network": "100%",
"cloud": "87%"
}
}
}
Key Reports
-
Discovery Coverage
- Discovered vs expected
- Blind spots analysis
- Credential failures
- Network unreachable
-
Discovery Performance
- Scan duration trends
- Success/failure rates
- Resource utilization
- Queue statistics
-
New Device Report
- Newly discovered items
- Unauthorized devices
- Shadow IT detection
- Compliance gaps
Troubleshooting
Common Issues
Duplicate CIs Created:
Symptom: Multiple CIs for same device
Causes:
- Missing tenant filter in lookups
- Concurrent scan processing
- Changed hardware identifiers
- IP address reuse
Resolution:
- Verify tenant filtering in processors
- Run duplicate detection script
- Merge duplicate CIs carefully
- Update matching logic
Prevention:
- Always include tenant in queries
- Use hardware IDs for matching
- Implement proper error handling
- Monitor for duplicates regularly
Discovery Failures:
Symptom: Devices not discovered
Causes:
- Network connectivity
- Firewall blocking
- Invalid credentials
- Service disabled
Resolution:
- Check port access
- Verify credentials
- Review firewall logs
- Enable services
Performance Issues:
Symptom: Slow discovery
Causes:
- Network congestion
- Overloaded targets
- Large scan ranges
- Insufficient resources
Resolution:
- Adjust scheduling
- Limit concurrent scans
- Increase resources
- Optimize queries
CI Matching Failures:
Symptom: CIs not updating, creating new instead
Causes:
- Missing serial numbers
- Changed MAC addresses
- Dynamic IP assignment
- Tenant mismatch
Resolution:
- Verify hardware identifiers present
- Check tenant assignment
- Review matching hierarchy
- Update lookup logic if needed
Best Practices
1. Planning
- ✅ Map network topology first
- ✅ Identify critical systems
- ✅ Plan discovery phases
- ✅ Set realistic schedules
2. Implementation
- ✅ Start with small pilot
- ✅ Validate discovered data
- ✅ Tune discovery patterns
- ✅ Monitor performance
3. Optimization
- ✅ Regular schedule review
- ✅ Credential maintenance
- ✅ Performance tuning
- ✅ Coverage analysis
4. Governance
- ✅ Discovery approval process
- ✅ Change notification
- ✅ Compliance validation
- ✅ Regular audits
Next Steps
- 📖 Network Scanning - Detailed network discovery
- 📖 Agent Deployment - Installing discovery agents
- 📖 Credential Management - Managing discovery credentials
- 📖 Discovery Patterns - Custom discovery patterns