Windows Async ICMP Implementation Analysis
Executive Summary
Problem Statement
The Windows async ICMP implementation experiences flakiness when enrichment (DNS/ASN lookups) is enabled with short timeouts (e.g., 70ms). The root cause was identified as resource contention in Tokio’s blocking thread pool, where 30+ concurrent spawn_blocking tasks for ICMP operations compete with enrichment tasks.
Key Findings
After parallel exploration of four different approaches:
- All alternative approaches performed worse than the current implementation
- Windows ICMP APIs have fundamental limitations that prevent true async I/O
- The current
spawn_blocking approach is actually near-optimal given API constraints
- The real issue is not the implementation, but the timeout configuration
Final Recommendation
Keep the current implementation with minor optimizations:
Implement a semaphore to limit concurrent blocking tasks (Not needed - Tokio has 512 blocking threads by default)
- Increase minimum timeout recommendations for Windows
- Add warning for users about potential flakiness with <100ms timeouts + enrichment
Update: The semaphore approach was initially implemented but then removed after realizing that Tokio’s default blocking thread pool has 512 threads, making exhaustion unlikely. Limiting to 20 would actually impair performance by serializing probes unnecessarily.
Approach Comparison Table
| Approach |
Success Rate |
Avg Execution Time |
Implementation Complexity |
Resource Usage |
Recommendation |
| Current (Baseline) |
80-100% |
150-275ms |
Low |
30+ blocking tasks |
Keep with limits |
| Option 1: WaitForMultipleObjects |
0% (50% hop detection) |
400-500ms |
High (497 lines) |
1 wait thread |
❌ Reject |
| Option 2: Runtime Optimization |
55-100% (unstable) |
245-1920ms |
Low (30 lines) |
4-8 threads + 128 blocking |
❌ Reject |
| Option 3: Dedicated Thread |
100% |
492-514ms |
Very High (800 lines) |
1 dedicated thread |
❌ Reject |
| Option 4: Windows IOCP |
N/A |
N/A |
Not Possible |
N/A |
❌ Not Viable |
Detailed Analysis
Approach: Replace individual spawn_blocking tasks with a single thread using WaitForMultipleObjects to monitor up to 62 events simultaneously.
Results:
- ✅ Eliminated thread pool exhaustion (1 thread vs 30+)
- ❌ Only detected 50% of hops (vs 80% baseline)
- ❌ 0% success rate for complete path detection
- ❌ Added 250ms overhead to execution time
Why it failed: Despite correct implementation, the approach missed ~50% of ICMP responses. The complexity of coordinating events across threads likely introduced timing issues that caused responses to be lost.
Option 2: Runtime Optimization
Approach: Tune Tokio runtime with multi-threaded configuration and larger blocking thread pool.
Results:
- ✅ Simple implementation (environment variables)
- ❌ Highly unstable performance (245ms to 1920ms)
- ❌ Made critical test case 598% slower
- ❌ Introduced new failure modes
Why it failed: Multi-threaded runtime added overhead without addressing the root cause. The increased concurrency actually made enrichment timing issues worse.
Option 3: Dedicated ICMP Thread
Approach: Isolate all ICMP operations in a dedicated thread, communicating via channels.
Results:
- ✅ Complete isolation from Tokio runtime
- ✅ 100% reliability maintained
- ❌ 2-3x slower (300-350ms overhead)
- ❌ Channel communication dominated performance
Why it failed: The overhead of cross-thread communication (10-20ms per probe) far exceeded any benefits. The Windows ICMP API with events is already async-friendly, making isolation counterproductive.
Option 4: Windows IOCP
Approach: Investigate using Windows I/O Completion Ports for true async I/O.
Results:
- ❌ Windows ICMP APIs don’t support IOCP
- ❌ No OVERLAPPED structure support
- ❌ Alternative approaches (raw sockets, APC) not viable
Why it failed: Fundamental API limitation - Windows ICMP APIs use their own async model (events/APCs) and cannot integrate with IOCP.
Root Cause Analysis
Why Current Implementation Has Issues
- Resource Contention: 30 concurrent
spawn_blocking tasks can saturate Tokio’s blocking thread pool
- Enrichment Timing: DNS/ASN lookups add additional async tasks that compete for resources
- Tight Timeouts: 70ms leaves little margin for scheduling delays
- Single-threaded Runtime: Limited capacity to handle concurrent operations
Why Alternative Approaches Failed
- API Constraints: Windows ICMP APIs are designed for event-based async, not IOCP
- Overhead Dominates: Any abstraction layer (channels, thread coordination) adds 10-20ms per probe
- Already Optimized:
spawn_blocking with events is actually the most efficient approach
- Complexity Penalty: More complex implementations introduced new failure modes
Recommended Solution
1. Implement Blocking Task Limiter
// Add to WindowsAsyncIcmpSocket
struct WindowsAsyncIcmpSocket {
// ... existing fields ...
blocking_semaphore: Arc<Semaphore>, // Limit to 20 concurrent
}
// In send_probe_and_recv
let _permit = self.blocking_semaphore.acquire().await?;
let wait_handle = tokio::task::spawn_blocking(move || {
// ... existing code ...
});
2. Adjust Timeout Recommendations
- Document that Windows users should use ≥100ms timeouts with enrichment
- Add warning if timeout <100ms and enrichment enabled on Windows
- Consider different defaults for Windows vs other platforms
3. Minor Optimizations
- Pre-allocate event handles pool
- Reuse reply buffers where possible
- Consider batching enrichment lookups
Expected Outcomes
- Eliminate thread pool exhaustion
- Maintain current performance levels
- Improve reliability with enrichment enabled
- Minimal code changes (<100 lines)
Lessons Learned
Windows API Limitations
- ICMP APIs are special: Don’t follow standard Windows async I/O patterns
- Event-based is optimal: For APIs already using events, wrapping adds overhead
- No silver bullet: IOCP isn’t always the answer for Windows async
Async Programming Trade-offs
- Simplicity often wins: Current
spawn_blocking approach is simple and effective
- Channel overhead is real: 10-20ms per cross-thread operation adds up quickly
- Resource limits matter: Default thread pools can be exhausted
- Measure first: All “optimizations” made performance worse
- Understand the API: Working with platform constraints is better than fighting them
- Incremental improvements: Small changes (semaphore) better than rewrites
Conclusion
The parallel exploration revealed that the current Windows async ICMP implementation is already near-optimal given API constraints. Rather than a complete rewrite, the solution is to:
- Add resource limits to prevent thread pool exhaustion
- Adjust timeout guidance for Windows users
- Accept platform limitations and work within them
The investigation provided valuable insights into Windows async programming and demonstrated that sometimes the simplest solution is the best one.