ftr - Fast TraceRoute

Windows Async ICMP Implementation Analysis

Executive Summary

Problem Statement

The Windows async ICMP implementation experiences flakiness when enrichment (DNS/ASN lookups) is enabled with short timeouts (e.g., 70ms). The root cause was identified as resource contention in Tokio’s blocking thread pool, where 30+ concurrent spawn_blocking tasks for ICMP operations compete with enrichment tasks.

Key Findings

After parallel exploration of four different approaches:

  1. All alternative approaches performed worse than the current implementation
  2. Windows ICMP APIs have fundamental limitations that prevent true async I/O
  3. The current spawn_blocking approach is actually near-optimal given API constraints
  4. The real issue is not the implementation, but the timeout configuration

Final Recommendation

Keep the current implementation with minor optimizations:

  1. Implement a semaphore to limit concurrent blocking tasks (Not needed - Tokio has 512 blocking threads by default)
  2. Increase minimum timeout recommendations for Windows
  3. Add warning for users about potential flakiness with <100ms timeouts + enrichment

Update: The semaphore approach was initially implemented but then removed after realizing that Tokio’s default blocking thread pool has 512 threads, making exhaustion unlikely. Limiting to 20 would actually impair performance by serializing probes unnecessarily.

Approach Comparison Table

Approach Success Rate Avg Execution Time Implementation Complexity Resource Usage Recommendation
Current (Baseline) 80-100% 150-275ms Low 30+ blocking tasks Keep with limits
Option 1: WaitForMultipleObjects 0% (50% hop detection) 400-500ms High (497 lines) 1 wait thread ❌ Reject
Option 2: Runtime Optimization 55-100% (unstable) 245-1920ms Low (30 lines) 4-8 threads + 128 blocking ❌ Reject
Option 3: Dedicated Thread 100% 492-514ms Very High (800 lines) 1 dedicated thread ❌ Reject
Option 4: Windows IOCP N/A N/A Not Possible N/A ❌ Not Viable

Detailed Analysis

Option 1: WaitForMultipleObjects Implementation

Approach: Replace individual spawn_blocking tasks with a single thread using WaitForMultipleObjects to monitor up to 62 events simultaneously.

Results:

Why it failed: Despite correct implementation, the approach missed ~50% of ICMP responses. The complexity of coordinating events across threads likely introduced timing issues that caused responses to be lost.

Option 2: Runtime Optimization

Approach: Tune Tokio runtime with multi-threaded configuration and larger blocking thread pool.

Results:

Why it failed: Multi-threaded runtime added overhead without addressing the root cause. The increased concurrency actually made enrichment timing issues worse.

Option 3: Dedicated ICMP Thread

Approach: Isolate all ICMP operations in a dedicated thread, communicating via channels.

Results:

Why it failed: The overhead of cross-thread communication (10-20ms per probe) far exceeded any benefits. The Windows ICMP API with events is already async-friendly, making isolation counterproductive.

Option 4: Windows IOCP

Approach: Investigate using Windows I/O Completion Ports for true async I/O.

Results:

Why it failed: Fundamental API limitation - Windows ICMP APIs use their own async model (events/APCs) and cannot integrate with IOCP.

Root Cause Analysis

Why Current Implementation Has Issues

  1. Resource Contention: 30 concurrent spawn_blocking tasks can saturate Tokio’s blocking thread pool
  2. Enrichment Timing: DNS/ASN lookups add additional async tasks that compete for resources
  3. Tight Timeouts: 70ms leaves little margin for scheduling delays
  4. Single-threaded Runtime: Limited capacity to handle concurrent operations

Why Alternative Approaches Failed

  1. API Constraints: Windows ICMP APIs are designed for event-based async, not IOCP
  2. Overhead Dominates: Any abstraction layer (channels, thread coordination) adds 10-20ms per probe
  3. Already Optimized: spawn_blocking with events is actually the most efficient approach
  4. Complexity Penalty: More complex implementations introduced new failure modes

1. Implement Blocking Task Limiter

// Add to WindowsAsyncIcmpSocket
struct WindowsAsyncIcmpSocket {
    // ... existing fields ...
    blocking_semaphore: Arc<Semaphore>, // Limit to 20 concurrent
}

// In send_probe_and_recv
let _permit = self.blocking_semaphore.acquire().await?;
let wait_handle = tokio::task::spawn_blocking(move || {
    // ... existing code ...
});

2. Adjust Timeout Recommendations

3. Minor Optimizations

Expected Outcomes

Lessons Learned

Windows API Limitations

  1. ICMP APIs are special: Don’t follow standard Windows async I/O patterns
  2. Event-based is optimal: For APIs already using events, wrapping adds overhead
  3. No silver bullet: IOCP isn’t always the answer for Windows async

Async Programming Trade-offs

  1. Simplicity often wins: Current spawn_blocking approach is simple and effective
  2. Channel overhead is real: 10-20ms per cross-thread operation adds up quickly
  3. Resource limits matter: Default thread pools can be exhausted

Performance vs Complexity

  1. Measure first: All “optimizations” made performance worse
  2. Understand the API: Working with platform constraints is better than fighting them
  3. Incremental improvements: Small changes (semaphore) better than rewrites

Conclusion

The parallel exploration revealed that the current Windows async ICMP implementation is already near-optimal given API constraints. Rather than a complete rewrite, the solution is to:

  1. Add resource limits to prevent thread pool exhaustion
  2. Adjust timeout guidance for Windows users
  3. Accept platform limitations and work within them

The investigation provided valuable insights into Windows async programming and demonstrated that sometimes the simplest solution is the best one.