Unusual data loss between two 10Gbps network cards.
Unusual data loss between two 10Gbps network cards.
I experience frequent packet loss when using iPerf3. My recent tests show increasing loss and unstable speeds, around 0.15%. I've tried running iPerf3 with specific flags and settings, including -s -V, -c NIC_IP, -V, -u, -b 3.1G, and -R. On Windows, iPerf3 hasn't been updated since a certain release, so I suspect compatibility issues.
I experience significant packet loss when using UDP at 1Gb speed. Reducing the buffer size to -P 10 restores performance close to 10Gb with ten TCP threads.
The precise throughput and loss values provided by iperf are specific to the test conditions and environment used.
I receive 9.1Gbps speed. Packets aren't visible since TCP is involved, and iperf3 on Windows doesn't allow the -P 10 option for UDP. I've run tests again using my tool and discovered some oddities. Loss appears significantly reduced than expected, though it's not completely zero. This is currently tested on a single edge device at a time. When I run four together, something unusual occurs. It could point to more bugs in my code. The data shows each packet has a unique ID matching the number sent. If packets 1 and 10 reach their destination, the receiver only captured two, but the latest ID is 10. This suggests it missed eight packets (I found them arriving in order with Nic to Nic). I'm counting 100,000 differences in IDs from the last run and comparing them to frame deltas. That gives a delta of 0.1% missing packets (100,000 * 8000 bytes). If this remains consistent across all edge devices, it might be within acceptable limits.
You're running a custom test? That might be why you're seeing false positives. I believe the packet loss you're observing is more about CPU usage than raw network speed. UDP processing demands more CPU because it involves counting losses and tracking data, plus the higher volume of packets due to smaller headers. PPS measures performance on networking gear rather than just throughput. Consider the data size, memory constraints, and any interruptions during flushing. Are you using jumbo frames?
I understood that Jump frames are only an issue when there is a mismatch between two NICs, if I connect PC to PC and control both and can set both to Jumbo frames then it's not an issue? How are packets lost CPU related? Am I not calling recv fast enough? Or is it the driver itself?
Only a minor aspect is mentioned. Jumbo frames aren't uniform and differ among makers in their implementation. The performance improvement isn't justified. Typically, it's preferable to keep the feature disabled, especially during testing—since you're simulating traffic. This process consumes CPU on the sender. At the receiver, every header must be examined from L3 upward, which strains CPU resources, often requiring buffering or discarding packets if capacity is exceeded. Actions must be taken at the packet level through software. Faster clock speeds and multicore applications can help, but the NIC now only handles L1/2 headers, which previously increased load significantly.
I adjusted MTU to 1500 and turned off Jumbo Packets. Performance improved significantly. Packet loss dropped considerably. Still, occasionally one end device experiences excessive packet drops. I can't anticipate when this will occur, but I notice it when it does. When sending video frames and loss is minimal, a single missing packet might result in one frame being skipped. If the problematic machine enters an unstable state instead of maintaining smooth FPS, I lose at least 30% of frames. Someone suggested aiming for zero packet loss in peer-to-peer setups. I'm not sure if that's realistic. As long as the computer is stable 100% of the time, the losses are manageable. Also, I don't seem to face problems when four devices communicate together, unlike when one is unstable.