-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lwip - Fix memory leak in k64f cyclic-buffer overflow #3135
Conversation
This was actually several bugs colluding together. 1. Confusion on the buffer-semaphore paradigm used led to misuse of the tx semaphore and potential for odd behaviour. 2. Equality tests on tx_consume_index and tx_produce_index did not handle overflow correctly. This would allow tx_consume_index to catch up to tx_produce_index and trick the k64f_rx_reclaim function into forgetting about a whole buffer of pbufs. 3. On top of all of that, the ENET_BUFFDESCRIPTOR_TX_READ_MASK was not correctly read immediately after being set due to either a compiler optimization or hardware delays. This caused k64f_low_level_output to eagerly overrun existing buff-descriptors before they had been completely sent. Adopting the counting-semaphore paradigm for 1 avoided this concern. As pointed out by @infinnovation, the overflow only occurs in the rare case that the 120MHz CPU can actually generate packets faster than the ENET hardware can transmit on a 100Mbps link.
👍 |
/morph test-nightly |
Result: FAILUREYour command has finished executing! Here's what you wrote!
Outputmbed Build Number: 947 Test failed! |
LGTM - successfully tested at 100MBps with a few variations of burst size, also at 10MBps. FYI, with 5.2.0 on a 10MBps link, a burst of more than 8 packets caused a lock-up as I expected. Would it be churlish to suggest that the only thing missing from your PR was the 20-line comment explaining the Theory of Operation to the next poor soul to visit the code? ;-) |
@@ -526,15 +524,14 @@ static err_t k64f_low_level_output(struct netif *netif, struct pbuf *p) | |||
|
|||
/* Wait until a descriptor is available for the transfer. */ | |||
/* THIS WILL BLOCK UNTIL THERE ARE A DESCRIPTOR AVAILABLE */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, though less directly, I can remove the bolded portion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this code actually getting used? I don't see where xTXDCountSem is released. If it isn't getting released, then this would essential lockup lwip if it occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific provide line numbers or code block that should get more documentation? |
@@ -526,15 +524,14 @@ static err_t k64f_low_level_output(struct netif *netif, struct pbuf *p) | |||
|
|||
/* Wait until a descriptor is available for the transfer. */ | |||
/* THIS WILL BLOCK UNTIL THERE ARE A DESCRIPTOR AVAILABLE */ | |||
while (g_handle.txBdCurrent->control & ENET_BUFFDESCRIPTOR_TX_READY_MASK) | |||
osSemaphoreWait(k64f_enet->xTXDCountSem.id, osWaitForever); | |||
osSemaphoreWait(k64f_enet->xTXDCountSem.id, osWaitForever); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It strikes me that there are two possible behaviours when output packets are generated faster than the hardware can send them: (a) wait for the hardware; (b) drop packets. Depending on the sizes of packets, the code here may do either: drop them (if there is not enough memory in the pool) or block waiting for a slot. Which is a little inconsistent.
If the preferred policy is to drop packets, then maybe it should use a zero timeout when getting the semaphore and bail out if none is available.
If the preferred policy is to block, then maybe it should defer allocating the pbuf until the slot is obtained (at which point the memory allocation is guaranteed to succeed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe it isn't guaranteed, since the pool is shared with the receive side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I haven't been able to look into this yet. Do you adopting a blocking memory allocation would be the correct solution (if such exists in lwip)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking is well defined at the Network socket API layer, so that is another option. If it was done there, then the LWIP thread would be free to continue processing RX packets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The k64f_low_level_output
function is responsible for sending ethernet frames and is significantly detached from the socket layer. LWIP only maintains the state for non-blocking network operations, not hardware operations.
In the situation we can't continue, we have two choices:
- Block the lwip thread
- Adds potential delay to every socket operation
- Still valid for non-blocking socket operations
- Drop the packet
- Relies on higher-level packet-loss logic to recover
- Adds overhead to network transaction
Looking at this, my gut says the current behaviour is reasonable.
Full hardware buffers is not an uncommon state for low-power devices, and blocking effectively throttles the network logic in the application without adding any overhead that would further slow down the device. Running out of memory, however, is much more concerning.
Dropping the packet and bailing out of any ip logic will probably give the device the best chance to recover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would normally say always drop packets. A buffer overflow is a buffer overflow. IP has to be resilient to buffer overflows in every device and every router. There's no reason to take extra precautions locally in your own Ethernet driver to avoid buffer overflow drops - you just need "enough" buffering.
Only pathological case you need to worry about generally is IP fragmentation - that's the only thing that pushes in multiple frames in one go and requires them all to go, and fails if they don't. As long as you have enough buffering to take the largest fragmented datagram you expect to attempt, dropping should be fine.
Blocking the network stack for any significant time could be very bad - it would be quite possible to pull out your Ethernet cable from one interface and bring the entire network stack (all interfaces) to a halt. Network stacks are usually single-threaded, and won't be able to do anything while a transmit call to one driver is blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm, you make a good argument. I'm still hesitant, but suspect I'm weighing efficiency too heavily over robustness and consistency.
I've pushed up a new patch that drops frames when the hardware buffers are exhausted. As expected, @infinnovation's example now only gets ~6 big packets through each iteration, but congestion control should be able to work around that.
I would be interested in seeing a performance comparison of the two strategies on a 10mbps link with something high level like tcp, but I don't currently have the hardware available.
BTW, nightly test looked ok, there's still an issue with the NCS36510 on master |
It's not necessarily about specific lines of code. E.g. declarations and code dealing with the pool of buffers are spread across (at least) 4 functions, three threads, one .c file and two .h files. What were the design decisions and how were they arrived at? Why were the sizes of tx and rx ring buffers chosen? Why use these buffers at all for tx? (Guess: lwip may send a packet as a chain of pbufs; we could use one buffer descriptor for each pbuf but that adds extra complexity; so we copy to ensure contiguous memory and one buffer per packet, with the alignment required by uDMA.) What are the circumstances in which packets may get dropped? How do task priorities affect operation? (E.g. if tx reclaim thread has insufficient priority, rx packets may get dropped too.) What are the invariants? And so on... |
Previously, exhausting hardware buffers would begin blocking the lwip thread. This patch changes the emac layer to simply drop ethernet frames, leaving recovery up to a higher level protocol. This is consistent with the behaviour of the emac layer when unable to allocate dynamic memory.
Honestly, I'm guessing at this code as much as you are. This driver has been around since before my time, and I suspect there have been enough contributors patching this code that there is no longer a single author who could explain it. I'm hesitant to add much documentation without knowing the facts. Modifying this driver has felt a bit like eating pasta with a teaspoon, and there have been other concerns raised about this driver's heap and thread usage. There is motivation to refactor the driver, but it should probably occur as a separate pull request. |
/morph test-nightly |
@infinnovation Happy with this patch or any other reviewer mentioned above? |
LGTM |
Result: FAILUREYour command has finished executing! Here's what you wrote!
Outputmbed Build Number: 976 Test failed! |
Results look good, currently chasing down an issue in Greentea that's causing it to show up as a failure. |
Release mbed OS 5.2.2 and mbed lib v129 Known issues in this release There is currently a DNS resolution failure in Thread mode with this release. This causes a failure in the mbed-os-example-client. This will be fixed in a subsequent release. This can be worked around by reverting to mbed-os-5.2.0 Ports for Upcoming Targets 3011: Add u-blox Sara-N target. ARMmbed#3011 3099: MAX32625 ARMmbed#3099 3151: Add support for FRDM-K82F ARMmbed#3151 3177: New mcu k22512 fixing pr 3136 ARMmbed#3177 Fixes and Changes 2990: [tools] Parallel building of tests ARMmbed#2990 3008: NUCLEO_F072RB: Fix wrong timer channel number on pwm PB_5 pin ARMmbed#3008 3013: STM32xx - Change how the ADC internal pins are checked before pinmap_ ARMmbed#3013 3023: digital_loop tests update for STM32 ARMmbed#3023 3041: [nRF5] - added implementation of API of serial port flow control configuration. ARMmbed#3041 3092: [tools + tests] Adding parallelized build option for iar and uvision exporters ARMmbed#3092 3084: [nrf5] fix in Digital I/O : a gpioe pin was uninitialized badly ARMmbed#3084 3009: TRNG enabled. TRNG APIs implemented. REV A/B/C/D flags removed. Warnings removed ARMmbed#3009 3139: Handle [NOT_SUPPORTED] exception in make.py ARMmbed#3139 3074: Target stm init gcc alignement ARMmbed#3074 3140: [tests] Replacing getchar with RawSerial getc in greentea-client ARMmbed#3140 3158: Added support for 6lowpan PAN ID filter to mbed mesh api configuration ARMmbed#3158 2988: Update of can_api.c fixing ARMmbed#2987 ARMmbed#2988 3175: Updating IAR definition for the NCS36510 for IAR EW v7.8 ARMmbed#3175 3170: [tests] Preventing test from printing before Greentea __sync ARMmbed#3170 3169: [Update of ARMmbed#3014] Usb updates ARMmbed#3169 3143: CFStore fix needed for the Cloud Client ARMmbed#3143 3135: lwip - Fix memory leak in k64f cyclic-buffer overflow ARMmbed#3135 3048: Make update.py test compile examples prior to updating mbed-os version. ARMmbed#3048 3162: lwip/nsapi - Clean up warnings in network code ARMmbed#3162 3161: nsapi - Add better heuristic for the default record of DNS queries ARMmbed#3161 3173: [Exporters] Add a device_name to microbit entry in targets.json ARMmbed#3173 3072: i2c_loop tests update for STM32 ARMmbed#3072 2958: Allowing mbed_app.json files to be discovered for tests. ARMmbed#2958 2969: [nRF52] - switch irq priorities of driver handlers to the lowest level ARMmbed#2969 3078: lwip: Allow several configuration macros to be set externally (bis) ARMmbed#3078 3165: Add address type checks to NanostackInterface ARMmbed#3165 3166: nsapi_dns: Provide 2 IPv6-hosted default servers ARMmbed#3166 3171: [tools] Fixing project.py -S printing problem ARMmbed#3171 3172: [Exporters] New export-build tests ARMmbed#3172 3184: ARMmbed#3183 Compiler warning in trng_api.c with K64F ARMmbed#3184 3185: Update tests to fix build failures. Also make the code similar to oth ARMmbed#3185 3104: [NuMaker] Support CAN and fix PWM CLK error ARMmbed#3104 3182: Exporter documentation ARMmbed#3182 3186: MultiTech mDot - add back SPI3 pins ARMmbed#3186 3187: [Export-Make] Use internal class variable for resolving templates in makefiles ARMmbed#3187 3195: [Exporters - Make-based] Quote the shell call in mkdir and rmdir ARMmbed#3195 3204: [Export build-test] Directory traversal error ARMmbed#3204 3189: [Exporters - Make-based] Force make exporter to search PATH for compilers ARMmbed#3189 3200: Using Popen for uVision and unifying the structure of the build function ARMmbed#3200 3075: nsapi - Add standardized return types for size and errors ARMmbed#3075 3221: u-blox odin w2 drivers update ARMmbed#3221
Before this patch, many warnings like below were generated during compilation with ArmCC [Warning] lwip_ethernet.h@57,0: ARMmbed#3135-D: attribute does not apply to any entity This happens here as ``--gnu`` option of ArmCC is being used, which enables the GNU compiler extensions that the ARM compiler supports. This is solve by adding a extra check on __CCARM .
Before this patch, many warnings like below were generated during compilation with ArmCC [Warning] lwip_ethernet.h@57,0: #3135-D: attribute does not apply to any entity This happens here as ``--gnu`` option of ArmCC is being used, which enables the GNU compiler extensions that the ARM compiler supports. This is solve by adding a extra check on __CCARM .
Before this patch, many warnings like below were generated during compilation with ArmCC [Warning] lwip_ethernet.h@57,0: #3135-D: attribute does not apply to any entity This happens here as ``--gnu`` option of ArmCC is being used, which enables the GNU compiler extensions that the ARM compiler supports. This is solve by adding a extra check on __CCARM .
This is hopefully a fix for #3118, where @infinnovation put in a lot of work in tracking down the exact problem.
This was actually several bugs colluding together.
tx_consume_index
andtx_produce_index
did not handle overflow correctly. This would allowtx_consume_index
to catch up totx_produce_index
and trick thek64f_rx_reclaim
function into forgetting about a whole buffer of pbufs.ENET_BUFFDESCRIPTOR_TX_READ_MASK
was not correctly read immediately after being set due to either a compiler optimization or hardware delays. This causedk64f_low_level_output
to eagerly overrun existing buff-descriptors before they had been completely sent. Adopting the counting-semaphore paradigm for 1 avoided this concern.As pointed out by @infinnovation, the overflow only occurs in the rare case that the 120MHz CPU can actually generate packets faster than the ENET hardware can transmit on a 100Mbps link.
related issues #3118, #2553
superseeds #3124
cc @infinnovation, @bogdanm, @kjbracey-arm