Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
We have seen a couple of incidents where Dali stops receiving connections from new clients.
In the most recent case, threads from Dali + a number of clients were captured and it could be seen that Dali's MPConnectThread had it's select triggered, indicating it data was available.
At this point it should be reading some client initialization data (8 bytes). However, the subsequent recv() stalled.
The client thread was waiting for a reply from the socket, having written the initialization data.
Eventually MPConnectThread timedout logging the IP:port of the client.
This happened over a period of time with many clients but eventually recovered.
It would be usueful to add some tracing to the recv and possibly shorten the timeout and loop, so that there were clues not only when it timesout, but when it is slow to receive on a recv() after a select like this, which should afaik always be quick.