Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-12242

Add additional tracing to help diagnose why on rare occasions MPConnect gets stuck on recv

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 5.0.4
    • Core Libraries
    • None

    Description

      We have seen a couple of incidents where Dali stops receiving connections from new clients.
      In the most recent case, threads from Dali + a number of clients were captured and it could be seen that Dali's MPConnectThread had it's select triggered, indicating it data was available.
      At this point it should be reading some client initialization data (8 bytes). However, the subsequent recv() stalled.

      The client thread was waiting for a reply from the socket, having written the initialization data.

      Eventually MPConnectThread timedout logging the IP:port of the client.
      This happened over a period of time with many clients but eventually recovered.

      It would be usueful to add some tracing to the recv and possibly shorten the timeout and loop, so that there were clues not only when it timesout, but when it is slow to receive on a recv() after a select like this, which should afaik always be quick.

      Attachments

        Activity

          People

            jakesmith Jake Smith
            jakesmith Jake Smith
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: