There are several places where we use spinlocks in the system. We tend to use them when they cover a very short window, and we want to reduce the locking overhead to a minimum. However their performance deteriorate fairly rapidly if they are heavily contended - especially if the number of active threads exceeds the number of cpus on the system.
So we should check if any of the spin locks (e.g., roxie memory manager) are likely to be contended in any reasonable situations, and if so either avoid the lock (e.g., by creating unique allocators) or by replacing with a mutex.
Mark Kelly also suggested an improvement to spinlock to actually spin on the cpu before calling threadYield(). I suspect a variation on it would be a good idea, but
- How many times should it iterate first?
- Should it need to call _mm_pause to free up the cpu while spinning?
It needs testing and tuning on some real world queries - in both roxie and thor - to determine if we should take it.