Project

General

Profile

Bug #1401

A lock inversion

Added by Avinoam Meir over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Category:
libcharon
Target version:
Start date:
13.04.2016
Due date:
Estimated time:
Affected version:
5.4.0
Resolution:
Fixed

Description

Hello,

I ran a lock inversion detector on the strongSwan code. A lock inversion was found with the mutexes of private_processor_t and private_thread_t in the following flow:

Stack trace:

Thread A:
libstrongswan/threading/thread.c cancel() (locks private_thread_t.mutex)
libstrongswan/processing/processor.c cancel() (locks private_processor_t.mutex)
libcharon/daemon.c destroy()
libcharon/daemon.c libcharon_deinit

Thread B:
libstrongswan/processing/processor.c restart - lock private_processor_t mutex
libstrongswan/threading/thread.c thread_cleanup - lock private_thread_t mutex

Although, it is not clear that this leads to real deadlock, this seems to be error-prone. A cancellation callback which tries to hold the mutex will lead to a deadlock which may be somewhat surprising to the caller. Generally, I find that locking a mutex before running callbacks is an anti-pattern which I try to avoid. Would love to hear what you think.

Thank you,
Avinoam

Associated revisions

Revision 960632ff (diff)
Added by Tobias Brunner over 4 years ago

thread: Don't hold mutex when calling cleanup handlers while terminating

This could interfere with cleanup handlers that try to acquire
mutexes while other threads holding these try to e.g. cancel the threads.

As cleanup handlers are only queued by the threads themselves we don't need
any synchronization to access the list.

Fixes #1401.

History

#1 Updated by Tobias Brunner over 4 years ago

  • Status changed from New to Feedback

Thanks for the report.

Thread A:
libstrongswan/threading/thread.c cancel() (locks private_thread_t.mutex)
libstrongswan/processing/processor.c cancel() (locks private_processor_t.mutex)
libcharon/daemon.c destroy()
libcharon/daemon.c libcharon_deinit

Thread B:
libstrongswan/processing/processor.c restart - lock private_processor_t mutex
libstrongswan/threading/thread.c thread_cleanup - lock private_thread_t mutex

Although, it is not clear that this leads to real deadlock, this seems to be error-prone.

There is a theoretical possibility for deadlocks here. Thread A will call processor_t.set_threads(0) before calling processor_t.cancel(). All idle threads will therefore terminate themselves, which causes them to call thread_cleanup() (via pthread_cleanup_pop() in thread_main()).

So a deadlock could definitely occur:

Thread A                    Thread B
processor_t.set_threads(0)
                            thread_cleanup()
                             lock mutex T
processor_t.cancel()
 lock mutex P               
thread_t.cancel()           restart()
 lock mutex T                lock mutex P

I think there is no reason to get the lock in thread_cleanup() before calling the cleanup handlers. These are only queued by this same thread, so there is no need to synchronize access to the list. We still need to get it afterwards, though, to synchronize the terminated flag and calling thread_destroy() as that has an effect on other threads calling thread_t.join(). I pushed this change to the 1401-thread-cleanup branch.

#2 Updated by Avinoam Meir over 4 years ago

I tested the 1401-thread-cleanup branch, and ran a lock inversion detector and a race detector, and everything looks good.

Thank you for the fix.

#3 Updated by Tobias Brunner over 4 years ago

  • Tracker changed from Issue to Bug
  • Status changed from Feedback to Closed
  • Assignee set to Tobias Brunner
  • Target version set to 5.5.0
  • Resolution set to Fixed

Great, thanks for testing the fix. I actually saw that the deadlock I described above should not occur, as the cleanup handler is only registered while a job is getting executed. So for idle threads this won't be a problem (and for canceled threads it's also no issue as they just have to wait until Thread A is done calling thread_t.cancel() until they may acquire the lock in thread_cleanup() and then wait again until Thread A waits on the condvar an releases the mutex in processor_t).

Also available in: Atom PDF