Project

General

Profile

Bug #1185

Charon/vici deadlock if start_action = trap|start is used

Added by J. Bill Chilton almost 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Category:
libcharon
Target version:
Start date:
29.10.2015
Due date:
Estimated time:
Affected version:
5.3.3
Resolution:
Fixed

Description

In Strongswan-5.3.0, I’m able to induce a deadlock condition in the charon daemon’s vici implementation with good repeatability using an eight-host full mesh test configuration.

The symptom of this deadlock condition is, initially, any attempt to do a swanctl --list-conns or --load-conns operation causes the swanctl process to hang until terminated, and one of charon’s worker threads to hang permanently. Additional list- or load-conns attempts will incapacitate additional worker threads until charon become entirely unresponsive to swanctl.

Here’s the deadlock scenario based on what I know so far:

1. A worker thread acquires a write lock on the rwlock in a private_vici_config_t instance at src/libcharon/plugins/vici/vici_config.c:1692 (the top of merge_config()). This rwlock will later be found blocking all the other hung threads.

2. merge_config() goes on to call run_start_actions() which calls run_start_action() which calls find_reqid() which gets an IKE SA enumerator (line 1493), then attempts to get the first IKE SA from it with enumerator->enumerate(). This call to enumerate() never returns.

3. The IKE SA enumerator that gets called, above, is enumerator_filter_wait() at src/libcharon/sa/ike_sa_manager.c:1474. It calls wait_for_entry() which enters but does not leave the while loop at ike_sa_manager:707.

Because run_start_actions() never returns, merge_config() never gets to the unlock call at vici_config.c:1728.

Associated revisions

Revision 20df9d31 (diff)
Added by Tobias Brunner over 4 years ago

vici: Don't hold write lock while running or undoing start actions

Running or undoing start actions might require enumerating IKE_SAs,
which in turn might have to enumerate peer configs concurrently, which
requires acquiring a read lock. So if we keep holding the write lock while
enumerating the SAs we provoke a deadlock.

By preventing other threads from acquiring the write lock while handling
actions, and thus preventing the modification of the configs, we largely
maintain the current synchronous behavior. This way we also don't need to
acquire additional refs for config objects as they won't get modified/removed.

Fixes #1185.

History

#1 Updated by J. Bill Chilton almost 5 years ago

OK, I see the rest of the deadlock now.

The IKE SA that wait_for_entry() is waiting for Godot-ing on is called (unnamed[2]). That SA was created by another thread in response to an incoming IKE_SA_INIT message. The creating thread is waiting for a read-lock on the same rwlock that was write-locked upstack from wait_for_entry(), apparently preventing it from ever checking in (unnamed)[2].

#2 Updated by J. Bill Chilton almost 5 years ago

I'm able to avoid the deadlock above--created when a new connection (that has "start_action = trap" in it) is loaded via VICI at about the same time a connection is being created in response to an incoming INIT message--with the following experimental patch:

*** ike_sa_manager.c    2015-11-04 08:30:55.148126506 -0500
--- ike_sa_manager.c.01    2015-11-04 08:33:17.107523760 -0500
***************
*** 1474,1479 ****
--- 1474,1485 ----
  static bool enumerator_filter_wait(private_ike_sa_manager_t *this,
                                   entry_t **in, ike_sa_t **out, u_int *segment)
  {
+    /* XXX - Must avoid waiting on an SA created by checkout_by_message() but not checked in yet */
+    if ((*in)->processing != -1)
+    {
+        DBG2(DBG_MGR, "XXX enumerator_filter_wait() - skipping entry w/ processing = %d", (*in)->processing);
+        return FALSE;
+    }
    if (wait_for_entry(this, *in, *segment))
    {
        *out = (*in)->ike_sa;

Does this make sense?

#3 Updated by J. Bill Chilton over 4 years ago

Nope, above isn't sufficient. Still found SA "checkout" vs. config rw-lock deadlocks incapacitating my charon daemon.

Moving the check for in-process IKE SAs into wait_for_entry() seems to have fixed it:

Index: src/libcharon/sa/ike_sa_manager.c
===================================================================
--- src.orig/libcharon/sa/ike_sa_manager.c    2015-11-13 16:46:35.953860249 -0500
+++ src/libcharon/sa/ike_sa_manager.c    2015-11-25 12:29:46.700451352 -0500
@@ -719,7 +719,8 @@
         /* we are not allowed to get this */
         return FALSE;
     }
-    while (entry->checked_out && !entry->driveout_waiting_threads)
+    /* Must avoid waiting on an SA created by checkout_by_message() but not checked in yet */
+    while (entry->checked_out && !entry->driveout_waiting_threads && entry->processing == -1)
     {
         /* so wait until we can get it for us.
          * we register us as waiting. */
@@ -728,7 +729,7 @@
         entry->waiting_threads--;
     }
     /* hm, a deletion request forbids us to get this SA, get next one */
-    if (entry->driveout_waiting_threads)
+    if (entry->driveout_waiting_threads || entry->processing != -1)
     {
         /* we must signal here, others may be waiting on it, too */
         entry->condvar->signal(entry->condvar);

#4 Updated by Chris Buechler over 4 years ago

J. Bill Chilton: I'm fighting what appears to be a similar issue if not identical. Building strongswan 5.3.5 with the patch from the 25th in the last comment doesn't seem to impact it though. It wasn't clear to me if the last patch you posted is adequate to resolve what you were seeing, or if both the patches are required.

I'm also struggling to replicate in a lab environment across a variety of circumstances, though we have a few pfSense users who are hitting it in production. Even using their exact configs, I'm unable to replicate in a lab. Could you share the config you're using and steps to replicate?

Any assistance would be much appreciated. Thanks!

#5 Updated by J. Bill Chilton over 4 years ago

To clarify, my second [experimental] patch replaces the first one. I don't claim this is the best way to avoid the race condition deadlock (between VICI config additions and incoming message config adds), but it did fix the problem for me.

My test set-up consists of (currently) 32 Linux LXC containers running charon and some local application code. A test run fires up all 32 at once, then all of them start adding connections via VICI and initiating them. This loads down the LXC host enough to flush out various race conditions and, most of the time (without the patch), results in one or two charon processes deadlocking as described above.

#6 Updated by M. C. over 4 years ago

Hello everyone

I have had the same problem. After some time, it seems randomly, the vici interface gets into a deadlock and the swanctl command hangs.. I tried out the second patch posted by J. Bill Chilton. But after some days the server had the same problem again. So I guess the patch still doesn't fix the problem completely..

Just wanted to let you know.

#7 Updated by M. C. over 4 years ago

Since I've installed the 2nd patch posted by J. Bill Chilton the deadlock doesn't happen as often as before.. But it still happens. I am on Strongswan 5.3.4.
I could reproduce it twice so far. It still hangs in this part of the code with the fix:

-    while (entry->checked_out && !entry->driveout_waiting_threads)
+    /* Must avoid waiting on an SA created by checkout_by_message() but not checked in yet */
+    while (entry->checked_out && !entry->driveout_waiting_threads && entry->processing == -1)
     {
         /* so wait until we can get it for us.
          * we register us as waiting. */
@@ -728,7 +729,7 @@
         entry->waiting_threads--;
     }

So the "&& entry->processing == -1" in the if statement doesn't completely fix the problem..
The calls which led to this function and the deadlock were coming from these points in the code:

diff --git a/src/libcharon/sa/ike_sa_manager.c b/src/libcharon/sa/ike_sa_manager.c
index 4625df5..92414fe 100644
--- a/src/libcharon/sa/ike_sa_manager.c
+++ b/src/libcharon/sa/ike_sa_manager.c
@@ -1146,6 +1146,7 @@ METHOD(ike_sa_manager_t, checkout, ike_sa_t*,

        if (get_entry_by_id(this, ike_sa_id, &entry, &segment) == SUCCESS)
        {
+               // Triggers the deadlock
                if (wait_for_entry(this, entry, segment))
                {
                        entry->checked_out = TRUE;
@@ -1343,6 +1344,7 @@ METHOD(ike_sa_manager_t, checkout_by_message, ike_sa_t*,
                        DBG1(DBG_MGR, "ignoring request with ID %u, already processing",
                                 entry->processing);
                }
+               // Triggers the deadlock!
                else if (wait_for_entry(this, entry, segment))
                {
                        ike_sa_id_t *ike_id;

The swanctl --stats command starts to hang as well. This happens here:

diff --git a/src/libcharon/plugins/vici/libvici.c b/src/libcharon/plugins/vici/libvici.c
index 7c98c8b..c7d703d 100644
--- a/src/libcharon/plugins/vici/libvici.c
+++ b/src/libcharon/plugins/vici/libvici.c
@@ -392,6 +392,7 @@ vici_res_t* vici_submit(vici_req_t *req, vici_conn_t *conn)
        conn->mutex->lock(conn->mutex);
        while (conn->wait == WAIT_IDLE)
        {
+               // Deadlock
                conn->cond->wait(conn->cond, conn->mutex);
        }
        switch (conn->wait)

#8 Updated by Jim Thompson over 4 years ago

there is no doubt it still occurs. it occurs on pfsense as well.

Your first diff is garbled, compare https://github.com/strongswan/strongswan/blob/master/src/libcharon/sa/ike_sa_manager.c#L1337

#9 Updated by Tobias Brunner over 4 years ago

  • Tracker changed from Issue to Bug
  • Subject changed from Charon/vici deadlock to Charon/vici deadlock if start_action = trap|start is used
  • Status changed from New to Feedback
  • Assignee set to Tobias Brunner

J. Bill analyzed the first part of the deadlock pretty accurately in the original description. But the conclusion that there is a problem in the IKE_SA manager is incorrect. If the root cause of this issue was any of the code in this central component we'd see deadlocks all over the place, which is not the case.

So as J. Bill said the problem is basically that the write lock in merge_config() (and unload_conn()) is held while trying to enumerate IKE_SAs. Threads working on IKE_SAs are keeping them checked out and thus blocking the IKE_SA-enumerator in wait_for_entry(). These threads might concurrently try to acquire the read lock in create_ike_cfg_enumerator() or create_peer_cfg_enumerator(). When handling an IKE_SA_INIT an ike config is searched and later when handling the IKE_AUTH message, when the peer's identity is known, a peer config is searched. Which is also why the IKE_SA the enumerator waits for in #1185-1 is unnamed, it has no peer config assigned yet.

In my opinion the proper way to fix this is to avoid holding the write lock while initiating or terminating SAs in the first place. I did something to that effect in the 1185-vici-action-unlock branch. Let me know if this works for you.

#10 Updated by Chris Buechler over 4 years ago

Tobias Brunner wrote:

I did something to that effect in the 1185-vici-action-unlock branch. Let me know if this works for you.

With the changes from that branch, we're still hitting the deadlock we've been fighting. I'm looking for a lab-replicable test case.

#11 Updated by M. C. over 4 years ago

I just had the patched strongswan hang again. In GDB I can see that one thread crashed.


Thread 12 (Thread 0x7fe54fb27700 (LWP 1822)):
#0 0x00007fe55f94b334 in __lll_lock_wait () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1 0x00007fe55f9465d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007fe55f9464a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fe5604ae0ea in lock (this=<value optimized out>) at threading/mutex.c:106
#4 0x00007fe560047cb5 in log_ (this=0x22adf20, group=<value optimized out>, level=<value optimized out>, thread=6, ike_sa=0x7fe5100140c0, message=0x7fe54fb25280 "thread 6 received 11") at bus/listeners/sys_logger.c:91
#5 0x00007fe5604856e8 in invoke_function (this=<value optimized out>, fn=0x7fe55fff3420 <log_cb>, d1=0x7fe54fb25680, d2=0x40223d, d3=0x0, d4=0x0, d5=0x7fe54fb255a0) at collections/linked_list.c:414
#6 0x00007fe55fff3c4c in vlog (this=0x22a9a00, group=DBG_DMN, level=LEVEL_CTRL, format=0x402228 "thread %u received %d", args=0x7fe54fb25700) at bus/bus.c:400
#7 0x00007fe55fff3dcf in log_ (this=<value optimized out>, group=<value optimized out>, level=<value optimized out>, format=<value optimized out>) at bus/bus.c:430
#8 0x000000000040132f in segv_handler (signal=11) at charon.c:186
#9 <signal handler called>
#10 0x00007fe55f41afd5 in malloc_consolidate () from /lib64/libc.so.6
#11 0x00007fe55f41ec28 in _int_malloc () from /lib64/libc.so.6
#12 0x00007fe55f41f7ed in calloc () from /lib64/libc.so.6
#13 0x00007fe55f413f2d in open_memstream () from /lib64/libc.so.6
#14 0x00007fe55f489a5b in __vsyslog_chk () from /lib64/libc.so.6
#15 0x00007fe55f48a0d0 in syslog () from /lib64/libc.so.6
#16 0x00007fe560047d22 in log_ (this=0x22adf20, group=<value optimized out>, level=<value optimized out>, thread=6, ike_sa=<value optimized out>, message=<value optimized out>) at bus/listeners/sys_logger.c:97
#17 0x00007fe5604856e8 in invoke_function (this=<value optimized out>, fn=0x7fe55fff3420 <log_cb>, d1=0x7fe54fb26680, d2=0x7fe560049791, d3=0x0, d4=0x382065736e6f7073, d5=0x1) at collections/linked_list.c:414
#18 0x00007fe55fff3c4c in vlog (this=0x22a9a00, group=DBG_ENC, level=LEVEL_CTRL, format=0x7fe560049784 "generating %s", args=0x7fe54fb26700) at bus/bus.c:400
#19 0x00007fe55fff3dcf in log_ (this=<value optimized out>, group=<value optimized out>, level=<value optimized out>, format=<value optimized out>) at bus/bus.c:430
#20 0x00007fe560000130 in generate_message (this=0x234f5a0, keymat=0x7fe540008470, out_generator=0x7fe54fb26ac8, encrypted=0x7fe54fb26ac0) at encoding/message.c:1737
#21 0x00007fe5600009cf in generate (this=0x234f5a0, keymat=0x7fe540008470, packet=0x7fe54fb26b40) at encoding/message.c:1851
#22 0x00007fe5600132ee in generate_message (this=0x7fe5100140c0, message=0x234f5a0, packet=0x7fe54fb26b40) at sa/ike_sa.c:1034
#23 0x00007fe5600158d0 in generate_message_fragmented (this=0x7fe5100140c0, message=0x234f5a0, packets=0x7fe54fb26b88) at sa/ike_sa.c:1083
#24 0x00007fe56002053e in generate_message (this=<value optimized out>, message=<value optimized out>, packets=0x7fe554007d90) at sa/ikev2/task_manager_v2.c:289
#25 0x00007fe560021289 in build_response (this=0x7fe554007cd0, message=<value optimized out>) at sa/ikev2/task_manager_v2.c:828
---Type <return> to continue, or q <return> to quit---
#26 process_request (this=0x7fe554007cd0, message=<value optimized out>) at sa/ikev2/task_manager_v2.c:1072
#27 0x00007fe560021be9 in process_message (this=0x7fe554007cd0, msg=0x7fe51c00c8f0) at sa/ikev2/task_manager_v2.c:1334
#28 0x00007fe560013617 in process_message (this=0x7fe5100140c0, message=0x7fe51c00c8f0) at sa/ike_sa.c:1401
#29 0x00007fe56000e617 in execute (this=0x7fe534009c40) at processing/jobs/process_message_job.c:74
#30 0x00007fe56049cc09 in process_job (worker=0x233ece0) at processing/processor.c:235
#31 process_jobs (worker=0x233ece0) at processing/processor.c:321
#32 0x00007fe5604adb6c in thread_main (this=0x233ed10) at threading/thread.c:303
#33 0x00007fe55f944aa1 in start_thread () from /lib64/libpthread.so.0
#34 0x00007fe55f48d93d in clone () from /lib64/libc.so.6

The problem seems to be in vlog of bus.c.. In this method there is even a comment that it's not 100% threadsafe..

#12 Updated by Tobias Brunner over 4 years ago

I just had the patched strongswan hang again. In GDB I can see that one thread crashed.

That's something different. There seems to be a segmentation fault here while logging to syslog() (i.e. the crash is caused in libc). Since the mutex in our sys-logger is locked during the call to syslog(), to avoid splitting multiple log lines, this obviously causes a deadlock when the signal handler again tries to log something via the sys-logger.

You should try to find out why your system crashes in the allocator here:

#9 <signal handler called>
#10 0x00007fe55f41afd5 in malloc_consolidate () from /lib64/libc.so.6
#11 0x00007fe55f41ec28 in _int_malloc () from /lib64/libc.so.6
#12 0x00007fe55f41f7ed in calloc () from /lib64/libc.so.6
#13 0x00007fe55f413f2d in open_memstream () from /lib64/libc.so.6
#14 0x00007fe55f489a5b in __vsyslog_chk () from /lib64/libc.so.6
#15 0x00007fe55f48a0d0 in syslog () from /lib64/libc.so.6

#13 Updated by Tobias Brunner over 4 years ago

Chris Buechler wrote:

Tobias Brunner wrote:

I did something to that effect in the 1185-vici-action-unlock branch. Let me know if this works for you.

With the changes from that branch, we're still hitting the deadlock we've been fighting. I'm looking for a lab-replicable test case.

Could you provide more information on this? Or was that what M.C. posted?

#14 Updated by M. C. over 4 years ago

The last crash I reported in comment #1185-11 was something related on our side, like you correctly pointed out.
And I never had the deadlock in the vici side anymore since I use the code from the 1185-vici-action-unlock branch.

#15 Updated by Tobias Brunner over 4 years ago

  • Status changed from Feedback to Closed
  • Target version set to 5.4.0
  • Resolution set to Fixed

And I never had the deadlock in the vici side anymore since I use the code from the 1185-vici-action-unlock branch.

OK, thanks for the feedback. I'll close this issue and the patch will be included in the next release.

Chris, if you find the issue you had is not fixed, please open a new ticket and provide details. Thanks!

Also available in: Atom PDF