Bug #1453
Starter is getting stuck handling ipsec reload
Description
Hi,
We have several hundred connections defined in ipsec.conf. We have script which calls "ipsec reload" whenever a new connection definition is added. If the script calls "ipsec reload" few times in loop due to any reason we are seeing that starter getting stuck in the following call stack and never recovers. We started seeing this issue after upgraded to 5.3.5. Any help in debugging this issue is highly appreciated.
(gdb) bt #0 0x00007faeee9edcb0 in __write_nocancel () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007faeeee31c25 in update (this=<optimized out>) at processing/watcher.c:121 #2 0x00007faeeee31cce in remove_ (this=0xf60490, fd=6) at processing/watcher.c:490 #3 0x00007faeeee2cc1e in destroy (this=0xfe2730) at networking/streams/stream.c:263 #4 0x0000000000405dc3 in send_stroke_msg (msg=0xf61840) at starterstroke.c:119 #5 0x0000000000406476 in starter_stroke_add_conn (cfg=0xfd0e30, conn=0xfbe000) at starterstroke.c:280 #6 0x0000000000403023 in main (argc=<optimized out>, argv=<optimized out>) at starter.c:902 (gdb) info threads Id Target Id Frame * 1 Thread 0x7faeef276700 (LWP 25088) "starter" 0x00007faeee9edcb0 in __write_nocancel () from /lib/x86_64-linux-gnu/libpthread.so.0
Related issues
History
#1 Updated by Tobias Brunner over 9 years ago
- Description updated (diff)
- Category set to starter
- Status changed from New to Feedback
If the script calls "ipsec reload" few times in loop
That command just sends a SIGUSR1 to the starter daemon. I haven't looked into this in detail yet, but maybe the signal handling/multi threading does not interact very well at that point.
On the other hand you perhaps should just call ipsec reload
once from your script if any changes occurred and not for every single change. Actually, you should probably use ipsec update
, which has no effect on unchanged connections. And you might want to have a look at vici/swanctl.
We started seeing this issue after upgraded to 5.3.5
What version did you use before? (Streams and watcher_t
are used since 5.2.0)
#2 Updated by Sankar Penniboyina over 9 years ago
Thanks Tobias. I have updated the script to use "ipsec update" but want to make sure if same issue can occur with "ipsec update" as well as this call stack is hit even when running "ipsec update". I briefly checked the swanctl but don't see any option to reload only the changed connection configs.
We were using 5.1.0 before moving to 5.3.5.
#3 Updated by Tobias Brunner over 9 years ago
I have updated the script to use "ipsec update" but want to make sure if same issue can occur with "ipsec update" as well as this call stack is hit even when running "ipsec update".
Yes it could still happen, especially, if you call it in rapid succession (that's why I said you should only call it once after doing all your file changes, not for every single update). But with update
fewer stroke messages are transmitted, so chances for conflicts are reduced.
I briefly checked the swanctl but don't see any option to reload only the changed connection configs.
The vici plugin merges them automatically and leaves unchanged configs as they are. And multiple --load-conns
calls are serialized.
#4 Updated by Sankar Penniboyina over 9 years ago
Thanks Tobias. I will try swanctl.
#5 Updated by Noel Kuntze over 8 years ago
- Status changed from Feedback to Closed
- Resolution set to No feedback
#6 Updated by Tomas Paukrt about 8 years ago
We have experienced exactly same issue with strongSwan 5.5.3, so I looked into the source code and found out that function update in watcher.c is sending one character to write-end of notify pipe in blocking mode while there is no thread for extracting data from read-end of notify pipe, so program starter stopped working after 65536 writes in our case.
I attached quick fix that switches both ends of notify pipe to non-blocking mode.
#7 Updated by Tobias Brunner about 8 years ago
- Affected version changed from 5.3.5 to 5.5.3
We have experienced exactly same issue with strongSwan 5.5.3, so I looked into the source code and found out that function update in watcher.c is sending one character to write-end of notify pipe in blocking mode while there is no thread for extracting data from read-end of notify pipe, so program starter stopped working after 65536 writes in our case.
I see. Since there are no threads in starter there is, as you noticed, nobody reading from the other end of the notify pipe. However, while we use a stream to connect to the stroke socket, we don't register any callbacks (i.e. these calls happen blocking, without any interaction with watcher). So there is not really any need for this (source:src/libstrongswan/networking/streams/stream.c#L263) call to remove the FD from watcher (doing this conditionally would require an additional flag, though), or calling update()
if there were no changes in watcher (source:src/libstrongswan/processing/watcher.c#L545). I've pushed a commit that changes the latter to the watcher-no-update branch.
#8 Updated by Tobias Brunner about 8 years ago
- Has duplicate Issue #2414: Charon sometimes doesn't react to stroke command added
#9 Updated by Tobias Brunner about 8 years ago
- Has duplicate deleted (Issue #2414: Charon sometimes doesn't react to stroke command)
#10 Updated by Tobias Brunner about 8 years ago
- Related to Issue #2414: Charon sometimes doesn't react to stroke command added
#11 Updated by Tobias Brunner about 8 years ago
- Status changed from Closed to Assigned
- Assignee set to Tobias Brunner
- Resolution deleted (
No feedback)
#12 Updated by Tobias Brunner about 8 years ago
- Tracker changed from Issue to Bug
- Target version set to 5.6.1
#13 Updated by Tobias Brunner almost 8 years ago
- Status changed from Assigned to Closed
- Resolution set to Fixed