Project

General

Profile

Issue #2146

strongSwan dual-active HA | charon daemon crashes when node re-integrates

Added by Danny Kulchinsky almost 9 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
high availability (ha plugin)
Affected version:
5.5.0
Resolution:
No feedback

Description

Following issue #2139

We have two nodes running CentOS 7.2, Kernel 4.7.2 (Vanilla + HA patches) and strongSwan 5.5.0 (configure in dual-active HA)

The following patches were applied on both nodes:
1) Patch from issue #1192 - 1192-half-open-ha branch
2) Patch from issue #2144 - ha-pool.patch

Both nodes were running for some time and I tried to perform a re-integration of one of the nodes (MAPR-PDG-POC04), once it re-integrated in the Cluster it was able to sync the states of the tunnels but a minute later it caused the existing node (MAPR-PDG-POC03) to crash, subsequently when it (MAPR-PDG-POC03) recovered by starter the other node failed (again, about a minute later) and so on.

Here are the steps and logs:

Stable state: strongSwan is active on both nodes (MAPR-PDG-POC03 and MAPR-PDG-POC04), ~2700 tunnels distributed across both nodes.
Step 1: Stop strongSwan on MAPR-PDG-POC04 using "systemctl stop strongswan" (it seems that charon did not respond to this request and was eventually killed).

Oct 12 17:15:54 MAPR-PDG-POC04 systemd: Stopping strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf...
Oct 12 17:16:02 MAPR-PDG-POC04 ipsec: starter_stop_charon(): charon does not respond, sending KILL
Oct 12 17:16:02 MAPR-PDG-POC04 ipsec: charon stopped after 8200 ms
Oct 12 17:16:02 MAPR-PDG-POC04 ipsec: ipsec starter stopped
Oct 12 17:16:02 MAPR-PDG-POC04 systemd: Stopped strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf.

Step 2: Checked surviving node (MAPR-PDG-POC03), it has taken ownership for all the tunnels.
Step 3: Restarted strongSwan on MAPR-PDG-POC04, I didn't notice any issues with UDP In Errors or Receive Buffer Errors (counters did not change), also based on swanctl -S output on both nodes, the joining node was able to sync on all the tunnels .

About 1 minute or so after MAPR-PDG-POC04 joined the cluster, charon crashed on MAPR-PDG-POC03, once it recovered and synced from MAPR-PDG-POC04 it crashed as well and so it continues until I force one of the nodes to stop re-starting charon.

Here is the log data I was able to gather about the crashes from both nodes:

MAPR-PDG-POC04 charon.log:

2016-10-12 17:18:59.736 07[DMN] <some name|827> thread 7 received 11
2016-10-12 17:18:59.738 07[LIB] <some name|827>  dumping 2 stack frame addresses:
2016-10-12 17:18:59.738 07[LIB] <some name|827>   /lib64/libpthread.so.0 @ 0x7f1d2526f000 [0x7f1d2527e100]
2016-10-12 17:18:59.743 07[LIB] <some name|827>     -> sigaction.c:?
2016-10-12 17:18:59.743 07[LIB] <some name|827>     [0x7f1c901348b0]
2016-10-12 17:18:59.747 07[DMN] <some name|827> killing ourself, received critical signal

MAPR-PDG-POC04 messages log:

Oct 12 17:15:54 MAPR-PDG-POC04 systemd: Stopping strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf...
Oct 12 17:16:02 MAPR-PDG-POC04 ipsec: starter_stop_charon(): charon does not respond, sending KILL
Oct 12 17:16:02 MAPR-PDG-POC04 ipsec: charon stopped after 8200 ms
Oct 12 17:16:02 MAPR-PDG-POC04 ipsec: ipsec starter stopped
Oct 12 17:16:02 MAPR-PDG-POC04 systemd: Stopped strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf.
Oct 12 17:17:16 MAPR-PDG-POC04 systemd: Started strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf.
Oct 12 17:17:16 MAPR-PDG-POC04 systemd: Starting strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf...
Oct 12 17:17:16 MAPR-PDG-POC04 ipsec: Starting strongSwan 5.5.0 IPsec [starter]...
Oct 12 17:17:16 MAPR-PDG-POC04 ipsec: charon (19486) started after 20 ms
Oct 12 17:18:59 MAPR-PDG-POC04 ipsec: dumping 2 stack frame addresses:
Oct 12 17:18:59 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7f1d2526f000 [0x7f1d2527e100]
Oct 12 17:18:59 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 17:18:59 MAPR-PDG-POC04 ipsec: [0x7f1c901348b0]
Oct 12 17:18:59 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:19:04 MAPR-PDG-POC04 ipsec: charon (20381) started after 20 ms
Oct 12 17:19:45 MAPR-PDG-POC04 ipsec: dumping 1 stack frame addresses:
Oct 12 17:19:45 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fdfa8b27000 [0x7fdfa8b36100]
Oct 12 17:19:45 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 17:19:45 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:19:50 MAPR-PDG-POC04 ipsec: charon (20641) started after 40 ms
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: dumping 16 stack frame addresses:
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fb7064fb000 [0x7fb70650a100]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fb706a19000 [0x7fb706a2c535]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/config/proposal.c:105
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7fb706ca2000 [0x7fb706cb8fd4]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/collections/enumerator.c:525
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fb706a19000 [0x7fb706a2c8af]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/config/proposal.c:140
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fb706a19000 [0x7fb706a45ce2]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:406
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fb706a19000 [0x7fb706a44ec5]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:873 (discriminator 1)
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fb706a19000 [0x7fb706a45100]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:1282
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fb706a19000 [0x7fb706a4926f]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa.c:2777 (discriminator 1)
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fb706a19000 [0x7fb706a4bee1]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:126
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fb706a19000 [0x7fb706a4e07d]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:1785
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/plugins/libstrongswan-ha.so @ 0x7fb6ff760000 [0x7fb6ff76485e]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/plugins/ha/ha_dispatcher.c:605
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7fb706ca2000 [0x7fb706cd1d4e]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/jobs/callback_job.c:78
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7fb706ca2000 [0x7fb706cd25e2]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/processor.c:235
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7fb706ca2000 [0x7fb706ce2a55]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/threading/thread.c:332 (discriminator 2)
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fb7064fb000 [0x7fb706502dc5]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> pthread_create.c:?
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: /lib64/libc.so.6 @ 0x7fb705f35000 (clone+0x6d) [0x7fb70602bced]
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: -> ??:?
Oct 12 17:26:10 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:26:15 MAPR-PDG-POC04 ipsec: charon (23909) started after 20 ms
Oct 12 17:37:39 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:37:44 MAPR-PDG-POC04 ipsec: charon (29489) started after 40 ms
Oct 12 17:44:09 MAPR-PDG-POC04 ipsec: dumping 2 stack frame addresses:
Oct 12 17:44:09 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fa7f461a000 [0x7fa7f4629100]
Oct 12 17:44:09 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 17:44:09 MAPR-PDG-POC04 ipsec: [0x7fa7c40480d0]
Oct 12 17:44:09 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:44:14 MAPR-PDG-POC04 ipsec: charon (31943) started after 40 ms
Oct 12 17:55:41 MAPR-PDG-POC04 ipsec: dumping 2 stack frame addresses:
Oct 12 17:55:41 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7f3e21b8a000 [0x7f3e21b99100]
Oct 12 17:55:41 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 17:55:41 MAPR-PDG-POC04 ipsec: [0x7f3dec0635a0]
Oct 12 17:55:41 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:55:46 MAPR-PDG-POC04 ipsec: charon (1667) started after 40 ms
Oct 12 17:56:18 MAPR-PDG-POC04 ipsec: dumping 2 stack frame addresses:
Oct 12 17:56:18 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fd322c23000 [0x7fd322c32100]
Oct 12 17:56:18 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 17:56:18 MAPR-PDG-POC04 ipsec: [0x7fd2e00721b0]
Oct 12 17:56:18 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:56:23 MAPR-PDG-POC04 ipsec: charon (1856) started after 40 ms
Oct 12 18:02:41 MAPR-PDG-POC04 ipsec: dumping 1 stack frame addresses:
Oct 12 18:02:41 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fa5cd0a0000 [0x7fa5cd0af100]
Oct 12 18:02:41 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 18:02:41 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:02:46 MAPR-PDG-POC04 ipsec: charon (3159) started after 40 ms
Oct 12 18:03:23 MAPR-PDG-POC04 ipsec: dumping 1 stack frame addresses:
Oct 12 18:03:23 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fab3979d000 [0x7fab397ac100]
Oct 12 18:03:23 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 18:03:23 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:03:28 MAPR-PDG-POC04 ipsec: charon (3351) started after 40 ms
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: dumping 13 stack frame addresses:
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fa010ddd000 [0x7fa010dec100]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fa0112fb000 [0x7fa011327cdf]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:406
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fa0112fb000 [0x7fa011326ec5]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:873 (discriminator 1)
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fa0112fb000 [0x7fa011327100]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:1282
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fa0112fb000 [0x7fa01132b26f]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa.c:2777 (discriminator 1)
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fa0112fb000 [0x7fa01132dee1]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:126
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7fa0112fb000 [0x7fa01133007d]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:1785
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/plugins/libstrongswan-ha.so @ 0x7fa00a042000 [0x7fa00a04685e]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/plugins/ha/ha_dispatcher.c:605
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7fa011584000 [0x7fa0115b3d4e]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/jobs/callback_job.c:78
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7fa011584000 [0x7fa0115b45e2]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/processor.c:235
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7fa011584000 [0x7fa0115c4a55]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/threading/thread.c:332 (discriminator 2)
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fa010ddd000 [0x7fa010de4dc5]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> pthread_create.c:?
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: /lib64/libc.so.6 @ 0x7fa010817000 (clone+0x6d) [0x7fa01090dced]
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: -> ??:?
Oct 12 18:14:39 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:14:44 MAPR-PDG-POC04 ipsec: charon (5341) started after 40 ms
Oct 12 18:16:36 MAPR-PDG-POC04 ipsec: dumping 1 stack frame addresses:
Oct 12 18:16:36 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7fb79dc9c000 [0x7fb79dcab100]
Oct 12 18:16:36 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 18:16:36 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:16:41 MAPR-PDG-POC04 ipsec: charon (5794) started after 40 ms
Oct 12 18:17:22 MAPR-PDG-POC04 ipsec: dumping 2 stack frame addresses:
Oct 12 18:17:22 MAPR-PDG-POC04 ipsec: /lib64/libpthread.so.0 @ 0x7f7e53f30000 [0x7f7e53f3f100]
Oct 12 18:17:22 MAPR-PDG-POC04 ipsec: -> sigaction.c:?
Oct 12 18:17:22 MAPR-PDG-POC04 ipsec: [0x7f7dc800be50]
Oct 12 18:17:22 MAPR-PDG-POC04 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:17:27 MAPR-PDG-POC04 ipsec: charon (5984) started after 60 ms

MAPR-PDG-POC03 charon.log:

2016-10-12 17:18:36.441 04[DMN] <some name|320377> thread 4 received 11
2016-10-12 17:18:36.442 04[LIB] <some name|320377>  dumping 1 stack frame addresses:
2016-10-12 17:18:36.442 04[LIB] <some name|320377>   /lib64/libpthread.so.0 @ 0x7f70f4398000 [0x7f70f43a7100]
2016-10-12 17:18:36.469 04[LIB] <some name|320377>     -> sigaction.c:?
2016-10-12 17:18:36.486 04[DMN] <some name|320377> killing ourself, received critical signal

MAPR-PDG-POC03 messages log:

Oct 12 17:18:36 MAPR-PDG-POC03 ipsec: dumping 1 stack frame addresses:
Oct 12 17:18:36 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f70f4398000 [0x7f70f43a7100]
Oct 12 17:18:36 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 17:18:36 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:18:41 MAPR-PDG-POC03 ipsec: charon (6899) started after 40 ms
Oct 12 17:19:24 MAPR-PDG-POC03 ipsec: dumping 1 stack frame addresses:
Oct 12 17:19:24 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f9415703000 [0x7f9415712100]
Oct 12 17:19:24 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 17:19:24 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:19:29 MAPR-PDG-POC03 ipsec: charon (7377) started after 20 ms
Oct 12 17:25:47 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:25:53 MAPR-PDG-POC03 ipsec: charon (10820) started after 20 ms
Oct 12 17:37:19 MAPR-PDG-POC03 ipsec: dumping 2 stack frame addresses:
Oct 12 17:37:19 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f9d2e79f000 [0x7f9d2e7ae100]
Oct 12 17:37:19 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 17:37:19 MAPR-PDG-POC03 ipsec: [0x7f9ca80759f0]
Oct 12 17:37:19 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:37:24 MAPR-PDG-POC03 ipsec: charon (17397) started after 40 ms
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: dumping 13 stack frame addresses:
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f8249d45000 [0x7f8249d54100]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f824a263000 [0x7f824a28fced]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:406
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f824a263000 [0x7f824a28eec5]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:873 (discriminator 1)
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f824a263000 [0x7f824a28f100]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:1282
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f824a263000 [0x7f824a29326f]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa.c:2777 (discriminator 1)
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f824a263000 [0x7f824a295ee1]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:126
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f824a263000 [0x7f824a29807d]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:1785
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/plugins/libstrongswan-ha.so @ 0x7f8242faa000 [0x7f8242fae85e]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/plugins/ha/ha_dispatcher.c:605
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7f824a4ec000 [0x7f824a51bd4e]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/jobs/callback_job.c:78
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7f824a4ec000 [0x7f824a51c5e2]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/processor.c:235
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7f824a4ec000 [0x7f824a52ca55]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/threading/thread.c:332 (discriminator 2)
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f8249d45000 [0x7f8249d4cdc5]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> pthread_create.c:?
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: /lib64/libc.so.6 @ 0x7f824977f000 (clone+0x6d) [0x7f8249875ced]
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: -> ??:?
Oct 12 17:43:42 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:43:47 MAPR-PDG-POC03 ipsec: charon (20015) started after 40 ms
Oct 12 17:55:22 MAPR-PDG-POC03 ipsec: dumping 1 stack frame addresses:
Oct 12 17:55:22 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7fdde9490000 [0x7fdde949f100]
Oct 12 17:55:22 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 17:55:22 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:55:27 MAPR-PDG-POC03 ipsec: charon (22336) started after 20 ms
Oct 12 17:56:00 MAPR-PDG-POC03 ipsec: dumping 2 stack frame addresses:
Oct 12 17:56:00 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f3c554bc000 [0x7f3c554cb100]
Oct 12 17:56:00 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 17:56:00 MAPR-PDG-POC03 ipsec: [0x7f3bcc09fee0]
Oct 12 17:56:00 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:56:05 MAPR-PDG-POC03 ipsec: charon (22531) started after 40 ms
Oct 12 17:56:38 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 17:56:43 MAPR-PDG-POC03 ipsec: charon (22564) started after 40 ms
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: dumping 13 stack frame addresses:
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f0103952000 [0x7f0103961100]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f0103e70000 [0x7f0103e9cced]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:406
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f0103e70000 [0x7f0103e9bec5]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:873 (discriminator 1)
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f0103e70000 [0x7f0103e9c100]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:1282
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f0103e70000 [0x7f0103ea026f]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa.c:2777 (discriminator 1)
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f0103e70000 [0x7f0103ea2ee1]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:126
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7f0103e70000 [0x7f0103ea507d]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:1785
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/plugins/libstrongswan-ha.so @ 0x7f00fcbb7000 [0x7f00fcbbb85e]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/plugins/ha/ha_dispatcher.c:605
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7f01040f9000 [0x7f0104128d4e]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/jobs/callback_job.c:78
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7f01040f9000 [0x7f01041295e2]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/processor.c:235
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7f01040f9000 [0x7f0104139a55]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/threading/thread.c:332 (discriminator 2)
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f0103952000 [0x7f0103959dc5]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> pthread_create.c:?
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: /lib64/libc.so.6 @ 0x7f010338c000 (clone+0x6d) [0x7f0103482ced]
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: -> ??:?
Oct 12 18:03:00 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:03:05 MAPR-PDG-POC03 ipsec: charon (24054) started after 60 ms
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: dumping 13 stack frame addresses:
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7ff8453cf000 [0x7ff8453de100]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7ff8458ed000 [0x7ff845919ced]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:406
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7ff8458ed000 [0x7ff845918ec5]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:873 (discriminator 1)
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7ff8458ed000 [0x7ff845919100]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/child_sa.c:1282
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7ff8458ed000 [0x7ff84591d26f]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa.c:2777 (discriminator 1)
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7ff8458ed000 [0x7ff84591fee1]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:126
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libcharon.so.0 @ 0x7ff8458ed000 [0x7ff84592207d]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/sa/ike_sa_manager.c:1785
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/plugins/libstrongswan-ha.so @ 0x7ff83e634000 [0x7ff83e63885e]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libcharon/plugins/ha/ha_dispatcher.c:605
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7ff845b76000 [0x7ff845ba5d4e]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/jobs/callback_job.c:78
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7ff845b76000 [0x7ff845ba65e2]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/processing/processor.c:235
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /usr/lib/ipsec/libstrongswan.so.0 @ 0x7ff845b76000 [0x7ff845bb6a55]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> /root/strongSwan/strongswan-5.5.0/src/libstrongswan/threading/thread.c:332 (discriminator 2)
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7ff8453cf000 [0x7ff8453d6dc5]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> pthread_create.c:?
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: /lib64/libc.so.6 @ 0x7ff844e09000 (clone+0x6d) [0x7ff844effced]
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: -> ??:?
Oct 12 18:14:11 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:14:16 MAPR-PDG-POC03 ipsec: charon (26038) started after 40 ms
Oct 12 18:15:00 MAPR-PDG-POC03 ipsec: dumping 1 stack frame addresses:
Oct 12 18:15:00 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7fafc7f40000 [0x7fafc7f4f100]
Oct 12 18:15:00 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 18:15:00 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:15:05 MAPR-PDG-POC03 ipsec: charon (26312) started after 40 ms
Oct 12 18:16:58 MAPR-PDG-POC03 ipsec: dumping 1 stack frame addresses:
Oct 12 18:16:58 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7fa55765d000 [0x7fa55766c100]
Oct 12 18:16:58 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 18:16:58 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:17:03 MAPR-PDG-POC03 ipsec: charon (26702) started after 40 ms
Oct 12 18:17:42 MAPR-PDG-POC03 ipsec: dumping 1 stack frame addresses:
Oct 12 18:17:42 MAPR-PDG-POC03 ipsec: /lib64/libpthread.so.0 @ 0x7f2049e14000 [0x7f2049e23100]
Oct 12 18:17:42 MAPR-PDG-POC03 ipsec: -> sigaction.c:?
Oct 12 18:17:42 MAPR-PDG-POC03 ipsec: charon has died -- restart scheduled (5sec)
Oct 12 18:17:47 MAPR-PDG-POC03 ipsec: charon (26738) started after 60 ms

History

#1 Updated by Danny Kulchinsky almost 9 years ago

Maybe I should build strongSwan with --enable-bfd-backtraces or --enable-unwind-backtraces (or both) ?

Perhaps you can guide me on how to setup strongSwan to produce core dumps so we could analyze better why it crashes... ?

#2 Updated by Danny Kulchinsky almost 9 years ago

Not sure it if it's related, however yesterday node MAPR-PDG-POC03 had a kernel panic (seem to be related to cryptd - log below), once it has booted and re-integrated into the Cluster it caused the above mentioned crashes of charon on the 2nd node (MAPR-PDG-POC04) and subsequently on itself (about 10 times) then it stopped and now both are running OK for ~26 hours.

I'm completely lost here :( I'm unable to bring this HA setup to work.

Perhaps the Kernel (4.7.2) we are running is not stable enough ? any ideas/leads why charon is crashing when a node re-integrates ?

Kernel panic vmcore-dmesg:

[677402.899694] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[677402.899739] IP: [<ffffffffa04b5cea>] esp_input_done2+0x3a/0x240 [esp4]
[677402.899800] PGD 1c60b7067 PUD 23307f067 PMD 0 
[677402.899821] Oops: 0000 [#1] SMP
[677402.899835] Modules linked in: iptable_nat nf_nat_ipv4 nf_nat iptable_filter ip_tables nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill authencesn binfmt_misc authenc echainiv xfrm6_mode_tunnel xfrm4_mode_tunnel xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_policy ip6table_filter ipt_REJECT ip6_tables nf_reject_ipv4 vmw_vsock_vmci_transport ipt_CLUSTERIP vsock nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack sb_edac edac_core intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev intel_rapl_perf vmw_balloon pcspkr input_leds vmw_vmci sg i2c_piix4 parport_pc shpchp parport acpi_cpufreq xfs libcrc32c sr_mod cdrom ata_generic pata_acpi
[677402.900133]  sd_mod vmwgfx crc32c_intel serio_raw mptspi drm_kms_helper scsi_transport_spi syscopyarea sysfillrect sysimgblt fb_sys_fops mptscsih vmxnet3 ttm mptbase ata_piix drm libata floppy fjes dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
[677402.900236] CPU: 5 PID: 20348 Comm: kworker/5:0 Not tainted 4.7.2+ #1
[677402.900259] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/17/2015
[677402.900297] Workqueue: crypto cryptd_queue_worker [cryptd]
[677402.900317] task: ffff880172a9ad00 ti: ffff88004df08000 task.ti: ffff88004df08000
[677402.900342] RIP: 0010:[<ffffffffa04b5cea>]  [<ffffffffa04b5cea>] esp_input_done2+0x3a/0x240 [esp4]
[677402.900375] RSP: 0018:ffff88004df0bd38  EFLAGS: 00010246
[677402.900394] RAX: 0000000000000000 RBX: ffff8800b80c7c00 RCX: ffff880172a9ad00
[677402.900418] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81601520
[677402.900441] RBP: ffff88004df0bd88 R08: ffff880163720c00 R09: 0000000000000020
[677402.900465] R10: ffff880043b665b0 R11: 0000000000000001 R12: ffffe8ffffd419d8
[677402.900488] R13: ffff880163720da0 R14: ffff88023fd5d200 R15: 0000000000000001
[677402.900512] FS:  0000000000000000(0000) GS:ffff88023fd40000(0000) knlGS:0000000000000000
[677402.900539] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[677402.900558] CR2: 0000000000000004 CR3: 000000015e48a000 CR4: 00000000000406e0
[677402.900621] Stack:
[677402.901325]  0000000000000000 0000000000000000 0000000000000000 ffff880163720c00
[677402.902034]  000000008e22ad01 ffff8800b80c7c00 ffffe8ffffd419d8 ffff880163720da0
[677402.902763]  ffff88023fd5d200 0000000000000140 ffff88004df0bda0 ffffffffa04b61a6
[677402.903485] Call Trace:
[677402.904185]  [<ffffffffa04b61a6>] esp_input_done+0x16/0x30 [esp4]
[677402.904893]  [<ffffffffa03e9419>] cryptd_blkcipher_crypt+0x69/0xa0 [cryptd]
[677402.905592]  [<ffffffffa03e946c>] cryptd_blkcipher_decrypt+0x1c/0x20 [cryptd]
[677402.906281]  [<ffffffffa03e9c34>] cryptd_queue_worker+0x64/0x90 [cryptd]
[677402.906979]  [<ffffffff810988b2>] process_one_work+0x152/0x400
[677402.907660]  [<ffffffff810991a5>] worker_thread+0x125/0x4b0
[677402.908324]  [<ffffffff81099080>] ? rescuer_thread+0x380/0x380
[677402.908973]  [<ffffffff8109ecd8>] kthread+0xd8/0xf0
[677402.909606]  [<ffffffff8172207f>] ret_from_fork+0x1f/0x40
[677402.910227]  [<ffffffff8109ec00>] ? kthread_park+0x60/0x60
[677402.910832] Code: 41 54 53 48 89 fb 48 83 ec 28 44 8b bf 80 00 00 00 89 75 b4 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8b 47 68 48 8b 7f 50 <8b> 48 04 8d 51 ff 48 63 d2 4c 8b 64 d0 08 49 8b 84 24 f0 02 00 
[677402.912678] RIP  [<ffffffffa04b5cea>] esp_input_done2+0x3a/0x240 [esp4]
[677402.913275]  RSP <ffff88004df0bd38>
[677402.913848] CR2: 0000000000000004

#3 Updated by Tobias Brunner almost 9 years ago

  • Status changed from New to Feedback

Maybe I should build strongSwan with --enable-bfd-backtraces or --enable-unwind-backtraces (or both) ?

Backtraces are already logged, so that's not necessary. They seem to indicate that during an IKE_SA delete (triggered by the other node) some traffic selectors of an attached CHILD_SA were invalid/NULL. Not really sure how that could happen as the TS should have been accessed before the deletion (i.e. when the CHILD_SA was created and installed in the kernel). The logs don't tell us much about what's going on during the synchronization, maybe there already were some issues when the SAs were created.

EDIT: I mainly concentrated on the messages on MAPR-PDG-POC03, but I noticed that the first backtrace on MAPR-PDG-POC04 is actually very strange. There is a jump from source:src/libcharon/sa/child_sa.c@74de8c3727#L406, where traffic_selector_t:get_type() is called, to proposal_t:get_algorithm() source:src/libcharon/config/proposal.c@74de8c3727#L140. This makes not much sense. I've seen backtraces like these if libraries/plugins/executables of different versions/builds were mixed at runtime. Could you check that all the binaries come from the same build? It's strange though, that this should only have an effect when reintegrating a node.

Perhaps you can guide me on how to setup strongSwan to produce core dumps so we could analyze better why it crashes... ?

That probably depends on your system.

Not sure it if it's related, however yesterday node MAPR-PDG-POC03 had a kernel panic

Don't think this is related. Looks like a similar issue as #2139 (asynchronous ESP handling).

#4 Updated by Danny Kulchinsky almost 9 years ago

Hi Tobias, Thank you for getting back to me!

I've been building strongSwan from source code on both machines, always using the same base (5.5.0) with the addition of the various patches.

I did not execute make uninstall when deploying new builds (i.e. before make install), perhaps this could cause some issues ?

EDIT: I was able to figure out why core dumps were not being generated on our nodes and fixed it, and seems that I was able to catch this issue ~4 hours ago on both nodes. Please let me know if you need anything specific from the core dumps (using gdb ?), I could also share them with you if required.

Also, I verified using md5sum that all executable, libraries & plugins (/usr/libexec/ipsec, /usr/lib/ipsec & /usr/lib/ipsec/plugins) are identical on both nodes.

Did the same for all configuration files and keys/certificates, all identical (except ha.conf with local/remote values being swapped).

#5 Updated by Tobias Brunner over 7 years ago

  • Category set to high availability (ha plugin)
  • Status changed from Feedback to Closed
  • Resolution set to No feedback

Closing old issues. If this is still a problem, please reopen.