Project

General

Profile

Bug #2500

Crash in charon when updating interface and IKE_SAs newly created via HA plugin exist

Added by Emeric Poupon 4 months ago. Updated 2 months ago.

Status:
Closed
Priority:
Normal
Category:
high availability (ha plugin)
Target version:
Start date:
Due date:
Estimated time:
Affected version:
5.5.3
Resolution:
Fixed

Description

Hello,

I have hit a crash, but unfortunately it is very difficult to reproduce.

Using strongSwan 5.5.3 on FreeBSD 9.3:

Current language:  auto; currently asm
(gdb) bt
#0  0x0000000801acd66c in thr_kill () at thr_kill.S:3
#1  0x0000000801b73923 in abort () at /usr/home/build/FreeBSD/tmp/usr/src/lib/libc/stdlib/abort.c:65
#2  0x00000000004018a1 in segv_handler (signal=Could not find the frame base for "segv_handler".
) at charon.c:187
#3  0x00000008018523e9 in handle_signal (actp=0x7ffffe3f17d0, sig=11, info=0x7ffffe3f1bb0, ucp=0x7ffffe3f1840)
    at /usr/home/build/FreeBSD/tmp/usr/src/lib/libthr/thread/thr_sig.c:240
#4  0x0000000801852650 in thr_sighandler (sig=11, info=0x7ffffe3f1bb0, _ucp=0x7ffffe3f1840) at /usr/home/build/FreeBSD/tmp/usr/src/lib/libthr/thread/thr_sig.c:183
#5  <signal handler called>
#6  0x0000000800b33df9 in ike_cfg_has_address (cfg=0x0, addr=0x806517600, local=true) at config/ike_cfg.c:572
#7  0x0000000800b6a663 in roam (this=0x8042aa400, address=true) at sa/ike_sa.c:2578
#8  0x0000000800b5e5a2 in execute (this=0x80717deb0) at processing/jobs/roam_job.c:72
#9  0x00000008008730ca in process_job (this=0x802021740, worker=0x8021aa8e0) at processing/processor.c:235
#10 0x000000080087339b in process_jobs (worker=0x8021aa8e0) at processing/processor.c:321
#11 0x000000080088c4d4 in thread_main (this=0x8021a3b20) at threading/thread.c:331
#12 0x000000080184de10 in thread_start (curthread=0x80200a800) at /usr/home/build/tmp/usr/src/lib/libthr/thread/thr_create.c:284

Here is what is in the ike sa whose ike_cfg is set to null ptr:

(gdb) f 7
#7  0x0000000800b6a663 in roam (this=0x8042aa400, address=true) at sa/ike_sa.c:2578
2578    sa/ike_sa.c: No such file or directory.
    in sa/ike_sa.c
Current language:  auto; currently c
(gdb) p *this
$1 = {
  public = {
    get_id = 0x800b67f50 <get_id>, 
    get_version = 0x800b67f70 <get_version>, 
    get_unique_id = 0x800b64d80 <get_unique_id>, 
    get_state = 0x800b66070 <get_state>, 
    set_state = 0x800b66090 <set_state>, 
    get_name = 0x800b64da0 <get_name>, 
    get_statistic = 0x800b64df0 <get_statistic>, 
    set_statistic = 0x800b64e30 <set_statistic>, 
    get_my_host = 0x800b64e60 <get_my_host>, 
    set_my_host = 0x800b64e80 <set_my_host>, 
    get_other_host = 0x800b64ed0 <get_other_host>, 
    get_other_sns_user_id = 0x800b6b390 <get_other_sns_user_id>, 
    set_other_sns_user_id = 0x800b6b340 <set_other_sns_user_id>, 
    get_other_sns_user = 0x800b6b440 <get_other_sns_user>, 
    get_other_sns_user_domain = 0x800b6b460 <get_other_sns_user_domain>, 
    set_other_sns_user = 0x800b6b3b0 <set_other_sns_user>, 
    set_other_host = 0x800b64ef0 <set_other_host>, 
    float_ports = 0x800b66d80 <float_ports>, 
    update_hosts = 0x800b66e70 <update_hosts>, 
    get_my_id = 0x800b67f90 <get_my_id>, 
    set_my_id = 0x800b67fb0 <set_my_id>, 
    get_other_id = 0x800b68000 <get_other_id>, 
    get_other_eap_id = 0x800b68020 <get_other_eap_id>, 
    set_other_id = 0x800b68140 <set_other_id>, 
    get_ike_cfg = 0x800b65aa0 <get_ike_cfg>, 
    set_ike_cfg = 0x800b65ac0 <set_ike_cfg>, 
    get_peer_cfg = 0x800b64f60 <get_peer_cfg>, 
    set_peer_cfg = 0x800b64f80 <set_peer_cfg>, 
    get_auth_cfg = 0x800b65030 <get_auth_cfg>, 
    add_auth_cfg = 0x800b65070 <add_auth_cfg>, 
    create_auth_cfg_enumerator = 0x800b650c0 <create_auth_cfg_enumerator>, 
    verify_peer_certificate = 0x800b651c0 <verify_peer_certificate>, 
    get_proposal = 0x800b65720 <get_proposal>, 
    set_proposal = 0x800b65740 <set_proposal>, 
    set_message_id = 0x800b657a0 <set_message_id>, 
    get_message_id = 0x800b65810 <get_message_id>, 
    add_peer_address = 0x800b66a50 <add_peer_address>, 
    create_peer_address_enumerator = 0x800b66a80 <create_peer_address_enumerator>, 
    clear_peer_addresses = 0x800b66ae0 <clear_peer_addresses>, 
    has_mapping_changed = 0x800b66b20 <has_mapping_changed>, 
    enable_extension = 0x800b65b00 <enable_extension>, 
    supports_extension = 0x800b65b30 <supports_extension>, 
    set_condition = 0x800b65b90 <set_condition>, 
    has_condition = 0x800b65b60 <has_condition>, 
    get_pending_updates = 0x800b66d60 <get_pending_updates>, 
    set_pending_updates = 0x800b66d40 <set_pending_updates>, 
    initiate = 0x800b67ac0 <initiate>, 
---Type <return> to continue, or q <return> to quit---
    retry_initiate = 0x800b67d80 <retry_initiate>, 
    delete = 0x800b68660 <delete_>, 
    roam = 0x800b6a5d0 <roam>, 
    process_message = 0x800b67de0 <process_message>, 
    generate_message = 0x800b67250 <generate_message>, 
    generate_message_fragmented = 0x800b67460 <generate_message_fragmented>, 
    retransmit = 0x800b69c20 <retransmit>, 
    send_dpd = 0x800b65ea0 <send_dpd>, 
    send_keepalive = 0x800b65850 <send_keepalive>, 
    redirect = 0x800b69a00 <redirect>, 
    handle_redirect = 0x800b69850 <handle_redirect>, 
    get_redirected_from = 0x800b64f40 <get_redirected_from>, 
    get_keymat = 0x800b66740 <get_keymat>, 
    add_child_sa = 0x800b68190 <add_child_sa>, 
    get_child_sa = 0x800b681f0 <get_child_sa>, 
    get_child_count = 0x800b68290 <get_child_count>, 
    create_child_sa_enumerator = 0x800b683a0 <create_child_sa_enumerator>, 
    remove_child_sa = 0x800b68460 <remove_child_sa>, 
    rekey_child_sa = 0x800b684c0 <rekey_child_sa>, 
    delete_child_sa = 0x800b68530 <delete_child_sa>, 
    destroy_child_sa = 0x800b685b0 <destroy_child_sa>, 
    rekey = 0x800b68820 <rekey>, 
    reauth = 0x800b68890 <reauth>, 
    reestablish = 0x800b68e30 <reestablish>, 
    set_auth_lifetime = 0x800b69f40 <set_auth_lifetime>, 
    add_virtual_ip = 0x800b66760 <add_virtual_ip>, 
    clear_virtual_ips = 0x800b66910 <clear_virtual_ips>, 
    create_virtual_ip_enumerator = 0x800b66a00 <create_virtual_ip_enumerator>, 
    add_configuration_attribute = 0x800b6a9f0 <add_configuration_attribute>, 
    create_attribute_enumerator = 0x800b6ac30 <create_attribute_enumerator>, 
    set_kmaddress = 0x800b67690 <set_kmaddress>, 
    create_task_enumerator = 0x800b6ac70 <create_task_enumerator>, 
    flush_queue = 0x800b6acb0 <flush_queue>, 
    queue_task = 0x800b6acf0 <queue_task>, 
    queue_task_delayed = 0x800b6ad30 <queue_task_delayed>, 
    inherit_pre = 0x800b6ad70 <inherit_pre>, 
    inherit_post = 0x800b6ae50 <inherit_post>, 
    reset = 0x800b66630 <reset>, 
    destroy = 0x800b6b480 <destroy>
  }, 
  ike_sa_id = 0x804350000, 
  version = IKEV2, 
  unique_id = 1566, 
  state = IKE_CONNECTING, 
  ike_cfg = 0x0, 
  peer_cfg = 0x0, 
  my_auth = 0x806657320, 
  other_auth = 0x8066573e0, 
  my_auths = 0x804340860, 
---Type <return> to continue, or q <return> to quit---
  other_auths = 0x8043408a0, 
  proposal = 0x8065c8e00, 
  task_manager = 0x8038e5a40, 
  my_host = 0x806517600, 
  other_host = 0x806517a00, 
  my_id = 0x806657440, 
  other_id = 0x8066574a0, 
  extensions = EXT_DPD, 
  conditions = 0, 
  child_sas = 0x0, 
  keymat = 0x80656d580, 
  my_vips = 0x0, 
  other_vips = 0x0, 
  attributes = 0x8043408b0, 
  peer_addresses = 0x0, 
  nat_detection_dest = {
    ptr = 0x0, 
    len = 0
  }, 
  pending_updates = 0, 
  keepalive_interval = 20, 
  keepalive_job = 0x0, 
  retry_initiate_interval = 0, 
  retry_initiate_queued = false, 
  stats = {0, 0, 0, 0, 3001, 3001}, 
  keyingtry = 0, 
  local_host = 0x0, 
  remote_host = 0x0, 
  flush_auth_cfg = false, 
  fragment_size = 1280, 
  follow_redirects = true, 
  redirected_from = 0x0, 
  redirected_at = 0x0, 
  other_sns_user_id = 0x0, 
  other_sns_user = 0x0, 
  other_sns_user_domain = 0x0
}

Associated revisions

Revision 007a2701 (diff)
Added by Tobias Brunner 2 months ago

ike: Don't handle roam events if no IKE config is available

IKE_SAs newly created via HA_IKE_ADD message don't have any IKE or peer
config assigned yet (this happens later with an HA_IKE_UPDATE message).
And because the state is initially set to IKE_CONNECTING the roam() method
does not immediately return, as it later would for passive HA SAs. This
might cause the check for explicitly configured local addresses to crash
the daemon with a segmentation fault.

Fixes #2500.

History

#1 Updated by Tobias Brunner 4 months ago

  • Status changed from New to Feedback

Hm, looks strange. The IKE_SA has no ike_cfg_t assigned (no peer_cfg_t either). Not sure when this could happen because as initiator the config should be set right away and as responder it's basically the first thing that happens in task_manager_v2::process_message() when the message that created the IKE_SA is processed (unless parsing that message fails or no config is found, in which case the SA is destroyed). Since the IKE_SA is locked during that time the roam job should only have access to it once a config is set. Anything special in your environment? HA plugin? Code modifications (other_sns_user... members are not from us, for example)?

#2 Updated by Emeric Poupon 4 months ago

Tobias Brunner wrote:

Hm, looks strange. The IKE_SA has no ike_cfg_t assigned (no peer_cfg_t either). Not sure when this could happen because as initiator the config should be set right away and as responder it's basically the first thing that happens in task_manager_v2::process_message() when the message that created the IKE_SA is processed (unless parsing that message fails or no config is found, in which case the SA is destroyed). Since the IKE_SA is locked during that time the roam job should only have access to it once a config is set. Anything special in your environment? HA plugin? Code modifications (other_sns_user... members are not from us, for example)?

Hello,

Well we are still using ipsec stroke, the HA plugin is loaded and indeed in use.
We also have this in order to prevent bad synchronization states: https://lists.strongswan.org/pipermail/dev/2015-March/001281.html
The extra members you see in the IKE SA are just additional information put by our custom authentication plugin to ease debugging.

There are a lot of connections (1024) and a lot of virtual interfaces (1024 too)
I can't really say when it crashed (it crashed only twice). Not sure if it was right after a segment responsibility change or a 'reboot' command.

#3 Updated by Tobias Brunner 4 months ago

  • Tracker changed from Issue to Bug
  • Subject changed from Crash in charon when updating interface to Crash in charon when updating interface and IKE_SAs newly created via HA plugin exist
  • Category set to high availability (ha plugin)
  • Target version set to 5.6.2

Well we are still using ipsec stroke, the HA plugin is loaded and indeed in use.

OK. With the HA plugin there is the HA_IKE_ADD message that evidently creates an IKE_SA without config (it is sent after key derivation, i.e. while processing IKE_SA_INIT). The state of this IKE_SA is set to IKE_CONNECTING not e.g. IKE_PASSIVE that would cause the roam() method to return immediately. The config (the peer config, from which the ike config is fetched) is set later with an HA_IKE_UPDATE message. Once a config has been set by the message triggered by ike_updown() the state is finally changed from IKE_CONNECTING to IKE_PASSIVE. Between these two events the observed crash in the roam() method might occur.

The crash is easily fixed (see 2500-ha-roam branch), but I wonder if there are other such cases. Although roam() is kind of a special case as other method calls are either triggered by inbound messages, which only the other HA peer receives, or by jobs scheduled by the IKE_SA itself, which is also not the case for such passive SAs. But it might be possible to handle the IKE_PASSIVE state differently (e.g. not as state itself, but rather as condition of the IKE_SA, like COND_STALE, i.e. it would be in addition to the IKE_SA's normal state). But since a lot currently relies on this state such a change might lead to subtle other bugs.

#4 Updated by Tobias Brunner 2 months ago

  • Status changed from Feedback to Closed
  • Assignee set to Tobias Brunner
  • Resolution set to Fixed

Also available in: Atom PDF