Project

General

Profile

Issue #1106

Memory Leak Charon FreeBSD

Added by Adam Piasecki over 4 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
freebsd
Affected version:
5.3.2
Resolution:

Description

This box has one IPsec Tunnel, but i have a box that has 100 tunnels that runs out of 2gb of memory and swap space after about 3 days.(The box crashes due to no swap) I have searched everywhere and can't find a issue with my configuration. If i kill Charon and load it again it will gradually eat up memory. Reboot, Fresh Install, nothing stops it from eating memory.

ipsec statusall
Status of IKE charon daemon (strongSwan 5.3.2, FreeBSD 10.1-RELEASE-p15, amd64):
  uptime: 44 days, since Jul 27 13:24:48 2015
  worker threads: 10 of 16 idle, 6/0/0/0 working, job queue: 0/0/0/0, scheduled: 3
  loaded plugins: charon unbound aes des blowfish rc2 sha1 sha2 md4 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey ipseckey pem openssl fips-prf xcbc cmac hmac curl attr kernel-pfkey kernel-pfroute resolve socket-default stroke smp updown eap-identity eap-sim eap-md5 eap-mschapv2 eap-dynamic eap-radius eap-tls eap-ttls eap-peap xauth-generic xauth-eap whitelist addrblock unity
Listening IP addresses:
  X.X.X.X
  X.X.X.X
  X.X.X.X
  X.X.X.X
Connections:
   bypasslan:  %any...%any  IKEv1/2
   bypasslan:   local:  uses public key authentication
   bypasslan:   remote: uses public key authentication
   bypasslan:   child:  X.X.X.X/X|/0 === X.X.X.X/X|/0 PASS
        con1:  X.X.X.X...X.X.X.X  IKEv2, dpddelay=10s
        con1:   local:  [X.X.X.X] uses pre-shared key authentication
        con1:   remote: [X.X.X.X] uses pre-shared key authentication
        con1:   child:  X.X.X.X/X|/0 === X.X.X.X/X|/0 X.X.X.X/X|/0 TUNNEL, dpdaction=restart
Shunted Connections:
   bypasslan:  X.X.X.X/X|/0 === X.X.X.X/X|/0 PASS
Routed Connections:
        con1{2233}:  ROUTED, TUNNEL, reqid 1
        con1{2233}:   X.X.X.X/X|/0 === X.X.X.X/X|/0 X.X.X.X/X|/0
Security Associations (1 up, 0 connecting):
        con1[188]: ESTABLISHED 3 hours ago, X.X.X.X[X.X.X.X]...X.X.X.X[X.X.X.X]
        con1[188]: IKEv2 SPIs: 634c4b09653e75f6_i 3f096e7e6e227d29_r*, pre-shared key reauthentication in 4 hours
        con1[188]: IKE proposal: AES_CBC_256/HMAC_MD5_96/PRF_HMAC_MD5/MODP_1024
        con1{2232}:  INSTALLED, TUNNEL, reqid 1, ESP SPIs: c42e8a95_i c67e08b4_o
        con1{2232}:  AES_CBC_256/HMAC_MD5_96, 597807 bytes_i (7711 pkts, 0s ago), 10543976 bytes_o (9315 pkts, 1161s ago), rekeying in 17 minutes
        con1{2232}:   X.X.X.X/X|/0 === X.X.X.X/X|/0 X.X.X.X/X|/0
        con1{2234}:  INSTALLED, TUNNEL, reqid 1, ESP SPIs: c7e77372_i ced2ce23_o
        con1{2234}:  AES_CBC_256/HMAC_MD5_96, 255838 bytes_i (3165 pkts, 0s ago), 1549768 bytes_o (3179 pkts, 0s ago), rekeying in 34 minutes
        con1{2234}:   X.X.X.X/X|/0 === X.X.X.X/X|/0 X.X.X.X/X|/0

/var/etc/ipsec: top | grep charon
35874 root       17  20    0   497M   297M uwait   1   5:21   0.00% charon
sigwait.patch (501 Bytes) sigwait.patch Tobias Brunner, 17.09.2015 16:02
valgrind-initial.log (20.2 KB) valgrind-initial.log Tobias Brunner, 17.09.2015 16:17
valgrind-rekeyings.log (20.2 KB) valgrind-rekeyings.log Tobias Brunner, 17.09.2015 16:17

Associated revisions

Revision 85814809 (diff)
Added by Tobias Brunner over 4 years ago

Replace usages of sigwait(3) with sigwaitinfo(2)

This is basically the same call, but it has the advantage of being
supported by FreeBSD's valgrind, which sigwait() is not.

References #1106.

History

#1 Updated by Tobias Brunner over 4 years ago

  • Description updated (diff)
  • Category changed from charon to freebsd
  • Status changed from New to Feedback

X.X.X.X/X|/0

Looks like you are using pfSense. I'm not aware of any memory leaks in charon. If there are any on this system that could be due to a modification added by the pfSense guys (which I think are still not publicly available, so I can't comment on that). Or perhaps because of some FreeBSD peculiarity that we handle incorrectly (e.g. something related to the PF_KEY or UDP sockets). If this is a classic memory leak (i.e. caused by not freeing allocated memory) it would be helpful to know where. As I don't think our own leak detective works on FreeBSD running charon in valgrind might be the best option to find it.

#2 Updated by Adam Piasecki over 4 years ago

It is pfSense, it seems at least a couple of pfsense users are running into this same issue. Thanks for the tips, ill probably install a fresh FreeBSD/strongswan and see if i can get it to act like the pfSense version.

#3 Updated by Chris Buechler over 4 years ago

Didn't realize someone had opened a ticket here. I've confirmed the same memory leak occurs on stock FreeBSD using the strongswan available via 'pkg install'.

The port we build strongswan from, along with changes, is here:
https://github.com/pfsense/FreeBSD-ports/tree/devel/security/strongswan
but not only specific to anything we're doing there. There might be something there that makes some situations worse, but it is replicable on stock FreeBSD and stock strongSwan.

Bringing up 400 connections like the following, changing only left/rightsubnet on subsequent ones:

conn con1
        fragmentation = yes
        keyexchange = ikev1
        reauth = yes
        forceencaps = no
        mobike = no
        rekey = yes
        installpolicy = yes
        type = tunnel
        dpdaction = restart
        dpddelay = 10s
        dpdtimeout = 60s
        auto = route
        left = 10.2.44.114
        right = 10.2.44.168
        leftid = 10.2.44.114
        ikelifetime = 7200s
        lifetime = 3600s
        ike = aes256-sha1-modp1024!
        leftauth = psk
        rightauth = psk
        rightid = 10.2.44.168
        aggressive = no
        rightsubnet = 100.66.1.1
        leftsubnet = 100.65.1.1

and letting them rekey, it leaks around 4 MB RAM per hour.

Any tips on running charon in valgrind? I'm failing with attempts along the lines of the following.

 # valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes ipsec start
==11175== Memcheck, a memory error detector
==11175== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==11175== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==11175== Command: /usr/local/sbin/ipsec start
==11175==

valgrind: m_syswrap/syswrap-freebsd.c:3302 (vgSysWrap_freebsd_sys_fcntl_before): Assertion 'Unimplemented functionality' failed.
valgrind: valgrind

host stacktrace:
==11175==    at 0x38052372: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==11175==    by 0x38052494: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==11175==    by 0x38052616: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==11175==    by 0x380B6B00: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==11175==    by 0x380A2EE5: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==11175==    by 0x3809F81C: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==11175==    by 0x380A0C66: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==11175==    by 0x380ADEDC: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==11175==    at 0x518D63A: _fcntl (in /lib/libc.so.7)
==11175==    by 0x50A9ED1: fcntl (in /lib/libc.so.7)
==11175==    by 0x40D0B0: ??? (in /bin/sh)
==11175==    by 0x41204C: ??? (in /bin/sh)
==11175==    by 0x4105A1: ??? (in /bin/sh)
==11175==    by 0x402EDE: ??? (in /bin/sh)
==11175==    by 0x4020FFF: ???

or similar to this post: https://lists.strongswan.org/pipermail/dev/2015-July/001415.html

#4 Updated by Tobias Brunner over 4 years ago

I've confirmed the same memory leak occurs on stock FreeBSD using the strongswan available via 'pkg install'.

OK, thanks for the confirmation.

The port we build strongswan from, along with changes, is here:
https://github.com/pfsense/FreeBSD-ports/tree/devel/security/strongswan

Thanks, good to know.

Bringing up 400 connections like the following, changing only left/rightsubnet on subsequent ones:
[...]
and letting them rekey, it leaks around 4 MB RAM per hour.

Is it specific to rekeying (of the IKE_SA or the CHILD_SA)? Does the IKE version make a difference (the OP uses IKEv2, so I guess not really).

Any tips on running charon in valgrind? I'm failing with attempts along the lines of the following.

No idea what this particular problem is about, but even when running charon in valgrind directly it apparently doesn't work right away. We use sigwait(), which FreeBSD's valgrind does not support (unhandled syscall: 429, followed by [DMN] error -1 while waiting for a signal), so this requires patching either valgrind or charon.

#5 Updated by Tobias Brunner over 4 years ago

The attached patch fixes the sigwait() issue. And by modifying source:src/starter/invokecharon.c#L129 (similar to the GDB stuff) it is possible to run charon with ipsec.conf-based configs in valgrind on FreeBSD.

I did some tests on FreeBSD 10.2 and while I can confirm the RSS increase in ps/top for each CHILD_SA rekeying, I don't find any memory leaks with valgrind. At least not when I compile with --with-printf-hooks=builtin, the default (glibc, which is also supported by FreeBSD's libc) seems to cause leaks for our printf hook callbacks.

The two attached log files are from two runs of charon in valgrind with --leak-check=full --show-leak-kinds=all --undef-value-errors=no. valgrind-initial.log is taken after the initial CHILD_SA got established. valgrind-rekeyings.log was taken in a second run after four additional rekeyings. They are exactly the same except for the changed number of allocations/frees and one moved block. So there does not seem to be any leaks caused by the rekeyings (or any other leaks for that matter). However, when I do the same thing without valgrind I can see the RSS increase in ps/top with each rekeying. I currently don't know what causes this. Maybe it's an issue with FreeBSD's memory management (fragmentation?).

I'm also not sure what the leaks related to the printf hooks are about, but it seems to be an issue in FreeBSD's libc related to register_printf_function() when used with [v]snprintf.

#6 Updated by Jim Thompson over 4 years ago

Hi,

I'm working with Chris on this issue. All testing is occurring on freebsd. Once we find the issue, we'll retest on pfsense.

The leak shows up (but not as badly) with --with-printf-hooks=vstr as well.

I need to go back and re-test with --with-printf-hooks=builtin

I strongly suspect a leak in at least the TS printf-hook ("%#R").

p.s. I think this "(which I think are still not publicly available, so I can't comment on that)" was a bit unfair.

#7 Updated by Renato Botelho over 4 years ago

By default, strongswan built on FreeBSD ports do not force --with-printf-hooks, leaving it as 'auto' and then end up using glibc version.

A test to compare vstr versus builtin would be a good experiment

#8 Updated by Adam Piasecki over 4 years ago

This was fixed in the latest release of pfSense 2.2.5.. https://redmine.pfsense.org/issues/5149

#9 Updated by Chris Buechler over 4 years ago

Adam Piasecki wrote:

This was fixed in the latest release of pfSense 2.2.5.. https://redmine.pfsense.org/issues/5149

More worked around than fixed, but the worst of the leaks are no longer a problem at least. with-printf-hooks=vstr works around the bulk of the memory leaking, then we also switched from SMP to Vici for status output because vstr broke SMP output (and we were planning on switching to vici anyway in the future).

We also updated the FreeBSD port to default to with-printf-hooks=builtin. builtin was preferred as default there because it doesn't have the vstr dependency. builtin appears to leak significantly less than glibc, the former default, but still leaks noticeably in larger scale setups (multi-hundred connections) over time.
https://www.freshports.org/security/strongswan

#10 Updated by Noel Kuntze over 2 years ago

  • Status changed from Feedback to Closed

#11 Updated by Renato Botelho over 2 years ago

Was the leak fixed? Is there a specific commit to point out?

#12 Updated by Tobias Brunner over 2 years ago

There is no leak.

Also available in: Atom PDF