Issue #1106
Memory Leak Charon FreeBSD
Description
This box has one IPsec Tunnel, but i have a box that has 100 tunnels that runs out of 2gb of memory and swap space after about 3 days.(The box crashes due to no swap) I have searched everywhere and can't find a issue with my configuration. If i kill Charon and load it again it will gradually eat up memory. Reboot, Fresh Install, nothing stops it from eating memory.
ipsec statusall Status of IKE charon daemon (strongSwan 5.3.2, FreeBSD 10.1-RELEASE-p15, amd64): uptime: 44 days, since Jul 27 13:24:48 2015 worker threads: 10 of 16 idle, 6/0/0/0 working, job queue: 0/0/0/0, scheduled: 3 loaded plugins: charon unbound aes des blowfish rc2 sha1 sha2 md4 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey ipseckey pem openssl fips-prf xcbc cmac hmac curl attr kernel-pfkey kernel-pfroute resolve socket-default stroke smp updown eap-identity eap-sim eap-md5 eap-mschapv2 eap-dynamic eap-radius eap-tls eap-ttls eap-peap xauth-generic xauth-eap whitelist addrblock unity Listening IP addresses: X.X.X.X X.X.X.X X.X.X.X X.X.X.X Connections: bypasslan: %any...%any IKEv1/2 bypasslan: local: uses public key authentication bypasslan: remote: uses public key authentication bypasslan: child: X.X.X.X/X|/0 === X.X.X.X/X|/0 PASS con1: X.X.X.X...X.X.X.X IKEv2, dpddelay=10s con1: local: [X.X.X.X] uses pre-shared key authentication con1: remote: [X.X.X.X] uses pre-shared key authentication con1: child: X.X.X.X/X|/0 === X.X.X.X/X|/0 X.X.X.X/X|/0 TUNNEL, dpdaction=restart Shunted Connections: bypasslan: X.X.X.X/X|/0 === X.X.X.X/X|/0 PASS Routed Connections: con1{2233}: ROUTED, TUNNEL, reqid 1 con1{2233}: X.X.X.X/X|/0 === X.X.X.X/X|/0 X.X.X.X/X|/0 Security Associations (1 up, 0 connecting): con1[188]: ESTABLISHED 3 hours ago, X.X.X.X[X.X.X.X]...X.X.X.X[X.X.X.X] con1[188]: IKEv2 SPIs: 634c4b09653e75f6_i 3f096e7e6e227d29_r*, pre-shared key reauthentication in 4 hours con1[188]: IKE proposal: AES_CBC_256/HMAC_MD5_96/PRF_HMAC_MD5/MODP_1024 con1{2232}: INSTALLED, TUNNEL, reqid 1, ESP SPIs: c42e8a95_i c67e08b4_o con1{2232}: AES_CBC_256/HMAC_MD5_96, 597807 bytes_i (7711 pkts, 0s ago), 10543976 bytes_o (9315 pkts, 1161s ago), rekeying in 17 minutes con1{2232}: X.X.X.X/X|/0 === X.X.X.X/X|/0 X.X.X.X/X|/0 con1{2234}: INSTALLED, TUNNEL, reqid 1, ESP SPIs: c7e77372_i ced2ce23_o con1{2234}: AES_CBC_256/HMAC_MD5_96, 255838 bytes_i (3165 pkts, 0s ago), 1549768 bytes_o (3179 pkts, 0s ago), rekeying in 34 minutes con1{2234}: X.X.X.X/X|/0 === X.X.X.X/X|/0 X.X.X.X/X|/0
/var/etc/ipsec: top | grep charon 35874 root 17 20 0 497M 297M uwait 1 5:21 0.00% charon
Associated revisions
History
#1 Updated by Tobias Brunner over 5 years ago
- Description updated (diff)
- Category changed from charon to freebsd
- Status changed from New to Feedback
X.X.X.X/X|/0
Looks like you are using pfSense. I'm not aware of any memory leaks in charon. If there are any on this system that could be due to a modification added by the pfSense guys (which I think are still not publicly available, so I can't comment on that). Or perhaps because of some FreeBSD peculiarity that we handle incorrectly (e.g. something related to the PF_KEY or UDP sockets). If this is a classic memory leak (i.e. caused by not freeing allocated memory) it would be helpful to know where. As I don't think our own leak detective works on FreeBSD running charon in valgrind might be the best option to find it.
#2 Updated by Adam Piasecki over 5 years ago
It is pfSense, it seems at least a couple of pfsense users are running into this same issue. Thanks for the tips, ill probably install a fresh FreeBSD/strongswan and see if i can get it to act like the pfSense version.
#3 Updated by Chris Buechler over 5 years ago
Didn't realize someone had opened a ticket here. I've confirmed the same memory leak occurs on stock FreeBSD using the strongswan available via 'pkg install'.
The port we build strongswan from, along with changes, is here:
https://github.com/pfsense/FreeBSD-ports/tree/devel/security/strongswan
but not only specific to anything we're doing there. There might be something there that makes some situations worse, but it is replicable on stock FreeBSD and stock strongSwan.
Bringing up 400 connections like the following, changing only left/rightsubnet on subsequent ones:
conn con1 fragmentation = yes keyexchange = ikev1 reauth = yes forceencaps = no mobike = no rekey = yes installpolicy = yes type = tunnel dpdaction = restart dpddelay = 10s dpdtimeout = 60s auto = route left = 10.2.44.114 right = 10.2.44.168 leftid = 10.2.44.114 ikelifetime = 7200s lifetime = 3600s ike = aes256-sha1-modp1024! leftauth = psk rightauth = psk rightid = 10.2.44.168 aggressive = no rightsubnet = 100.66.1.1 leftsubnet = 100.65.1.1
and letting them rekey, it leaks around 4 MB RAM per hour.
Any tips on running charon in valgrind? I'm failing with attempts along the lines of the following.
# valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes ipsec start ==11175== Memcheck, a memory error detector ==11175== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==11175== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==11175== Command: /usr/local/sbin/ipsec start ==11175== valgrind: m_syswrap/syswrap-freebsd.c:3302 (vgSysWrap_freebsd_sys_fcntl_before): Assertion 'Unimplemented functionality' failed. valgrind: valgrind host stacktrace: ==11175== at 0x38052372: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==11175== by 0x38052494: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==11175== by 0x38052616: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==11175== by 0x380B6B00: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==11175== by 0x380A2EE5: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==11175== by 0x3809F81C: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==11175== by 0x380A0C66: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==11175== by 0x380ADEDC: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) sched status: running_tid=1 Thread 1: status = VgTs_Runnable ==11175== at 0x518D63A: _fcntl (in /lib/libc.so.7) ==11175== by 0x50A9ED1: fcntl (in /lib/libc.so.7) ==11175== by 0x40D0B0: ??? (in /bin/sh) ==11175== by 0x41204C: ??? (in /bin/sh) ==11175== by 0x4105A1: ??? (in /bin/sh) ==11175== by 0x402EDE: ??? (in /bin/sh) ==11175== by 0x4020FFF: ???
or similar to this post: https://lists.strongswan.org/pipermail/dev/2015-July/001415.html
#4 Updated by Tobias Brunner over 5 years ago
I've confirmed the same memory leak occurs on stock FreeBSD using the strongswan available via 'pkg install'.
OK, thanks for the confirmation.
The port we build strongswan from, along with changes, is here:
https://github.com/pfsense/FreeBSD-ports/tree/devel/security/strongswan
Thanks, good to know.
Bringing up 400 connections like the following, changing only left/rightsubnet on subsequent ones:
[...]
and letting them rekey, it leaks around 4 MB RAM per hour.
Is it specific to rekeying (of the IKE_SA or the CHILD_SA)? Does the IKE version make a difference (the OP uses IKEv2, so I guess not really).
Any tips on running charon in valgrind? I'm failing with attempts along the lines of the following.
No idea what this particular problem is about, but even when running charon in valgrind directly it apparently doesn't work right away. We use sigwait()
, which FreeBSD's valgrind does not support (unhandled syscall: 429
, followed by [DMN] error -1 while waiting for a signal
), so this requires patching either valgrind or charon.
#5 Updated by Tobias Brunner over 5 years ago
- File sigwait.patch sigwait.patch added
- File valgrind-initial.log valgrind-initial.log added
- File valgrind-rekeyings.log valgrind-rekeyings.log added
The attached patch fixes the sigwait()
issue. And by modifying source:src/starter/invokecharon.c#L129 (similar to the GDB stuff) it is possible to run charon with ipsec.conf-based configs in valgrind on FreeBSD.
I did some tests on FreeBSD 10.2 and while I can confirm the RSS increase in ps/top for each CHILD_SA rekeying, I don't find any memory leaks with valgrind. At least not when I compile with --with-printf-hooks=builtin
, the default (glibc
, which is also supported by FreeBSD's libc) seems to cause leaks for our printf hook callbacks.
The two attached log files are from two runs of charon in valgrind with --leak-check=full --show-leak-kinds=all --undef-value-errors=no
. valgrind-initial.log is taken after the initial CHILD_SA got established. valgrind-rekeyings.log was taken in a second run after four additional rekeyings. They are exactly the same except for the changed number of allocations/frees and one moved block. So there does not seem to be any leaks caused by the rekeyings (or any other leaks for that matter). However, when I do the same thing without valgrind I can see the RSS increase in ps/top with each rekeying. I currently don't know what causes this. Maybe it's an issue with FreeBSD's memory management (fragmentation?).
I'm also not sure what the leaks related to the printf hooks are about, but it seems to be an issue in FreeBSD's libc related to register_printf_function()
when used with [v]snprintf
.
#6 Updated by Jim Thompson about 5 years ago
Hi,
I'm working with Chris on this issue. All testing is occurring on freebsd. Once we find the issue, we'll retest on pfsense.
The leak shows up (but not as badly) with --with-printf-hooks=vstr as well.
I need to go back and re-test with --with-printf-hooks=builtin
I strongly suspect a leak in at least the TS printf-hook ("%#R").
p.s. I think this "(which I think are still not publicly available, so I can't comment on that)" was a bit unfair.
#7 Updated by Renato Botelho about 5 years ago
By default, strongswan built on FreeBSD ports do not force --with-printf-hooks, leaving it as 'auto' and then end up using glibc version.
A test to compare vstr versus builtin would be a good experiment
#8 Updated by Adam Piasecki about 5 years ago
This was fixed in the latest release of pfSense 2.2.5.. https://redmine.pfsense.org/issues/5149
#9 Updated by Chris Buechler about 5 years ago
Adam Piasecki wrote:
This was fixed in the latest release of pfSense 2.2.5.. https://redmine.pfsense.org/issues/5149
More worked around than fixed, but the worst of the leaks are no longer a problem at least. with-printf-hooks=vstr works around the bulk of the memory leaking, then we also switched from SMP to Vici for status output because vstr broke SMP output (and we were planning on switching to vici anyway in the future).
We also updated the FreeBSD port to default to with-printf-hooks=builtin. builtin was preferred as default there because it doesn't have the vstr dependency. builtin appears to leak significantly less than glibc, the former default, but still leaks noticeably in larger scale setups (multi-hundred connections) over time.
https://www.freshports.org/security/strongswan
#10 Updated by Noel Kuntze over 3 years ago
- Status changed from Feedback to Closed
#11 Updated by Renato Botelho over 3 years ago
Was the leak fixed? Is there a specific commit to point out?
#12 Updated by Tobias Brunner over 3 years ago
There is no leak.
Replace usages of sigwait(3) with sigwaitinfo(2)
This is basically the same call, but it has the advantage of being
supported by FreeBSD's valgrind, which sigwait() is not.
References #1106.