Bug #990
unsafe use of malloc after fork
Description
After upgrading strongwan to 5.2.2 (retried again with 5.3.0), we noticed deadlocks in charon from time to time.
Problem is within the new process handler (added in 5.2.1), calling malloc() after the fork in closefrom();
According to the following post this is not in compliance with POSIX 2008 (SUSv4):
http://lists.uclibc.org/pipermail/uclibc/2011-March/045130.html
For more information about multithreaded applications calling fork:
http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
We use uClibc 0.9.33, where only one mutex protects the malloc area and deadlocks occur quite soon after running charon.
I havent tested strongswan with glibc, but this should happen there too. However glibc uses memory allocation arenas
for multithreaded applications, so its not so likely to happen as with uClibc.
History
#1 Updated by Ulrich Weber about 10 years ago
Attached stress test program to test libc implementation.
Segfaults after some time on my ubuntu workstation with glibc.
#2 Updated by Tobias Brunner about 10 years ago
- Tracker changed from Issue to Bug
- Status changed from New to Feedback
I can't reproduce a crash with your test program on an Ubuntu 14.04 machine by the way (ran fine for over an hour). As far as I could determine it should be safe to call malloc()
after fork()
with glibc (via malloc_atfork()
).
Nevertheless, this is definitely not ideal. I now saw that some implementations (e.g. Pyhton's) directly use syscall()
to enumerate the FD's in /proc
to avoid malloc()
, and there was a discussion on LML about a possible nextfd() syscall to simplify the implementation of closefrom()
a few years ago (which didn't amount to anything it seems).
Doing something similar to Python is definitely possible, the patch in the closefrom-fix branch does so. I suppose we could remove the unsafe fallback and just resort to closing all possible FDs (which could be quite slow if the maximum was increased on a specific system). But that implementation would need some fixing anyway (e.g. if opendir()
uses an FD itself, maybe directly using opendir()
and dirfd()
could fix that - still opendir()
will use malloc()
).
#3 Updated by Ulrich Weber about 10 years ago
My workstation is ubuntu 14.04 too, 64 bit on a quad core i5-3470S. Try to stop and start the program if it not crashes after 5 minutes.
Had one run where id didnt crash too.
merlin@gir:~$ time ./a.out a.out: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed. real 0m22.105s user 0m3.186s sys 0m49.410s merlin@gir:~$ time ./a.out a.out: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed. Segmentation fault (core dumped) real 3m6.941s user 0m35.719s sys 6m45.173s merlin@gir:~$ time ./a.out *** Error in `./a.out': double free or corruption (out): 0x00007fde780008c0 *** Aborted (core dumped) real 1m35.279s user 0m17.237s sys 3m39.286s merlin@gir:~$ time ./a.out *** Error in `./a.outSegmentation fault (core dumped) real 0m38.299s user 0m7.267s sys 1m25.885s
#4 Updated by Tobias Brunner about 10 years ago
My workstation is ubuntu 14.04 too, 64 bit on a quad core i5-3470S. Try to stop and start the program if it not crashes after 5 minutes.
Even multiple instances ran fine here (Quad Core Xeon E3-1230 with HT, 8 GB Ram) for quite a long time. But one instance now eventually crashed with a segmentation fault (after 82 minutes). So, yes, it seems even glibc is not 100% safe.
#5 Updated by Tobias Brunner about 10 years ago
- Status changed from Feedback to Closed
- Assignee set to Tobias Brunner
- Target version set to 5.3.3
- Resolution set to Fixed