Project

General

Profile

Bug #990

unsafe use of malloc after fork

Added by Ulrich Weber over 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Category:
libstrongswan
Target version:
Start date:
11.06.2015
Due date:
Estimated time:
Affected version:
5.3.2
Resolution:
Fixed

Description

After upgrading strongwan to 5.2.2 (retried again with 5.3.0), we noticed deadlocks in charon from time to time.
Problem is within the new process handler (added in 5.2.1), calling malloc() after the fork in closefrom();

According to the following post this is not in compliance with POSIX 2008 (SUSv4):
http://lists.uclibc.org/pipermail/uclibc/2011-March/045130.html

For more information about multithreaded applications calling fork:
http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

We use uClibc 0.9.33, where only one mutex protects the malloc area and deadlocks occur quite soon after running charon.
I havent tested strongswan with glibc, but this should happen there too. However glibc uses memory allocation arenas
for multithreaded applications, so its not so likely to happen as with uClibc.

pthread_malloc_fork_stress_test.c (750 Bytes) pthread_malloc_fork_stress_test.c Ulrich Weber, 11.06.2015 10:03

Associated revisions

Revision b410d7f8 (diff)
Added by Tobias Brunner over 5 years ago

utils: Don't use directory enumerator to close open FDs in closefrom()

Calling malloc() after fork() is potentially unsafe, so we should avoid
it if possible. opendir() will still require an allocation but that's
less than the variant using the enumerator wrapper, thus, decreasing
the conflict potential. This way we can also avoid closing the
FD for the enumerated directory itself.

References #990.

Revision f25f4192 (diff)
Added by Tobias Brunner over 5 years ago

utils: Directly use syscall() to close open FDs in closefrom()

This avoids any allocations, since calling malloc() after fork() is
potentially unsafe.

Fixes #990.

History

#1 Updated by Ulrich Weber over 5 years ago

Attached stress test program to test libc implementation.
Segfaults after some time on my ubuntu workstation with glibc.

#2 Updated by Tobias Brunner over 5 years ago

  • Tracker changed from Issue to Bug
  • Status changed from New to Feedback

I can't reproduce a crash with your test program on an Ubuntu 14.04 machine by the way (ran fine for over an hour). As far as I could determine it should be safe to call malloc() after fork() with glibc (via malloc_atfork()).

Nevertheless, this is definitely not ideal. I now saw that some implementations (e.g. Pyhton's) directly use syscall() to enumerate the FD's in /proc to avoid malloc(), and there was a discussion on LML about a possible nextfd() syscall to simplify the implementation of closefrom() a few years ago (which didn't amount to anything it seems).

Doing something similar to Python is definitely possible, the patch in the closefrom-fix branch does so. I suppose we could remove the unsafe fallback and just resort to closing all possible FDs (which could be quite slow if the maximum was increased on a specific system). But that implementation would need some fixing anyway (e.g. if opendir() uses an FD itself, maybe directly using opendir() and dirfd() could fix that - still opendir() will use malloc()).

#3 Updated by Ulrich Weber over 5 years ago

My workstation is ubuntu 14.04 too, 64 bit on a quad core i5-3470S. Try to stop and start the program if it not crashes after 5 minutes.
Had one run where id didnt crash too.

merlin@gir:~$ time ./a.out 
a.out: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
real    0m22.105s
user    0m3.186s
sys    0m49.410s

merlin@gir:~$ time ./a.out 
a.out: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
Segmentation fault (core dumped)
real    3m6.941s
user    0m35.719s
sys    6m45.173s

merlin@gir:~$ time ./a.out 
*** Error in `./a.out': double free or corruption (out): 0x00007fde780008c0 ***
Aborted (core dumped)
real    1m35.279s
user    0m17.237s
sys    3m39.286s

merlin@gir:~$ time ./a.out 
*** Error in `./a.outSegmentation fault (core dumped)
real    0m38.299s
user    0m7.267s
sys    1m25.885s

#4 Updated by Tobias Brunner over 5 years ago

My workstation is ubuntu 14.04 too, 64 bit on a quad core i5-3470S. Try to stop and start the program if it not crashes after 5 minutes.

Even multiple instances ran fine here (Quad Core Xeon E3-1230 with HT, 8 GB Ram) for quite a long time. But one instance now eventually crashed with a segmentation fault (after 82 minutes). So, yes, it seems even glibc is not 100% safe.

#5 Updated by Tobias Brunner about 5 years ago

  • Status changed from Feedback to Closed
  • Assignee set to Tobias Brunner
  • Target version set to 5.3.3
  • Resolution set to Fixed

Also available in: Atom PDF