Project

General

Profile

Issue #1276

Threading: ext-auth hook blocks any other connection attempt

Added by Michael Schmoock over 4 years ago. Updated over 4 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
charon
Affected version:
5.3.5
Resolution:

Description

When using an ext-auth script containing some IO blocking operations (i.e. external web API authorization requests),
it blocks any other connection attempts until the script is returned.

One can easily reproduce this by adding an ext-auth.sh script containing just "sleep 1".
This effectively reduces the (parallel) connection establishment to 1 per second.

Without ext-auth script the connection of 10 parallel clients take about 200 milliseconds:

[root@vpntest3 ipsec-server.git]# time ipsec load-tester initiate 10
..........++++++++++
real    0m0.264s
user 0m0.006s
sys 0m0.003s

Using an ext-auth.sh containing a second sleep only the required time is increased to 10 seconds, instead of the assumed 1.2 seconds, if the hook wouldn't block other threads:

[root@vpntest3 ipsec-server.git]# time ipsec load-tester initiate 10
..........+*********+++++++++
real    0m10.390s
user 0m0.006s
sys 0m0.004s

Using ext-auth for some additional authorization checks seems a quite regular usecase to me, it should not block other parallel connection attempts.
Is there a technical reason for this?

History

#1 Updated by Michael Schmoock over 4 years ago

Update: This has been tested using IKEv2

#2 Updated by Michael Schmoock over 4 years ago

Update2: after extensive testing, the problem also applies to IKEv1 and does not seem to resolve using various config options. It seems to be a code issue within the job scheduler.

Any advice how I can fix this myself? I'm not used to the codebase yet.

#3 Updated by Noel Kuntze over 4 years ago

Assuming that you do not have tuned the settings yet, there is a single lock on the ike_sa table, that prevents any other thread from handling any requests concurrently.
You can tune your settings and see if that improves the situation.

#4 Updated by Michael Schmoock over 4 years ago

I have optimized stuff a bit. But the problem remains.
It runs on a 2x12 core xeon. This is my strongswan.conf:

charon {
load_modular = yes
threads = 96
ikesa_table_size = 1024
ikesa_table_segments = 32
dos_protection = no
plugins {
include strongswan.d/charon/*.conf
ext-auth {
load = yes
script = /root/ipsec-server.git/scripts/extauth.sh
}
stroke {
max_concurrent = 10
timeout = 1000
}
load-tester {
enable = yes
}
}
}

The ipsec.conf is pretty much default ikev2 pubkey auth and a IPv4 virtual IP pool.

#5 Updated by Michael Schmoock over 4 years ago

It's not fixed, but I have been able to identify the lock that is causing the pain.
For ext-auth plugin a authentication listener is called within bus in the it's authorize function.
Before the listener is called (synchroniously) the bus's mutex is locked.

I don't know the consequences yet, but at least my modified strongswan is able to authenticate
multiple conenctions in parallel. Before that it was able to do about 1 connection per second.
After modification it was about 10 per second.

@@ -878,11 +878,11 @@ METHOD(bus_t, authorize, bool,

        ike_sa = this->thread_sa->get(this->thread_sa);

-       this->mutex->lock(this->mutex);
+       //this->mutex->lock(this->mutex);
        enumerator = this->listeners->create_enumerator(this->listeners);
        while (enumerator->enumerate(enumerator, &entry))
        {
-               if (entry->calling || !entry->listener->authorize)
+               if (/*entry->calling ||*/ !entry->listener->authorize)
                {
                        continue;
                }
@@ -900,7 +900,7 @@ METHOD(bus_t, authorize, bool,
                }
        }
        enumerator->destroy(enumerator);
-       this->mutex->unlock(this->mutex);
+       //this->mutex->unlock(this->mutex);
        if (!success)
        {
                alert(this, ALERT_AUTHORIZATION_FAILED);

Whats happening here? So I disabled the bus lock when calling exauth. Additionally I needed to remove the "calling" check, since it would prevent the parallel listener to be called at all. Maybe some active strongSwan developer can give me some hints. I know that the solution is not clean (yet), but I'm also sure the authenticate listener should not block ALL other connection attempts to a responder when using a ike_sa hastable of 1024/32 (ikesa_table_size/ikesa_table_segments).

Not only did it increase the connection establishment per second but also the maximum stable connections from ~1300 to ~2200 on a smaller test setup that I was using this day.

#6 Updated by Michael Schmoock over 4 years ago

I dived a bit deeper into the code. It turns out that the bus is locking any other listener (not just connection/authentication attempts) whenever a listener is called. The reason for this is that listeners can be removed and added during run-time (systemctl reload - config reload etc.) and therefore the operations are synced.

Furthermore, the return value of a listener implementation returns TRUE/FALSE in order to tell the bus if the listener instance is to be removed after execution. However, the 'normal' listener class always returns TRUE (unconditionally). Meaning that it never happens that a listener of that class removes itself after execution and therefore they can all be read locks that can be executed in parallel, i.e:

- updown_listener.c:407
- ext_auth_listener.c:176
- whitelist_listener.c:100
- xauth_pam_listener.c:113
- ... even the high availability ha_ike plugin is always returning TRUE

The drawback of the current implementation is, that this gives really slow multi-threading performance results whenever any blocking i/o listener (ext-auth, ocsp, radius, updown, ...) is called from any thread.

I suggest the following change: introduce read/write locks for the bus listener structure. Read locks can be acquired in parallel from any thread. Write locks have to be synced over all threads, waiting for all current read locks to be returned. The listener interface can be extended to tell the bus if its a synchronous or asynchronous listener (on several or on a single ike_sa's) and if it can remove itself after execution before actually being executed.

My findings can improve the performance by a factor 2-10 (depending on listener configuration). The bus.c code is rather old, I wonder if I have missed the point somewhere else or it's just that nobody is using blocking i/o listeners a lot (ocsp, request/response radius, ext-auth scripts, etc.). It seems a regular use-case to me.

Here the results on how fast a asynchronous but blocking i/o connection of 10 parallel clients should be:

time ipsec load-tester initiate 10
..........++++++++++
real    0m1.023s
user 0m0.007s
sys 0m0.004s

#7 Updated by Tobias Brunner over 4 years ago

  • Status changed from New to Feedback
  • Priority changed from High to Normal

The reason for this is that listeners can be removed and added during run-time (systemctl reload - config reload etc.) and therefore the operations are synced.

That's actually not the main reason. The main reason why this strict locking exists is that this way listeners/plugins basically don't have to care for any locking or multi-thread syncing themselves. They can be implemented just as if there was only a single thread operating on a single IKE_SA.

However, the 'normal' listener class always returns TRUE (unconditionally).

There are a few such listeners in client implementations (controller_t, Android, NetworkManager, Maemo), some of our users might use this functionality too (but it is not really an issue as it could probably also be implemented without such restrictive locking).

The drawback of the current implementation is, that this gives really slow multi-threading performance results whenever any blocking i/o listener (ext-auth, ocsp, radius, updown, ...) is called from any thread.

Yes, the current implementation definitely has its drawbacks and this is a known issue. I personally would very much like to change this so multiple threads can call methods on bus_t for different IKE_SAs concurrently. However, it would require us to look closely at ALL implementations of listener_t and make them thread-safe. This would be quite a lot of work and could easily lead to subtle race conditions or even deadlocks. Another option would be to do this just for some of the callbacks (e.g. authorize) and use separate managers/interfaces for these with other semantics.

Also, OCSP is not blocking users of bus_t as it uses a different interface. The same applies to RADIUS authentication (although like OCSP it does block stuff like shutting down the daemon because the IKE_SA is locked), RADIUS accounting is problematic in some cases, though (not for interim updates but for start/stop messages).

#8 Updated by Michael Schmoock over 4 years ago

If you need me/us to improve the code a bit just tell me. For the moment I'm going to use a patched version of strongSwan for our client. Especially the high-availability plugin is synchronizing on the bus_t. I can't imagine a setup using where the HA plugin can give acceptable performance.

Also available in: Atom PDF