Project

General

Profile

Bug #632

MOBIKE issue when multiple tasks are queued

Added by Andre Valentin over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Category:
libcharon
Target version:
Start date:
04.07.2014
Due date:
Estimated time:
Affected version:
5.1.3
Resolution:
Fixed

Description

At first I have to thank you for this great software. I'm currently using it in a chroot environement on my cellphone to establish a vpn connection. I have android 4.4 with the annouying bug, that's why I use this approach.
The setup:
Strongswan is installed on a firewall.
The device roams between wifi (private connection to the firewall) and public mobile network.

That my ipsec.conf

config setup
        uniqueids=no

conn kal
        left=%any
        leftid=@android.XX.net
        rightallowany=yes
        ikelifetime=8h
        keylife=2h
        keyexchange=ikev2
        authby=secret
        compress=no
        rekey=yes
    mobike=yes
        keyingtries=%forever
        dpdaction=restart
    leftsourceip = %config4
        right=kal.sip.XX.net
        rightid=@kal.dyn.XX.net
        rightsubnet=0.0.0.0/0,::/0
        auto=start

Steps to produce the error:
  1. A connection via mobile network to strongswan ipsec server is created. Ping works
  2. The device roams to wifi. Ping works
  3. The device roams back to mobile network. Ping does not work.

If I start with the wifi connection in step 1 it's vice versa.

Look at this:

Jul  3 22:03:04 14[KNL] creating roam job due to address/link change
Jul  3 22:03:04 14[MGR] checkout IKE_SA
Jul  3 22:03:04 14[MGR] IKE_SA kal[2] successfully checked out
Jul  3 22:03:04 14[KNL] using 37.85.26.21 as address to reach 84.139.36.183/32
Jul  3 22:03:04 14[IKE] old path is not available anymore, try to find another
Jul  3 22:03:04 14[IKE] looking for a route to 84.139.36.183 ...
Jul  3 22:03:04 14[KNL] using 37.85.26.21 as address to reach 84.139.36.183/32
Jul  3 22:03:04 14[IKE] requesting address change using MOBIKE
Jul  3 22:03:04 14[IKE] delaying task initiation, INFORMATIONAL exchange in progress
Jul  3 22:03:04 14[MGR] checkin IKE_SA kal[2]
Jul  3 22:03:04 14[MGR] check-in of IKE_SA successful.
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 09[KNL] virtual IP 10.150.242.130 is on interface ipsec0
Jul  3 22:03:04 04[JOB] watched FD 16 ready to read
Jul  3 22:03:04 04[JOB] watcher going to select()
Jul  3 22:03:04 01[JOB] next event in 99ms, waiting
Jul  3 22:03:04 04[JOB] watcher got notification, rebuilding
Jul  3 22:03:04 04[JOB] watcher going to select()
Jul  3 22:03:04 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:04 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:04 01[JOB] got event, queuing job for execution
Jul  3 22:03:04 01[JOB] next event in 1s 392ms, waiting
Jul  3 22:03:04 12[KNL] creating roam job due to route change
Jul  3 22:03:04 12[MGR] checkout IKE_SA
Jul  3 22:03:04 12[MGR] IKE_SA kal[2] successfully checked out
Jul  3 22:03:04 12[KNL] using 37.85.26.21 as address to reach 84.139.36.183/32
Jul  3 22:03:04 12[IKE] old path is not available anymore, try to find another
Jul  3 22:03:04 12[IKE] looking for a route to 84.139.36.183 ...
Jul  3 22:03:04 12[KNL] using 37.85.26.21 as address to reach 84.139.36.183/32
Jul  3 22:03:04 12[IKE] requesting address change using MOBIKE
Jul  3 22:03:04 12[IKE] delaying task initiation, INFORMATIONAL exchange in progress
Jul  3 22:03:04 12[MGR] checkin IKE_SA kal[2]
Jul  3 22:03:04 12[MGR] check-in of IKE_SA successful.
Jul  3 22:03:04 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:04 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:05 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:05 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:05 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:05 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:05 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:05 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 01[JOB] got event, queuing job for execution
Jul  3 22:03:06 01[JOB] next event in 22s 567ms, waiting
Jul  3 22:03:06 16[MGR] checkout IKE_SA
Jul  3 22:03:06 16[MGR] IKE_SA kal[2] successfully checked out
Jul  3 22:03:06 16[IKE] path probing attempt 1
Jul  3 22:03:06 16[KNL] using 37.85.26.21 as address to reach 84.139.36.183/32
Jul  3 22:03:06 16[IKE] checking path 37.85.26.21[4500] - 84.139.36.183[4500]
Jul  3 22:03:06 16[NET] sending packet: from 37.85.26.21[4500] to 84.139.36.183[4500] (76 bytes)
Jul  3 22:03:06 16[KNL] using ::1 as address to reach 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2/128
Jul  3 22:03:06 16[IKE] checking path ::1[4500] - 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2[4500]
Jul  3 22:03:06 16[NET] sending packet: from ::1[4500] to 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2[4500] (76 bytes)
Jul  3 22:03:06 16[KNL] using ::1 as address to reach 2003:68:2d37:c301::1/128
Jul  3 22:03:06 16[IKE] checking path ::1[4500] - 2003:68:2d37:c301::1[4500]
Jul  3 22:03:06 16[NET] sending packet: from ::1[4500] to 2003:68:2d37:c301::1[4500] (76 bytes)
Jul  3 22:03:06 16[MGR] checkin IKE_SA kal[2]
Jul  3 22:03:06 16[MGR] check-in of IKE_SA successful.
Jul  3 22:03:06 02[NET] sending packet: from 37.85.26.21[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] sending packet: from ::1[4500] to 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2[4500]
Jul  3 22:03:06 02[NET] no socket found to send IPv6 packet from port 4500
Jul  3 22:03:06 02[NET] sending packet: from ::1[4500] to 2003:68:2d37:c301::1[4500]
Jul  3 22:03:06 02[NET] no socket found to send IPv6 packet from port 4500
Jul  3 22:03:06 01[JOB] next event in 2s 499ms, waiting
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 05[NET] received packet: from 84.139.36.183[4500] to 37.85.26.21[4500]
Jul  3 22:03:06 05[ENC] parsing header of message
Jul  3 22:03:06 05[ENC] parsing HEADER payload, 76 bytes left
Jul  3 22:03:06 05[ENC]   parsing rule 0 IKE_SPI
Jul  3 22:03:06 05[ENC]   parsing rule 1 IKE_SPI
Jul  3 22:03:06 05[ENC]   parsing rule 2 U_INT_8
Jul  3 22:03:06 05[ENC]   parsing rule 3 U_INT_4
Jul  3 22:03:06 05[ENC]   parsing rule 4 U_INT_4
Jul  3 22:03:06 05[ENC]   parsing rule 5 U_INT_8
Jul  3 22:03:06 05[ENC]   parsing rule 6 RESERVED_BIT
Jul  3 22:03:06 05[ENC]   parsing rule 7 RESERVED_BIT
Jul  3 22:03:06 05[ENC]   parsing rule 8 FLAG
Jul  3 22:03:06 05[ENC]   parsing rule 9 FLAG
Jul  3 22:03:06 05[ENC]   parsing rule 10 FLAG
Jul  3 22:03:06 05[ENC]   parsing rule 11 FLAG
Jul  3 22:03:06 05[ENC]   parsing rule 12 FLAG
Jul  3 22:03:06 05[ENC]   parsing rule 13 FLAG
Jul  3 22:03:06 05[ENC]   parsing rule 14 U_INT_32
Jul  3 22:03:06 05[ENC]   parsing rule 15 HEADER_LENGTH
Jul  3 22:03:06 05[ENC] parsing HEADER payload finished
Jul  3 22:03:06 05[ENC] parsed a INFORMATIONAL response header
Jul  3 22:03:06 05[NET] waiting for data on sockets
Jul  3 22:03:06 07[MGR] checkout IKE_SA by message
Jul  3 22:03:06 07[MGR] IKE_SA kal[2] successfully checked out
Jul  3 22:03:06 07[NET] received packet: from 84.139.36.183[4500] to 37.85.26.21[4500] (76 bytes)
Jul  3 22:03:06 07[ENC] parsing body of message, first payload is ENCRYPTED
Jul  3 22:03:06 07[ENC] starting parsing a ENCRYPTED payload
Jul  3 22:03:06 07[ENC] parsing ENCRYPTED payload, 48 bytes left
Jul  3 22:03:06 07[ENC]   parsing rule 0 U_INT_8
Jul  3 22:03:06 07[ENC]   parsing rule 1 U_INT_8
Jul  3 22:03:06 07[ENC]   parsing rule 2 PAYLOAD_LENGTH
Jul  3 22:03:06 07[ENC]   parsing rule 3 CHUNK_DATA
Jul  3 22:03:06 07[ENC] parsing ENCRYPTED payload finished
Jul  3 22:03:06 07[ENC] verifying payload of type ENCRYPTED
Jul  3 22:03:06 07[ENC] ENCRYPTED payload verified, adding to payload list
Jul  3 22:03:06 07[ENC] ENCRYPTED payload found, stop parsing
Jul  3 22:03:06 07[ENC] process payload of type ENCRYPTED
Jul  3 22:03:06 07[ENC] found an encrypted payload
Jul  3 22:03:06 07[ENC] parsed content of encryption payload
Jul  3 22:03:06 07[ENC] verifying message structure
Jul  3 22:03:06 07[ENC] parsed INFORMATIONAL response 16 [ ]
Jul  3 22:03:06 07[IKE] activating new tasks
Jul  3 22:03:06 07[IKE]   activating IKE_MOBIKE task
Jul  3 22:03:06 07[KNL] using 37.85.26.21 as address to reach 84.139.36.183/32
Jul  3 22:03:06 07[ENC] generating INFORMATIONAL request 17 [ ]
Jul  3 22:03:06 07[ENC] generating payload of type HEADER
Jul  3 22:03:06 07[ENC]   generating rule 0 IKE_SPI
Jul  3 22:03:06 07[ENC]   generating rule 1 IKE_SPI
Jul  3 22:03:06 07[ENC]   generating rule 2 U_INT_8
Jul  3 22:03:06 07[ENC]   generating rule 3 U_INT_4
Jul  3 22:03:06 07[ENC]   generating rule 4 U_INT_4
Jul  3 22:03:06 07[ENC]   generating rule 5 U_INT_8
Jul  3 22:03:06 07[ENC]   generating rule 6 RESERVED_BIT
Jul  3 22:03:06 07[ENC]   generating rule 7 RESERVED_BIT
Jul  3 22:03:06 07[ENC]   generating rule 8 FLAG
Jul  3 22:03:06 07[ENC]   generating rule 9 FLAG
Jul  3 22:03:06 07[ENC]   generating rule 10 FLAG
Jul  3 22:03:06 07[ENC]   generating rule 11 FLAG
Jul  3 22:03:06 07[ENC]   generating rule 12 FLAG
Jul  3 22:03:06 07[ENC]   generating rule 13 FLAG
Jul  3 22:03:06 07[ENC]   generating rule 14 U_INT_32
Jul  3 22:03:06 07[ENC]   generating rule 15 HEADER_LENGTH
Jul  3 22:03:06 07[ENC] generating HEADER payload finished
Jul  3 22:03:06 07[ENC] generating payload of type ENCRYPTED
Jul  3 22:03:06 07[ENC]   generating rule 0 U_INT_8
Jul  3 22:03:06 07[ENC]   generating rule 1 U_INT_8
Jul  3 22:03:06 07[ENC]   generating rule 2 PAYLOAD_LENGTH
Jul  3 22:03:06 07[ENC]   generating rule 3 CHUNK_DATA
Jul  3 22:03:06 07[ENC] generating ENCRYPTED payload finished
Jul  3 22:03:06 07[KNL] using 37.85.26.21 as address to reach 84.139.36.183/32
Jul  3 22:03:06 07[IKE] checking path 37.85.26.21[4500] - 84.139.36.183[4500]
Jul  3 22:03:06 07[NET] sending packet: from 37.85.26.21[4500] to 84.139.36.183[4500] (76 bytes)
Jul  3 22:03:06 02[NET] sending packet: from 37.85.26.21[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 07[KNL] using ::1 as address to reach 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2/128
Jul  3 22:03:06 07[IKE] checking path ::1[4500] - 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2[4500]
Jul  3 22:03:06 07[NET] sending packet: from ::1[4500] to 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2[4500] (76 bytes)
Jul  3 22:03:06 02[NET] sending packet: from ::1[4500] to 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2[4500]
Jul  3 22:03:06 07[KNL] using ::1 as address to reach 2003:68:2d37:c301::1/128
Jul  3 22:03:06 07[IKE] checking path ::1[4500] - 2003:68:2d37:c301::1[4500]
Jul  3 22:03:06 02[NET] no socket found to send IPv6 packet from port 4500
Jul  3 22:03:06 07[NET] sending packet: from ::1[4500] to 2003:68:2d37:c301::1[4500] (76 bytes)
Jul  3 22:03:06 07[MGR] checkin IKE_SA kal[2]
Jul  3 22:03:06 02[NET] sending packet: from ::1[4500] to 2003:68:2d37:c301::1[4500]
Jul  3 22:03:06 07[MGR] check-in of IKE_SA successful.
Jul  3 22:03:06 01[JOB] next event in 2s 290ms, waiting
Jul  3 22:03:06 02[NET] no socket found to send IPv6 packet from port 4500
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 02[NET] sending packet: from 192.168.203.77[4500] to 84.139.36.183[4500]
Jul  3 22:03:06 02[NET] error writing to socket: Invalid argument
Jul  3 22:03:06 05[NET] received packet: from 84.139.36.183[4500] to 37.85.26.21[4500]
.......

As you can see in the last lines of the log, charon received an update from it source IP, but it does not use it afterwards!
This has been testet with 5.1.3 and 5.2.0rc1

Thanks for taking a look!

Associated revisions

Revision 5dec7d2f
Added by Tobias Brunner about 6 years ago

Merge branch 'mobike-fixes'

These changes improve the handling of MOBIKE tasks, for instance, when
retransmitting and no path is available.

Fixes #632.

History

#1 Updated by Tobias Brunner over 6 years ago

Thanks for the report. I have analyzed the logs and I found the main problem.

When IP addresses/interfaces/routes change a route lookup is done and if one of the peer's IP addresses seems reachable a MOBIKE task gets queued. When this task runs it will send packets from all known local addresses to all known and compatible remote addresses to find a working path. While this check runs further address/interface/route changes may happen and additional MOBIKE tasks may get queued. To keep track a counter is maintained and if the running MOBIKE tasks receives a response it reduces and checks that counter to see if a more recent task was queued, if so, it discards the response and terminates, then the next task runs.

Since queuing more than one MOBIKE task makes not much sense (actually, depending on the flags set in the task it might, so the fix described below might not be the final solution) there is a check that destroys new MOBIKE tasks in case one is queued already. Therefore we see this in the log:

Jul  3 21:55:26 07[IKE] requesting address change using MOBIKE
Jul  3 21:55:26 07[IKE] queueing IKE_MOBIKE task
Jul  3 21:55:26 07[IKE] activating new tasks
Jul  3 21:55:26 07[IKE]   activating IKE_MOBIKE task
...
Jul  3 21:55:26 13[IKE] requesting address change using MOBIKE
Jul  3 21:55:26 13[IKE] queueing IKE_MOBIKE task
Jul  3 21:55:26 13[IKE] delaying task initiation, INFORMATIONAL exchange in progress
...
Jul  3 21:55:26 09[IKE] requesting address change using MOBIKE
Jul  3 21:55:26 09[IKE] delaying task initiation, INFORMATIONAL exchange in progress

First a task is queued and dequeued and initiated right away because no other task is running. The second time the task is queued for later execution, and the third time the task is destroyed as a MOBIKE task is already queued (no "queueing IKE_MOBIKE task" message is seen).

The bug is that the aforementioned counter is increased when the task gets created, but it does not get decreased when the task is destroyed in case one is already queued. So when the queued task finally runs it ignores the successful response because it thinks another task is in the queue, which is not the case so the SA is stuck with a unusable source address.

The attached patches should fix this issue. Would be great if you could try them.

#2 Updated by Andre Valentin over 6 years ago

Hi Tobias,

thank you very much for your investigation and solution! That's why I also like strongswan, the competent and good support!

The attached patches should fix this issue. Would be great if you could try them.

Tonight I have a long migration, so hopefully I will have some time tomorrow evening!

André

#3 Updated by Andre Valentin over 6 years ago

Hi Tobias!

Thank you! Also for the detailed explanation. Now I understand where there problem lies. I made a quick check this evening and the problem does not arise again. Now I will try to run charon in the background. If this kind of error occurs again, I will inform you.
I'm interested if this is reliable on android. Perhaps you could take a look at the Android.mk und 4.4. I'd love to run it native. As I can see on my company S5, there is already strongswan installed by Samsung. They crippled it.... But nevertheless, they saw how good strongswan is :-)

I also removed the other charon bind addresses on the vpn gateway . This also seems to enhance stability of the android app which I used in the meantime. Perhaps the bug lurks there also.

#4 Updated by Tobias Brunner over 6 years ago

I made a quick check this evening and the problem does not arise again. Now I will try to run charon in the background. If this kind of error occurs again, I will inform you.

Thanks for testing the patches.

I'm interested if this is reliable on android. Perhaps you could take a look at the Android.mk und 4.4. I'd love to run it native.

What do you mean?

As I can see on my company S5, there is already strongswan installed by Samsung. They crippled it.... But nevertheless, they saw how good strongswan is :-)

That's interesting. Do they provide it in an app? Or did they modify the built-in VPN client to support IKEv2?

I also removed the other charon bind addresses on the vpn gateway .

"bind addresses"? Are you referring to the IPv6 addresses?

This also seems to enhance stability of the android app which I used in the meantime. Perhaps the bug lurks there also.

It definitely suffers from the same bug. But the retransmission timeouts are shorter so it will relatively quickly start to reestablish the IKE_SA from scratch.

But another thing I noticed is that the app should probably not try to use IPv6 addresses if it has no IPv6 socket available (as is the case in the Android app and also in your case). There are a lot of these messages:

Jul  3 21:54:40 02[NET] sending packet: from ::1[4500] to 2003:68:2d7f:b7c3:f10a:5a32:accd:17a2[4500]
Jul  3 21:54:40 02[NET] no socket found to send IPv6 packet from port 4500

Side note, that the app even tries to send from ::1 (i.e. localhost) to a public IP is because Android reroutes all traffic via lo if no connectivity is available.

#5 Updated by Andre Valentin over 6 years ago

Tobias Brunner wrote:

I made a quick check this evening and the problem does not arise again. Now I will try to run charon in the background. If this kind of error occurs again, I will inform you.

Thanks for testing the patches.

I'm interested if this is reliable on android. Perhaps you could take a look at the Android.mk und 4.4. I'd love to run it native.

What do you mean?

I just send you my report to the users mailinglist. Perhaps you could take a short look.

As I can see on my company S5, there is already strongswan installed by Samsung. They crippled it.... But nevertheless, they saw how good strongswan is :-)

That's interesting. Do they provide it in an app? Or did they modify the built-in VPN client to support IKEv2?

It's the built-in VPN Client. Strongswan is called if you create an IKEv2 VPN.

I also removed the other charon bind addresses on the vpn gateway .

"bind addresses"? Are you referring to the IPv6 addresses?

I meant
charon.interfaces_use = pppoe-wan
Beforce it was not defined so the server informed the initiator that it had 7 IP addresses, which all were tested. Perhaps that makes this bug more probable.

This also seems to enhance stability of the android app which I used in the meantime. Perhaps the bug lurks there also.

It definitely suffers from the same bug. But the retransmission timeouts are shorter so it will relatively quickly start to reestablish the IKE_SA from scratch.

Side note, that the app even tries to send from ::1 (i.e. localhost) to a public IP is because Android reroutes all traffic via lo if no connectivity is available.

If you want me to test something, I'm here! Android sometime goes strange ways....

Thanks for all,

André

#6 Updated by Andre Valentin over 6 years ago

Hi Tobias,

I run a long test with my setup and there still seems to be a problem with mobike and address updates.
Here for documentation my config:

config setup
        uniqueids=no

conn kal
        left=%any
        leftid=@android.test.net
        rightallowany=yes
        ikelifetime=24h
        keylife=2h
        keyexchange=ikev2
        authby=secret
        compress=no
        rekey=yes
    mobike=yes
        keyingtries=%forever
        dpdaction=none
    leftsourceip = %config4
        right=1.2.3.4
        rightid=@kal.dyn.test.net
        rightsubnet=0.0.0.0/0,::/0
    closeaction=restart
        auto=start

conn vpn
        left=%any
        leftid=@android.test.net
        rightallowany=yes
        ikelifetime=24h
        keylife=2h
        keyexchange=ikev2
        authby=secret
        compress=no
        rekey=yes
    mobike=yes
        keyingtries=%forever
        dpdaction=none
    leftsourceip = %config4
        right=1.2.3.5
        rightid=@vpn.test.net
        rightsubnet=0.0.0.0/0,::/0
    closeaction=restart
        auto=start


All vpn servers have public IPs. There are not NAT gateways in from of them.

Here is the log, but just a part of it. Please take a look (interesting at the end):

(Connection went find until this starts) ...
Jul 26 21:14:16 16[ENC] generating INFORMATIONAL request 96 [ ]
Jul 26 21:14:16 16[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.5[4500] (76 bytes)
Jul 26 21:14:16 11[NET] received packet: from 1.2.3.4[4500] to 37.81.185.253[4500] (76 bytes)
Jul 26 21:14:16 11[ENC] parsed INFORMATIONAL response 98 [ ]
Jul 26 21:14:16 01[NET] received packet: from 1.2.3.5[4500] to 37.81.185.253[4500] (76 bytes)
Jul 26 21:14:16 01[ENC] parsed INFORMATIONAL response 96 [ ]
Jul 26 21:23:07 12[IKE] old path is not available anymore, try to find another
Jul 26 21:23:07 12[IKE] looking for a route to 1.2.3.4 ...
Jul 26 21:23:07 12[IKE] looking for a route to 192.168.203.254 ...
Jul 26 21:23:07 12[IKE] looking for a route to 2003:68:2d4c:1000::1 ...
Jul 26 21:23:07 12[IKE] requesting address change using MOBIKE
Jul 26 21:23:07 12[ENC] generating INFORMATIONAL request 99 [ ]
Jul 26 21:23:07 12[IKE] checking path ::1[4500] - 2003:68:2d4c:1000::1[4500]
Jul 26 21:23:07 12[NET] sending packet: from ::1[4500] to 2003:68:2d4c:1000::1[4500] (76 bytes)
Jul 26 21:23:07 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:07 12[IKE] checking path ::1[4500] - fd00:f88:10:a203::ffff[4500]
Jul 26 21:23:07 12[NET] sending packet: from ::1[4500] to fd00:f88:10:a203::ffff[4500] (76 bytes)
Jul 26 21:23:07 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:07 12[IKE] checking path ::1[4500] - 2003:68:2d4c:1001::1[4500]
Jul 26 21:23:07 12[NET] sending packet: from ::1[4500] to 2003:68:2d4c:1001::1[4500] (76 bytes)
Jul 26 21:23:07 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:07 12[IKE] checking path ::1[4500] - 2003:68:2d7f:cb8e:2126:9cf:1319:eb7b[4500]
Jul 26 21:23:07 12[NET] sending packet: from ::1[4500] to 2003:68:2d7f:cb8e:2126:9cf:1319:eb7b[4500] (76 bytes)
Jul 26 21:23:07 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:07 12[IKE] old path is not available anymore, try to find another
Jul 26 21:23:07 12[IKE] looking for a route to 1.2.3.5 ...
Jul 26 21:23:07 12[IKE] looking for a route to 123:123:0:3168:217:14:168:23 ...
Jul 26 21:23:07 12[IKE] requesting address change using MOBIKE
Jul 26 21:23:07 12[ENC] generating INFORMATIONAL request 97 [ ]
Jul 26 21:23:07 12[IKE] checking path ::1[4500] - 123:123:0:3168:217:14:168:23[4500]
Jul 26 21:23:07 12[NET] sending packet: from ::1[4500] to 123:123:0:3168:217:14:168:23[4500] (76 bytes)
Jul 26 21:23:07 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:09 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:09 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:11 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:17 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:18 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:18 01[IKE] path probing attempt 1
Jul 26 21:23:18 01[IKE] checking path ::1[4500] - 2003:68:2d4c:1000::1[4500]
Jul 26 21:23:18 01[NET] sending packet: from ::1[4500] to 2003:68:2d4c:1000::1[4500] (76 bytes)
Jul 26 21:23:18 01[IKE] checking path ::1[4500] - fd00:f88:10:a203::ffff[4500]
Jul 26 21:23:18 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:07 12[IKE] old path is not available anymore, try to find another
Jul 26 21:23:07 12[IKE] looking for a route to 1.2.3.5 ...
Jul 26 21:23:07 12[IKE] looking for a route to 123:123:0:3168:217:14:168:23 ...
Jul 26 21:23:07 12[IKE] requesting address change using MOBIKE
Jul 26 21:23:07 12[ENC] generating INFORMATIONAL request 97 [ ]
Jul 26 21:23:07 12[IKE] checking path ::1[4500] - 123:123:0:3168:217:14:168:23[4500]
Jul 26 21:23:07 12[NET] sending packet: from ::1[4500] to 123:123:0:3168:217:14:168:23[4500] (76 bytes)
Jul 26 21:23:07 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:08 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:09 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:09 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:11 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:17 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:18 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:18 01[IKE] path probing attempt 1
Jul 26 21:23:18 01[IKE] checking path ::1[4500] - 2003:68:2d4c:1000::1[4500]
Jul 26 21:23:18 01[NET] sending packet: from ::1[4500] to 2003:68:2d4c:1000::1[4500] (76 bytes)
Jul 26 21:23:18 01[IKE] checking path ::1[4500] - fd00:f88:10:a203::ffff[4500]
Jul 26 21:23:18 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:18 06[IKE] path probing attempt 1
Jul 26 21:23:18 06[IKE] checking path ::1[4500] - 123:123:0:3168:217:14:168:23[4500]
Jul 26 21:23:18 06[NET] sending packet: from ::1[4500] to 123:123:0:3168:217:14:168:23[4500] (76 bytes)
Jul 26 21:23:18 01[NET] sending packet: from ::1[4500] to fd00:f88:10:a203::ffff[4500] (76 bytes)
Jul 26 21:23:18 01[IKE] checking path ::1[4500] - 2003:68:2d4c:1001::1[4500]
Jul 26 21:23:18 01[NET] sending packet: from ::1[4500] to 2003:68:2d4c:1001::1[4500] (76 bytes)
Jul 26 21:23:18 01[IKE] checking path ::1[4500] - 2003:68:2d7f:cb8e:2126:9cf:1319:eb7b[4500]
Jul 26 21:23:18 01[NET] sending packet: from ::1[4500] to 2003:68:2d7f:cb8e:2126:9cf:1319:eb7b[4500] (76 bytes)
Jul 26 21:23:18 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:18 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:18 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:18 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:19 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:22 02[NET] error writing to socket: Network is unreachable
Jul 26 21:23:23 12[IKE] path probing attempt 2
Jul 26 21:23:23 12[IKE] checking path 37.81.185.253[4500] - 1.2.3.5[4500]
Jul 26 21:23:23 12[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.5[4500] (76 bytes)
Jul 26 21:23:23 12[IKE] checking path ::1[4500] - 123:123:0:3168:217:14:168:23[4500]
Jul 26 21:23:23 12[NET] sending packet: from ::1[4500] to 123:123:0:3168:217:14:168:23[4500] (76 bytes)
Jul 26 21:23:23 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:23 13[IKE] path probing attempt 2
Jul 26 21:23:23 13[IKE] checking path 37.81.185.253[4500] - 1.2.3.4[4500]
Jul 26 21:23:23 13[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.4[4500] (76 bytes)
Jul 26 21:23:23 13[IKE] checking path 37.81.185.253[4500] - 192.168.203.254[4500]
Jul 26 21:23:23 13[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.4[4500] (76 bytes)
Jul 26 21:23:23 13[IKE] checking path 37.81.185.253[4500] - 192.168.203.254[4500]
Jul 26 21:23:23 13[NET] sending packet: from 37.81.185.253[4500] to 192.168.203.254[4500] (76 bytes)
Jul 26 21:23:23 13[IKE] checking path ::1[4500] - 2003:68:2d4c:1000::1[4500]
Jul 26 21:23:23 13[NET] sending packet: from ::1[4500] to 2003:68:2d4c:1000::1[4500] (76 bytes)
Jul 26 21:23:23 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:23 13[IKE] checking path ::1[4500] - fd00:f88:10:a203::ffff[4500]
Jul 26 21:23:23 13[NET] sending packet: from ::1[4500] to fd00:f88:10:a203::ffff[4500] (76 bytes)
Jul 26 21:23:23 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:23 13[IKE] checking path ::1[4500] - 2003:68:2d4c:1001::1[4500]
Jul 26 21:23:23 13[NET] sending packet: from ::1[4500] to 2003:68:2d4c:1001::1[4500] (76 bytes)
Jul 26 21:23:23 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:23 13[IKE] checking path ::1[4500] - 2003:68:2d7f:cb8e:2126:9cf:1319:eb7b[4500]
Jul 26 21:23:23 13[NET] sending packet: from ::1[4500] to 2003:68:2d7f:cb8e:2126:9cf:1319:eb7b[4500] (76 bytes)
Jul 26 21:23:23 02[NET] no socket found to send IPv6 packet from port 4500
Jul 26 21:23:26 16[NET] received packet: from 1.2.3.5[4500] to 37.81.185.253[4500] (76 bytes)
Jul 26 21:23:26 16[ENC] parsed INFORMATIONAL response 97 [ ]
Jul 26 21:23:26 14[NET] received packet: from 1.2.3.4[4500] to 37.81.185.253[4500] (76 bytes)
Jul 26 21:23:26 14[ENC] parsed INFORMATIONAL response 99 [ ]
Jul 26 21:23:30 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:30 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 213]
Jul 26 21:23:30 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:30 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 214]
Jul 26 21:23:30 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:30 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 215]
Jul 26 21:23:31 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:31 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 216]
Jul 26 21:23:31 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:31 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 217]
Jul 26 21:23:32 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:32 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 218]
Jul 26 21:23:32 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:32 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 219]
Jul 26 21:23:32 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:32 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 220]
Jul 26 21:23:32 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:32 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 221]
Jul 26 21:23:32 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:32 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 222]
Jul 26 21:23:32 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:32 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 223]
Jul 26 21:23:32 08[ESP] ESP sequence number verification failed:
Jul 26 21:23:32 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 224]
Jul 26 21:24:24 08[ESP] ESP sequence number verification failed:
Jul 26 21:24:24 08[ESP] ESP sequence number verification failed:
Jul 26 21:24:24 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 236]
Jul 26 21:24:24 08[ESP] ESP sequence number verification failed:
Jul 26 21:24:24 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 237]
Jul 26 21:24:24 08[ESP] ESP sequence number verification failed:
Jul 26 21:24:24 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 238]
Jul 26 21:24:24 08[ESP] ESP sequence number verification failed:
Jul 26 21:24:24 08[ESP]   src 1.2.3.5, dst 37.81.185.253, SPI 9f3f4dba [seq 239]
Jul 26 21:29:35 06[KNL] 192.168.203.77 appeared on wlan0
Jul 26 21:29:35 11[IKE] sending address list update using MOBIKE
Jul 26 21:29:35 11[ENC] generating INFORMATIONAL request 100 [ N(ADD_4_ADDR) ]
Jul 26 21:29:35 11[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.4[4500] (76 bytes)
Jul 26 21:29:35 11[IKE] sending address list update using MOBIKE
Jul 26 21:29:35 11[ENC] generating INFORMATIONAL request 98 [ N(ADD_4_ADDR) ]
Jul 26 21:29:35 11[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.5[4500] (76 bytes)
Jul 26 21:29:37 06[KNL] 37.81.185.253 disappeared from rmnet0
Jul 26 21:29:37 15[KNL] interface rmnet0 deactivated
Jul 26 21:29:37 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:37 01[IKE] old path is not available anymore, try to find another
Jul 26 21:29:37 01[IKE] looking for a route to 1.2.3.4 ...
Jul 26 21:29:37 01[IKE] requesting address change using MOBIKE
Jul 26 21:29:37 01[IKE] old path is not available anymore, try to find another
Jul 26 21:29:37 01[IKE] looking for a route to 1.2.3.5 ...
Jul 26 21:29:37 01[IKE] requesting address change using MOBIKE
Jul 26 21:29:38 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:38 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:38 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:38 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:39 06[IKE] retransmit 1 of request with message ID 100
Jul 26 21:29:39 06[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.4[4500] (76 bytes)
Jul 26 21:29:39 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:39 11[IKE] retransmit 1 of request with message ID 98
Jul 26 21:29:39 11[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.5[4500] (76 bytes)
Jul 26 21:29:39 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:39 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:39 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:39 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:40 12[KNL] fe80::8638:38ff:fe85:754c appeared on wlan0
Jul 26 21:29:40 15[IKE] old path is not available anymore, try to find another
Jul 26 21:29:40 15[IKE] looking for a route to 1.2.3.4 ...
Jul 26 21:29:40 15[IKE] requesting address change using MOBIKE
Jul 26 21:29:40 15[IKE] old path is not available anymore, try to find another
Jul 26 21:29:40 15[IKE] looking for a route to 1.2.3.5 ...
Jul 26 21:29:40 15[IKE] requesting address change using MOBIKE
Jul 26 21:29:41 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:42 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:43 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:43 01[IKE] retransmit 2 of request with message ID 100
Jul 26 21:29:43 01[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.4[4500] (76 bytes)
Jul 26 21:29:43 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:43 06[IKE] retransmit 2 of request with message ID 98
Jul 26 21:29:43 06[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.5[4500] (76 bytes)
Jul 26 21:29:43 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:43 02[NET] error writing to socket: Invalid argument
Jul 26 21:29:43 02[NET] error writing to socket: Invalid argument
...

In my opinion here lies the error:
Jul 26 21:29:35 06[KNL] 192.168.203.77 appeared on wlan0
Jul 26 21:29:35 11[IKE] sending address list update using MOBIKE
Jul 26 21:29:35 11[ENC] generating INFORMATIONAL request 100 [ N(ADD_4_ADDR) ]
Jul 26 21:29:35 11[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.4[4500] (76 bytes)

I miss the path probing. Also I set retransmit_tries = 65534 with a retransmit_base = 1 to enforce keeping SA.
Some defaults have been patched for the same reason:

diff --git a/src/libcharon/sa/task_manager.h b/src/libcharon/sa/task_manager.h
index e7a6bf4..c916a73 100644
--- a/src/libcharon/sa/task_manager.h
+++ b/src/libcharon/sa/task_manager.h
@@ -49,12 +49,12 @@ typedef enum task_queue_t task_queue_t;
 /**
  * Interval for mobike routability checks in ms.
  */
-#define ROUTEABILITY_CHECK_INTERVAL 2500
+#define ROUTEABILITY_CHECK_INTERVAL 5000

 /**
  * Number of routability checks before giving up
  */
-#define ROUTEABILITY_CHECK_TRIES 10
+#define ROUTEABILITY_CHECK_TRIES 65534

 /**
  * Type of task queues the task manager uses to handle tasks

Please notice that I detected a error in IPv6 router advertisement in the setup, which resulted in no IPv6 on clients.

#7 Updated by Tobias Brunner over 6 years ago

In my opinion here lies the error:

Jul 26 21:29:35 06[KNL] 192.168.203.77 appeared on wlan0
Jul 26 21:29:35 11[IKE] sending address list update using MOBIKE
Jul 26 21:29:35 11[ENC] generating INFORMATIONAL request 100 [ N(ADD_4_ADDR) ]
Jul 26 21:29:35 11[NET] sending packet: from 37.81.185.253[4500] to 1.2.3.4[4500] (76 bytes)
I miss the path probing.

That's actually fine. At that point the IP address 37.81.185.253 is still installed and the client has a valid path to the server. So it just tries to notify the peer about the additional IP address, no path probing required.

The IP disappears shortly thereafter, though:

Jul 26 21:29:37 06[KNL] 37.81.185.253 disappeared from rmnet0

This is a situation the daemon currently does not handle well. If a MOBIKE exchange is initiated packets will be sent even if roam events later occur and indicate a lost connection. So the SA will eventually time out (depending on the retransmission settings this takes a while, but it will not recover in the mean time, unless the original path becomes available again).

I pushed some changes that try to fix this to the mobike-fixes branch of our repository (includes the patches above). Path probing is now enabled for the active MOBIKE task. I also added a check so that IPv6 addresses are not considered if no IPv6 socket is available. When retransmitting MOBIKE messages, available paths are re-checked and if none are available the current MOBIKE task is deferred (we already did that before queing such a task).

Since we use MOBIKE tasks also for DPD if MOBIKE is enabled this should fix the issue too when a DPD exchange gets stuck. But if other exchanges are currently active (e.g. a rekeying) this won't help. But that might be something we could consider in later updates, that is, use any IKE message for path probing and defer any exchange if no path is available - but that'd require moving parts of the MOBIKE task to the task manager.

What's also not handled very well is the case when the other peer already deleted the SA (e.g. due to DPD) when we finally can reach it again. Basically the SA has to go through the full retransmission (or path probing) cycle until it gets destroyed (and optionally recreated if dpdaction=restart is configured). If the responder would send back an INVALID_SPI notify, which strongSwan currently doesn't, the client could use that as indication to perhaps reduce the retransmission timeouts (since they are unauthenticated, we can't just destroy the SA, but terminating more quickly would be an option).

Please notice that I detected a error in IPv6 router advertisement in the setup, which resulted in no IPv6 on clients.

Please note that IPv6 addresses can't be used as outer tunnel addresses with kernel-libipsec when IPsec is handled by the Linux kernel on the other end. With kernel-libipsec we currently force UDP encapsulation so ESP packets can be sent over the IKE sockets. Unfortunately, the Linux kernel currently does not support UDP encapsulation for IPv6 (that's why the Android app has charon.plugins.socket-default.use_ipv6 disabled).

#8 Updated by Andre Valentin over 6 years ago

Tobias Brunner wrote:

In my opinion here lies the error:
[...] I miss the path probing.

That's actually fine. At that point the IP address 37.81.185.253 is still installed and the client has a valid path to the server. So it just tries to notify the peer about the additional IP address, no path probing required.

Okay, I understand.

The IP disappears shortly thereafter, though:

[...]

This is a situation the daemon currently does not handle well. If a MOBIKE exchange is initiated packets will be sent even if roam events later occur and indicate a lost connection. So the SA will eventually time out (depending on the retransmission settings this takes a while, but it will not recover in the mean time, unless the original path becomes available again).

I tried that but I could not get the old IP again..

I pushed some changes that try to fix this to the mobike-fixes branch of our repository (includes the patches above). Path probing is now enabled for the active MOBIKE task. I also added a check so that IPv6 addresses are not considered if no IPv6 socket is available. When retransmitting MOBIKE messages, available paths are re-checked and if none are available the current MOBIKE task is deferred (we already did that before queing such a task).

Thanks a lot for this. Strongswan is just compiling, I will run a test the next days and inform you.

Since we use MOBIKE tasks also for DPD if MOBIKE is enabled this should fix the issue too when a DPD exchange gets stuck. But if other exchanges are currently active (e.g. a rekeying) this won't help. But that might be something we could consider in later updates, that is, use any IKE message for path probing and defer any exchange if no path is available - but that'd require moving parts of the MOBIKE task to the task manager.

What's also not handled very well is the case when the other peer already deleted the SA (e.g. due to DPD) when we finally can reach it again. Basically the SA has to go through the full retransmission (or path probing) cycle until it gets destroyed (and optionally recreated if dpdaction=restart is configured). If the responder would send back an INVALID_SPI notify, which strongSwan currently doesn't, the client could use that as indication to perhaps reduce the retransmission timeouts (since they are unauthenticated, we can't just destroy the SA, but terminating more quickly would be an option).

That makes sense. I tested my setup with dpd enabled, but I did not help. Perhaps now..
But I do not like to enable DPD. DPD costs tansfer and battery power. With the nat keepalive, it's optimal. If strongswan detects NAT it sends keepalives. In the other case it doesn't.

Please notice that I detected a error in IPv6 router advertisement in the setup, which resulted in no IPv6 on clients.

Please note that IPv6 addresses can't be used as outer tunnel addresses with kernel-libipsec when IPsec is handled by the Linux kernel on the other end. With kernel-libipsec we currently force UDP encapsulation so ESP packets can be sent over the IKE sockets. Unfortunately, the Linux kernel currently does not support UDP encapsulation for IPv6 (that's why the Android app has charon.plugins.socket-default.use_ipv6 disabled).

I knew;-). I just wanted to tell you because my test setup has changed. Hopefully in future kernel will have that feature. But IPv4 is currently enough if it works.

Again I would like to thank you for your help and struggle.

#9 Updated by Andre Valentin over 6 years ago

Just a small update. Today with a lot of network changes the SA stood fine.
I monitored it and discovered no problems in connectivity:-)

But this, by the way, looked weired for me:

Jul 29 11:58:39 08[IKE] sending keep alive to A.B.C.D[4500]
Jul 29 12:20:47 08[IKE] sending keep alive to A.B.C.D[4500]

The NAT keepalive timer is set to 30s. Does this mean there was the whole time traffic so keepalive was not needed, or is it perhaps android specific(no native build yet)?

#10 Updated by Andre Valentin over 6 years ago

Currently still up und running! I have reachability problems from gateway because of the random NAT keep alives.
Therefore I am compiling a native version strongswan, perhaps this behave better.

I will keep you posted if there are errors, but it looks very good.

#11 Updated by Andre Valentin over 6 years ago

Still very promising. I rebuild it now as a native Android binary and it still works good.
But I have some issues:
-No DNS Lookup (always throws error)
-NAT Keep Alive is still unreliable (sometimes 10min without a packet)

loaded plugins: charon android-log openssl fips-prf random nonce pubkey pkcs1 pkcs8 pem xcbc hmac kernel-netlink kernel-libipsec socket-default android-dns stroke eap-identity eap-mschapv2 eap-md5 eap-gtc

I do not know if this should be part of this ticket. Perhaps you have an idea about this?
Of course I can provide detailed logs if needed.

Kind regards,

André

#12 Updated by Tobias Brunner over 6 years ago

Thanks for the feedback.

-No DNS Lookup (always throws error)

What exactly do you mean? Does strongSwan complain? Or other applications? Do you push DNS servers to the client?

-NAT Keep Alive is still unreliable (sometimes 10min without a packet)

They are sent only when no outbound traffic (IKE or ESP) was recorded in the last keep-alive interval (20 seconds by default).

By the way, with this plugin order: ...kernel-netlink kernel-libipsec... the IPsec implementation in the Linux kernel will be used, the kernel-libipsec plugin will have no effect (seems your kernel has all the required modules for this to work).

Edit: When you say "no native build", what do you mean? What other kind of build is there?

#13 Updated by Andre Valentin over 6 years ago

Hi Tobias,

thanks for taking a look.

-No DNS Lookup (always throws error)

What exactly do you mean? Does strongSwan complain? Or other applications? Do you push DNS servers to the client?

I'am using a VPN server with a dynamic IP. In past I have used DNS for this. So the error is, that charon cannot lookup the DNS Name I inserted in the left field. There is no info why the lookup did not work, or even a hint. I do not use the DNS Servers pushed py the VPN Gateway. I took a look at the source and could not find any android specific lookup changes. Perhaps it is not supported ?
If not, I will take a look at the mediation support.

The android-dns plugin in my module list is a left over because I do not need it.

-NAT Keep Alive is still unreliable (sometimes 10min without a packet)

They are sent only when no outbound traffic (IKE or ESP) was recorded in the last keep-alive interval (20 seconds by default).

Okay, I will take a look with tcpdump. I thought the traffic on the SAs does not count.

By the way, with this plugin order: ...kernel-netlink kernel-libipsec... the IPsec implementation in the Linux kernel will be used, the kernel-libipsec plugin will have no effect (seems your kernel has all the required modules for this to work).

YES, I was very delighted when I saw that the "original" kernel has support for netkey:-) So for now kernel-libipsec is just a fallback if I roll it out later.

Another point is that the client should have a different MTU on those routes. Perhaps I will take a look if I can change the code to allow this. Do you have any hints?
I thought of the 2 following configurable constants:
route_mtu_esp_encap = 1400
route_mtu_udp_encap = 1300
If these are not set, the routes will be generated without MTU, like now. If set, it will set the MTU on these routes. I noticed that I needed MSSFIX on my VPN-GW. I did that now but I think this should be handled on both sites.
As you once stated, the routes have now assignment to the connection, so there's only the possibility to work with a general setting.

Kind regards,

André

#14 Updated by Tobias Brunner over 6 years ago

-No DNS Lookup (always throws error)

What exactly do you mean? Does strongSwan complain? Or other applications? Do you push DNS servers to the client?

I'am using a VPN server with a dynamic IP. In past I have used DNS for this. So the error is, that charon cannot lookup the DNS Name I inserted in the left field.

left? Are you referring to the option you set on the server? Or do you configure left on the client? Or do your rather mean right on the client? If so, does nslookup <hostname> work on the client (without VPN)?

-NAT Keep Alive is still unreliable (sometimes 10min without a packet)

They are sent only when no outbound traffic (IKE or ESP) was recorded in the last keep-alive interval (20 seconds by default).

Okay, I will take a look with tcpdump. I thought the traffic on the SAs does not count.

No it does. The idea being that it doesn't really matter what packets are sent to keep the NAT mapping alive (IKE and ESP are both sent from UDP port 4500 in case a NAT is detected).

Another point is that the client should have a different MTU on those routes. Perhaps I will take a look if I can change the code to allow this. Do you have any hints?

We currently don't do anything related to the MTU or MSS and let users deal with it themselves. Due to PMTUD and the kernel's automatic MSS derivation many setups automatically work as intended (but PMTUD seems to be prone to get broken easily due to firewalls blocking ICMPs etc.). You could try the attached patches to set a global MTU and/or MSS for the installed routes.

#15 Updated by Andre Valentin over 6 years ago

Tobias Brunner wrote:

-No DNS Lookup (always throws error)

What exactly do you mean? Does strongSwan complain? Or other applications? Do you push DNS servers to the client?

I'am using a VPN server with a dynamic IP. In past I have used DNS for this. So the error is, that charon cannot lookup the DNS Name I inserted in the left field.

left? Are you referring to the option you set on the server? Or do you configure left on the client? Or do your rather mean right on the client? If so, does nslookup <hostname> work on the client (without VPN)?

Aehm. Of course right. With the original ping tool (from firmware) resolving works:

root@klte:/system/etc # ping www.heise.de
PING www.heise.de (193.99.144.85) 56(84) bytes of data.

nslookup is from busybox and it only works if I create a resolv.conf in /etc/etc
root@klte:/system/etc # nslookup www.heise.de                                  
Server:    217.14.160.130
Address 1: 217.14.160.130 rns1.marcant.net

Name:      www.heise.de
Address 1: 2a02:2e0:3fe:1001:7777:772e:2:85 www.heise.de
Address 2: 193.99.144.85 www.heise.de

This is what I get in the log:
08-04 22:32:41.325 I/charon  (15141): 14[CFG] received stroke: initiate 'kal'
08-04 22:32:41.325 I/charon  (15141): 17[LIB] resolving 'kal.XX.net' failed: No address associated with hostname
08-04 22:32:41.325 I/charon  (15141): 12[IKE] unable to resolve kal.XX.net,0.0.0.0/0,::/0, initiate aborted

I can't get it more verbose through ipsec.conf setting charondebug="lib 2,knl 2". Other strongswan.d/... log settings do also not to impact the log level. But this is another thing.
Do you have an idea where to look?

-NAT Keep Alive is still unreliable (sometimes 10min without a packet)

They are sent only when no outbound traffic (IKE or ESP) was recorded in the last keep-alive interval (20 seconds by default).

Okay, I will take a look with tcpdump. I thought the traffic on the SAs does not count.

No it does. The idea being that it doesn't really matter what packets are sent to keep the NAT mapping alive (IKE and ESP are both sent from UDP port 4500 in case a NAT is detected).

You're right, I verified it that it works as written. Thanks for the hint!

Another point is that the client should have a different MTU on those routes. Perhaps I will take a look if I can change the code to allow this. Do you have any hints?

We currently don't do anything related to the MTU or MSS and let users deal with it themselves. Due to PMTUD and the kernel's automatic MSS derivation many setups automatically work as intended (but PMTUD seems to be prone to get broken easily due to firewalls blocking ICMPs etc.).

In a normal case with 2 peers directly connected to the uplink this works perfect.
But in real life, as a network admin also configuring VPNs, I have to say that a working path MTU is a phantasy :-( . Therefore I always use iptables to do the job with MSSFIX, especially for road warriors. (Hotel LAN, Office, Homeoffice, Mobile,..)

I think it would be a good idea to make this somehow connection dependent. Then you can configure normal site-to-site tunnels and roadwarrior vpn dialins without the need of extra iptables. Just an idea. For me a solution with iptables is simple, but it raises complexity.

You could try the attached patches to set a global MTU and/or MSS for the installed routes.

Wow, thank you for that quick patch. I already applied them. Works as expexted, I love it :-)

#16 Updated by Andre Valentin over 6 years ago

Sorry to bother, but take a look at this (dump at a router in between):

22:49:01.397507 IP 192.168.203.77.4500 > 211.123.168.23.4500: UDP-encap: ESP(spi=0xcaacaf97,seq=0xd1), length 132
22:49:01.399985 IP 192.168.203.77.4500 > 211.123.168.23.4500: UDP-encap: ESP(spi=0xcaacaf97,seq=0xd2), length 132

22:50:45.714451 IP 211.123.168.23.4500 > 192.168.203.77.4500: UDP-encap: ESP(spi=0xc15b2ca1,seq=0xc0), length 260
22:50:45.714478 IP 211.123.168.23.4500 > 192.168.203.77.4500: UDP-encap: ESP(spi=0xc15b2ca1,seq=0xc1), length 260

And this

08-04 22:47:37.055 I/charon  (15718): 14[IKE] sending keep alive to 211.123.168.23[4500]
08-04 22:47:57.055 I/charon  (15718): 08[IKE] sending keep alive to 211.123.168.23[4500]

08-04 22:52:26.496 I/charon  (15718): 12[IKE] sending keep alive to 211.123.168.23[4500]
08-04 22:52:46.496 I/charon  (15718): 13[IKE] sending keep alive to 211.123.168.23[4500]

If the display is active, the interval between data/keepalive is exactly 20s. With display of. I have these lags. If you think this has nothing to do with strongswan, just tell. I only want to locate the problem.

#17 Updated by Andre Valentin over 6 years ago

DNS Resolution by the way can be fixed if charon runs with

charon {
    user = vpn
    group = inet
}

At least on my system.

#18 Updated by Tobias Brunner over 6 years ago

I can't get it more verbose through ipsec.conf setting charondebug="lib 2,knl 2". Other strongswan.d/... log settings do also not to impact the log level. But this is another thing.

The log level of the android-log plugin can be set via charon.plugins.android_log.loglevel in strongswan.conf. Since it's a single level for all subsystems this will result in quite verbose logs.

If the display is active, the interval between data/keepalive is exactly 20s. With display of. I have these lags. If you think this has nothing to do with strongswan, just tell. I only want to locate the problem.

It definitely could be that Android puts processes to sleep when the display is off (to safe battery power). Maybe it's possible to start charon in a way to prevent that, not sure how though. Perhaps registering it as service in init.rc would help.

DNS Resolution by the way can be fixed if charon runs with

charon {
    user = vpn
    group = inet
}

Interesting. We actually hard coded the user to vpn (AID_VPN) already (i.e. charon.user has no effect on Android). Something similar could probably also be done for the group (e.g. set it to inet (AID_INET)). Setting the user was originally done for the deprecated frontend patch (Samsung actually based their strongSwan integration in the S5 on it), which required charon to be registered in init.rc (patch is on that page) where only the group is set not the user. I saw that in the current sources racoon is now configured in that file with the vpn, net_admin and inet groups.

#19 Updated by Andre Valentin about 6 years ago

Hi!

I'am really happy with the solutions. So from my point of view this can be closed.

Thanks a lot for your help and hints,

André

#20 Updated by Tobias Brunner about 6 years ago

  • Status changed from Feedback to Closed
  • Resolution set to Won't fix

#21 Updated by Tobias Brunner about 6 years ago

  • Resolution changed from Won't fix to Fixed

Also available in: Atom PDF