Project

General

Profile

Issue #2829

IKE_SA is deleted when rekey retry

Added by Xiaoqiang Fu almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Category:
configuration
Affected version:
5.6.1
Resolution:
No change required

Description

Hi,

IKE_SA is deleted in such flow:
1. both side schedule rekey CHILD_SA;
2. rekey successfully, schedule delete old CHILD_SA, both side status is REKEYED => DELETING;
3. local initiate IKE_SA rekey, receive TEMPORARY_FAILURE, rekey failed, retry after 14s;
4. local delete IKE_SA, no rekey retried.

Whether it is an issue?

Local log:

    Line 365634: Nov 20 14:31:18 ei-1 charon[1724]: 14[CHD] CHILD_SA heatipv6~heatipv6{50} state change: INSTALLING => INSTALLED
    Line 365635: Nov 20 14:31:18 ei-1 charon[1724]: 14[IKE] outbound CHILD_SA heatipv6~heatipv6{50} established with SPIs c142ebe6_i c3cbacbd_o and TS 9900:1010:1010::10/128 === 8800:1010:1010::10/128
    Line 365636: Nov 20 14:31:18 ei-1 charon[1724]: 14[IKE] outbound CHILD_SA heatipv6~heatipv6{50} established with SPIs c142ebe6_i c3cbacbd_o and TS 9900:1010:1010::10/128 === 8800:1010:1010::10/128
    Line 365637: Nov 20 14:31:18 ei-1 charon[1724]: 14[CHD] CHILD_SA heatipv6~heatipv6{49} state change: REKEYING => REKEYED
    Line 365638: Nov 20 14:31:18 ei-1 charon[1724]: 14[IKE] closing CHILD_SA heatipv6~heatipv6{49} with SPIs c79444cf_i (0 bytes) cc065073_o (0 bytes) and TS 9900:1010:1010::10/128 === 8800:1010:1010::10/128
    Line 365639: Nov 20 14:31:18 ei-1 charon[1724]: 14[IKE] closing CHILD_SA heatipv6~heatipv6{49} with SPIs c79444cf_i (0 bytes) cc065073_o (0 bytes) and TS 9900:1010:1010::10/128 === 8800:1010:1010::10/128
    Line 365640: Nov 20 14:31:18 ei-1 charon[1724]: 14[IKE] sending DELETE for ESP CHILD_SA with SPI c79444cf
    Line 365641: Nov 20 14:31:18 ei-1 charon[1724]: 14[CHD] CHILD_SA heatipv6~heatipv6{49} state change: REKEYED => DELETING
    Line 365642: Nov 20 14:31:18 ei-1 charon[1724]: 04[IKE] received DELETE for ESP CHILD_SA with SPI cc065073
    Line 365644: Nov 20 14:31:18 ei-1 charon[1724]: 04[IKE] CHILD_SA closed
    Line 365672: Nov 20 14:31:20 ei-1 charon[1724]: 16[IKE] initiating IKE_SA heatipv6~heatipv6[23] to 2a00:8a00:8000:134::d:b506
    Line 365673: Nov 20 14:31:20 ei-1 charon[1724]: 16[IKE] initiating IKE_SA heatipv6~heatipv6[23] to 2a00:8a00:8000:134::d:b506
    Line 365674: Nov 20 14:31:20 ei-1 charon[1724]: 16[LIB] size of DH secret exponent: 1535 bits
    Line 365675: Nov 20 14:31:20 ei-1 charon[1724]: 16[CFG] configured proposals: IKE:AES_CBC_256/HMAC_MD5_96/PRF_HMAC_MD5/MODP_1536
    Line 365676: Nov 20 14:31:20 ei-1 charon[1724]: 06[IKE] received TEMPORARY_FAILURE notify error
    Line 365677: Nov 20 14:31:20 ei-1 charon[1724]: 06[IKE] IKE_SA rekeying failed, trying again in 14 seconds
    Line 365705: Nov 20 14:31:22 ei-1 charon[1724]: 13[IKE] peer initiated rekeying, but a child is half-open
    Line 365706: Nov 20 14:31:23 ei-1 charon[1724]: destroy_child_sa,
    Line 365708: Nov 20 14:31:23 ei-1 charon[1724]: spi=0xc79444cf,current->get_spi=0xc79444cf
    Line 365709: Nov 20 14:31:23 ei-1 charon[1724]: 06[CHD] CHILD_SA heatipv6~heatipv6{49} state change: DELETING => DESTROYING
    Line 365738: Nov 20 14:31:30 ei-1 charon[1724]: 06[CFG] vici client 65 connected
    Line 365739: Nov 20 14:31:30 ei-1 charon[1724]: 09[CFG] vici client 65 registered for: list-sa
    Line 365740: Nov 20 14:31:30 ei-1 charon[1724]: 13[CFG] vici client 65 requests: list-sas
    Line 365741: Nov 20 14:31:30 ei-1 charon[1724]: 09[CFG] vici client 65 disconnected
    Line 365751: Nov 20 14:31:31 ei-1 charon[1724]: 15[CFG] Entered alert, alert is 25, ike sa is 0x7fffa4004e10
    Line 365758: Nov 20 14:31:31 ei-1 charon[1724]: 15[IKE] deleting IKE_SA heatipv6~heatipv6[21] between 2a00:8a00:8000:5002:0:15:0:c[2a00:8a00:8000:5002:0:15:0:c]...2a00:8a00:8000:134::d:b506[2a00:8a00:8000:134::d:b506]
    Line 365759: Nov 20 14:31:31 ei-1 charon[1724]: 15[IKE] deleting IKE_SA heatipv6~heatipv6[21] between 2a00:8a00:8000:5002:0:15:0:c[2a00:8a00:8000:5002:0:15:0:c]...2a00:8a00:8000:134::d:b506[2a00:8a00:8000:134::d:b506]
    Line 365760: Nov 20 14:31:31 ei-1 charon[1724]: 15[IKE] sending DELETE for IKE_SA heatipv6~heatipv6[21]
    Line 365761: Nov 20 14:31:31 ei-1 charon[1724]: 04[IKE] IKE_SA deleted
    Line 365762: Nov 20 14:31:31 ei-1 charon[1724]: 04[IKE] IKE_SA deleted

remote log:

Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 13[CHD] CHILD_SA heatipv6~heatipv6{75} state change: INSTALLING => INSTALLED
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 13[IKE] inbound CHILD_SA heatipv6~heatipv6{75} established with SPIs c3cbacbd_i c142ebe6_o and TS 8800:1010:1010::10/128 === 9900:1010:1010::10/128
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 13[IKE] inbound CHILD_SA heatipv6~heatipv6{75} established with SPIs c3cbacbd_i c142ebe6_o and TS 8800:1010:1010::10/128 === 9900:1010:1010::10/128
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 13[CHD] CHILD_SA heatipv6~heatipv6{74} state change: INSTALLED => REKEYING
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 13[CHD] CHILD_SA heatipv6~heatipv6{74} state change: REKEYING => REKEYED
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[IKE] received DELETE for ESP CHILD_SA with SPI c79444cf
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[IKE] closing CHILD_SA heatipv6~heatipv6{74} with SPIs cc065073_i (0 bytes) c79444cf_o (0 bytes) and TS 8800:1010:1010::10/128 === 9900:1010:1010::10/128
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[IKE] closing CHILD_SA heatipv6~heatipv6{74} with SPIs cc065073_i (0 bytes) c79444cf_o (0 bytes) and TS 8800:1010:1010::10/128 === 9900:1010:1010::10/128
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[IKE] sending DELETE for ESP CHILD_SA with SPI cc065073
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[CHD] CHILD_SA heatipv6~heatipv6{74} state change: REKEYED => DELETING
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[IKE] CHILD_SA closed
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[CHD] adding outbound ESP SA
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[CHD]   SPI 0xc142ebe6, src 2a00:8a00:8000:134::d:b506 dst 2a00:8a00:8000:5002:0:15:0:c
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[CHD] CHILD_SA heatipv6~heatipv6{75} state change: INSTALLED => INSTALLED
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[IKE] outbound CHILD_SA heatipv6~heatipv6{75} established with SPIs c3cbacbd_i c142ebe6_o and TS 8800:1010:1010::10/128 === 9900:1010:1010::10/128
Nov 20 06:31:18 vfpc-hzdcsp-5 charon[32421]: 16[IKE] outbound CHILD_SA heatipv6~heatipv6{75} established with SPIs c3cbacbd_i c142ebe6_o and TS 8800:1010:1010::10/128 === 9900:1010:1010::10/128
Nov 20 06:31:20 vfpc-hzdcsp-5 charon[32421]: 14[IKE] peer initiated rekeying, but a child is half-open
Nov 20 06:31:22 vfpc-hzdcsp-5 charon[32421]: 11[IKE] initiating IKE_SA heatipv6~heatipv6[35] to 2a00:8a00:8000:5002:0:15:0:c
Nov 20 06:31:22 vfpc-hzdcsp-5 charon[32421]: 11[IKE] initiating IKE_SA heatipv6~heatipv6[35] to 2a00:8a00:8000:5002:0:15:0:c
Nov 20 06:31:22 vfpc-hzdcsp-5 charon[32421]: 11[CFG] configured proposals: IKE:AES_CBC_256/HMAC_MD5_96/PRF_HMAC_MD5/MODP_1536
Nov 20 06:31:22 vfpc-hzdcsp-5 charon[32421]: 09[IKE] received TEMPORARY_FAILURE notify error
Nov 20 06:31:22 vfpc-hzdcsp-5 charon[32421]: 09[IKE] IKE_SA rekeying failed, trying again in 14 seconds
Nov 20 06:31:23 vfpc-hzdcsp-5 charon[32421]: 05[CHD] CHILD_SA heatipv6~heatipv6{74} state change: DELETING => DESTROYING
Nov 20 06:31:31 vfpc-hzdcsp-5 charon[32421]: 12[IKE] received DELETE for IKE_SA heatipv6~heatipv6[34]
Nov 20 06:31:31 vfpc-hzdcsp-5 charon[32421]: 12[IKE] deleting IKE_SA heatipv6~heatipv6[34] between 2a00:8a00:8000:134::d:b506[2a00:8a00:8000:134::d:b506]...2a00:8a00:8000:5002:0:15:0:c[2a00:8a00:8000:5002:0:15:0:c]
Nov 20 06:31:31 vfpc-hzdcsp-5 charon[32421]: 12[IKE] deleting IKE_SA heatipv6~heatipv6[34] between 2a00:8a00:8000:134::d:b506[2a00:8a00:8000:134::d:b506]...2a00:8a00:8000:5002:0:15:0:c[2a00:8a00:8000:5002:0:15:0:c]
Nov 20 06:31:31 vfpc-hzdcsp-5 charon[32421]: 12[IKE] IKE_SA deleted
Nov 20 06:31:31 vfpc-hzdcsp-5 charon[32421]: 12[IKE] IKE_SA deleted
Nov 20 06:31:31 vfpc-hzdcsp-5 charon[32421]: 12[CHD] CHILD_SA heatipv6~heatipv6{75} state change: INSTALLED => DESTROYING

History

#1 Updated by Tobias Brunner almost 2 years ago

  • Category set to configuration
  • Status changed from New to Feedback

3. local initiate IKE_SA rekey, receive TEMPORARY_FAILURE, rekey failed, retry after 14s;
4. local delete IKE_SA, no rekey retried.

That's probably because your rekey settings are too narrow, i.e. they don't allow for this 14 delay and the SA expires before that (see ExpiryRekey).

#2 Updated by Xiaoqiang Fu almost 2 years ago

I found that there is a job queued for DELETE SA when REKEYED => DELETING, the default time value is 5s, before reaching 5s, TEMPORARY_FAILURE and schedule a 14s delay. When 5s is over, SA is deleted, when readch 14s delay there is no SA to retry.

Do you think it is an issue?

#3 Updated by Tobias Brunner almost 2 years ago

I found that there is a job queued for DELETE SA when REKEYED => DELETING, the default time value is 5s, before reaching 5s, TEMPORARY_FAILURE and schedule a 14s delay. When 5s is over, SA is deleted, when readch 14s delay there is no SA to retry.

Do you think it is an issue?

No, that's not related, that delete job is for the old CHILD_SA (only the inbound SA to be exact). That has nothing to do the the delay to rekey the IKE_SA (the IKE_SA is deleted by an ike_delete_job_t queued when the SA got established). If your IKE_SA rekey settings are too narrow, you need to change that (i.e. if the chances are high that the time between triggering the rekeying and expiring the IKE_SA is lower than 14 seconds). Again, refer to the link above for the settings and the formula used to calculate the rekeying times.

#4 Updated by Xiaoqiang Fu almost 2 years ago

No, that's not related, that delete job is for the old CHILD_SA (only the inbound SA to be exact). That has nothing to do the the delay to rekey the IKE_SA (the IKE_SA is deleted by an ike_delete_job_t queued when the SA got established).

Yes, you are right. The delete job is for CHILD_SA.

If your IKE_SA rekey settings are too narrow, you need to change that (i.e. if the chances are high that the time between triggering the rekeying and expiring the IKE_SA is lower than 14 seconds). Again, refer to the link above for the settings and the formula used to calculate the rekeying times.

After check with ExpiryRekey, and conbine with rekey retry delay calc fomula

retry = RETRY_INTERVAL - (random() % RETRY_JITTER);

The rekey time is best 15s before SA expire.
According to the fomula
margintime + random(0, margintime * rekeyfuzz)
rekeyfuzz = 50% // my setting

margintime MUST be larther than 15s, right?

Could you kindly check my analysis?

#5 Updated by Tobias Brunner almost 2 years ago

After check with ExpiryRekey, and conbine with rekey retry delay calc fomula
[...]
The rekey time is best 15s before SA expire.

Yes, 15 seconds is the maximum retry delay. But the rekeying itself would also take some time afterwards (in particular, if there e.g. are retransmissions or even another collision) so it should be quite a bit more.

According to the fomula
[...]
margintime MUST be larther than 15s, right?

Yes, but make sure 1.5 * margintime (with rekeyfuzz=50%) does not equal or exceed ikelifetime.

#6 Updated by Xiaoqiang Fu almost 2 years ago

Thanks for your reminder.

But the rekeying itself would also take some time afterwards (in particular, if there e.g. are retransmissions or even another collision) so it should be quite a bit more.

Could you give a suggestion for a proper margin time from your expertee insight? Currently, the lifetime is restricted to 60s minimum. 20s is the smallest margintime to avoid the conflict between expire and retry, at least at the first time.

#7 Updated by Tobias Brunner almost 2 years ago

But the rekeying itself would also take some time afterwards (in particular, if there e.g. are retransmissions or even another collision) so it should be quite a bit more.

Could you give a suggestion for a proper margin time from your expertee insight? Currently, the lifetime is restricted to 60s minimum. 20s is the smallest margintime to avoid the conflict between expire and retry, at least at the first time.

That depends, for instance, on the retransmission settings. Using a lifetime as low as 60s doesn't make sense in practice (networks are unstable, collisions occur). Why set it so low?

#8 Updated by Xiaoqiang Fu almost 2 years ago

Why set it so low?

Yes, we are trying to restrict it to a larger time interval. Could you give a practical time from your insight?

#9 Updated by Tobias Brunner almost 2 years ago

Yes, we are trying to restrict it to a larger time interval. Could you give a practical time from your insight?

I don't have any explicit recommendations. It's pretty much an individual choice depending on the use case/scenario and other settings and conditions.

#10 Updated by Tobias Brunner over 1 year ago

  • Status changed from Feedback to Closed
  • Assignee set to Tobias Brunner
  • Resolution set to No change required

Also available in: Atom PDF