Project

General

Profile

Bug #891

[eap_radius] AcctInputOctets increase twice in 5 seconds and decrease to normal after 5 seconds.

Added by bronze man over 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Category:
libcharon
Target version:
Start date:
14.03.2015
Due date:
Estimated time:
Affected version:
5.2.2
Resolution:
Fixed

Description

I used to see some AcctInputOctets decrease error on my server's log after 37 hours of the strongswan servers start.

This kind of bug happens somethings after hours of running of strongswan servers.

Here is the log I got from my servers:

{"Cat":"RadiusRecord","Time":"2015-03-14T23:08:35+08:00","Data":[{"AcctInputOctets":"355483","AcctInputPackets":"3770","AcctOutputOctets":"1574243","AcctOutputPackets":"3105","AcctSessionId":"1426212679-2666","AcctSessionTime":"1420","AcctStatusType":"InterimUpdate","CalledStationId":"10.60.153.62[4500]","CallingStationId":"xxx.xxx.xxx.xxx[xxx]","Code":"AccountingRequest","FramedIPAddress":"172.20.2.88","Identifier":"217","NASIPAddress":"10.60.153.62","NASIdentifier":"strongSwan","NASPort":"2666","NASPortId":"ios_ikev2_psk","NASPortType":"Virtual","ServiceType":"Framed","UserName":"xSxx"}]}
{"Cat":"RadiusRecord","Time":"2015-03-14T23:08:40+08:00","Data":[{"AcctInputOctets":"364547","AcctInputPackets":"3883","AcctOutputOctets":"1710271","AcctOutputPackets":"3239","AcctSessionId":"1426212679-2666","AcctSessionTime":"1425","AcctStatusType":"InterimUpdate","CalledStationId":"10.60.153.62[4500]","CallingStationId":"xxx.xxx.xxx.xxx[xxx]","Code":"AccountingRequest","FramedIPAddress":"172.20.2.88","Identifier":"166","NASIPAddress":"10.60.153.62","NASIdentifier":"strongSwan","NASPort":"2666","NASPortId":"ios_ikev2_psk","NASPortType":"Virtual","ServiceType":"Framed","UserName":"xSxx"}]}
{"Cat":"RadiusRecord","Time":"2015-03-14T23:08:45+08:00","Data":[{"AcctInputOctets":"369369","AcctInputPackets":"3934","AcctOutputOctets":"1754406","AcctOutputPackets":"3279","AcctSessionId":"1426212679-2666","AcctSessionTime":"1430","AcctStatusType":"InterimUpdate","CalledStationId":"10.60.153.62[4500]","CallingStationId":"xxx.xxx.xxx.xxx[xxx]","Code":"AccountingRequest","FramedIPAddress":"172.20.2.88","Identifier":"226","NASIPAddress":"10.60.153.62","NASIdentifier":"strongSwan","NASPort":"2666","NASPortId":"ios_ikev2_psk","NASPortType":"Virtual","ServiceType":"Framed","UserName":"xSxx"}]}
{"Cat":"RadiusRecord","Time":"2015-03-14T23:08:50+08:00","Data":[{"AcctInputOctets":"369421","AcctInputPackets":"3935","AcctOutputOctets":"1754406","AcctOutputPackets":"3279","AcctSessionId":"1426212679-2666","AcctSessionTime":"1435","AcctStatusType":"InterimUpdate","CalledStationId":"10.60.153.62[4500]","CallingStationId":"xxx.xxx.xxx.xxx[xxx]","Code":"AccountingRequest","FramedIPAddress":"172.20.2.88","Identifier":"27","NASIPAddress":"10.60.153.62","NASIdentifier":"strongSwan","NASPort":"2666","NASPortId":"ios_ikev2_psk","NASPortType":"Virtual","ServiceType":"Framed","UserName":"xSxx"}]}
{"Cat":"RadiusRecord","Time":"2015-03-14T23:08:55+08:00","Data":[{"AcctInputOctets":"371088","AcctInputPackets":"3940","AcctOutputOctets":"1755446","AcctOutputPackets":"3283","AcctSessionId":"1426212679-2666","AcctSessionTime":"1440","AcctStatusType":"InterimUpdate","CalledStationId":"10.60.153.62[4500]","CallingStationId":"xxx.xxx.xxx.xxx[xxx]","Code":"AccountingRequest","FramedIPAddress":"172.20.2.88","Identifier":"31","NASIPAddress":"10.60.153.62","NASIdentifier":"strongSwan","NASPort":"2666","NASPortId":"ios_ikev2_psk","NASPortType":"Virtual","ServiceType":"Framed","UserName":"xSxx"}]}
{"Cat":"RadiusRecord","Time":"2015-03-14T23:09:00+08:00","Data":[{"AcctInputOctets":"742956","AcctInputPackets":"7895","AcctOutputOctets":"3565404","AcctOutputPackets":"6608","AcctSessionId":"1426212679-2666","AcctSessionTime":"1445","AcctStatusType":"InterimUpdate","CalledStationId":"10.60.153.62[4500]","CallingStationId":"xxx.xxx.xxx.xxx[xxx]","Code":"AccountingRequest","FramedIPAddress":"172.20.2.88","Identifier":"189","NASIPAddress":"10.60.153.62","NASIdentifier":"strongSwan","NASPort":"2666","NASPortId":"ios_ikev2_psk","NASPortType":"Virtual","ServiceType":"Framed","UserName":"xSxx"}]}
{"Cat":"RadiusRecord","Time":"2015-03-14T23:09:05+08:00","Data":[{"AcctInputOctets":"372016","AcctInputPackets":"3956","AcctOutputOctets":"1801182","AcctOutputPackets":"3319","AcctSessionId":"1426212679-2666","AcctSessionTime":"1450","AcctStatusType":"InterimUpdate","CalledStationId":"10.60.153.62[4500]","CallingStationId":"xxx.xxx.xxx.xxx[xxx]","Code":"AccountingRequest","FramedIPAddress":"172.20.2.88","Identifier":"39","NASIPAddress":"10.60.153.62","NASIdentifier":"strongSwan","NASPort":"2666","NASPortId":"ios_ikev2_psk","NASPortType":"Virtual","ServiceType":"Framed","UserName":"xSxx"}]}
{"Cat":"error","Time":"2015-03-14T23:09:05+08:00","Data":["[radius.Accounting Update] connId 1426212679-2666_2666 toIncr less 0, in:372016 out:1801182 last:4308360"]}
{"Cat":"RadiusRecord","Time":"2015-03-14T23:09:06+08:00","Data":[{"AcctInputOctets":"372016","AcctInputPackets":"3956","AcctOutputOctets":"1801182","AcctOutputPackets":"3319","AcctSessionId":"1426212679-2666","AcctSessionTime":"1451","AcctStatusType":"Stop","AcctTerminateCause":"UserRequest","CalledStationId":"10.60.153.62[4500]","CallingStationId":"xxx.xxx.xxx.xxx[xxx]","Code":"AccountingRequest","FramedIPAddress":"172.20.2.88","Identifier":"41","NASIPAddress":"10.60.153.62","NASIdentifier":"strongSwan","NASPort":"2666","NASPortId":"ios_ikev2_psk","NASPortType":"Virtual","ServiceType":"Framed","UserName":"xSxx"}]}
{"Cat":"error","Time":"2015-03-14T23:09:06+08:00","Data":["[radius.Accounting Stop] connId 1426212679-2666_2666 toIncr less than 0, in:372016 out:1801182 last:4308360"]}

Associated revisions

Revision b29c5c0f
Added by Tobias Brunner about 5 years ago

Merge branch 'accounting-fixes'

This fixes the usage stats reported via RADIUS Accounting in several
corner cases (e.g. with interim updates while rekeying a CHILD_SA).

Fixes #891.

History

#1 Updated by bronze man over 5 years ago

Here is the strongswan log of connection Id 2666 at the same time.

Mar 14 22:44:55 11[IKE] <ios_ikev2_psk|2666> IKE_SA ios_ikev2_psk[2666] established between 10.60.153.62[xxx.xxx.xxx.143]...xxx.xxx.xxx.145[xSxxx]
Mar 14 22:44:55 11[IKE] <ios_ikev2_psk|2666> CHILD_SA ios_ikev2_psk{2047} established with SPIs cbcd8a9a_i 07a12430_o and TS 0.0.0.0/0 === 172.20.2.88/32
Mar 14 23:08:57 12[IKE] <ios_ikev2_psk|2666> CHILD_SA ios_ikev2_psk{2047} established with SPIs cedc6235_i 052cb296_o and TS 0.0.0.0/0 === 172.20.2.88/32
Mar 14 23:09:00 14[IKE] <ios_ikev2_psk|2666> closing CHILD_SA ios_ikev2_psk{2047} with SPIs cbcd8a9a_i (371764 bytes) 07a12430_o (1779902 bytes) and TS 0.0.0.0/0 === 172.20.2.88/32
Mar 14 23:09:01 02[IKE] <ios_ikev2_psk|2666> CHILD_SA ios_ikev2_psk{2047} established with SPIs c0e451e5_i 03c79c6b_o and TS 0.0.0.0/0 === 172.20.2.88/32
Mar 14 23:09:01 03[IKE] <ios_ikev2_psk|2666> closing CHILD_SA ios_ikev2_psk{2047} with SPIs cedc6235_i (860 bytes) 052cb296_o (21228 bytes) and TS 0.0.0.0/0 === 172.20.2.88/32
Mar 14 23:09:06 01[IKE] <ios_ikev2_psk|2666> deleting IKE_SA ios_ikev2_psk[2666] between 10.60.153.62[xxx.xxx.xxx.143]...xxx.xxx.xxx.145[xSxxx]
Mar 14 23:09:06 15[IKE] <ios_ikev2_psk|2666> IKE_SA deleted

I think CHILD_SA stuff cause this bug.

#2 Updated by bronze man over 5 years ago

version:
Linux strongSwan U5.2.2/K3.13.0-36-generic

I think the AcctInputOctets should not decrease during the Interim Update.

Event list:
a IKE_SA established at 22:44:55
a CHILD_SA established at 23:08:57
The AcctInputOctets increase from 371088 to 742956 at 2015-03-14T23:09:00.

a CHILD_SA closing at 23:09:00
The AcctInputOctets decrease from 742956 to 372016 at 2015-03-14T23:09:05.
a CHILD_SA established at 23:09:01
a CHILD_SA closing at 23:09:01

My radius server notice that at 2015-03-14T23:09:05, and kill the client with vici at 2015-03-14T23:09:06.
IKE_SA deleted at 23:09:06

https://www.ietf.org/rfc/rfc2869.txt
rfc2869 says that :

   Note that all information in an interim message is cumulative (i.e.
   number of packets sent is the total since the beginning of the
   session, not since the last interim message).

#3 Updated by bronze man over 5 years ago

I just found that bytes increase during the rekey of CHILD_SA.and the increase bytes is the same with the old CHILD_SA's bytes.

I have read the code, find that there is a data race during following three events:
Event 1: add CHILD_SA current bytes in the foreach of IKE_SA's child_sas array in send_interim.
Event 2: add CHILD_SA finish bytes when child_rekey happens.
Event 3: Delete CHILD_SA from the IKE_SA so that event 1 will not add old CHILD_SA

If following order of the events happens, the bytes will decrease after the rekey finish.
Event 2,Event 1,Event 3.

The code of Event 1:

enumerator = ike_sa->create_child_sa_enumerator(ike_sa);

https://github.com/strongswan/strongswan/blob/master/src/libcharon/plugins/eap_radius/eap_radius_accounting.c#L347

The code of Event 2:

METHOD(listener_t, child_rekey, bool,   

https://github.com/strongswan/strongswan/blob/master/src/libcharon/plugins/eap_radius/eap_radius_accounting.c#L674

        charon->bus->child_rekey(charon->bus, this->child_sa,
                            this->child_create->get_child(this->child_create));

https://github.com/strongswan/strongswan/blob/master/src/libcharon/sa/ikev2/tasks/child_rekey.c#L365

The code of Event 3:

    this->child_delete = child_delete_create(this->ike_sa, protocol, spi, FALSE);

https://github.com/strongswan/strongswan/blob/master/src/libcharon/sa/ikev2/tasks/child_rekey.c#L377

I can not find an easy way to change the code to make it work correct.
Save the rekey old CHILD_SA_id in the entry_t ,and check it in send_interim may fix this problem.
I can not find a workaround.

#4 Updated by Tobias Brunner over 5 years ago

  • Status changed from New to Feedback

Thanks for analyzing this issue. There are several similar situations that can cause the numbers to jump up and down (in particular for IKEv1, where rekeyed CHILD_SAs are kept around until they expire).

As you mentioned fixing it is not that easy. I've attempted to do so in the accounting-fixes branch with additional caching in the plugin.

#5 Updated by Tobias Brunner about 5 years ago

  • Tracker changed from Issue to Bug
  • Category set to libcharon
  • Assignee set to Tobias Brunner
  • Target version set to 5.3.1
  • Resolution set to Fixed

The associated merge commit fixes the usage stats in several corner cases and will be included in the next release.

#6 Updated by Tobias Brunner about 5 years ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF