Statistical data handling program of Wireshark analyzer and incoming traffic
research
Veniamin Tarasov <tarasov-vn@psuti.ru>, Sergey Malakhov <malakhov-sv@psuti.ru> PSUTI, 443090, Moskovskoe sh., 77, Samara, Russia
Abstract. The paper presents a plugin to the Wireshark traffic analyzer to calculate the moments of the random variable - the interval between packets of incoming traffic. The article also presents the analytical solution for the average waiting time for a QS type H2/M/l. Here H2 is the 2nd order hyperexponential distribution law of the input flow time intervals. The final result is obtained as a solution of Lindley's integral equation using the method of spectral decomposition. It is shown that in this case the distribution laws of intervals between input flow requirements can be approximated at the level of their three first moments. The joint use of these results allows to fully analyze the incoming traffic by queuing methods.
Keywords: traffic analyzer, wireshark program, numerical characteristics of random variables, Lindleys equation, method of spectral decomposition.
1. Introduction
The identification of the distribution laws of intervals is particularly sophisticated problem, at the same time the traffic as a random process tends to be constantly changing. It is known, the queuing theory is based on the laws of distribution of intervals between income and service requirements. Therefore it is important to know the numerical characteristics of these intervals or their moments. In this paper we propose to use the Wireshark analyzer to determine such characteristics [[1]].
2. Description of the program Wireshark
Wireshark (previously, Ethereal) is a traffic analyzer for Ethernet computer networking technology and some others. In June 2006 the project was renamed Wireshark due to trademark issues [[1]].
The functionality provided by Wireshark is very similar to the capabilities of the tcpdump program, but Wireshark has a graphical user interface and additional features for sorting and filtering information. The program allows the user to view all the traffic through the network in real time, shifting the network card to promiscuous mode. (Eng. Promiscuous mode) (Fig. 1).
Wireshark is an application that can display the structure of a wide variety of network protocols, and therefore allows parsing network packets, showing the value of each field protocol at any level. The use of Pcap packet capture library allows capturing data only from those networks that are supported by this library. However, Wireshark can work with multiple formats of input data an open data files captured by other programs that enhances the capture. The features include:
• deep analysis of hundreds of protocols, with the regular addition of new ones;
• capturing network traffic in real time, followed by analysis at any time;
• standard three-pane packet browser (standard package has three regions);
• cross-platform: there are versions for most types of UNIX, including Linux, Solaris, FreeBSD, NetBSD, OpenBSD, Mac OS X, as well as for Windows;
• The captured from network information can be viewed by using the graphical user interface or by using the TTY-mode utility TShark;
• the most powerful sorting and filtering in the industry;
• a great opportunity to VoIP analysis;
• read / Write a large number of file formats capture: tcpdump (libpcap), Pcap NG, Catapult DCT2000, Cisco Secure IDS iplog, Microsoft Network Monitor, Network General Sniffer® (compressed and uncompressed), Sniffer® Pro, and NetXray®, Network Instruments Observer, NetScreen snoop, Novell LANalyzer, RAD-COM WAN / LAN Analyzer, Shomiti / Finisar Surveyor, Tektronix K12xx, Visual Net-works Visual UpTime, WildPackets EtherPeek / TokenPeek / AiroPeek, and many other;
• capture files that compressed with gzip can be unpacked immediately;
• capturing real-time data can be effected via Ethernet, IEEE 802.11, PPP / HDLC, ATM, Bluetooth, USB, Token Ring, Frame Relay, FDDI, and the other (depending on the platform);
• decoding support for many protocols, including IPsec, ISAKMP, Kerberos, SNMPv3, SSL / TLS, WEP, and WPA / WPA2;
• Highlighting rules can be applied to the package list for quick, in-intuitively analysis;
• output data can be exported to XML, PostScript®, CSV, or plain text.
File Edit View Go Capture Analyze Statistics Telephony lools internals Help
• •«■¿шахе ",m«ti i|B|a|i вз
Filter rt... Clear Apply
Tin e Source Destination Protocol Length Info
6618 88 .121305000 54.221.249.159 192.168.1.236 TCP 60 https
6619 88 .218070000 79.111.108.119 192.168.1.236 UDP 130 sourci
6620 88 .218160000 192.168.1.236 79.111.108.119 UDP 62 sourci
6621 88 .218231000 192.168.1.236 79.111.108.119 UDP 616 sourci
6622 88 .243631000 192.168.1.236 5.164.166.217 UDP 72 sourci
6623 88 .293833000 79.111.108.119 192.168.1.236 UDP 205 sourci
6624 88 .293884000 192.168.1.236 79.111.108.119 UDP 62 sourci
6625 88 .312648000 54.221.249.159 192.168.1.236 TLSVl 507 Appli(
6626 88 . 334514000 192.168.1.236 108.160.165.138 TLSVl 912 Appli(
6627 88 .362880000 192.168.1.236 54. 221. 249.159 TCP 54 24018
6628 88 .366853000 fe80: :e437:Ьбсе: e723 : 5 el ff02::1:3 LLMNR 84 stand;
6629 88 .366866000 fe80: :e437:Ьбсе: e723 : 5el ff02::1:3 LLMNR 84 stand;
6630 88 .366909000 192.168.1.236 224.0.0.252 LLMNR 64 stand;
6631 88 .366928000 192.168.1.236 224.0.0.252 LLMNR 64 stand;
6632 88 . 369789000 2001:0:5ef5:79fd :1830 1720 dlff:dd97 2001:0:9d38:6ab8 38cf:12c5 eO' IPv6 98 IPv6 r
6633 88 .369799000 2001:0:5ef 5:79fd :1830 1720 dlff:dd97 2001:0:9d38:6abd 3027:30b0 b0ïlPv6 98 IPv6 r
6634 88 .369800000 2001:0:5ef 5:79fd :1830 1720 dlff:dd97 2001:0:5ef5:79fd 38ea:3356 aOcIPv6 98 IPv6 r
! 377808000 Asustekc_7a:dd: 2 spannîng-tree-(for-bridges) _0(STP
6636 88 . 398093000 192.168.1.236 192.168.1.255 NBN5 92 Name c
6637 88 .406468000 79.111.108.119 192.168.1.236 UDP 62 sourci
6638 88 .406520000 192.168.1.236 79.111.108.119 UDP 62 sourci
6639 88 .426222000 79.111.108.119 192.168.1.236 UDP 62 sourci
6640 88 .566740000 108.160.165.138 192.168.1.236 TCP 60 https
H Frame 1: 84 bytes on wire (672 bits), 84 bytes captured (672 bits) on interface 0
E Ethernet H, src: Giga-Byt_c7:3b :83 (lc:6f:65:c7:3b : 83), Dst: Ipv6mcast_00:01:00:03 (33:33:00:01:00:03) E Internet protocol version 6, src: fe80: :e437:b6ce:e723:85el (fe80::e437:b6ce:e723:8 5el), Dst: ff02::l:3 (ff02::l:3) H User Datagram protocol, src port: 63067 (63067), Dst port: llmnr (5Ï55) H link-local Multicast Name Resolution (query)
0000 33 33 00 01 00 03 lc 6f 65 c7 3b 83 86 dd 60 00 33.....о e.;...".
0010 00 00 00 le 11 01 fe 80 00 00 00 00 00 00 e4 37 ...............7
0020 be ce e7 23 85 el ff 02 00 00 00 00 00 00 00 00 . . .#............
0030 00 00 00 01 00 03 f6 5b 14 eb 00 le 53 bb c2 25 .......[ . . . . S..%
0040 00 00 00 01 00 00 00 00 00 00 04 7 7 7 0 61 64 00 ...........wpad.
0050 00 lc 00 01 ....
Ethernet: 'live capture in progre:s> File: C:\Un Packets: 5640 . Displayed: 6640 (1W.0%] Prcfile: Default
Fig. 1. The example of a netM'ork traffic capture by Wireshark.
CSV is one of the formats of data export, convenient for viewing (Fig. 2). This file can be opened in any text editor or spreadsheet editor for analysis and calculation of performance.
However, it is difficult to process the data in case of intense traffic even in the spreadsheet editor. Furthermore the traffic data can be stored in more than one file. This article describes a software solution for the calculation of the numerical characteristics of packet arrival intervals. The main advantage of this analyzer is his work on a small scale of time (microseconds), in contrast to the same program NetFlow Analyzer, which captures packets-per-minute rate.
3. Determination of the moments of the interarrivai time of incoming traffic
The program developed by the authors of the present paper allows, in addition to the analyzer, to retrieve the packet arrival times, isolated the incoming traffic from the entire data set received by Wireshark. Next, using the well-known formulas of mathematical statistics, it can be defined the moment characteristics of the timing. We use the statistics to the third order statistical properties, which provides representations of the distribution of the intervals.
For example, the coefficient of variation shows the difference from a Poisson traffic flow and with asymmetry gives an indication of the degree of weight in the distribution tails.
The average value of the interval between adjacent packets 1 w
k=0
where tk - packet arrival times, N - the number of intervals analyzed. Custom dispersion D =t2 - r2,
— I N
where r = —^ (ik , - ik )2 - the second initial moment.
^ k=0
The coefficient of variation c = a / z , where a = . Asymmetry As = (i3-3-t2-F + 2F3)/cî3,
_ 1 N
where i3 = —^('i , .
i-=o
Прав»
Формат Вид Спрзвк
", 'Time","Source","Destination' 1.77.174.170",' ","212.142.72.190", ,"192.168.1.236",' ","192.168.1.236",' ,"192.168.1.236",' ","192.168.1.236",' ,"192.168.1.236",' ","212.142.72.190", ,"192.168.1.236",' ,"0.003505000","192.168.1.236", ,"0.003530000","192.168.1.236",
002022000 002144000 002180000 002204000 002227000 002264000 003363000 003467000
,"Protocol","Length", 192.168.1.236",-UDP-, "192.168.1.236 212.142.72.190 212.142.72.190 212.142.72.190 212.142.72.190 212.142.72.190 "192.168.1.236 212.142.72.190 "212.142.72.190","lOP "212.142.72.190',"UDP
"LDP" "LOP" -UDP-"LIDP" "HDP" "HDP" "LIDP-"UDP"
"Info"
"120","Source port: camac Destination port: 59615" "62","Source port: 26294 Destination port: 50259"
"1464","Source port "1464","Source port "1464","Source port "1464","Source port "1464","Source port
50259 50259 50259 50259 50259
Destination port Destination port Destination port Destination port Destination port
26294" 26294" 26294" 26294" 26294"
62","Source port: 26294 Destination port: 50259" "1464","Source port: 50259 Destination port: 26294" ","1464","Source port: 50259 Destination port: 26294" ,"1464","Source port: 50259 Destination port: 26294"
Fig. 2. The example of the data exported to the CSV format.
If a large amount of data is divided into several blocks, then these formulas are determined by the average group, and then their mean values.
4. Time data analysis software and Results
To calculate the moments of the interval between adjacent packets, we developed a program, which selects only the data related to the inbound packet from the input file, containing the capture of a network traffic data, and calculates intervals and moments.
The features include:
• sample timing of the data packets arrived at said host;
• calculation of the time intervals between the incoming packets;
• calculation of the torque characteristics for intervals of received packets;
• saving time of the data packets arrived in binary and text format;
• saving data packet arrival intervals in binary and text formats;
• output and saving torque characteristics in a text format.
The program handles text files containing the data as shown in Fig. 2 or similar. For the program the two classes (in terms of object-oriented programming) are developed:
• TrafficLogParams - stores the packet arrival time, their intervals and calculates the torque characteristics. Also provides the methods to store and download the data from files;
• LogParser - static class that produces an analysis of the input file and adds data to the TrafficLogParams class.
The input of LogParser main method is the file name and IP-address of the host. Each line of the source file is processed and from the selected data on the time and two IP-address - the address of the sender and the recipient's address. If the recipient field matches the host IP-address, then the packet arrival time is added to the array such times in TrafficLogParams class.
public static TrafficLogParams TextFileParser(string fileName, string ip, bool
islncoming) {
TrafficLogParams log = new TrafficLogParams();
StreamReader file = new StreamReader (fileName);
string[] currentLine;
int lineNumber = 0;
int iplndex;
if (islncoming)
iplndex = 2;
else
iplndex = 1;
while (! file .EndOfStream) {
currentLine = GetDataArray (file.ReadLine().Trim()); lineNumber++;
try {
if (Minimizelp (currentLine[ipIndex]) == Minimizelp (ip)) {
log.AddTime(ParseDouble(currentLine [0]));
} }
catch (FormatException ex) {
MessageBox.Show(string.Format("{0}\nCTpoica = {1}", ex.Message, lineNumber)); }
}
file.Close();
return log; }
The second most important method of LogParser splits the input string into elements, checking every element belonging to the format of time or IP-address, and returns them as an array.
private static string[] GetDataArray(string input) {
string[] data = new string[3]; string current Value = ""; int symbollndex = 0; int valuelndex = 0;
while (symbollndex < inputLength && valuelndex < 3) {
while (symbollndex < inputLength && (char.IsDigit(input[symbolIndex])
11 IsSeparator(input[symbolIndex]))) {
currentValue += input[symbolIndex];
symbolIndex++; }
if (currentValue != "")
{
if ((IsDouble(currentValue) || Islp(currentValue)))
{
data[valuelndex] = currentValue;
valuelndex++; }
currentValue = "";
if (valuelndex >=3) {
symbollndex = inputLength; }
}
while (symbollndex < input.Length && !char.IsDigit(input[symbolIndex])
&&
!IsSeparator(input[symbolIndex]))
{
symbolIndex++; }
}
return data; }
The method checks if the input symbol is a separator "." or ",". Such testing is important only for the time data, as in some countries, the fractional part is separated by a comma (for example, in Russia), rather than a point. It is for the reason, when a string representation of a number is converted to its equivalent real number denoting the time, the standard method is not used programming language, and its modification depends on the regional settings.
private static double ParseDouble(string value) {
if (Culturelnfo.CurrentCulture .NumberFormat.NumberDecimalSeparator== ".") {
value = value.Replace(7,'.'); }
else {
value = value.Replace('.',','); }
return double.Parse(value); }
When comparing the IP-address of the host with the IP-address on the current line of the log file to minimize the usual pro-IP-address to the general form. In other words, IP-address will be equal 010,014,000,011 10.14.0.11. The program was used to analyze the data file of the traffic coming to the proxy server of the university with almost an hour-long data set. The input file contains more than 2150000 rows, which could not be processed manually. Were obtained the following results (Fig. 3):
File Help
Initial moment of the 1st order: 5,097781e-003
Initial moment of the 2nd order: 3.325837e-004
Initial moment of the 3rd order: 5,505049e-005
Dispersion: 3,065963e-004
Variation coefficient: 3,434807e+000
Asymmetry: 1,02544le+001
Packets connt: 628183
Ready!
Fig. 3. The result of the analysis program log files.
5. Research of queuing system h2/m/1
The data indicate that the analyzed traffic differs from a Poisson (coefficient of variation c = 3,43 instead of 1), the asymmetry value As = 10,25 indicates that the distribution of intervals between the packets of traffic relates to a heavy-tailed distributions. For example, for Poisson flow of As = 2. The calculation of the characteristics of such traffic requires appropriate mathematical apparatus. For the analysis of such traffic the authors of [[2]] proposed the new results for the system H2/M/1. We will describe the basic results from the article.
It is known, as example from [[3]], to study queuing systems (QS) G/G/l the integral equation of Lindley is used:
»(})
where fV(y) is the probability distribution function (PDF), the waiting time in line requirements is the PDF limiting random variable, U — lim Un — xn — in |,
77—
and xn is the time of the n-th service requirement ('„, and is the time interval between the t„+1 arrival of the requirements ('„ and C„+1.
To solve (1), a spectral method is used that reduces to using the expression A * (-.v)- /¿* (.v)-1 and finding a representation as a product of two factors, which would give a rational function of s [3]. Thus, to find the latency distribution, the following spectral decomposition is used:
s
jw(v-u)dC(u\ v > 0
—oo
0, V<0
(1)
where \\i (.sj and t// (vj are rational functions of s, which can be factored. The functions \\i (.sj and '// (v) must satisfy certain conditions [3]:
1. For Rc(.v) > 0, the function v|/+ (5) is analytic without zeros in the half-plane.
2. For Rc(.v) <D, the function t// (.v) is analytic without zeros in the half-plane, (3) where D is a positive constant determined from the following condition:
r
lim —< co .
t-> OD g Ut
Moreover, the functions >// (.v) and >// (.v) must have the following properties:
for Re(s) > 0 lim-^±^ = l;
M (4)
for R e(s)<D =
We know that all the main characteristics of QSs are derived from the average waiting time, and therefore all subsequent calculations will be performed with respect to the average waiting time in the queue requirements. Consider QS H2/M/1, where H2 designates the hyperexponential distribution 2nd order arrival time requirements in a density function
a(t) = + (l - p(5) and M - notation exponential law services with a density function b{t) = ^ (6) The Laplace transform of (5) has the form
+ (7)
S + S +
and function (6):
B*(s) = -2— (8)
S + JU
Now we define (2) for the distributions (5) and (6) from (7) and (8):
V-
P
* +(1 -P)-"2
A1-s JL2-s
1 =
/u + s
\p\{h.-■?)]•/¿-(4-sjh-sjM + s) _
fa-sp^-sfoi+s) /¿(op - - - Sp^ - sfoj + ¿j
where the coefficients a0 = /., /.2, ax = p\ + (l - p)/.2 .
The numerator of the right side of (9) is a third degree polynomial .s (.s 2 - c2s — ct j , and it remains to determine the coefficients for the decomposition of the factors. The coefficients of the polynomial are:
cx = p)+ X2p\-\X2 , c2 = \ + /" • Then the expression (9) can be
factored:
i//+(s) = ¿(/-c^-g) = + ^ X^- - 5-2 ) \f/_ (5) (5 - Aj - + (s~ \ - •s')i" + s) '
where — .V, = ~('\Jc22 /4 + cl — c2 / 2) is the negative root of the quadratic
equation in the numerator, and is the s2 = ^c22 +cl +c2 /2 positive root. Further, omitting some calculations, we obtain the Laplace transform of the density function of the waiting time: W*(s)= + Hence
dW*\s) s,u(s, +5)-51(5 + Ju)Ju .
-— --1—y7——^—. Using the properties of the Laplace transform,
ds /¿2(5 + 51)
we find that the average waiting time is __ dW*(s)
ds
s} u, + u2s, 1 1
-—-=---. Finally, the average waiting time is
s=0 M s1 si №
W 1 1
W=------(10)
Si H
\ + cx = fj{\{\-p)+ Xzp}-A^Ar,, c2 = \ + y^-/j.
where s, =
6. Practical use of the results
Consider the result (10) for example, the input distribution, with a heavy tail (fig. 3). Using the Laplace transform (7) we can determine the initial moments of the distribution (5):
t +
A ^2
~T_2p i 2(1 -p) ~3~ =6p | 6(1 -p)
A ^2
Next, substituting the results obtained in step 1 from the initial moments of the distribution of intervals between bursts to determine the unknown parameters of the input distribution (5): Ai. /L and p, we obtain the folio wing system of equations:
The solution of (11) in the package Mathcad yields the following results: p~ 0.950, ^-417.985, 4- 17.556.
In case of load of the channel equals to 0.4, intermediate parameters: cl - 10999,4;
c2~ -54.655, .v, -135.707 and the average waiting time W = 5.329-10 3 s.
For comparison, let us look to the average waiting time for an M/M/l system. In this case, the intensity of service equals to /ux 490.196 , and the channel loading p = 0.4.
Then the average waiting time of packets W = BJJL = 0 4/490 196 = l .36 • 10 \y.
1 -p 1-0.4
Thus the queuing model taking into account the distribution and its weight in the tail of the input, gives a delay about four times larger than the classical model.
7. Conclusion
This paper has presented how optimistic are the results given by classical M/M/l system in comparison to the system in the case of high H2/M/1 weightiness tail of the distribution of the input stream. Therefore, the approach can be successfully applied in the modern teletraffic theory where packet delays in the incoming traffic are significant.
Note that the distribution, which contains three unknown parameters 4 • 4 ancl P> allows to use the moment equations to approximate the unknown input distribution in the first three moments.
References
[1]. Wireshark official web-site URL: http://www.wireshark.org/
[2]. Tarasov V.N., Bakhareva N.F., Gorelov G.A. Mathematizheskaya model trafica s tyazhelohvostnym raspredeleniem na osnove sistemy massovogo obsluzhivaniya H2/M/1. [Mathematical model of traffic from heavy-tailed distributions with based queuing system H2/M/1], Infocommunicationye tehnologii, 2014, no. 3, pp.36-41.
[3]. Kleinrock L. Queueing Theory. Tran. from English, edited by V.I. Neumann. M. Mechanical Engineer-ing, 1979.
P | P)
= 5.0978e-003
Aj /¿2 2p , 2(1 -p)
Aa /fa
= 3.3258e-004
(11)
= 5.5050e-005
Программа статистической обработки данных анализатора wireshark и исследование входящего трафика
Вениамин Тарасов <tarasov-vn@psuti.ru>, Сергей Малахов <malakhov-sv@psuti.ru> ПГУТИ, 443090, Россия, г.Самара, Московское ш., д 77
Аннотация В работе представлена программа-дополнение к анализатору трафика Wireshark для расчета моментов случайной величины - интервала между пакетами входящего трафика. Приведено аналитическое решение для среднего времени ожидания для СМО типа Н2/М/1. Здесь Н2 - гиперэкспоненциальный закон распределения 2-го порядка интервалов времени входного потока. Конечный результат получен путем решения интегрального уравнения Линдли методом спектрального разложения. Показано, что в этом случае законы распределения интервалов между требованиями входного потока можно аппроксимировать на уровне их трех первых моментов. Совместное использование этих результатов позволяет полностью анализировать входящий трафик методами массового обслуживания.
Ключевы слова: анализатор трафика, программа Wireshark, числовые характеристики случайной величины, интегральное уравнение Линдли, метод спектрального разложения.
Список литературы
[1]. Wireshark official web-site URL: http://www.wireshark.org/
[2]. B.H. Тарасов, Н.Ф. Бахарева, F.A. Торелов «Математическая модель трафика с тяжелохвостным распределением на основе системы массового обслуживания Н2/М/1» //Инфокоммуникационные технологии, 2014 г., №3, с.36-41.
[3]. Клейнрок Л. Теория массового обслуживания. Пер. с англ. под редакцией В.И. Неймана. М. Машиностроение, 1979. -432 с.