Statistical Data Handling Program of Wireshark Analyzer and Incoming Traffic
Research
Veniamin Tarasov <tarasov-vn@psuti.ru>, Sergey Malakhov <malakhov-sv@psuti.ru> Volga Region State University of Telecommunications and Informatics, 77 Moskovskoe sh., Samara, 443090, Russian Federation
Abstract. This article is devoted to virtualization of the computing clusters constructed on the Software Defined Networks (SDN). An example of creation of a virtual computing cluster and the analysis of productivity of SDN is reviewed.
To measure the throughput capacity of the controller is used Cbench utility. The number of threads per second with which the controller can handle determines this. Cbench supports two modes of operation: delay mode and bandwidth mode. In delay mode, each switch supports exactly one emulated the new flow request, waiting for an answer before the next request. The delay time is measured time OpenFlow controller request processing at low load. The capacity of each mode, the switch sends a large number of requests as long as allows buffering. Thus, the bandwidth mode allows you to measure the maximum flow rate setting, which can support the controller.
The delay brought by OpenFlow protocol is determined. Experiments were carried out with the support of OpenFlow protocol and without, to assess the impact of the protocol on the network performance. The traffic generated by iperf utility installed on cluster nodes, thus it fills maximum channel bandwidth. Capturing traffic is tcpdump utility. For an assessment of virtual computing cluster productivity in Software Defined Networks, it is necessary to solve 2 basic tasks. The first to conduct research of productivity of the controller. The controller is selected as NOX.
The second to determine the delays brought by the OpenFlow protocol. On servers, virtual computing cluster was deployed, consisting of 17 compute nodes, with the help of software OpenNebula system. As an OpenFlow controller installed Floodlight Open SDN Controller with default applications.
Keywords: traffic analyzer, wireshark program, numerical characteristics of random variables, Lindleys equation, method of spectral decomposition.
DOI: 10.15514/ISPRAS-2015-27(3 )-21
For citation: Tarasov Veniamin, Malakhov Sergey. Statistical Data Handling Program of Wireshark Analyzer and Incoming Traffic Research. Trudy ISP RAN/Proc. ISP RAS, vol. 27, issue 3, 2015, pp. 303-314. DOI: 10.15514/ISPRAS-2015-27(3)-21.
Veniamin Tarasov, Sergey Malakhov. Statistical Data Handling Program of Wireshark Analyzer and Incoming Traffic Research. Trudy ISP RAN /Proc. ISP RAS, vol. 27, issue 3, 2015, pp. 303-314
1. Introduction
The identification of the distribution laws of intervals is particularly sophisticated problem, at the same time the traffic as a random process tends to be constantly changing. It is known, the queuing theory is based on the laws of distribution of intervals between income and service requirements. Therefore it is important to know the numerical characteristics of these intervals or their moments. In this paper we propose to use the Wireshark analyzer to determine such characteristics [[1]].
2. Description of the program Wireshark
Wireshark (previously, Ethereal) is a traffic analyzer for Ethernet computer networking technology and some others. In June 2006 the project was renamed Wireshark due to trademark issues [[1]].
The functionality provided by Wireshark is very similar to the capabilities of the tcpdump program, but Wireshark has a graphical user interface and additional features for sorting and filtering information. The program allows the user to view all the traffic through the network in real time, shifting the network card to promiscuous mode. (Eng. Promiscuous mode) (Fig. 1).
Wireshark is an application that can display the structure of a wide variety of network protocols, and therefore allows parsing network packets, showing the value of each field protocol at any level. The use of Pcap packet capture library allows capturing data only from those networks that are supported by this library. However, Wireshark can work with multiple formats of input data an open data files captured by other programs that enhances the capture. The features include:
• deep analysis of hundreds of protocols, with the regular addition of new ones;
• capturing network traffic in real time, followed by analysis at any time;
• standard three-pane packet browser (standard package has three regions);
• cross-platform: there are versions for most types of UNIX, including Linux, Solaris, FreeBSD, NetBSD, OpenBSD, Mac OS X, as well as for Windows;
• The captured from network information can be viewed by using the graphical user interface or by using the TTY-mode utility TShark;
• the most powerful sorting and filtering in the industry;
• a great opportunity to VoIP analysis;
• read / Write a large number of file formats capture: tcpdump (libpcap), Pcap NG, Catapult DCT2000, Cisco Secure IDS iplog, Microsoft Network Monitor, Network General Sniffer® (compressed and uncompressed), Sniffer® Pro, and NetXray®, Network Instruments Observer, NetScreen snoop, Novell LANalyzer, RAD-COM WAN / LAN Analyzer, Shomiti /
Finisar Surveyor, Tektronix K12xx, Visual Net-works Visual UpTime, WildPackets EtherPeek / TokenPeek / AiroPeek, and many other;
• capture files that compressed with gzip can be unpacked immediately;
• capturing real-time data can be effected via Ethernet, IEEE 802.11, PPP / HDLC, ATM, Bluetooth, USB, Token Ring, Frame Relay, FDDI, and the other (depending on the platform);
• decoding support for many protocols, including IPsec, ISAKMP, Kerberos, SNMPv3, SSL / TLS, WEP, and WPA / WPA2;
• Highlighting rules can be applied to the package list for quick, in-intuitively analysis;
• output data can be exported to XML, PostScript®, CSV, or plain text.
Fig. 1. The example of a network traffic capture by Wireshark.
CSV is one of the formats of data export, convenient for viewing (Fig. 2). This file can be opened in any text editor or spreadsheet editor for analysis and calculation of performance.
However, it is difficult to process the data in case of intense traffic even in the spreadsheet editor. Furthermore the traffic data can be stored in more than one file. This article describes a software solution for the calculation of the numerical characteristics of packet arrival intervals. The main advantage of this analyzer is his
work on a small scale of time (microseconds), in contrast to the same program NetFlow Analyzer, which captures packets-per-minute rate.
3. Determination of the moments of the interarrival time of incoming traffic
The program developed by the authors of the present paper allows, in addition to the analyzer, to retrieve the packet arrival times, isolated the incoming traffic from the entire data set received by Wireshark. Next, using the well-known formulas of mathematical statistics, it can be defined the moment characteristics of the timing. We use the statistics to the third order statistical properties, which provides representations of the distribution of the intervals.
For example, the coefficient of variation shows the difference from a Poisson traffic flow and with asymmetry gives an indication of the degree of weight in the distribution tails.
The average value of the interval between adjacent packets
_ 1 N T = "(tk+1 " tk )
k=0
where tk - packet arrival times, N - the number of intervals analyzed. Custom dispersion D = t2 - t2,
_ j N
where t2 = — '*"{tk+1 - tk )2 - the second initial moment.
k=0
The coefficient of variation c = a It , where a = -JD. Asymmetry As = (t3 -3• t2 ■¥ + 2t3)Io3,
_ ! N where 13 = — ^(tk+1 -tk)3 .
k=0
Fig. 2. The example of the data exported to the CSV format.
If a large amount of data is divided into several blocks, then these formulas are determined by the average group, and then their mean values.
4. Time data analysis software and Results
To calculate the moments of the interval between adjacent packets, we developed a program, which selects only the data related to the inbound packet from the input file, containing the capture of a network traffic data, and calculates intervals and moments.
The features include:
• sample timing of the data packets arrived at said host;
• calculation of the time intervals between the incoming packets;
• calculation of the torque characteristics for intervals of received packets;
• saving time of the data packets arrived in binary and text format;
• saving data packet arrival intervals in binary and text formats;
• output and saving torque characteristics in a text format.
The program handles text files containing the data as shown in Fig. 2 or similar. For the program the two classes (in terms of object-oriented programming) are developed:
• TrafficLogParams - stores the packet arrival time, their intervals and calculates the torque characteristics. Also provides the methods to store and download the data from files;
• LogParser - static class that produces an analysis of the input file and adds data to the TrafficLogParams class.
The input of LogParser main method is the file name and IP-address of the host. Each line of the source file is processed and from the selected data on the time and two IP-address - the address of the sender and the recipient's address. If the recipient field matches the host IP-address, then the packet arrival time is added to the array such times in TrafficLogParams class.
public static TrafficLogParams TextFileParser(string fileName, string ip, bool
isIncoming) {
TrafficLogParams log = new TrafficLogParams();
StreamReader file = new StreamReader (fileName);
string[] currentLine;
int lineNumber = 0;
int ipIndex;
if (isIncoming)
ipIndex = 2;
else
ipIndex = 1;
while (Ifile.EndOfStream)
{
currentLine = GetDataArray (file.ReadLine().Trim()); lineNumber++;
try
{
if (MinimizeIp (currentLine[ipIndex]) == MinimizeIp (ip))
{
log.AddTime(ParseDouble(currentLine [0])); }
}
catch (FormatException ex)
{
MessageBox.Show(string.Format("{0}\nCrpoKa = {1}", ex.Message, lineNumber));
}
}
file.Close();
return log; }
The second most important method of LogParser splits the input string into elements, checking every element belonging to the format of time or IP-address, and returns them as an array.
private static string[] GetDataArray(string input) {
string[] data = new string[3]; string currentValue = ""; int symbolIndex = 0; int valueIndex = 0;
while (symbolIndex < input.Length && valueIndex < 3)
{
while (symbolIndex < input.Length && (char.IsDigit(input[symbolIndex])
|| IsSeparator(input[symbolIndex])))
{
currentValue += input[symbolIndex];
symbolIndex++; }
if (currentValue != "")
{
if ((IsDouble(currentValue) || IsIp(currentValue)))
{
data[valueIndex] = currentValue;
valueIndex++; }
currentValue = "";
if (valueIndex >= 3) {
symbolIndex = input.Length;
}
}
while (symbolIndex < input.Length && !char.IsDigit(input[symbolIndex])
&&
!IsSeparator(input[symbolIndex])) {
symbolIndex++; }
}
return data; }
The method checks if the input symbol is a separator "." or ",". Such testing is important only for the time data, as in some countries, the fractional part is separated by a comma (for example, in Russia), rather than a point. It is for the reason, when a string representation of a number is converted to its equivalent real number denoting the time, the standard method is not used programming language, and its modification depends on the regional settings.
private static double ParseDouble(string value) {
if (CultureInfo.CurrentCulture .NumberFormat.NumberDecimalSeparator == ".")
{
value = value.Replace(',', '.'); }
else {
value = value.Replace('.', ','); }
return double.Parse(value);
}
When comparing the IP-address of the host with the IP-address on the current line of the log file to minimize the usual pro-IP-address to the general form. In other words, IP-address will be equal 010,014,000,011 10.14.0.11. The program was used to analyze the data file of the traffic coming to the proxy server of the university with almost an hour-long data set. The input file contains more than 2150000 rows, which could not be processed manually. Were obtained the following results (Fig. 3):
File Help
Initial moment of the 1st order: 5,09778 le-003
Initial moment of the 2nd order: 3,325837e-004
Initial moment of the 3rd order: 5,505049e-005
Dispersion: 3,065963e-004
Variation coefficient: 3,434807e+000
Asymmetry: 1,02544le+001
Packets count: 62S183
Ready!
Fig. 3. The result of the analysis program log files.
5. Research of queuing system h2/m/1
The data indicate that the analyzed traffic differs from a Poisson (coefficient of variation c = 3,43 instead of 1), the asymmetry value As = 10,25 indicates that the distribution of intervals between the packets of traffic relates to a heavy-tailed distributions. For example, for Poisson flow of As = 2. The calculation of the characteristics of such traffic requires appropriate mathematical apparatus. For the analysis of such traffic the authors of [[2]] proposed the new results for the system H2/M/1. We will describe the basic results from the article.
It is known, as example from [[3]], to study queuing systems (QS) G/G/1 the integral equation of Lindley is used:
W (y ) =
y
jw(y - u)dC(u), y > 0
(1)
0, j < 0
where W(y) is the probability distribution function (PDF), the waiting time in line requirements C(u) is the PDF limiting random variable, U = lim Un = xn - tn+1,
n—
and x is the time of the n-th service requirement Cn , and is the time interval between the tn+1 arrival of the requirements Cn and Cn+1.
Вениамин Тарасов, Сергей Малахов. Программа статистической обработки данных анализатора wireshark и исследование входящего трафика. Труды ИСП РАН, том 27, вып. 3, 2015 г., с. 303-314
To solve (1), a spectral method is used that reduces to using the expression A* (- s)-B * (s)-1 and finding a representation as a product of two factors, which would give a rational function of s [3]. Thus, to find the latency distribution, the following spectral decomposition is used:
A * (- s)-B * (s)-1 = (2)
W-(s)
where y+(s) and y/-(s) are rational functions of s, which can be factored. The functions y+(s) and ^i(s) must satisfy certain conditions [3]:
1. For Re(s)> 0, the function y+(s) is analytic without zeros in the half-plane.
2. For Re(s)< D, the function ^-(s) is analytic without zeros in the half-plane, (3) where D is a positive constant determined from the following condition:
г a(t)
lim —Vt < <x>.
t e-D
Moreover, the functions ^+(s) and ^_(s) must have the following properties: for Re(s)> 0 lim = 1;
() w (4)
for Re(s) < D lim = -1.
We know that all the main characteristics of QSs are derived from the average waiting time, and therefore all subsequent calculations will be performed with respect to the average waiting time in the queue requirements. Consider QS H2/M/1, where H2 designates the hyperexponential distribution 2nd order arrival time requirements in a density function
a(t) = pAf'* + (1 - p(5) and M - notation exponential law services with a density function b(t )=&>* (6) The Laplace transform of (5) has the form
A * (s)= P-77" + (1 - P)-^ (7)
S + Aj S + A2
and function (6):
B * (s) = -^ (8)
S + /Л
Now we define (2) for the distributions (5) and (6) from (7) and (8):
P
A ..(J _ p)_h
A _ s A> _ s
=
V + s
= [pA (A2 _ s) + (l _ P)A2 (A1 _ s)]' V _ (A1 _ s)(A2 _ sXu + s) = (9)
(A1 _ s)(A2 _ s)(V + s)
y-{s)
= z"(a0 - a1s)- (/1 - s)(/2 - s){" + s) - s){^2 - s){U + s)
where the coefficients a0 = // , a1 = p\ + (l - p)X2.
The numerator of the right side of (9) is a third degree polynomial s(s2 - c2s - cx ) , and it remains to determine the coefficients for the decomposition of the factors. The coefficients of the polynomial are:
c1 = ^[//(1 -p)+/1p\-\X1, c2 = //+/ . Then the expression (9) can be factored:
V+(s) _ s(s2 - c2s - c1 ) s(s + sl)(s - s2)
-(s) = s(s2 - C2S - C) = s(s + si)(s - s2)
Y-(s) (s-AiXA> -s)(^ + s) (s -A)(A> -sX/" + s)'
s1 = c22 / 4 + c - c2 /2) is equation in the numerator, and is the s2 =7 c22 + c1 + c2 / 2 positive root.
where _ sx = _(V c22 /4 + cx _ c2 /2) is the negative root of the quadratic
Further, omitting some calculations, we obtain the Laplace transform of the density function of the waiting time: W * (s)=si(s + . Hence
As + si)
dW * (s) s1w(s1 + s)-s, (s + u)u . _ , -— =--——-—^—— . Using the properties of the Laplace transform,
ds (s + ¿-J2
we find that the average waiting time is
W = -dW * (s)
ds
2 2 _ s1 u + u s1 1 1 =-2~2-=---• Finally, the average waiting time is
s=0 V s1 s1 u
W = — _ — (10)
s1 V
where s1 =^c\/4 + c1 _ c2 /2 , c1 = v[A(1 _p)+ A1p]_ AA, c2 = A + A _V •
6. Practical use of the results
Consider the result (10) for example, the input distribution, with a heavy tail (fig. 3). Using the Laplace transform (7) we can determine the initial moments of the distribution (5):
T ■ _ p , (1 - p )
X X X2
TX= 2 p X2 . 2(1 - p ) X ■
TX= 6 p X . 6(1 - p) x2
Next, substituting the results obtained in step 1 from the initial moments of the distribution of intervals between bursts to determine the unknown parameters of the input distribution (5): X, X2 and p, we obtain the following system of equations:
p + ¿ZZl = 5.0978e - 003
X
2p + 2(1 - p) = 3.3258e-004 (11)
-X X
6p + = 5.5050e-005
XI X2
The solution of (11) in the package Mathcad yields the following results: p- 0.950, X - 417.985, X - 17.556.
In case of load of the channel equals to 0.4, intermediate parameters: c - 10999,4; C - -54.655, 5 -135.707 and the average waiting time W « 5.329 -10"3 s.
For comparison, let us look to the average waiting time for an M/M/1 system. In this case, the intensity of service equals to /u~ 490.196 , and the channel loading p = 0.4.
T, ... .. f , . — pi u 0.4/490.196 , ^ , ?
Then the average waiting time of packets W = =-= 1 36 • 10 3 5 .
1 -p 1 - 0.4
Thus the queuing model taking into account the distribution and its weight in the tail of the input, gives a delay about four times larger than the classical model.
7. Conclusion
This paper has presented how optimistic are the results given by classical M/M/1 system in comparison to the system in the case of high H2/M/1 weightiness tail of the distribution of the input stream. Therefore, the approach can be successfully applied in the modern teletraffic theory where packet delays in the incoming traffic are significant.
Note that the distribution, which contains three unknown parameters X, X and p, allows to use the moment equations to approximate the unknown input distribution in the first three moments.
References
[1]. Wireshark official web-site URL: http://www.wireshark.org/
[2]. Tarasov V.N., Bakhareva N.F., Gorelov G.A. Mathematizheskaya model trafica s tyazhelohvostnym raspredeleniem na osnove sistemy massovogo obsluzhivaniya Н2/М/1. [Mathematical model of traffic from heavy-tailed distributions with based queuing system H2/M/1]. Infocommunicationye tehnologii, 2014, no. 3, pp.36-41.
[3]. Kleinrock L. Queueing Theory. Tran. from English. edited by V.I. Neumann. M. Mechanical Engineer-ing, 1979.
Программа статистической обработки данных анализатора wireshark и исследование входящего трафика
Вениамин Тарасов < tarasov-vn@psuti.ru> , Сергей Малахов <malakhov-sv@psuti.ru> ПГУТИ, 443090, Россия, г.Самара, Московское ш., д 77
Аннотация В работе представлена программа-дополнение к анализатору трафика Wireshark для расчета моментов случайной величины - интервала между пакетами входящего трафика. Приведено аналитическое решение для среднего времени ожидания для СМО типа Н2/М/1. Здесь Н2 - гиперэкспоненциальный закон распределения 2-го порядка интервалов времени входного потока. Конечный результат получен путем решения интегрального уравнения Линдли методом спектрального разложения. Показано, что в этом случае законы распределения интервалов между требованиями входного потока можно аппроксимировать на уровне их трех первых моментов. Совместное использование этих результатов позволяет полностью анализировать входящий трафик методами массового обслуживания.
Ключевы слова: анализатор трафика, программа Wireshark, числовые характеристики случайной величины, интегральное уравнение Линдли, метод спектрального разложения.
DOI: 10.15514/ISPRAS-2015-27(3 )-21
Для цитирования: Тарасов Вениамин, Малахов Сергей. Программа статистической обработки данных анализатора wireshark и исследование входящего трафика. Труды ИСП РАН, том 27, вып. 3, 2015 г., стр. 303-314 (на английском языке). DOI: 10.15514/ISPRAS-2015-27(3)-21.
Список литературы
[1]. Wireshark official web-site URL: http://www.wireshark.org/
[2]. В.Н. Тарасов, Н.Ф. Бахарева, Г.А. Горелов «Математическая модель трафика с тяжелохвостным распределением на основе системы массового обслуживания Н2/М/1» // Инфокоммуникационные технологии, 2014 г., №3, с.36-41.
[3]. Клейнрок Л. Теория массового обслуживания. Пер. с англ. под редакцией В.И. Неймана. М. Машиностроение, 1979. - 432 с.