Kernel Sampling Based Parameter Estimation in Detected Community in Weighted Graph in Big Data

Ram Milan; Diwakar Shukla

Ram Milan, Diwakar Shukla

Department of Computer Science and Applications, Dr. Harisingh Gour Vishwavidyalaya ( A Central University), Sagar M.P1' 2 rammilan.in@gmail.com1, diwakarshukla@rediffmail.com2

Abstract

The social media platforms are such examples of big-data where the volume, velocity, and variety are visualized over time domain. Registered users of such platforms bear frequent communication with others and that could be identified as a community. Many methods (algorithms) exist in literature to detect such likely groups of frequent communication. This paper presents contribution to estimate parameters of detected communities using sampling procedure. A Kernel sampling procedure is suggested in the setup of detected community environment. A method is suggested whose efficiency has been estimated using calculations of confidence interval. Simulation procedure is used to obtain the lower and upper limits of confidence intervals with the help of multiple samples.

Keywords: Community Detection, Weighted Graph, Big Data, Internet Technology, 4G, Sampling, Simulations Confidence interval.

I. Introduction

With the expansion of social media platforms and technologies, large numbers of users are interacting with each other by forming groups, based on commonness of characters. Some most popular social networking sites are Face-book, Twitter, Instagram and Whatsapp etc. Where users register them self and communicate with the likeminded peoples. This motivates to think over for the identification of phenomena of community formation and community detection. The formation is usually on commonness but detection needs scientific methodologies.

One can assume that each registered user, on social networking platforms, is a vertex of a graph and his social communication with other people represents an edge of a graph. The quantum of connectivity with each other varies exponentially over time which generates voluminous data in a small span of time. The communication defers in modes like text, voice, image, videos, and many other similar which reveal variety in data. Moreover, in a fraction of time, growth of data on social networking platforms is immensely high which reveal velocity characteristics.

The community size and type detection is one such aspect which generates information in terms of popularity and security. Dongsheng Duan Li et al.[1] suggested algorithms for community mining assuming each user a vertex and density of connecting edges a community. An approach to community

discovery based on evaluation of partition matrix has also been considered along with detection of change points. Pizzuti et al. [2] used Genetic algorithms approach for detecting communities in social media platform with mathematical approach using concept of graph theory. The Nan Du, et al. [3] detected community development in large scale social networks. An efficient approach based on faster algorithm for obtaining close community structure was suggested due to Newman et al.[4]. A community may be subdivided into small sub communities whose formation and analysis performed by Ferrara E. [5] .The graph theoretical application for community designing and analysis was attempted by Fortunato S. [6].

Communication at the social networking platform when become highly frequent, close and intense then it reaches up to sentimental level. Deitrick et al. [7] suggested sentiment analysis approach on data obtained through social media platform. Leskovec et al.[8] considered several algorithms for network community detection. A methodological survey based contributions over community detection procedures are due to Plantie et al. [9] and Uthayasankar et al. [10]. This paper focuses on developing parameter estimation approach as a posterior application to the detected community.

II. Graph Based Rules

The methodology of community detection targets to the detection of groups of vertices within which connections are dense. Consider a graph G which is set of vertices V (G), and set of Edges E (G). One can construct rules for cliques and kernel formation based on collection of vertices and corresponding edges as under.

III. Community detection in weighted graph

The clique is referring to a kind of cohesive sub structure whose maxima provide a tool for community detection. The overlapping maximal clique is kernel. In view to N.Du, et al.[3] some of rule are as under:

Rule 1. S c V (G), "u,ve S,u * v, such that (u,v) e E, then S is a clique in G. if any other S' is a clique and S 'ê S iff S'= S, S is a maximal clique of G.

Rule 2. For a given vertex v, N(v)= {u I (v,u) e E (G) }, we call N (v) is the set of all neighbors of v. Given set S c V (G) , NIs= è N (vi ) - S, Vi e S, NIs is the set of all neighbors of S. Rule 3. Let Com (G) be the set of all components in G. the giant component is denoted by Cg and M (Cg) is the set of all the maximal cliques of Cg. We use Vmc V(G) to represent the set of all vertices covered by M (Cg).

Rule 4. Let Po,Pi,--,-- Pn-i be the sub graph of G such that " Pi, Pj, V(pi )nV(Pj)=f, and V (Po) è,---,V (Pn-i)=V(G). For any pair of Pi and Pj , if I E (Pi)I > I (NI Pin Pj)I, Pi is defined as a community of G.

Rule 5. Given vertex vi e Vm, define Ci= {SIS e M (Cg), Vie S} to be the set of all maximal cliques containing Vi and C the set of all Ci' 's. "Ci, ,Cj e C,if |Ct^c%|>= f which is a threshold to describe the

extent to which Ci overlaps with Cj, we call Cj is contained in Ci , denoted by Cj < Ci .If ci is not contained by any other element in C, Ci is called the kernel of G and Vi is the center of Ci .

Rule 6. Let K be the set of all kernels in G. Vk= { Vi |Vi e K, Kj e K } is the set of all vertices covered by K. and Ik= u (Ki n kj), ki, kj e K, i^j is the union of all the vertices that any pair of element in K has in common.

IV. Problem undertaken

Assume, using any of existing algorithms several communities have been detected. One may be interested to estimate unknown parameter of characteristics associated with edge between any pair of vertices, within the community formed in graphical population structure of a social media platform in the setup of big data. For example, large numbers of registered users are on social networking platform then the average time consumed between any pair of users within a community is a problem to work out. Being a large data setup, growing fast over time and space, the estimation of such is time and cost consuming. This paper considers a solution approach for a problem described herein using sampling procedure.

V.A Graphical Structure:

Assume a fig. 1 where enumeration of cliques is taken into consideration. Among constituted cliques, there exist maximal clique which is a complete sub graph which can represent closed relationship for single entity in a given network.

For enumerate the cliques of a graph using rules 1-5: one can get:

Co = {(Vo, Wl,W2), (Vl,Wl,W2), (V4,Wl,W2), (Vs,Wl,W2)}/ {(Vo, Wl,W2), (Vl,Wl,W2), (V3,Wl,W2), (V4,Wl,W2)}/ {(Vo, Wi,W2), (V2,Wi,W2), (V3,Wi,W2)/ (V4,Wi,W2)}/ {(Vo, Wi,w2), (V6,Wi,W2), (V4,Wi,W2)/ (Vs,Wi,W2)} Vo being as the center.

Cl={(Vo, Wl,W2), (Vl,Wl,W2)/ (V4,Wl,W2), (Vs,Wl,W2)}/ {(Vo, Wl,W2), (Vl,Wl,W2)/

(V3, Wl, W2),(V4,Wl/W2)} C2= {(Vo, Wl, W2), (V2, Wl, W2), (V4, Wl, W2), (V3, Wl, W2)}

C3= {(Vo, Wl, W2), (V2, Wl, W2), (V4, Wl, W2), (V3, Wl, W2)},

C4={(Vo, Wl,W2), (Vl,Wl,W2), (V4,Wl,W2), (V6,Wl,W2)}, {(Vo, Wl,W2), (V2,Wl,W2), (V3,Wl,W2), (V4, Wl, W2)}, {(Vl, Wl, W2), (V4, Wl, W2), (Vs, Wl, W2), (Vs, Wl, W2)l Cs= {(Vs, Wl, W2), (Vl, Wl, W2), (V4, Wl, W2), (Vs, Wl, W2)l Ce= {(Vs, Wl, W2), (Vl, Wl, W2), (V4, Wl, w2), (Vs, Wi, w2)}

C7={(V7, Wl,W2), (Vs,Wl,W2), (V9,Wl,W2), (VlO,Wl,W2)}, {(V7, Wl,W2), (V9,Wl,W2), (Vll,Wl,W2), (VlO, Wl, W2)},

Cs= {(V7, Wl, W2), (Vs, Wl, W2), (V9, Wl, W2), (VlO, Wl, W2)} C9= {(V7, Wl, W2), (Vs, Wl, W2), (V9, Wl, W2), (VlO, Wl, W2)} ClO= {(V7, Wl, W2), (Vs, Wl, W2), (V), Wl, W2), (VlO, Wl, W2)} Cll= {(V7, Wl, W2), (Vs, Wl, W2), (V9, Wl, W2), (VlO, Wl, W2)} Cs, C9 Cio, Cn are contained by C?. Therefore Co and C7are two different kernels respectively with weight associated with vertices.

VI. Parameter estimation Consider the following graph in figure 3 where first weight the age of the users registered in the social networking sites and the other weight is the number of hours of the social networking sites used. In figure 2, social media communities detected through algorithms and unknown parameters existence are given from which one can extract sample based implementation.

Community 1

Community2

Community p

(^^Sampi^^)

Figure 2: Social media communities & unknown parameters Consider the graph as population having kernel based k groups classification likes below:-

Table 1: Kernel based groups

I II III IV V VI VII VIII IX X XI XII .....

Kel Ke2 Ke3 Ke4 KeS KeS Ke7 KeS Ke9 KelO Kell Kel2 Ken

VII. Kernel Sampling:

One can consider the graphical population of vertices (node) and edges G= (V, E) divided into k Kernel based groups, derived from given a graphical population (see table 1). This constitutes setup of

Kernel Sampling. Assume the strata sizes are N1, N2, N3............Nk such that £k=1 N# = N

Let the total size of population is N from which a sample of size of population n (n< N) is drawn which is divided into Kernel based group wise as ni, n2, rn,....nk.. Such that £k=1 n# =n. Let the sample means are

in1 ,in2, in* ........ inkof the k strata respectively P.V. Sukhatme[11] and Cochran [12].

Consider vertices of graph G= (V, E) having two variables W2: number of hours the user is consuming social media website is used in a month (auxiliary variable) and Wi: the age of user (in completer years) as main variable. The unknown parameter is average number of hours consumed by a user W2. It may assume that mean age of users W1 in population is known (due to registration data while creating account on social networking sites). The ith Kernel based group has size Ni and pair of values (Wiij, W2ij) where Wiij, W2ij are jth value ith Kernel based group relating to number of hours consumed by users and ages fo users.

W1=i=2/=1 YN4 WU] (Known parameter) (4.1)

N

W2=N=Yf=1 YN-U w2i] (Unknown parameter and to be estimated) (4.2)

Moreover some other symbols are as under: W1i: Population mean of ith strata of variable Wi W2i: Population mean of ith strata of variable W2 Estimation method under Kernel Sampling:

To estimate unknownM/2, the random samples of sizes ni are drawn from ith group Ni paired values ( wiij , w2ij ) such that

W1# = 0 W1#] (4.3)

W)i = 1 Y%1 W)#] (4.4)

and (w1i]- , w2i]) are pair of sample observations from ith group Method to use for estimation of l&2is

M = £k=1 f(z#,z#)W2i , where f( zi,z'i)= ( zi .z'i) and z# = w1i, z# = -— and W2i assumed known. (4.5)

W 2i

The Mean Square Error of method M is

MSE (M)= Yk=1 ¿#2 (;! - )(Si*)2 (4.6)

R=MVM/2;

(4.7)

Where (Si*)2 = [5#2W1+R25i2W2-2RSiwiw2]

= N3iY/=1(W1#] - m/1#)2, s#W2= n3TY/=1(W2#] - w#2)2;

i YN'1 wii] ;in2i YN21 W2i]; Siwiw2 = Y/=1(W1 #] - i).(W2#] - M^)

(4.9)

(4.8)

The estimate of (Si*)2is est(MSE) = £k=1 ¿#2 (71 - N) (Si*)2

Where (Si*)2 =[5#2W1+R25#2W2-2RSiw1w2] , the 5/W1,5#2W2, Siw1w2 are estimated from sample and r=r# exist

W"

in sample.

The 95% Confidence interval for estimating l&1 is:

Step I: Draw a random sample of size n

Step II: Compute the lower limit and upper limit of confidence interval Step III: Repeat step I and II for k times (K=200)

Step IV: Compute the less than type and more than type cumulative frequency over all k samples for lover limit and upper limit of confidence interval.

Step V: Plot data of step IV on graph. The perpendicular from point of intersection on the x-axis is the simulated value of lover limit and upper limit of confidence interval for parameter to be estimated.

IX. Numerical illustration:

Consider figure 2 having 11 vertices and consisting of data in the tuple (Vi, Wii, W2O. The relationship of vertices is in the form of edges which is used to constitute form clique and kernel.

P [M- 1.96 6MSE(M)< M+1.96 6MS£(M)]=0.95

(4.10)

VIII. Simulation procedure for confidence interval

D Vil6,9

V0 15,6

V3,18,2

Figure 3: Graph with weight representing Age and hours of use.

The figure has 2 kernels Co and C7.The Kernel constituted based group structure of graphical population is as under. From figure 4 we are extracting samples from group 1(Co) and group 2(0).

As per figure 3, the representation of the vertices with weight (W1: ages of users) and (W2: time consumed by users) are given below in terms of Vi=( Wii ,W2i )

Vo=(15,6) , Vi=(16,9), V2=(17,4) V3=(18,2) V4=(12,6) Vb=(15,7)

V6=(13,7), V7=(15,3), Vs=(12,7), V9=(18,7) Vio=(19,3), Vii=(18,2)

The group 1 Kernels contains 16 tuple (Ni=16) and group 2 contains 20 tuple (N2=20).

A random sample of size ni=6 is drawn from Ni=16. Similarly, random sample of size rn is drawn from N2=20 (n1< N1, n2 < N2). Using these sample values, the objective is to estimate unknown population meanl4/(.

Table 2: Description of poi pulation parameters

Group size Ni Group mean Wi Group mean W2

Group I Ni=16 1&1G1= 14.75 W&2G1=5.93

Zi=Ni/N c2 =4 2 JW1G1 ^W2G1=3.02

=0.44

Group II N2=20 W&1G2=16.6 ^&2G2=4.5

Z2=N2/N ^11G2=6.25 ^W2G2=4.47

=0.55

N=NI+N2=36 R= ^^#=14.75/16.6=0.88 W2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Table 3: Sample based computation (First Sample)

Sam ple size Sample values (Vo,W1,W2) Mean 95% C.I.

wi W2 ri est .(Si*)2

Group I n1=6 (Vo,15,6),(Vi,16,9),( V4,12,6),(V5,15,7),( V3,18,2),(V6,13,7) w1G1=14.83 ^W1G1 = 4.51 W)G1=6.16 ^W2G1=5.47 ri=2.4 (Si*)2=12.9 2 [8.7019.67]

Group II n2=4 (V7,15,3),(V8,12,7) (V9,18,7),(Vio,19,3) V&1G2=16.0 ^W1G2 = 10 W)G2=5.0 ^W2G2 = 5.33 r2=3.2 (S2*)2=131. 2

M=14.19 Est.(MSE)=7.89

Table 4: Sample based computation (Second sample)

Sam ple size Sample values(Vo,wi,w2) Mean 95% C.I.

wi W2 ri est. .(Si*)2

Group I ni=6 (Vo,15,6),(V4,12,6), (V5,15,7) (V3,18,2) ,(V6,13,7), (Vi,16,9) W1G1=14.83 ^W1G1=4.51 W)G1=6.16 ^W2G1=5.47 ri=2.40 (Si*)2=12.8 2 [10.98-17.98]

Group II n2=4 (V7,15,3) ,(Vii,18,2) (Vs,12,7), (V9,18,7) v&1G2=15.75 ^W1G2=8.25 W)G2=4.75 ^W2G2=6.91 r2=3.31 (S2*)2=52.3 1

M=14.48 Est.(MSE)=3.37

Table 5: Sample based computation (Third sample)

Sampl e size Sample Mean 95% C.I.

Values(Vo,wi,w2) wi W2 ri est .(Si*)2

Group I ni=6 (V3,18,2) (V4,12,6) (V5,15,7) (V6,13,7) (Vi,16,9) (Vo,15,6) w1G1=14.83 ^W1G1 = 4.51 W)G1=6.16 ^W2G1=5.47 ri=2.40 (Si*)2=1 2.77 [13.19-22.47]

Group II n2=4 (V7,15,3) (Vii,18,2) (V9,18,7) (Vio,19,3) V&1G2=17.5 ^W1G2=3.0 W)G2=3.75 ^W2G2=4.91 r2=4.66 (Si*)2=5 2.57

M=17.831 Est.(MSE)=5.56

Table 6: Sample based computation (Fourth Sample)

Samp le size Sample Values(Vo,wi,w2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I ni=6 (V5,15,7) (V4,12,6) (V6,13,7) (Vo,15,6) (Vi,16,9) (V2,17,4) w1G1=14.66 J11G1 = 3.46 W)G1=6.5 ^12G1=2.7 ri=2.25 (Si*)2=22.75 [8.74-18.84]

Group II n2=4 (Vio,19,3) (V9,18,7) (Vs,12,7) (V7,15,3) W(G2=16 ^11G2 = 10 i&2G2=5 ^12G25.33 r2=3.2 (Si*)2=103.9

M=13.79 Est.(MSE)=6.66

Table 7: Sample based computation (Fifth Sample)

Mean 95% C.I.

Sample Size Sample Values(Vo,wi,w2) w1 w2 ri est .(Si*)2

Group I ni=6 (V4,12,6), (V3,18,2), (V6,13,7) (V2,17,4) (Vi,16,9) (Vo,15,6) w1G1=15.16 ^11G1=5.36 V&2G1=5.66 ^12G15.86 ri=2.67 (Si*)2=69.31 [11.48-19.9]

Group II n2=4 (Vs,12,7) ,(V9,18,7) (Vio,19,3) ,(Vii,18,2) V&1G2=16.75 511G2=10.24 v&2G2=4.75 ^12G2=6.91 r2=3.52 (Si*)2=55.57

M=15.69 Est.(MSE)=4.64

Table 8: Sample based computation (Sixth Sample)

Samp le size Sample Values (Vo,W1,W2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I n1=6 (V4,12,6) (Vo,15,6) (V2,17,4) (V5,15,7) (VI,16,9) (V3,18,2) w1G1=15.5 ^11G1=4.3 V&2G1=5.66 ^12G1=5.86 r 1=2.73 (Si*)2=49.05 [7.91-23.81]

Group II n2=4 (Vii,18,2) (Vio,19,3) (V9,18,7) (Vs,12,7) V&1G2=16.75 ^11G2=10.2 4 v&2G2=4.75 ^12G2=6.91 r2=3.52 (Si*)2=260.1 1

M=15.86 Est.(MSE)= 16.52

Table 9: Sample based computation (Seventh Sample)

Samp le size Sample Values (Vo,W1,W2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I ni=6 (Vo,15,6) (Vi,16,9), (V4,12,6) (V5,15,7) (V2,17,4) (V3,18,2) w1G1=15.5 ^1IGI=4.3 W)GI=5.66 ^12G1=5.86 ri=2.7 3 (Si*)2=49.43 [9.4821.46]

Group II n2=4 (V7,15,3) (Vs,12,7) (V9,18,7) (Vio,19,3) w1G2=16.00 ^11G2=10.0 W)G2=4.75 ^12G2=5.41 r2=3.3 6 (Si*)2=140.67

M=15.47 Est.(MSE)= 9.37

Table 10: Sample based computation (Eighth Sample)

Sam ple size Sample Values (Vo,W1,W2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I n1=6 (V3,18,2) (V4,12,6), (V5,15,7) (V6,13,7) (V2,17,4) (VI,16,9) W1G1=14.83 ^11G1=5.44 V&2G1=5.83 ^12G1=6.15 ri=2.54 (Si*)2=55.55 [10.1,19.54 ]

Group II n2=4 V7,15,3) (V8,12,7) (V9,18,7) (Vii,18,2) V&1G2=15.75 ^11G2=8.25 v&2G2=4.75 ^12G2=6.91 r2=3.31 (Si*)2=79.59

M=14.82 Est.(MSE)=5.82

Table 11: Sample based computation (Ninth Sample)

Samp Sample Values Mean 95% C.I.

le (Vo,W1,W2) W1 W2 ri est .(Si*)2

size

Group n1=6 (V4,12,6) (V3,18,2), W1G1=15.16 v&2G1=5.66 ri=2.67 (Si*)2=59.16 [8.93-21.59]

I (V6,13,7) (Vo,15,6) (V2,17,4) (Vi,16,9) ^11G1=5.36 ^12G1=5.86

Group n2=4 (V8,12,7) (Vio,19,3), v&1G2=16.75 V&2G2=5.00 ri=3.35 (Si*)2=155.7

II (V9,18,7) (Vii,18,2) 511G2=10.24 ^12G2=8 8

M=15.26 Est(MSE)=10.45

Table 12: Sample based computation (Tenth Sample)

Samp le size Sample Values (Vo,W1,W2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I ni=6 (V6,13,7) (V5,15,7), (V4,12,6) (Vo,15,6) (V2,17,4) (VI,16,9) w1G1=14.66 ^11G1=3.46 W)G1=6.5 ^12G1=2.7 ri=2.25 (Si*)2=21.55 [8.86-23.98]

Group II n2=4 (V7,15,3) (Vio,19,3) (Vii,18,2) (Vs,12,7) w1G2=16.00 ^11G2=10.0 W)G2=3.75 ^12G2=4.91 r2=4.26 (Si*)2=242.52

M=16.42 Est.(MSE)=14.95

Table 13: Sample based computation (Eleventh Sample)

Sample Values (Vo,W1,W2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I n1=6 (V6,13,7) (V5,15,7), (V4,12,6) (V3,18,2) (V2,17,4) (VI,16,9) w1G1=15.16 ^11G1=5.36 V&2G1=5.83 ^12G1=6.15 ri=2.60 (Si*)2 =56.9 8 [8.87-22.11]

Group II n2=4 (V8,12,7) (Vio,19,3) (Vii,18,2) (V9,18,7) V&1G2=16.75 ^1152=10.24 v&2G2=4.75 ^12G2=6.91 r2=3.52 (Si*)2 =172. 80

M=15.49 Est.(MSE)=11.43

Table 14: Sample based computation (Twelfth Sample)

Samp le size Sample Values (Vo,W1,W2) Mean 95% C.I.

wi W2 est .(Si*)2

Group I ni=6 (V2,17,4) (Vi,16,9), (Vo,15,6) (V4,12,6) (V5,15,7) (V6,13,7) w1G1=14.66 ^11G1=3.46 V&2G1=6.5 ^12G1=2.7 ri=2.25 (Si*)2=22.7 6 [8.19-19.39]

Group II n2=4 (V7,15,3) (Vio,19,3) (V9,18,7) (V8,12,7) w1G2=16.00 Sl1G2=10.° V&2G2=5.00 ^12G2=5.33 r2=3.2 (Si*)2=129. 42

M=13.79 Est.(MSE)=8.19

Table 15: Sample based computation (Thirteenth Sample)

Sampl Sample Values Mean 95% C.I.

e size (Vo,W1,W2) W1 W2 ri est .(Si*)2

Group ni=6 (V2,17,4) (VI,16,9), w1G1=15.66 W)GI=5.83 ri=2.68 [9.23-22.19]

I (V3,18,2) (V5,15,7) (V6,13,7) (Vo,15,6) ^11G1=3.°6 ^12G1=6.15 (Si*)2= 39.85

Group n2=4 (Vii,18,2) (Vio,19,3) W(G2=16.75 W)G2=4.75 r2=3.52

II (V9,18,7) (Vs,12,7) 51IG2=10.24 ^12G2=6.91 (Si*)2= 172.80

M=15.71 Est.(MSE=11.10

Table 16: Sample based computation (Fourteenth Sample)

Sample size Sample Values (Vo,W1,W2) Mean 95% C.I.

wi W2 ri est .(Si*)2

Group I ni=6 (V5,15,7) (V4,12,6), (V6,13,7) (V3,18,2) (V2,17,4) (Vi,16,9) w1G1=15.16 ^11G1=5.36 v&2G1=5.83 ^12G1=6.15 ri=2.60 (Si*)2=56.98 [9.8224.82]

Group II n2=4 (Vio,19,3) (Vii,18,2) (V7,15,3) (V8,12,7) w1G2=16.00 Sl1G2=10.° V&2G2=3.75 ^12G2=4.91 r2=4.26 (Si*)2=229.74

M=17.32 Est.(MSE)=14.85

Table 17: Sample based computation (Fifteenth Sample)

Sampl e size Sample Values (Vo,Wi,W2) Mean 95% C.I.

Wi W2 ri est .(Si*)2

Group I ni=6 (Vi,16,9) (Vo,15,6), (V4,12,6) (V5,15,7) (V2,17,4) (V3,18,2) W1G1=15.5 ^11G1=4.3 V&2G1=5.66 ^12G1=5.86 ri=2.73 (Si*)2 =49.4 3 [9.27-22.43]

Group II n2=4 (Vii,18,2) (Vio,19,3) (V9,18,7) (V8,12,7) V&1G2=16.75 ^1152=10.24 v&2G2=4.75 ^12G2=6.91 r2=3.52 (Si*)2 =172. 8

M=15.85 Est.(MSE)=11.29

Table 18: Sample based computation (Sixteenth Sample)

Sample size Sample Values (Vo,W1,W2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I ni=6 (V6,13,7) (V5,15,7), (V4,12,6) (Vo,15,6) (V2,17,4) (V3,18,2) w1G1=15.0 ^11G1=5.2 W)G1=5.33 ^12G1=3.85 ri=2.81 (Si*)2=66.45 [14.3123.43]

Group II n2=4 (Vii,18,2) (Vio,19,3) (V9,18,7) (V7,15,3) WiG2=17.50 ■^W1G2=3.° W)G2=3.75 ^12G2=4.91 r2=4.66 (Si*)2=70.39

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

M=18.87 Est.(MSE)=5.46

Table 19: Sample based computation (Seventeenth Sample)

Sample Sample Values Mean 95% C.I.

size (Vo,W1,W2) W1 W2 ri

Group n1=6 (V5,15,6) (V4,12,6), w1G1=14.66 W)G1=6.5 (Si*)2=26.08 [8.84-24]

I (V6,13,7) (V2,17,4) (Vi,16,9) (Vo,15,6) ^11G1=3.46 ^12G1=2.7 ri=2.25

Group n2=4 (Vii,18,2) (Vio,19,3) w1G2=16.00 W)G2=3.75 r2=4.26 (Si*)2=242.52

II (V7,15,3) (Vs,12,7) ^11G2=10.0 ^12G2=4.91

M=16.42 Est.(MSE)= 15.04

Table 20: Sample based computation (Eighteenth Sample)

Sampl e size Sample Values (Vo,W1,W2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I ni=6 (V4,12,6) (V3,18,2), (V2,17,4) (Vi,16,9) (Vo,15,6) (V5,15,7) w1G1=15.5 ^11G1=4.3 W)G1=5.66 ^12G1=5.86 ri=2. 73 (Si*)2=-0.476 [9.35-22.35

Group II n2=4 (Vii,18,2) (Vio,19,3) (V9,18,7) (V8,12,7) v&1G2=16.75 511G)=10.24 W)G2=4.75 ^12G2=6.91 r2=3. 52 (Si*)2=172.8 0

M=15.85 Est.(MSE)=10.37

Table 21: Sample based computation (Nineteenth Sample)

Sample size Sample Values (Vo,W1,W2) Mean 95% C.I.

W1 W2 ri est .(Si*)2

Group I ni=6 (V4,12,6) (V3,18,2), (V2,17,4) (VI,16,9) (Vo,15,6) (V6,13,7) w1G1=15.16 ^11G1=5.36 W)G1=5.66 ^12G1=5.86 ri=2.67 (Si*)2=60.92 [9.05-20.73]

Group II n2=4 (V7,15,3) (Vio,19,3) (V9,18,7) (Vs,12,7) w1G2=16.00 ^11G2=10.0 W)G2=5.00 ^12G2=5.33 r2=3.2 (Si*)2=129.42

M=14.89 Est.(MSE)=8.90

Table 22: Sample based computation (Twenty Samples)

Sampl e size Sample Values (Vo,W1,W2) Mean 95% C.I.

w1 w2 ri est .(Si*)2

Group I n1=6 (V2,17,4) (Vi,16,9), (V4,12,6) (V5,15,7) (V6,13,7) (Vo,15,6) w1G1=14.66 ^11G1=3.46 V&2G1=6.5 ^12G1=2.7 ri=2.25 (Si*)2=22.96 [13.4-21.42]

Group II n2=4 (V9,18,7) (V7,15,3) (Vio,19,3) (Vii,18,2) W1G2=17.50 ^11G2=3.0 V&2G2=3.75 ^12G2=4.9 r2=4.66 (Si*)2=63.47

M=17.41 Est.(MSE)=4.23

Table 23: For Confidence interval calculations

For lower limit of Confidence Interval For upper limit of Confidence Interval

Class Interval Probabil ity over 200 samples LTT MTT Class Interval Probabili ty over 200 samples LTT MTT

Below 8.0 0.08 0.08 1.00 Below 17.0 0.05 0.05 1.00

8.0-9.0 0.38 0.46 0.92 17.0-18.0 0.13 0.18 0.95

9.0-10.0 0.29 0.75 0.54 18.0-19.0 0.16 0.34 0.82

10.0-11.0 0.14 0.89 0.25 19.0-20.0 0.18 0.52 0.66

11.0-12.0 0.04 0.93 0.11 20.0-21.0 0.23 0.75 0.48

12.0-13.0 0.05 0.98 0.07 21.0-22.0 0.14 0.89 0.25

13.0-14.0 0.01 0.99 0.02 22.0-23.0 0.09 0.98 0.11

Above 14.0 0.01 1 0.01 23.0-24.0 0.01 0.99 0.02

LTT: Less Than Type; MTT: More Than Type Above 24.0 0.01 1.00 0.01

Probability =

Y,fi : total frequency;

fi : frequency of ith class interval P[A]= probability of event A.

Upper Limit

LTT MTT

0.05 1

0.18 0.95

0.34 0.82

0.52 0.66

0.75 0.48

0.89 0.25

0.98 0.11

0.99 0.02

1 0.01

b=4.9

Cu

m

mu

lati

ve

Pr

ob

abi

lity

0.8

0.6

0.4

0.2

-LTT

N. -MTT

> t --

12345678 Estimation of Confidence Interval Upper limit

Figure5 : Intersecting point of LTT & MTT for Upper limt

a=2.2, b=4.9

Confidence Interval = P[a<2.3<b]=0.95 ; where P[A] is probability of event A. Other Computations: - (S1*)2 =17.64, (S2*)2= 39.08

= 4.70

MSE (M) = ^ - 1) (V^2 - 1) (V)2

X. Conclusion

In this paper, a graphical structure of population has been considered and using the Kernel creation procedure rules and closed communities have been detected. The closeness is based on criteria of click formation. In order to estimate the unknown population parameter (average hours used) a scheme named after as Kernel Sampling estimation method is used. The 95% confidence intervals have been computed. It has been found that 95% confidence intervals are catching the true values. The simulation procedure suggested herein provides the well predicted estimated interval. This contribution opens up avenues and opportunities to think for mixing of community detection and parameter estimation.

References:

[1]. Dongsheng Duan YUhua Li et al. "Community Mining on Dynamic Weighted Directed Graphs", CNIKM 09 November 6, 2009, Hong Kong, China.

[2]. C. Pizzuti, "Community detection in social networks with genetic algorithms" Annual conference on Genetic and Evolutionary Computation, pages1137-1138, 2008.

[3]. Nan. Du, B.Wu, Xin Pei et al. "Community detection in large scale social networks" , SNA-KDD, pages 16-25, August 12, 2007, California, USA.

[4]. M.E.J. Newman. "Fast algorithm for detecting community structure in networks" Phys Rev E Stat Nonilin Soft Matter Phys, 2004.

[5]. Ferrara E. "A large-scale community structure analysis in Face-book. EPJ Data Sci ;1(9),2012

[6]. Fortunato S. Community detection in graphs. Phys Rep 2010; 4863(3-5):75-174.

[7]. Deitrick W, Valyou B, et al. "Enhancing sentiment analysis on twitter using community detection". Communications and Network 2013;5(3):192-7.

[8]. Leskovec J, Lang KJ, Mahoney MW. Empirical comparison of algorithms for network community detection". In: International conference on World Wide Web (WWW); 2010.p. 631-40.

[9]. Plantie M, Crampes M. "Survey on social community detection", In: Social media retrieval, computer Communications and Networks. Springer: 2013. P 65-85.

[10]. Uthayasankar Sivarajah et al. "Critical analysis of Big data challenges and analytical methods", Journal of Business research, (2017), P 263-286.

[11]. P.V. Sukhatme, B.V.Sukhatme et al. "Sampling Theory of Surveys with Applications", IOWA State University Press and Indian Society of Agricultural Statistics (New Delhi), 1984.

[12]. Cochran W.G (2005) , Sampling Techniques, John Willey and Sons, New York.

Kernel Sampling Based Parameter Estimation in Detected Community in Weighted Graph in Big Data Текст научной статьи по специальности «Фундаментальная медицина»

Аннотация научной статьи по фундаментальной медицине, автор научной работы — Ram Milan, Diwakar Shukla

Похожие темы научных работ по фундаментальной медицине , автор научной работы — Ram Milan, Diwakar Shukla

Текст научной работы на тему «Kernel Sampling Based Parameter Estimation in Detected Community in Weighted Graph in Big Data»