Научная статья на тему 'What’s in My Profile: VKontakte Data as a Tool for Studying the Interests of Modern Teenagers'

What’s in My Profile: VKontakte Data as a Tool for Studying the Interests of Modern Teenagers Текст научной статьи по специальности «Философия, этика, религиоведение»

CC BY-NC-ND
643
69
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
social networking services / VKontakte / adolescence / interests / machine learning

Аннотация научной статьи по философии, этике, религиоведению, автор научной работы — Katerina Polivanova, Ivan Smirnov

Children’s interests play a key role in their psychological development. However, research in this field is associated with serious methodological problems, as it has traditionally used questionnaire surveys that cannot adequately describe the diverse and dynamic world of interests of a developing person. The article suggests using the information on VKontakte communities followed by teenagers, in order to explore their interests. Apart from being comprehensive, Vkontakte data is, unlike questionnaire answers, also uncensored. The method’s potential demonstrated through the example of a Moscow school with 674 students following 20,203 various VKontakte communities. It reveals that teenagers’ interests vary depending on their gender, age, and academic performance. The degree of such variance is demonstrated on an extended set of data on the interests of 290,182 VKontakte users. It transpires that communities followed by teenagers predict with high accuracy not only their gender (97%) and age (98%) but also the performance of the schools they attend (83%). The findings point to the heterogeneity of age-related behavior patterns, in particular to their correlation with gender and academic achievements. Acknowledgement of the heterogeneity of interests and the diversity of age-related behavior patterns creates conditions for the further development of student-centered education, in the absence of which education is becoming more and more alienated from real life, ignoring the interests of real people

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «What’s in My Profile: VKontakte Data as a Tool for Studying the Interests of Modern Teenagers»

What's in My Profile:

VKontakte Data as a Tool for Studying the Interests of Modern Teenagers

Katerina Polivanova, Ivan Smirnov

Received in Katerina Polivanova

December 2016 Doctor of Sciences in Psychology, Professor, Director of the Center for Modern Childhood Research, Institute of Education, National Research University Higher School of Economics. Email: kpolivanova @hse.ru Ivan Smirnov

Junior Researcher at the Institute of Education, National Research University Higher School of Economics. Email: ibsmirnov@hse.ru

Address: 20 Myasnitskaya St, 101000 Moscow, Russian Federation.

Abstract. Children's interests play a key role in their psychological development. However, research in this field is associated with serious methodological problems, as it has traditionally used questionnaire surveys that cannot adequately describe the diverse and dynamic world of interests of a developing person. The article suggests using the information on VKontakte communities followed by teenagers, in order to explore their interests. Apart from being comprehensive, Vkontakte data is, unlike questionnaire answers, also uncen-sored. The method's potential demon-

strated through the example of a Moscow school with 674 students following 20,203 various VKontakte communities. It reveals that teenagers' interests vary depending on their gender, age, and academic performance. The degree of such variance is demonstrated on an extended set of data on the interests of 290,182 VKontakte users. It transpires that communities followed by teenagers predict with high accuracy not only their gender (97%) and age (98%) but also the performance of the schools they attend (83%). The findings point to the heterogeneity of age-related behavior patterns, in particular to their correlation with gender and academic achievements. Acknowledgement of the heterogeneity of interests and the diversity of age-related behavior patterns creates conditions for the further development of student-centered education, in the absence of which education is becoming more and more alienated from real life, ignoring the interests of real people. Keywords: social networking services, VKontakte, adolescence, interests, machine learning.

DOI: 10.17323/1814-9545-2017-2-134-152

The publication was prepared as part of research project No. 16-06-00916 supported by the Russian Foundation for the Humanities.

Changes in interests and their development are the fundamental nature of adolescence, according to Lev Vygotsky. The chapter Development of Interests in the Age of Transition of his Pedology of the Adolescent begins with an often quoted sentence: "The problem of interests in the age of transition is the key to the whole problem of mental development of the adolescent." [Vygotsky 1984a: 6] And further:

"Research in this area should begin with recognizing that not only skills and psychological functions (attention, memory, thinking, etc.) of the child develop; mental development is based first of all on the evolution of children's behavior and interests, i. e. changes to the structure of their behavioral orientation." (italics added) [Ibid.] Vygotsky argued that Russian research tradition defined age as a social situation of development, as attitude toward "reality, social reality first of all." [Vygotsky 1984b:258] Therefore, attitude (at least as applied to adolescence) and interest are at least close notions, if not synonyms. In the period of adolescence, attitude assumes the shape of interest. Hence, adolescence can be described through the structure of interests.

Another basic assumption of Vygotsky, important for this study, is the uniformity of age-graded behavior patterns. Age is described as a universal characteristic by Vygotsky as well as other classical age-periodization researchers: Piaget, Freud, and Erikson. In fact, developmental psychology as such is premised on the idea, or metaphor, of the universal "ladder" of age-bound stages. Any deviation from this universal trajectory of growing up is interpreted as acceleration or a delay in development, but never as diversity. Erica Burman is critical of this viewpoint: "A normal child—an ideal type constructed from the results of empirical research of different age periods—thus becomes a myth, or fiction." [Burman 2006: 30] As Burman believes, the concept of "normal" behavioral patterns dominates real-life manifestations of individuality.

By combining the two ideas—that of interest as the fundamental nature of age and that of uniform age-graded behavior patterns—we raise the following questions: to what extent are the interests of one age cohort, namely adolescents, uniform and with uniform objects, to what extent can these objects be identified as uniform, and if they cannot be at all, does it indicate a deviation from normal development or just various types of development?

Interest research Interests and related constructs—needs, motives, inclinations, and methods orientations—have been traditionally studied using the survey method, most often fill-out questionnaires [Lubovsky 2003]. In preparing such questionnaires, psychologists and social researchers always (except when using projective tests) predetermine a standard, "normal" pattern of the phenomenon, function, or process, i. e. the object of research. This pattern is operationalized, and then criteria are set to determine the compliance of real values with the standard (average) ones (usually at the validation stage). When individual tests are analyzed, researchers assess compliance of empirically observed values with the standard ones. However, on the one hand, questions asked to respondents may be irrelevant for an individual, who may be indifferent to the issue, which will produce random and useless answers [Bertrand, Mullainathan 2001]. On the other hand, researchers risk miss-

ing out on something that really matters to an individual. This problem becomes especially acute when it comes to research on interests as a significant private domain of an individual's life.

An alternative solution is offered by projective methods [Sokolova 1980], which are based on Freud's idea of projecting one's feelings upon exterior objects. The methodology consists in presenting stimuli that allow infinite interpretations; interpretations provided by the respondent are treated as induced by the individual's unique system of inner experiences. Analysis of adolescent behavior on social networking websites can be regarded as an investigation of an individual's unconscious projections. The very choice of content (from a virtually infinite set of options) is a projection of personal interest, which becomes manifested to both the individual and the researcher, if the latter succeeds in explicating these choices.

Regardless of the methods selected—qualitative or quantitative, objective or projective, etc.—there is another problem that distorts the potential results of adolescence research significantly: cen-soredness (social acceptability) of answers. As soon as research involves an issue that really matters to a respondent, there will be a high risk of elusion and disguise, which even the respondent will not always recognize. This is especially important when applied to adolescents, because one of the characteristics of this age period is the emergence of one's own "zone", deliberately concealed from external observers [Osorina 2011].

Thus, theoretical efforts are focused on the key characteristic of interest, which is, however, inaccessible for observation or empirical research. As a result, a critical need is encountered requiring new methods that can reveal the interests of adolescents per se.

We believe that the investigation of adolescent online behavior can be a step in the right direction towards the development of such methods. Social media has become a natural habitat for modern adolescents, who use mobile devices all the time [Koroleva 2016a]. When a school student creates a profile, they add friends, post content, join communities, etc.—independently, at their own discretion, choosing from a virtually infinite number of options. This study proceeds from the assumption that adolescents' profiles represent maps, of sorts, of users' interests which provide the unique opportunity to study them objectively.

Vkontakte Checking their inbox and scrolling through newsfeeds on social media communities regularly have become an everyday ritual for adolescents [Koroleva 2016b]. Vkontakte (VK) seems to be largely prioritized among school students: 86% of the respondents reported having their primary social media accounts on this website [Koroleva 2016a]. The Vkontakte newsfeed is made up of the content posted by the user's friends and

the communities that he/she follows. Social media communities afford a unique opportunity to investigate the interests of adolescents, which cannot be revealed by any traditional empirical study [Lewis et al. 2008]. There are over 26 million communities on VK, the most popular of them being followed by hundreds of thousands of users. Communities may be dedicated to a popular computer game, movie, book, actress, performer, politician, etc. Most organizations and brands, from Mariinsky Theater to tattoo studios, have a VK community of their own. Extremely diverse interests and hobbies, from embroidery to quantum physics, are represented on VK. Nearly every school has a community called "Overheard in ...", where students share gossip. Communities can also be devoted to sexual relationships, one-sided love, or a specific lifestyle or mood (e. g. "Warm blanket, cocoa and fireplace").

As we can see, social media content can be regarded as stimuli for users to project their interests, and communities that users follow as maps of their personal interests.

Research data A dataset on 674 students from a Moscow school was used, including and methods their academic performance (GPA for the last academic year), gender, grade (5-10), and VK communities that they follow. The overall number of communities followed by at least one student was 20,203. As soon as the communities followed by no more than ten students were removed, there were 883 left for analysis.

A special software application making requests to an API (application programming interface) was used to collect data. The software is launched by a school representative who inputs the list of students. Using this list, the app identifies students with their VK profiles. To enhance coverage, the app scans not only through users who disclose their school information but also through their VK friends. In addition, the app uses an extensive set of name variations (Anna, Anya, Anech-ka, Anyutka, etc.). As soon as all the profiles have been identified and the information has been uploaded, the app removes all the names and other VK identifiers. The anonymized data then undergoes further investigations. Approach effectiveness and the lack of significant bias were demonstrated by Smirnov, Sivak & Kozmina [2016].

In order to assess the diversity of interests among students of the same school, we produced maps of their interests. At first, the ratio of female and male followers for each community, their mean GPA, and their mean age were determined. Next, community names were plotted on a coordinate plane depending on the values obtained. The X-axis shows the age of students (expressed in their grade), and the /-axis indicates their academic performance.

Interest maps Figure 1 presents communities where over 50% of the followers are girls, and Figure 2 presents those where over 50% are boys.

Figure 1. Map of students' interests. Communities with the percentage of girls of over 50%

GPA

what your moms googles

Good to be a girl

ART BOX

Interesting facts

Why wasn't it me who came up with this?: Art Ideas WAC (World Arts Family won't get it

and Culture) Best poems of great poets

Not chi|dren anymore SWAG ............ Ward No. 6.............•....................... 40 KG "

: Girls like it : Overheard

UTUBER Caramel :

Funny SMS Finishing School

School? Never heard of it

Girls will grasp it

AMM (About Moscow Say 13—Music 2016.......

Leprosy

USE|BSE

INDULGENCE

•• KudaGo ("Where to go"): Moscow Decent Films wut

Beauty School | Want You PopSlar i Music

I Love You Love Horoscope

Holy cow, what a tattoo! Unorthodox Horoscope

Legendary stuff!

Do it yourself!

Best Girl Friends .............. Äirt ^^ g 1

Grow Up

Grade

6

7

3

9

Figure 2. Map of students' interests. Communities with the percentage of boys of over 50%

GPA

Rare Snaps

Book of Records Idea Factory PE TEACHER.

Life: The Ins and Outs

Art & Fun

GAGS | Laughie

Ideas for Life Lifehack

Ours [18+] joker

PFC CSKA Moscow Cool Gags

Android Blog Real Football Facts in Brief

Windows Blog Kick-Ass Gags •• FC Spartak-Moscow

I Would (Buy) iFace Smiled: D Laugh to tears: D

Really funny:] Killing Humor Success I photoshop like a jerk

IGM

..............................................................Football Memes

Cool Gags Our Football EVIL NIGGER

Academy of Decent Guys CS:GO Champions Cup|FOOTBALL

Videogame Myth Busters

You won't believe it! Football Vine Video

Laugh Corporation Orlyonok

Empire of Cinema Vine Video Fun Time - Male Humor

HORROR MOVIES

Cinemania

Five Interesting Facts USE-GO Science and Engineering

FCKDP Evil Corporation TTLFCP Five Best Movies Mmecraft|MaÑHKpa$T

EeOneGui

9 Grade

6

7

3

This qualitative data alone is enough for some preliminary discussion and judgments. The communities followed mostly by boys form a diagonal that crosses the map from the upper left corner to the bottom right one, i. e. from high-performing sixth-graders to low-performing ninth-graders. The girls' map has a less defined structure in this regard. Such distribution of the communities has to do with the natural trend of academic performance declining from grade 5 to grade 10, boys normally showing a steeper drop.

The maps demonstrate quite clearly that the choice of communities changes from middle to high school. Girls join communities on average later than boys (fewer communities in the left part). Low-performing girls (bottom right corner) follow such communities as "Love Horoscope", "Unorthodox Horoscope", or "Holy cow, what a tattoo!". Their better-performing female peers join learning-related communities, like "USE/BSE". An even higher level of academic performance is observed among girls following "40 KG", the community devoted to healthy eating, fitness and sports. Noteworthily, girls begin to join "Not children anymore", a community about sex, using rather coarse language, as early as in their sixth grade; fewer boys of the same age are observed among the followers. "Fun Time—Male Humor" only appears in grade 8 among low performers. Higher-performing boys are also members of USE and BSE communities, similar to their female peers.

Gender- and Tables 1 and 2 present "girlish" and "boyish" communities with the performance- highest percentages of the dominant gender, while Tables 3 and 4 based contain communities with the highest and lowest GPA among their differentiation followers, respectively. The communities singled out using this technique were studied more closely (at least 15 most recent posts). They were grouped into categories describing typical content of their posts. Some communities could not be assigned any specific category due to the high degree of content heterogeneity, so they were classified as "Other". All of these communities largely post images and videos. Text posts are short and usually serve as descriptions for images and videos; purely text posts are extremely rare.

The most popular communities include those dedicated to romantic relationships (including sex), computer games, football, and various "ideas", whether it be smart inventions or tips for unusual uses of everyday items. Nearly all the posts, except for purely romantic ones, contain jokes and "gags". However, there is also a dedicated "Humor" category.

The texts often feature obscene language (sometimes abbreviated or with one letter replaced with a symbol); meanwhile, the romance-related communities use words like "cuteness", "hugs", etc. as well as "cute" images a lot.

Table 1. Communities with girls accounting for over 80% of the followers

Percentage of girls,% Community Category

100 Finishing School Girlish

100 Best Girl Friends Girlish

100 Good to be a girl Girlish

97 Girls will grasp it Girlish

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

96 Caramel Girlish, dreaming about family, fidelity, etc.

95 40 KG Weight loss, healthy eating, sports

95 Girls like it Girlish

93 Family won't get it Girlish, growing up

91 Beauty School Beauty

90 #SWAG Girlish, lovey-dovey, humor

87 Creative IDEAS Needlework, etc.

85 I Want... Other

84 Grow Up Love, dreams, platitudes

83 Best poems of great poets Poems, mostly about love

82 Do it yourself! Needlework

82 Bad Girl Girlish, inclined to uniqueness and bitchiness

82 Idea Factory Needlework, etc.

82 Art Ideas Needlework, etc.

80 Not children anymore Love

80 suffocating. Ideal love, uniqueness

Table 2. Communities with boys accounting for over 80% of the followers

Percentage of boys,% Community Category

100 CS: GO Computer games

100 Windows Blog Computers

100 GameFan Computer games

96 Videogame MythBusters Computer games

96 Football Vine Video Football

95 Minecraft | ManHKpa^T Computer games

93 Champions Cup | FOOTBALL Football

92 IGM Computer games

91 Football Memes Football

91 Success Other

91 Real Football Football

90 FOOTBALL MEMES Football

88 Academy of Decent Guys Boyish

86 Our Football Football

82 Book of Records (lots of interesting stuff) Interesting facts

80 EVIL NIGGER Racist jokes

80 HORROR MOVIES Horror films

Boys' interests focus around football, computer games and various sorts of humor. Communities followed by boys often use obscene language and slang and post a lot of photos of recognized sex symbols. Irrespective of their content and major topics, such communities contain lots of joke-like videos and texts with surprising endings as well as unexpected situations, like a video of a multiple vehicle collision on a slippery road ("Slippery Canada").

The maps of the communities followed by girls look totally different. Girls are interested in "girlish stuff", romance first of all. There are a lot of posts "about him": he must understand, he must accept, he will come back for sure and regret the break-up even years later. Abundant posts describe break-ups, loneliness and abandonment. Assum-ingly, the narrative knot is what excites interest, while conflict-free, safe situations have no such drama in them. A lot of communities and posts touch upon relationships with a close girl friend, which indicates the phase of development, first described by Freud, where adolescents experience a strong eroticized feeling for a same-sex friend. The "Bad Girl" community also revolves around girlish issues, yet with elements of causticity and cynicism, which are implied by the name: the character is presented as smart, strong and little bothered by what others think of her. There are a pretty large number of needlework-related communities, which can be associated, at a stretch though, with the "Idea Factory", popular among the boys. The language of the girls' communities contains less vulgarisms and obscenities, although it would be too soon to say anything particular about the ratio of foul language used by boys and girls at this stage. Meanwhile, posts in the girls' communities feature numerous "pet" and hy-pocoristic words, such as "cuteness" or "hugs".

While communities followed by boys can be structurally associated with strong action, the girls' ones are largely about emotional experiences.

The ranking of "A students' communities" is topped by "what your mom googles", a very peculiar community posting imaginary search engine queries of a mother that have the structure, "OK Google, why does my daughter ...?", e. g. "OK Google, why is my daughter waiting for a letter from some Hogwarts?" That is to say, adolescents describe imaginary concerns of a mother (probably both parents) about what is going on with her daughter. This is a rather complex cognitive mechanism: an adolescent figures what in their behavior, while obvious and natural for them, could seem strange to their parents. The obviousness for some (adolescents themselves) and incomprehensibility for others (parents) are what makes every post funny (to a greater or lesser extent). The same community mentions the social network service Odnoklassniki ("Classmates") and mail.ru mailboxes as instances of something obsolete. However, this humor is not mean or blunt. The "Family won't get it" community could also be partly classified into this category, but this one is rather about self-descriptions

Table 3. Communities with the mean GPA of over 4.1 among followers

GPA Community Category

4.30* what your mom googles Humor, self-identification

4.28* Legendary stuff! Ideas

4.22* Do it yourself! Ideas

4.20* EeOneGuy Other

4.16* Good to be a girl Girlish

4.15* Interesting Facts Interesting facts

4.15* Rare Snaps Photography

4.14* Why wasn't it me who came up with this? Ideas

4.13* Book of Records Interesting facts

4.13* COOL FACTS! Interesting facts

4.12 Ideas for Life Ideas

4.12 Family won't get it Humor, self-identification

4.12 Art Ideas Ideas

4.12 WAC Humor

4.12 Idea Factory Ideas

4.12 PE TEACHER Other

4.11 ART BOX Needlework, etc.

4.10 Best poems of great poets Poetry

Table 4. Communities with the mean GPA of under 3.8 among followers

GPA Community Category

3.68* Love Horoscope Horoscopes

3.68* Unorthodox Horoscope Horoscopes

3.70* Holy cow, what a tattoo! Tattoos

3.71* HORROR MOVIES Films

3.74* I Love You Romance

3.75* Sarcasm Humor

3.76* Fun Time—Male Humor Humor

3.76* Minecraft | MaMHKpa^T Computer games

3.77* Cinemania Films

3.77* Vine Video Video

3.77* Empire of Cinema Films

3.77* TTLFCKP Humor

3.78* Popular Music Music

3.78* Five Best Movies Films

3.79 Evil Corporation Humor

3.79 Selfish Other

3.79 Orlyonok Humor

3.79 FCKDP Humor

3.79 Laugh Corporation Humor

http://vo.hse.ru/en/ Note: * denotes a value that is significantly lower than average at significance level 0.05.

that adults will not understand, as the name suggests—consider the "personal fable", a characteristic introduced by David Elkind [Alberts, Elkind, Ginsberg 2007].

Besides, the communities followed by "A students" include the "Idea Factory", which offers amusing or serious-minded unorthodox ways of using everyday items, weird combinations of them, etc., communities dedicated to humor, films and music, and "Best poems of great poets" (mostly about love). The subject of sex is also represented in nearly every community, but most often within other subjects (e. g. when parents come home to find a young couple having sex).

The communities followed by lower-performing students are a totally different selection. They include horoscopes ("love" and "unorthodox" ones), "Holy cow, what a tattoo!" with numerous photos of tattoos and sometimes funny tattoo pictures, selections of films, videos, music, and various types of humor, sometimes very rude.

Communities followed by both types of users—high- and low-performers—use an extensive variety of language, including vulgarisms and obscenities.

Adolescent communities almost never touch upon school issues. School, learning, and the content of school subjects are thus left beyond the domains of children's interests (except for tests and examinations, which are obviously given attention for purely external reasons, out of necessity). Low-performing students seem to show no interest even in exams.

As can be seen, a qualitative comparison of communities followed by boys and girls, or "A students" and "F students", reveals differences in the interests of these adolescent categories. Differences can also be observed between different age cohorts, but their detailed description requires further research on the data, content analysis probably being be the most appropriate method of comparison. This was not an objective at this stage of the research due to the large number of communities (over 800).

Level of differenti- Interest maps allow qualitative assessment of the differentiation of ation of interests adolescents' interests depending on their gender, age and academic performance. However, the significance of the revealed differences remains an open question, and using traditional methods to answer it is a challenging task. Let us consider an example. Suppose we would like to assess gender differences in academic performance. If we used the traditional approach, gender would be the independent variable and academic performance the dependent one. A regression model would be built to predict academic performance based on student gender and a number of control variables. The value of the gender variable coefficient and its significance level would predict the degree of correlation between academic performance and gender. Despite its widespread use, this approach has been strongly crit-

icized [Berk 2004] since the classical study of Edward Leamer [Learner 1983], and it is far from being the only one possible [Breiman 2001]. The traditional approach is obviously inappropriate for answering the question raised here: users' interests represent an oversized variable (hundreds of thousands), which cannot be used as the dependent variable in a regression model.

Predictive power of We maintain that the level of differentiation of interests can be eval-interests uated through their predictive power, namely through the accuracy of the model predicting a specific cohort (boys/girls, middle/high school students, etc.) based on the interests of adolescents. Otherwise speaking, if a high-accuracy gender-prediction model can be built, it will mean that interests are gender-differentiated. As with statistical hypotheses, the inverse is not always true: if no such model can be created, it does not mean that there is no differentiation at all.

Let us illustrate this thought with an example. Take gender and physical appearance: obviously, it is gender that determines appearance, not vice versa. However, it would be unreasonable to try to predict appearance based on gender, as the "gender" variable can only take two values while "appearance" can take billions. Yet, the correlation between gender and appearance is beyond argument—exactly because gender can be predicted based on appearance with high accuracy. The same can be true for interests: if interests can predict gender as accurately as appearance can, they will be at least as gender-specific as appearance is.

The creation of a prediction model requires much bigger data than we used, so information on all VK users born between 1993 and 2002, who specified the school in St. Petersburg they (had) attended, was additionally collected. Information on each user included gender, year of birth, the mean USE score among the graduates of their school over the last five years, and the list of communities they followed. All in all, this city dataset contained information on 290,182 users, and the overall number of various communities followed by at least one student was 886,191. Because there was no possibility to identify VK data with real-life indicators in this case, the reliability of the dataset was enhanced by removing the users who had no VK friends from the school they allegedly (had) attended.

Model building That being said, it is not critical to use any specific prediction model. We prefer the approach proposed by Michal Kosinski and his coauthors [Kosinski, Stillwell, Graepel 2013]. First, because this is the most famous study that used data similar to ours, which means that we will be able to compare our results with international findings. Second, because Kosinski's approach is straightforward and does not require using advanced machine learning methods to be understood. A

similar approach to big data analysis is used in social research [Eagle, Pentland 2009].

Users following less than 50 communities and communities followed by less than 50 users were removed from the city dataset. The resulting dataset covered 116,912 users and 40,774 communities. Being a member of each community or not was coded using binary variables a. (j = 1; ...; 40,774), where a. = 0 if the user does not follow the j community and a. = 1 if the user does. Thus, the whole dataset represents a (ij) 116,912 x 40,774 matrix whose entries are 1 in the case where the ith user follows the jth community or 0 otherwise. Next, singular value decomposition was performed to identify the 100 principal components bk (k = 1; ...; 100) describing users' interests.

bk variables were used as predictors in the logistic regression. Target variables included the user's gender, age cohort, and the fact of attending the best- or worst-performing school. Cross-validation was performed and the dataset was divided into ten parts to avoid model overfitting.

Predictive The models built on the city dataset predict the user's gender with power of commu- 97% accuracy. They also allow for classifying the user into one of the nity following two age cohorts with 98% accuracy, provided that the age gap is nine years (years of birth 2002 and 1993). However, in cases where the age gap is only four years (years of birth 2002 and 1998), accuracy drops to 88%. The model identifies even as small an age difference as two years with 70% accuracy, as well as users from 1% of the best-performing schools and 1% of the worst-performing schools with 83% accuracy. However, only 62% accuracy is achieved in dividing users between 50% of the best-performing schools and 50% of the worst-performing ones. Such a low level of accuracy is no surprise, since individual academic achievements of students from schools of these two categories largely overlap.

Using the interests of adolescents, their gender can be predicted with nearly 100% accuracy. It means that if we have a set of interests of some girl student, we can find a similar set of interests in another girl, but never in any boy student. In theory, high prediction accuracy can be ensured by merely memorizing the sets of interest that should correspond to a specific gender. However, in this case we would not be able to draw any meaningful conclusions from this high degree of prediction accuracy. In order to avoid this, cross-validation was performed. In this case, model accuracy is tested on a different dataset than the one used for its creation.

Conclusion A virtual environment, in particular Vkontakte social networking service, used by the overwhelming majority of Russian school students, contains not only personal accounts and profiles but also communi-

ties: public pages and groups—which gather users with the same or similar interests. In this study, the fact of following a community was regarded as an indicator of a student's interest in the content that prevails in the community. Communities followed by school students were identified and grouped based on the data on students from a Moscow school (student lists and GPAs). On the whole, the findings do not contradict the existing information on the interests of school students. For instance, they are perfectly in line with the conception of four dominants— remoteness, strain, romance, and egocentrism— described in Pedology of the Adolescent by Vygotsky and with El-kind's theory of adolescent egocentrism [Elkind 1967].

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Mathematical procedures used in the course of the research extended and deepened these ideas considerably, filling them, most importantly, with specific content as a result of community content analysis. A special procedure needs to be developed in order to provide a more detailed and comprehensive description and identification of the content offered by communities. We believe that content analysis will be an adequate formalized tool to compare content across different communities. However, the existing versions of this tool cannot be applied directly yet, as community content is not restricted to text posts—which are actually the least frequent—they also include videos, pictures and audios. Therefore, it was decided to confine the study to the qualitative description of community content with the indication of language used by the users.

As a result, we offer a new tool for analyzing the interests of adolescents: the most informative domain of their lives. The results of tool application are largely consistent with the existing assumptions, making them more specific and comprehensive. The most important implication of this research is the possibility of differentiating the interests of school students based on their academic performance. We can now safely hypothesize that school students follow different trajectories in the development of their interests depending on their academic attainment. It means that information about the diversity of educational trajectories is confirmed by more specific information on adolescents' internal psychological processes, i. e. their interests.

The study undertaken has both theoretical and purely practical significance. Theoretically, it points out the heterogeneity of age-graded behavior patterns, their relationship with gender and academic performance. Whereas diversity of boys' and girls' interests has always been implied one way or another (though never used for teaching purposes), performance-related differences in the interests of same-sex children have only been "discussed" outside the academic community, e. g. in films and literature on children. From now on, this diversity can be studied systematically and pointedly. A very important avenue of further research consists in describing the categories in more detail, based on at least three indicators—gender, age/grade and academic performance—instead of only one (boys/girls or high-/low-perform-

ers). Differentiation of adolescents based on the socioeconomic and cultural backgrounds that they come from also suggests itself naturally, but this type of information is not available in the form of big data as it is not disclosed in users' profiles.

The practical significance of information on the interests typical of adolescents in different categories seems obvious. First, availability of such information will create conditions for the humanization of education, which will otherwise become disconnected from real people, ignoring their personal interests; it will also enable the evolution of learning and leisure activities. Second, the recognition of the diversity of interests and age-graded behavior patterns can facilitate practical work with adolescents, encourage the development of their interests, and enhance personalized psychological support.

The education system keeps going with the idea of uniform age-graded behavior patterns, widely exploited in research on developmental psychology. However, this idea, dating as far back as pedology, seems to be becoming less and less in touch with reality nowadays.

References Alberts A., Elkind D., Ginsberg S. (2007) The Personal Fable and Risk-Taking in Early Adolescence. Journal of Youth and Adolescence, vol. 36, no 1, pp. 7176.

Berk R. A. (2004) Advanced Quantitative Techniques in the Social Sciences. Regression Analysis: A Constructive Critique, vol. 11. Thousand Oaks, CA: Sage. Bertrand M., Mullainathan S. (2001) Do People Mean What They Say? Implications for Subjective Survey Data. The American Economic Review, vol. 91, no 2, pp. 67-72.

Breiman L. (2001) Statistical Modeling: The Two Cultures. Statistical Science,

vol. 16, no 3, pp. 199-231. Burman E. (2006) Dekonstruktivnaya psikhologiya razvitiya [Deconstructing Developmental Psychology], Izhevsk: Udmurt State University, ERGO. Eagle N., Pentland A. S. (2009) Eigenbehaviors: Identifying Structure in Routine.

Behavioral Ecology and Sociobiology, vol. 63, no 7, pp. 1057-1066. Elkind D. (1967) Egocentrism in Adolescence. Child Development, vol. 38, no 4, pp. 1025-1034.

Koroleva D. (2016a) Vsegda onlayn: ispolzovanie mobilnykh tekhnologiy i sotsial-nykh setey sovremennymi podrostkami doma i v shkole [Always Online: Using Mobile Technology and Social Media at Home and at School by Modern Teenagers]. Voprosy obrazovaniya/ Educational Studies. Moscow, no1, pp. 206-224. doi: 10.17323/1814-9545-2016-1-205-224 Koroleva D. (2016b) Issledovanie povsednevnosti sovremennykh podrostkov: prisutstvie v sotsialnykh setyakh kak neotyemlemaya sostavlyayushchaya obshcheniya [A Study of the Daily Life of Modern Teenagers: The Presence in Social Networks as an Integral Component of Communication]. Journal of Modern Foreign Psychology, vol. 5, no 2, pp. 55-61. doi:10.17759/ jmfp.201605020

Kosinski M., Stillwell D., Graepel T. (2013) Private Traits and Attributes Are Predictable from Digital Records of Human Behavior. Proceedings of the National Academy of Sciences, vol. 110, no 15, pp. 5802-5805. Leamer E. E. (1983) Let's Take the Con out of Econometrics. The American Economic Review, vol. 73, no 1, pp. 31-43.

Lewis K., Kaufman J., Gonzalez M., Wimmer A., Christakis N. (2008) Tastes, Ties, and Time: A New Social Network Dataset Using Facebook. com. Social Networks, vol. 30, no 4, pp. 330-342.

Lubovsky D. (2003) Psikhodiagnosticheskie metody v rabote s uchashchimisya 3-4 klassov [Psychodiagnostic Methods in Teaching Students of Grades 3-4], Moscow—Voronezh: Flinta, Moscow Psychological and Social Institute.

Miller S. (2002) Psikhologiya razvitiya: metody issledovaniya [Developmental Research Methods], Saint Petersburg: Piter.

Osorina M. (2011) Sekretny mir detey v prostranstve mira vzroslykh [The Secret World of Children in the Adult Environment], Saint Petersburg: Piter.

Smirnov I., Sivak E., Kozmina Y. (2016) V poiskakh utrachennykh profiley: dos-tovernost' dannykh «VKontakte» i ikh znachenie dlya issledovaniy obra-zovaniya [In Search of Lost Profiles: The Reliability of VKontakte Data and Its Importance for Educational Research]. Voprosy obrazovaniya / Educational Studies. Moscow, no4, pp. 106-122. doi: 10.17323/1814-9545-20164-106-122

Sokolova E. (1980) Proektivnye metody issledovaniya lichnosti [Projective Techniques in Personality Research], Moscow: Moscow State University.

Vygotsky L. S. (1984a) Pedologiya podrostka [Adolescent Pedology]. Sobranie Sochineniy L. S. Vygotskogo [Collected Works of L. S. Vygotsky], vol. 4. Moscow: Pedagogika, pp. 5-242.

Vygotsky L. (1984b) Problema vozrasta [The Problem of Age]. Sobranie Sochineniy L. S. Vygotskogo [Collected Works of L. S. Vygotsky], vol. 4. Moscow: Pedagogika, pp. 244-269.

i Надоели баннеры? Вы всегда можете отключить рекламу.