Определение признаков вероятной достоверности новостей
Оспанова Улжан Абаевна,
магистр менеджмента, проектный менеджер Департамента прикладных исследований и разработок, АО «Информационно-аналитический центр», [email protected]
Атанаева Мираим Кажмухамбетовна,
магистр делового администрирования, И.о. Президента АО «Информационно-аналитический центр», [email protected]
Акоева Инесса Георгиевна,
главный аналитик Департамента прикладных исследований и разработок, АО «Информационно-аналитический центр», [email protected]
Булдыбаев Тимур Керимбекович,
директор Департамента прикладных исследований и разработок, АО «Информационно-аналитический центр», [email protected]
В статье рассматривается один из актуальных вопросов современного общества о возможности определения основных особенностей понимания достоверности новостей, публикуемых в онлайн-СМИ. В работе указана необходимость разработки методологии определения степени вероятной достоверности новостей в казахстанском сегменте. Исследование показывает, что подавляющее большинство населения не может отличить качественно подготовленный журналистский материал от материала, основанного на слухах и ложной информации. Таким образом, по результатам исследования мы определили наиболее приемлемые показатели для оценки вероятной степени достоверности новостей. Выбор этих показателей обусловлен возможностью автоматизации процесса их проверки, а также основан на результатах экспертного опроса. Данные информативные признаки будут в дальнейшем применены для разработки методологии оценки влияния средств массовой информации на общество с использованием методов машинного обучения.
Ключевые слова: достоверность, влияние на общество, информационные признаки
Introduction
The issue of reliability of publications remains in the focus of attention of many scholars all over the world. Thus, according to a study by the Massachusetts Institute of Technology [1], it was found that 1% of the false news most widely spread on Twitter is disseminated among the audience from 1,000 to 100,000 people, whereas the scale of truthful posts dissemination does not exceed 1,000 people. False data spread much farther, faster, deeper and broader than the truth in all categories of information, and the effects of false political news are more pronounced than for false news about terrorism, natural disasters, science, urban legends or financial information.
Despite the relevance of the topic, there is too little research in the field of analysing news content in Kazakhstan; in fact, this issue is only dealt with at the state level in order to identify potentially dangerous content that directly threatens and harms society and the state as a whole, undermining national security. However, the issue is increasingly raised and does not remain without attention.
Thus, at the Eurasian Media Forum-2018 in Almaty, international experts discussed the negative impact of biased information, distortion of facts, fake news, falsehood in media, and its excessive dissemination via new technologies, as well as the growth of the blogosphere, the threat it poses to the activities of professional journalists, professional media and governments' response to this challenge. Besides, in Kazakhstan, the Factcheck.kz project [2] conducting a check of the news for reliability has been functioning relatively recently. The primary goal of the project is to fight against unreliable information, information manipulation and fake news. Any person, media or organisation - their applications, publications, phrases, figures, and other - are subjected for fact-checking. In the world, fact-checking is a relatively new direction as a type of journalistic investigations.
Thus, the urgency of research on the identification of indicators for textual information reliability assessment is evident because of world community's concern in the dissemination of unreliable information and the impact of this information on the behaviour of large population groups [1, 3, 4]. In this regard, the development of a methodology for determining the level of probable reliability of publications in the Kazakhstani segment is timely.
о
(4
«9
SI
Z
Данная статья подготовлена в рамках реализации ПЦФ №БН05236839 Комитета науки Министерства образования и науки Республики Казахстан
Related work
The rapid development of information technology makes it possible to produce and disseminate
information much faster than in the past. The world community is increasingly discussing the problem of spreading fake news, as there is a particular influence on the political, economic and social well-being of humankind [5]. At the same time, Hilbig [6] u Min-Hsin Su et al. [7] note that fake news spread faster than the truthful.
The outcomes of the analysis of international practice in the field of media depict that the legal systems regulating their activities in different countries are not the same. According to the Bureau of International Information Programs [8], in some countries, the rights and duties of journalists have a clear and detailed fixing on the legislative level, in others, they stand on the traditions and principles of the common law. At the same time, the regulations of the Mass Media activities are universal in terms of ensuring objectivity, publishing information relevant to reality, seeking truth and accountability to society.
The analysis of the SPJ Code of Ethics [9] shows that in most codes of a journalist's ethical standards the central principle is the reliability and objectivity of the published information. However, the application of these norms for countering to unreliable information is a difficult task to accomplish, and in practice, it is used rather rarely due to the lack of unambiguous indicators for determining the reliability of the information.
In the Law on Mass Media in Kazakhstan, the stipulated main principles of mass media activities are as follows: objectivity, legitimacy, reliability and respect for privacy, honour, dignity of a person and a citizen. Legislation obliges journalists not to disseminate information that does not correspond to reality and respect the legal rights and interests of individuals and legal entities. Still, more and more messages appear that cause a public outcry, at which the population cannot always separate truthful information from lies. The lateness of official reports and refutations of this kind of information leads to an increase in distrust of information sources, which to one degree or another is confirmed by the results obtained in the course of our survey. As noted by Fuller et al. [10], with reference to earlier studies [11], the issue of recognition of fake news is highly relevant, and as confirmed by a public poll, most cannot distinguish reliable information from false, this phenomenon is widespread, in which the ability to recognize false does not depend on the readiness of the reader. Although attempts are being made to teach people to distinguish fake news from reliable, according to a global survey of sentiment, the Edelman Trust Barometer [12], the vast majority of respondents (73%) in 27 countries around the world are concerned that This concern is also present in Kazakhstan. Besides, the majority of respondents do not have the knowledge and skills to identify or identify unreliable information. That, in turn, prompted us to raise the issue of the possibility of automating the process of identifying false information in news reports based on peer review and vocabulary approach with the realities of Kazakhstan.
According to the results of the literature review, it was found that the approaches and criteria for
determining the reliability of publications are to some extent universal [10, 13], and can be more or less translated and adapted to the system for assessing the reliability of news publications in Kazakhstan. One of the main objectives of the study is to identify informative features of the probable reliability of the publication for the subsequent use of an automated approach in assessing the impact of open textual information sources on society with minimal involvement of experts.
Approach to the identification of features of reliability
The following algorithm was developed to implement the objectives of the study:
1. Conducting a sociological survey of the population of Kazakhstan and an expert survey to highlight informative features to determine the degree of probable reliability of publications.
2. Confirmation of the degree of significance and the construction of the Trust Index.
3. Formation and annotation of the corpus of news publications based on reliability features.
4. Identification of patterns and rules based on reliability features based on expert assessment.
5. Experimental confirmation of the feature's selection.
In order to determine acceptable indicators for the assessment of a publication in the context of Kazakhstan's reality, we conducted a sociological survey [14] to find out the opinion of the population. The sample of the survey is representative to the population of Kazakhstan an equal number of respondents in the 14 regions and two cities of republican significance, without the quota for gender and age. The study in total covered 3,200 respondents aged from 15 years and older, 200 respondents in each of the country's regions (14 regions and two cities of republican significance). The sample size n=3200 satisfies the confidence probability conditions of 95% with a confidence interval of ± 1.73%. The sample size by region n=200 satisfies the conditions of confidence probability of 95% with a confidence interval of ± 6.93%.
To sample respondents in the regions of the country, we used a multistage stratified sampling with a quota method of selecting units of observation [15]. The sample set for the given sample size represents the composition of the respondents to the equal population proportions by the "region" and "residence locality (urban/rural)" parameters following the data of the Statistics Committee of the Ministry of National Economy of the Republic of Kazakhstan.
Controlled quota parameters are as follows: residence locality (2 groups: urban and rural population); gender (2 groups: men and women); age groups: 15-19, 20-24, 25-34, 35-44, 45-54, 55-64, 65 years and older.
The survey include all social groups in the country by gender (52.5% - men, 47.5% - women), age groups (15-19, 20-24, 25-34, 35-44, 45-54, 55-64, 65 and over), nationalities (61.8% - Kazakhs, 26.8% -Russians and 11.4% - other nationalities), language of
o o oo ■o m 5 m
O m n m b A
X £ m O
x o m
0
01 ■o A u o 00 A X X m
o w
«9
SI
Z
communication, marital status, residence locality (56.7% - urban, 43.3% - rural), by type of employment (28.8% - private sector employees, 17.5%- public sector employees, 13.1% - retirees, 4.6% -entrepreneurs, ) and material well-being quotas.
The survey outcomes show that television and news websites are mostly trusted among the citizens of the country. 86% and 83%, respectively, trust these channels of information. Even though, only 6% use newspapers as the main source of information, 77% of respondents in the survey trust the print media to some extent. Private conversations and social networks are the least trustworthy sources of information. 20% and 30% of the total number of respondents respectively do not trust these channels of information to some extent.
In sum, the data obtained during the survey allows to identify key points for the determination of informative features indicating the probable reliability of publications.
Thereby, Ilyin specifies that an essential criterion of reliability is the bias of the information source, the reference to the source determines the reliability of information [16]. The level of probable reliability of information increases if the author refers to the primary source. However, in this case, the challenge of validating the information source itself emerges. If the latter has authority in a particular field or specialises in a certain area, it is considered competent. For example, if the source has official status or if the publication refers to the data generated by statistical agencies, research institutes, and other reliable organisations. References to official documents (regulatory, legal acts, formal agreements and contracts, and others) can be considered entirely credible in terms of information reliability.
Another informative feature for determining a publication's probable reliability is the reputation of the source of information and the author of the publication
[17]. The more independent is the source, the more objective is the information disseminated by it, and hence the level of trust in information is increased. As for the authors, the main factor shaping the reputation of any journalist is the source of information used by the journalist in his work. The amount of information coming from the source determines the level of trust in it. The higher the percentage of information confirmed valid, the higher the level of reliability of the source. In cases where the source of information is a person, in order to increase the level of trust, it is necessary to analyse the verifiable information about the author (the level of education, reputation in the areas of expertise, and other), and to consider his/her other articles and the sources he/she uses. If the reliability of the information cannot be verified some way, then the level of trust in information drops.
In the overwhelming majority of cases, unnamed sources reduce the reliability of the information, mainly when an article contains negative-tone information the reader was not aware of previously
[18]. The testimony of witnesses or eyewitnesses is much less objective since their testimony about the fragment of reality they observe is a product of the
reflection of the reality in their minds, and therefore they are subjective. The level of their reliability increases only if many witnesses describe the situation in the same way.
In cases where the information is anonymous, the level of information credibility is questionable [19], in this cases the name of the author, his or her reputation partially proves the reliability of the information, but it should be considered in combination with various number of additional factors as source reputation and comprehensive approach to coverage of events.
Additionally, one way to verify the reliability of information is to compare sources. Reliability of information increases if a large number of authors have used the same information. Research results support the information containing a fact obtained from the source; this includes official statistics or other reliable and verifiable information as well as if this information repeats in other publications on similar topics.
According to the hypothesis put forward in the study, measure the reputation of the source to determine the likely credibility of the publication, perhaps by the following indicators. First, the citation level of the source in other publications. In this case, it is necessary to take into account the citation, when the source is referred to as the ultimate truth. The second is when the government, political and public organisations refer to the data in official statements. Thirdly, the history of the source, its duration, lack of reputational costs. Fourthly, with what other sources the object under study cooperates, how authoritative and respected authors cooperate with this source on a permanent and temporary basis, what opinion do print and electronic media adhere to on this source? Fifth, information about the owner of the resource, his political and economic views, personal reputation, the possibilities of placing "custom" materials.
To the main factors of trust, the population attributed the channel of information distribution (television, newspapers, Internet sites); the status of the source of information (republican/regional/city) and the presence in the message of reasoned arguments, expert evaluation. The survey showed that the speed and availability of the transmitted information are not considered a guarantee of confidence in the received messages. This, in turn, confirms the assumption that the credibility of information is primarily based on an assessment of its reliability, and not of promptness.
We calculated the Trust Index to information dissemination channels based on the results of the survey analysis. When calculating the Index, an approach of equally weighted groups was used to calculate the indices. The Index for each information dissemination channel was calculated using the following formula:
Ir =
(n+ + 0.5*ni+—0.5*n2+—n_) (n+ + ni + + n2 + + n_ + no)
where:
n+ is the number of respondents who trust, n1± is the number of respondents, who trust, n2± is the number of respondents, who rather do not trust,
n- is the number of respondents who do not trust, n0 is the number of those who could not answer.
Based on the results of the calculation, the indices can take values from 1 to -1, where 1 is absolute trust, and -1 is absolute mistrust.
Thus, television (0.6), news websites (0.49 and print media (0.48) got the highest Trust Index. In the country as a whole, the Trust Index for all information dissemination channels is 0.43, which shows an insufficient level of public trust in existing information sources (Fig. 1).
official media. Conversely, the younger the respondent, the more often he/she trusts the opinion of his/her acquaintances and friends from social networks.
» 0,1 W 0,3 0,J 0.5 0.6
aur-kz O.SJ
EU J nt (US
ttnE notW3 Jcz
■faafa Oj:
:uivzn LLZ 0.01
36JmUz ^^^m 0,«J
vUiii ^^^m 0,0?
orr.iLj ^^^m 0,06 0,0j
But ■ ■ i ; ^m 0,03
iamonitor Icz 0,03
MhII m o.O:
LaptaJ tz m 0,02
Figure 1. Trust indices for information dissemination channels
The results show that television continues to be the most trusted channel for disseminating information. Moreover, respondents most often chose this source as the most reliable and truthful (61%).
The survey proved that the attitude towards the sources that disseminate the information plays an essential role in the formation of trust. The population reckons the information dissemination channel (television, newspapers, and websites); the status of the source of information (republican/regional/city) and the availability of reasoned arguments and expert judgment in the message as the main factors of trust. The survey showed that the promptness and accessibility to the information transmitted are not considered crucial for trust in the messages received. In turn, these findings confirm that the trust of information is primarily based on an assessment of its reliability, rather than promptness.
In order to conduct a comparative analysis of the level of trust in information dissemination channels, we also calculated the Trust Index for news websites (Fig. 2). Such websites as nur.kz (0,54), mail.ru (o,38), tengrinews.kz (0,36) and zakon.kz (0,32) received the highest Index.
According to the results of the survey, the overwhelming majority of Kazakhstanis (76%) put more trust in messages published on websites of officially registered media than in messages that other people and bloggers post on social networks (Facebook, Twitter, VKontakte, and other). Moreover, the older the respondent, the more he/she trusts the
Figure 2. Trust indices in information websites
The survey defined that the audience does not make significant distinctions between the trust in proper journalism and social networks. The difference in the level of trust in journalism is only 9 points higher than the level of trust in social networks (a total of 52.2% compared to 43.3%).
One of the factors of audience's dissatisfaction with both social networks and official media is the use of fake news and their dissemination. According to the results of the research, one of the reasons for the lack of trust in information is the spread of false news, which, according to respondents, is a frequent phenomenon not only in social networks but also on websites of officially registered media. At the same time, the share of those who believe that materials based on rumours and false information on social networks are published 1.5 times more often than the same of registered media (38% and 21% respectively).
Thus, according to the results of the bibliographic analysis and the results of the sociological survey, we determined the most acceptable indicators for estimating the probable degree of reliability of the news publication.
We identified the informative features allowing determination of the reliability of the publication as follows:
Reference to a competent and objective source made in the publication.
Reference to the primary information source.
The indication of the author's name in an article.
Rating/reputation of the publisher (websites of news and information agencies, electronic media, other similar sites) where the publication is posted.
Coverage of the same event by various media: cross-checking, the presence of discourse with other publications.
The conformity of the title with the content of the publication.
To assess the selected indicators, a test corps was formed from 5 thousand news texts. This choice is due to the following factors:
- The results of a sociological research on the assessment of the influence of open information
o o
DD ■o m S m
O m n m b A
S £ m
O
x O m
0
01 ■o A u o n A X
s m
e w
«9 ei Z
sources on society showed that for the majority of respondents from Kazakhstan (59.8%) the main source of news information is the following sources: social networks and bloggers (30.3%), as well as Internet news sites (29.5%).
- One of the most important trends in modern communication is a sharp increase in the influence of online and electronic media. The results of a study by the Agency GlobalWeblndex, indicate the predominance of digital sources of information over traditional media, which means watching television. The corpus of news texts was formed from 5 sources. For the formation of the body was carried out 2-stage systematic cluster sampling. The period of selection of publications was determined: from August 2017 to August 2018. In order to determine patterns in publications, texts were selected for three consecutive days.
The formed sample meets the necessary statistical standards (each sample unit has an equal chance of being selected), which in turn will allow transferring the conclusions of the testing to the general population of articles.
The principle of balance of the corpus was observed in terms of quota balancing in the number of news from one news resource in the total number of news publications.
Compliance with the principle of representativeness (representativeness) of the corpus, which refers to the corpus ability to reflect all the properties of the study area, was taken into account when forming the corpus, which included news media texts of various genres, types, classes, from different sources, and other. In addition, adherence to the principle of representativeness, the most artificial constraints on corpus composition were eliminated: any publication from selected sources during the analysed period had equal chances to get into the sample of the corpus.
According to the results of the selection, primary and then expert markings were made on the selected informative features of authenticity. Experts use the triangulation method to mark the body objectively [20]. That is, experts unite in 3 groups of 3 people. Each group is provided with the same publications in the same quantity. Thus, each group evaluates the same publications. Based on the results, an assessment of convergence between the coders is carried out.
At the next stage of the study, linguistic experts were involved in conducting in-depth analysis and detailed marking of texts to formalise the rules for determining informative features of plausible authenticity, as well as forming a dictionary for this informative feature.
The preliminary results of the totality of the work carried out based on the reliability of publications revealed a significant difference in the manner in which the news resources provide information. Thus, preliminary results showed that news sites rated by respondents as causing the least trust and publishing inaccurate information contain up to 57% of unreliable news, while in the more popular and sought-after resources, the accuracy of publications reaches 98%. It
is worth noting that credibility, publications cannot be used as the only indicator of the reliability and truthfulness of the news resource. In matters of assessing the impact of media on society, this indicator should be considered in conjunction with other indicators, such as objectivity, tonality, the absence or presence of politicisation of texts and manipulative techniques. Confirmation of the identified patterns is scheduled for the next stage on the unmarked corpus of news publications through machine learning tools.
Conclusion
The results of the research showed that the overwhelming majority of the population could not distinguish qualitatively prepared journalistic material from journalistic material based on rumours and false information. Moreover, there are very few criteria the audience uses to recognize fake news, and they are not always consistent.
In this regard, the majority of respondents positively responded to the question of the need for an instrument for assessing the reliability of the representatives of the journalistic community.
Thus, according to the results of the bibliographic analysis and the results of the sociological survey, we determined the most acceptable indicators for estimating the probable degree of reliability of the news publication.
The choice of these indicators of reliability is due to the possibility of automating the process of their verification, as well as based on confirmation with the results of an expert survey. These informative features will apply to develop a methodology for assessing the impact of Mass Media on society. With satisfactory results of machine learning, the next stage of the experimental study is to repeat this algorithm of work on the unlabelled corpus of texts and expert testing to conduct a comparative analysis and confirm or refute the effectiveness of the developed methodology.
Identification of features of probable reliability of news Ospanova U.A., Atanayeva M.K., Akoyeva I.G., Buldybayev T.K.
JSC "Information-Analytical Center"
The article addresses one of the topical issues of modern society on the ability to identify the main features of understanding the reliability of news published in the online Mass Media. This paper specifies the need for the development of a methodology for determining the degree of probable reliability of the news in the Kazakhstani segment. The study shows that the overwhelming majority of the population could not distinguish qualitatively prepared journalistic material from material based on rumours and false information. Thus, according to the results of the study, we determined the most acceptable indicators for estimating the probable degree of reliability of the news. The choice of these indicators is due to the possibility of automating the process of their verification, as well as based on the results of an expert survey. As well, these informative features will further apply to develop a methodology for assessing the impact of Mass Media on society using machine learning methods.
Keywords: reliability, impact on society, informative features References
1. Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science,359(6380), 1146-1151. doi:10.1126/science.aap9559
2. About Factcheck. (March 2019). Retrieved from: http://factcheck.kz/o-nas/
3. Xu, K., Wang, F., Wang, H., & Yang, B. (2018). A First Step Towards Combating Fake News over Online Social Media. Wireless Algorithms, Systems, and Applications Lecture Notes in Computer Science,521-531. doi:10.1007/978-3-319-94268-1_43.
4. Rodny-Gumede, Y., (2018). Fake It till You Make It: The Role, Impact and Consequences of Fake News. In: Mutsvairo B., Karam B. (eds) Perspectives on Political Communication in Africa. Palgrave Macmillan, Cham
5. Figueira, A., Oliveira, L., (2017). The current state of fake news: challenges and opportunities, Procedia Computer Science. Vol. 121, pp. 817-825
6. Hilbig B.E. (2012) How framing statistical statements affects subjective veracity: validation and application of a multinomial model for judgments of truth. Cognition. 2012 Oct;125(1):37-48. doi: 0.1016/j.cognition.2012.06.009. PubMed PMID: 22832179
7. Su, M., Liu, J., & Mcleod, D. M. (2019). Pathways to news sharing: Issue frame perceptions and the likelihood of sharing. Computers in Human Behavior, 91, 201-210. doi:10.1016/j.chb.2018.09.026
8. Media Law Handbook. (2010). Bureau of International Information Programs United States Department of State Retrieved from http://www.america.gov/publications/books/media-law.html, p. 74.
9. SPJ Code of Ethics. (July 29 2018). Retrieved from https://www.spj.org/ethicscode.asp
10. Fuller, C. M., Biros, D. P., & Wilson, R. L. (2009). Decision support for determining veracity via linguistic-based cues. Decision Support Systems, 46(3), 695-703. doi:10.1016/j.dss.2008.11.001
11. Bond, J. C., & Depaulo, B. M. (2006). Accuracy of Deception Judgments: Appendix B. Personality and Social Psychology Review,10(3). doi:10.1207/s15327957pspr1003_2b
12. Edelman Trust Barometer. EXECUTIVE SUMMARY. 2019. (February 2019). Retrieved from: https://www.edelman.com/sites/g/files/aatuss191/files/2019-0l/2019_Edelman_Trust_Barometer_Executive_Summary.pdf
13. Rubin, V., Conroy, N., and Chen, Y. (2015). Towards News Verification: Deception Detection Methods for News Discourse. Retrieved from https://www.researchgate.net/profile/Victoria_Rubin/publication/ 270571080_Towards_News_Verification_Deception_Detection _Methods_for_News_Discourse/links/54ad9bb80cf2828b29fcaf 7f.pdf
14. Results of a sociological survey on the impact of open information sources (electronic media) on society. (2018). "Information and Analytical Center", JSC of the Ministry of Education and Science of the Republic of Kazakhstan. Astana.
15. Dobren'kov, V. I., & Kravchenko, A. I. (2004). Metody sotsiologicheskogo issledovaniia: Uchebnik. Moskva: Infra-M.
16. Ilyin, K. (2006). Tsennost' istochnikov informatsii. (April 2019). Retrieved from: http://www.itsec.ru/articles2/control/cennost_istochnikov_inform acii.
17. Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology,52(1), 1-4. doi:10.1002/pra2.2015.145052010082
18. Ishida, Y., & Kuraya, S. (2018). Fake News and its Credibility Evaluation by Dynamic Relational Networks: A Bottom up Approach. Procedia Computer Science,126, 2228-2237. doi:10.1016/j.procs.2018.07.226
19. Dou, X., Walden, J. A., Lee, S., & Lee, J. Y. (2012). Does source matter? Examining source effects in online product reviews. Computers in Human Behavior,28(5), 1555-1563. doi:10.1016/j.chb.2012.03.015.
20. Kosharnaya, G., & Kosharnyy, V. (2016). Triangulation as a Means Ensuring Validity of Empirical Study Results. University Proceedings. Volga Region. Social Sciences, (2), 117-122. doi:10.21685/2072-3016-2016-2-13
o
o
00
■o
m
5
m
X
X
o
m
n
m
b
A
-i
O
-i
X
£
m
O
X
o
m
O
01
■o
A
u
o
00
A
X
X
m
49