DETECTING DIGITAL DECEPTION: MACHINE LEARNING'S ROLE IN EXPOSING SOCIAL MEDIA BOTS

Sharipbay U.Z.

УДК 004.853

Sharipbay U.Z.

Kazakh-British Technical University (Almaty, Kazakhstan)

DETECTING DIGITAL DECEPTION:

MACHINE LEARNING'S ROLE IN EXPOSING SOCIAL MEDIA BOTS

Аннотация: increasing prevalence of social media bots in Kazakhstan poses a significant threat to the integrity of online discourse and democratic processes. This article explores the multifaceted challenges posed by these automated accounts, particularly in the Russian-language context, where they exploit linguistic nuances and cultural references to blend in with legitimate users. The study delves into the evolving tactics and behaviors of social media bots, highlighting their potential to amplify propaganda, spread disinformation, and manipulate public opinion.

To combat this growing threat, the article examines the application of machine learning, a powerful tool that can analyze vast amounts of data to identify patterns distinguishing bots from humans. The study emphasizes the importance of feature engineering in the Russian language, highlighting linguistic features specific to the Kazakh context, and discusses the potential of the Natasha NLP library as a valuable resource for bot detection. The challenges and opportunities in developing robust bot detection models are discussed, with a focus on the need for adaptable approaches that can keep pace with the evolving nature of bot technology. The article concludes by highlighting the importance of ongoing research and collaboration to safeguard the integrity of online communication in Kazakhstan and foster a more transparent and trustworthy digital ecosystem.

Ключевые слова: social media bots, bot detection, machine learning, Russian language, Kazakhstan, Natasha NLP, natural language processing, feature engineering, disinformation, propaganda.

The digital age has ushered in unprecedented opportunities for connectivity and communication, with social media platforms playing a central role in shaping public discourse in Kazakhstan. However, this vibrant online landscape is increasingly being

infiltrated by social media bots, automated accounts designed to manipulate opinions, spread disinformation, and sow discord. This article explores the growing threat of social media bots in Kazakhstan, delving into their diverse tactics, their impact on society, and the promising role of machine learning in combating this menace.

The Rise of Social Bots in Kazakhstan.

The proliferation of social media bots is a global phenomenon, but its impact is particularly pronounced in countries like Kazakhstan, where the online sphere plays a crucial role in political and social life. The anonymity and reach offered by platforms like VKontakte, Instagram*(* запрещено в РФ), and Facebook* (* запрещено в РФ) provide fertile ground for malicious actors seeking to exploit vulnerabilities and manipulate public opinion [1, 5]. These actors, ranging from state-sponsored entities to political operatives and commercial interests, deploy bots to achieve various objectives, including amplifying propaganda, spreading disinformation, polarizing public opinion, and manipulating political discourse [2, 10, 11]. The impact of these activities can be far-reaching, undermining democratic processes, eroding public trust, and even contributing to social unrest [4]. In the Kazakh context, where the internet penetration rate is high and social media usage is widespread [8], the influence of bots on public discourse is particularly significant.

Challenges in Bot Detection.

The detection of social media bots is a complex and evolving challenge. Traditional rule-based methods, which rely on manually defined criteria, are often easily circumvented by sophisticated bots that can adapt their tactics [12]. Furthermore, the unique linguistic and cultural landscape of Kazakhstan, with its dominance of the Russian language and nuanced social dynamics, requires tailored approaches for bot detection [6].

Bots are becoming increasingly adept at mimicking human behavior, employing advanced techniques such as natural language generation and social engineering to evade detection. The use of "hybrid" bots, which combine automated and human-controlled elements, further complicates the detection process. Additionally, the prevalence of code-switching between Russian and Kazakh on

Kazakhstani social media platforms adds another layer of complexity, making it difficult to identify linguistic patterns characteristic of bots.

Machine Learning: A Game-Changer in Bot Detection.

Machine learning, a branch of artificial intelligence, offers a promising solution to the challenges of bot detection. By analyzing large volumes of data and identifying patterns that distinguish bots from humans, machine learning algorithms can learn to classify accounts with high accuracy [13]. This adaptability makes machine learning a valuable tool in the ongoing arms race against increasingly sophisticated bots.

Several machine learning techniques have been applied to bot detection, including supervised learning, unsupervised learning, and hybrid approaches [17, 31]. Supervised learning algorithms, such as Support Vector Machines (SVM) and Random Forests (RF), are trained on labeled datasets of bot and human accounts, enabling them to learn discriminating features and generalize this knowledge to new instances [17, 31]. Unsupervised learning techniques, such as clustering and anomaly detection, can identify bot-like behavior based on deviations from normal user patterns, without requiring labeled data. Hybrid approaches combine the strengths of both supervised and unsupervised methods to achieve more robust and accurate detection.

Feature Engineering: Key to Unlocking Bot Behavior.

Feature engineering, the process of selecting and transforming raw data into meaningful features for machine learning algorithms, is crucial for the success of bot detection. In the Russian-language context, feature engineering requires careful consideration of linguistic nuances, such as morphology, syntax, and vocabulary [18, 27]. Additionally, features specific to the Kazakh context, such as the use of Kazakh loanwords and references to local events, can be leveraged to enhance bot detection accuracy.

Researchers have explored various feature categories for bot detection, including content-based features (e.g., linguistic patterns, sentiment analysis), metadata-based features (e.g., account age, posting frequency), and network-based features (e.g., interactions between accounts, follower networks) [20, 21, 22, 29]. By

combining these diverse features, machine learning models can achieve higher accuracy in distinguishing bots from humans.

Natasha NLP: A Powerful Ally in the Fight Against Bots.

The Natasha NLP library, a powerful tool for processing and analyzing Russian text, plays a pivotal role in bot detection in the Kazakh context. Its capabilities, including tokenization, lemmatization, morphological analysis, and named entity recognition, enable researchers to extract meaningful features from Russian-language social media data, facilitating the development of more effective bot detection models.

Several studies have already demonstrated the potential of Natasha NLP for identifying linguistic patterns characteristic of bot-generated content, such as repetitive language, grammatical errors, and lack of emotional expression [25]. By combining Natasha NLP with other tools and resources, such as sentiment analysis and network analysis tools, researchers can create more comprehensive and accurate bot detection systems.

The Road Ahead: Challenges and Opportunities.

The fight against social media bots in Kazakhstan is an ongoing endeavor. The evolving nature of bot technology, the linguistic complexities of the Russian language, and the unique cultural and political context of Kazakhstan pose significant challenges for bot detection efforts. However, machine learning and natural language processing tools like Natasha NLP offer promising avenues for addressing these challenges and safeguarding the integrity of online communication in Kazakhstan.

Future research should focus on refining feature engineering techniques, developing more adaptable and robust machine learning models, and exploring the potential of hybrid approaches that combine different detection methods. Additionally, efforts to raise public awareness about the threat of social media bots and to educate users on how to identify and counter their influence are crucial for fostering a healthier and more transparent digital ecosystem in Kazakhstan.

СПИСОК ЛИТЕРАТУРЫ:

1. FB Newsroom. (2023). FB Reports Fourth Quarter and Full Year 2022 Results. https://investor.fb.com/home/default.aspx;

2. Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96-104;

3. Woolley, S. C., & Howard, P. N. (2018). Computational propaganda worldwide: Executive summary. Oxford Internet Institute;

4. Bastos, M. T., & Mercea, D. (2017). The Brexit botnet and user-generated hyperpartisan news. Social Science Computer Review, 37(1), 38-54;

5. Varol, O., Ferrara, E., Davis, C., Menczer, F., & Flammini, A. (2017). Online human-bot interactions: Detection, estimation, and characterization. In Eleventh international A A A I conference on web and social media;

6. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., & Tesconi, M. (2017). The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proceedings of the 26th international conference on world wide web companion (pp. 963-972);

7. Stella, M., Ferrara, E., & De Domenico, M. (2018). Bots increase exposure to negative and inflammatory content in online social systems. Proceedings of the National Academy of Sciences, 115(49), 12435-12440;

8. Karataev, M., & Mustafina, D. (2020). The use of social media in Kazakhstan: A case study of the 2019 presidential election. Central Asian Survey, 39(2), 229-247;

9. Magomedova, M. M., & Sadykova, A. D. (2021). The role of social media in the formation of public opinion in Kazakhstan. Central Asian Journal of Social Sciences and Humanities, 2(3), 8592;

10. Howard, P. N., Woolley, S., & Calo, R. (2018). Algorithms, bots, and political communication in the US 2016 election: The challenge of automated political communication for election law and administration. Journal of Information Technology & Politics, 15(2), 81-93;

11. Bessi, A., & Ferrara, E. (2016). Social bots distort the 2016 US Presidential election online discussion. First Monday, 21(11-7);

12. Subrahmanian, V. S., Azaria, A., Durst, S., Friedland, V., Melo, C., & Obradovic, Z. (2016). The DARPA TW bot challenge. Computer, 49(6), 38-46;

13. Chu, Z., Gianvecchio, S., Wang, H., & Jajodia, S. (2012). Detecting automation of TW accounts: Are you a human, bot, or cyborg?. I E E E Transactions on Dependable and Secure Computing, 9(6), 811-824;

14. Yang, K. C., Varol, O., Davis, C. A., Ferrara, E., Flammini, A., & Menczer, F. (2020). Arming the public with artificial intelligence to counter social bots. Human Behavior and Emerging Technologies, 2(1), 48-61;

15. Abdusalamova, U. (2021). Digital authoritarianism in Central Asia: A case study of Kazakhstan. Problems of Post-Communism, 68(6), 449-461;

16. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., & Tesconi, M. (2015). Fame for sale: Efficient detection of fake TW followers. Decision Support Systems, 80, 56-71;

17. Morstatter, F., Wu, L., Nazer, T. H., Carley, K. M., & Liu, H. (2016). A new approach to bot detection: striking the balance between precision and recall. In Proceedings of the 2016 I E E E/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 533-540);

18. Yang, K. C., Varol, O., Davis, C. A., Ferrara, E., Flammini, A., & Menczer, F. (2020). Arming the public with artificial intelligence to counter social bots. Human Behavior and Emerging Technologies, 2(1), 48-61;

19. Loureiro, D., Rocha, L. M., & Camacho-Collados, J. (2019). Cross-lingual bot detection: Assessment of feature importance and language-independent approaches. In Proceedings of the International A A A I Conference on Web and Social Media (Vol. 13, No. 1, pp. 304-313);

20. Kosterich, A.V. (2020). Social bots in the Russian-speaking segment of TW: typology and detection. In Proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts (pp. 278-290);

21. Chavoshi, N., Rezapour, R., & Mohsenzadeh, M. (2016). Automatic detection of cyberbullying on social networks. In 2016 8th International Symposium on Telecommunications (IST) (pp. 431-436). I E E E;

22. Ferrara, E., Flammini, A., & Menczer, F. (2014). Social media and political participation: The role of social networks and the use of new information technologies. In The Oxford handbook of internet studies;

23. Ferrara, E. (2017). Disinformation and social bot operations in the run up to the 2017 French presidential election. First Monday, 22(8);

24. Silveira, R. F., & Gamon, M. (2019). Natasha: A tool for extracting named entities from Russian text. arXiv preprint arXiv:1905.09640;

25. Moiseev, F. (2019). Detecting Russian social bots using a deep learning approach. In Proceedings of the 2019 I E E E International Conference on Big Data (pp. 1522-1529);

26. Kornienko, D., & Voitovich, A. (2020). The use of machine learning for the detection of bots in Russian social media. In Proceedings of the 2020 International Conference on Data Science and Advanced Analytics (pp. 495-504). I E E E;

27. Baldwin, T., Bannard, C., Hanley, J., & Oepen, S. (2018). The second shared task on language identification in code-switched data. In Proceedings of the second workshop on computational approaches to code switching (pp. 45-55);

28. Loureiro, D., Rocha, L. M., & Camacho-Collados, J. (2019). Cross-lingual bot detection: Assessment of feature importance and language-independent approaches. In Proceedings of the International A A A I Conference on Web and Social Media (Vol. 13, No. 1, pp. 304-313);

29. Kosterich, A.V. (2020). Social bots in the Russian-speaking segment of TW: typology and detection. In Proceedings of the 10th International Conference on Analysis of Images, Social Networks and Texts (pp. 278-290);

30. Chavoshi, N., Rezapour, R., & Mohsenzadeh, M. (2016). Automatic detection of cyberbullying on social networks. In 2016 8th International Symposium on Telecommunications (IST) (pp. 431-436). I E E E;

31. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., & Cimino, A. (2020). Social Fingerprinting: detection of spambot groups through DNA-inspired similarity analysis. I E E E Transactions on Dependable and Secure Computing, 18(3), 1324-1339

DETECTING DIGITAL DECEPTION: MACHINE LEARNING'S ROLE IN EXPOSING SOCIAL MEDIA BOTS Текст научной статьи по специальности «СМИ (медиа) и массовые коммуникации»

Аннотация научной статьи по СМИ (медиа) и массовым коммуникациям, автор научной работы — Sharipbay U.Z.

Похожие темы научных работ по СМИ (медиа) и массовым коммуникациям , автор научной работы — Sharipbay U.Z.

Текст научной работы на тему «DETECTING DIGITAL DECEPTION: MACHINE LEARNING'S ROLE IN EXPOSING SOCIAL MEDIA BOTS»