АКТУАЛЬНЫЕ ПРОБЛЕМЫ АНАЛИТИЧЕСКОЙ ТЕКСТОЛОГИИ
УДК 81'33
А. В. Джунковский
аспирант кафедры прикладной и экспериментальной лингвистики Института прикладной и математической лингвистики факультета английского языка МГЛУ; e-maiL: [email protected]
СТЕГАНОГРАФИЯ: МЕТОД ТРЕХСТУПЕНЧАТОГО АНАЛИЗА ДЛЯ ПИСЬМЕННЫХ ТЕКСТОВ РУССКОГО ЯЗЫКА
В статье рассматривается стеганоаналитический метод раскрытия смысловой информации с помощью лингвистических методов в письменных текстах русского языка на основе выявленной типологии стеганографических методов. Наш трехступенчатый метод стеганоанализа включает стадии мета-анализа текста, непосредственную стадию лингвистического анализа, а также стадию контекстуального экстралингвистического анализа семантических элементов текста. Метод позволяет добиться высокой вероятности выявления признаков сокрытия смысловой информации лингвистическими методами в письменных текстах русского языка. Метод трехступенчатого стеганоанализа был выработан нами на основе синтеза и совершенствования результатов типологического анализа методов стеганоло-гии отечественных и зарубежных ученых. Выработанный метод на данный момент используется в наших исследованиях по экспериментальному выявлению эффективности отдельных лингвистических методов стеганографии для письменных текстов русского языка. Актуальность статьи обусловлена необходимостью выработки эффективных методов стеганоанализа для письменных текстов русского языка. В настоящее время существует некоторое количество работ по стеганоанализу звучащей речи, однако исследование стеганоаналитических методов для письменных текстов русского языка остается темой, требующей освещения.
Ключевые слова: стеганография; стеганоанализ; контейнер; шифр; уязвимость стеганографических методов; восприятие человеком; методология стеганоанализа.
Dzhunkovskiy A.V.
Postgraduate Student, Department of Applied and Experimental Linguistics, Institute of Applied and Mathematical Linguistics, Faculty of the English Language, MSLU; е-maR: [email protected]
STEGANOGRAPHY: THREE-STAGE ANALYSIS METHODOLOGY APPLIED TO RUSSIAN WRITTEN TEXTS
In this paper we present our findings concerning the typoLogy of various steganaLyticaL means applicable for the purpose of uncovering information in Russian written texts hidden through means of linguistic steganography. Our threefold steganalysis method includes three major stages: meta-analysis of the text, linguistic anaLysis proper, and contextuaL extraLinguistic anaLysis of the semantic eLements of the text. The resuLts the method yieLds aLLow for high probabiLity of detecting signs of conceaLment of semantic information through Linguistic means in written Russian texts. We have created and conceptualized the three-stage analysis methodology on the bases of analyzing, synthesizing and improving upon the results of typological analysis of steganological methods carried out by foreign and domestic scientists. The incepted method is currently used in our further experimental research with the goal of determining the efficiency of various linguistic steganographic methods for the purpose of conceaLing information in Russian written texts. The reLevance of the present paper is based on the need of developing effective steganoanalytical methods for written Russian texts. Presently, only such studies of steganalysis exist that deal with spoken language, and there is little research that deals with steganalysis and steganography in written Russian texts. We present this methodology as basis for investigating the efficiency of various methods of steganography and steganalysis and simultaneously a tool for conducting practical work on analyzing and concealing information in written Russian texts using linguistic methods.
Key words: steganography; steganalysis; container; cipher; steganographic method vulnerability; human perception; steganalysis methodology.
I. Introduction
Contemporary advancements in computational technologies brought about a noteworthy phenomenon. Сryptography quickly gained relevance and is now slowly becoming suboptimal as a means of achieving the goal of confidential information protection. The reason for this is the fact that cryptography and cryptanalysis, the practice of unauthorized decryption, have come to depend on automation. The very nature of cryptography dictates that encryption must follow strict, if often obscure and unintuitive, mathematical patterns. Thus, both encryption and decryption of information by crypto-means is nowadays a question of computational power and algorithmization. While there are many practical applications to crypto methods the usefulness of which cannot be denied, the evolution of these methods has made them nigh unusable for those who do not possess the necessary technology and are still in need of discreetly transferring confidential information through unsecure channels [Potapova 2010, p. 120].
А. В. Джунковский
The aforementioned information is crucial for the purpose of understanding the significance of steganography and steganalysis, which we collectively refer to as steganology. Steganography is by its very nature highly protected from automatic and computer-aided attempts of successful attacks (term used for steganalytical and cryptanalytical attempts of gaining unauthorized access to protected information) and requires a human expert [Алферов и др. 2012, p. 67]. Steganalysis of large volumes of information is extremely prohibitive as it involves high costs to maintain a large number of highly skilled experts. This makes steganology a promising field of research.
We focus our efforts on linguistic (or text) methods of steganography in Russian language. While the field of steganology is underdeveloped as of now, this statement is even truer for linguistic steganography [Bennet 2013, p. 15]. When it comes to research limited to text steganography on the materials of Russian language, no papers or studies of note are to be found as of yet. That being said, a sufficient, albeit small number of papers dedicated to steganographic means in other languages [Wayner 2002, p. 78], as well as such means in spoken Russian texts exist, granting us an opportunity to develop a steganographic and steganalytical methodology for Russian written texts.
Furthermore, having established that steganographic methods are heavily resistant against automatic attacks [Лукьянов 2005, p. 7], we undertook to focus on analyzing the efficiency of various steganographic methods against visual attacks carried out by human experts.
II. Methods
The main methods of developing the three-stage steganalysis methodology were analysis and synthesis using the scarce contemporary achievements of experts in steganography and steganalysis working with various languages [Xiang, Sun, Luo 2014, p. 1894] to develop classifications of steganalytical [Katzenbeisser 2000, p. 87] and steganorgraphic [Wayner 2005, p. 163] methods. No such work has been undertaken for Russian steganography applied to written text before now. Furthermore, the existing stego methodologies based on these typologies are, as a rule, untested experimentally [Encyclopedia of Cryptography... 2014] and vary vastly from each other. In view of this, we have synthesized the existing typologies in order to create both steganalytical and steganographic methodologies for encrypting and decrypting information in Russian written texts.
III. Results
We discovered that no classification of methods of visual steganalysis and methods of steganography aimed at counteracting visual attacks existed. Thus, we have created such classifications.
In general, their idea is as follows: for visual steganalysis, we separate the existing methods into three types, which correspond to the steps in the methodology.
The first is the metanalytical step during which the format and design elements of the text are examined. Step two is the main linguistic analysis. Finally, step three is the extralinguistic analysis, during which the contents of the message are checked against objective reality.
The classification of methods of steganography applicable to written Russian text is symmetrical to the steganalytical classification. The first category is the alteration of metaelements, the second includes linguistic alterations, and, finally the third - extralinguistic alterations.
In this manner, our classification efforts have yielded sufficient basis for having developed two distinct methodologies for steganalysis and steganography correspondingly.
The three-stage steganalytical method we have developed is as follows:
1. Text design and formatting analysis (The meta-analysis stage)
a. Letter, word, and paragraph line spacing size analysis;
b. Vertical alignment analysis of words an letters on the same line;
c. Font size, color and type analysis;
d. Line spacing analysis;
e. Text and individual text elements inclination analysis;
f. Individual grapheme feature analysis (especially relevant to handwritten text);
g. Grapheme connections analysis (When dealing with handwritten text).
2. Text linguistic features analysis (Linguistic analysis proper)
a. Spelling peculiarities analysis;
b. Peculiarities on the morpheme level;
c. Peculiarities on the lexical level;
d. Peculiarities on the syntactic level;
e. Usus errors analysis on all language levels;
f. Stylistic features analysis;
g. Diachronic usus consistency analysis;
h. Content-analysis.
A. В. flwyHKOBCKUu
3. Extra-linguistic text peculiarities analysis (context and semantic analysis stage)
a. Comparative analysis of the given text with other text by the same author;
b. Analysis of the consistency of the message with the objective reality;
c. Analysis of the consistency of particular minutiae in the text to objective reality;
d. Analysis of the adherence of the text style to the social hierarchy status (when cognizant of the identities of communicating parties, their social statuses and peculiarities of interpersonal communication).
This three-stage steganalytical method provides human experts with strong methodological basis for conducting their work. We shall now move forward to the discussion of the presented methodology.
IV. Discussion
The three-stage steganalytical method allows for a thorough analysis of written text in Russian when searching for signs of alterations indicative of information hidden within. It allows one to consider the text in three distinct ways. First, as a physical object in the stage of meta-analysis, the text as a linguistic product in the second stage of linguistic analysis proper, and, finally, as a part of the larger context and in connection with it.
Of course, the presented methodology based on our typology can hardly be considered universal and final at this stage, as we foresee necessity for alteration and further development. That being said, the present results have proven to be rather useful in conducting experiments aimed at assessing the efficiency of various steganographic containers.
The first stage of the analysis deals with the meta-elements and is the only stage where human eye may fail to see certain peculiarities evident during automatic steganalysis. With this in mind, we conclude that at the first stage, it is prudent to combine manual and automated means of analysis. A question may arise as to why the involvements of the human eye can be considered beneficial in analyzing these aspects. Result proofing aside, at the present stage automated steganalysis of handwritten text is a question of future advancements. The state of OCR, let alone steanalysis of handwritten text, is insufficient to be used as a standard procedure. Therefore, the need for visual human expert analysis arises.
The second stage of the analysis is something that linguists, medical and criminal experts have been working on for a considerable amount of time. As such, our main contribution here is simply synthesizing all of the distinct elements of the analysis that already exist into one comprehensive linguistic analysis technique.
The final stage of the analysis deals with the connection of the text and its contexts to objective reality, the identities of the author and the addressee, and of general and specific validity of presented information.
The main downside of the present methodology is the same that all steganalytical methods share: in best case scenarios, the expert may only draw conclusions on the possibility that certain text peculiarities may contain a hidden message within. Another downside is the fact that this methodology has not been tested in other languages or in spoken Russian texts. We suspect that large alterations are required for our methodology to function in these cases.
Nevertheless, we have grounds to believe our methodology to be one of the most promising existing methods of steganalysis usable for Russian written texts.
V. Conclusions
The presented three-stage analysis method creates basis for analyzing the efficiency of individual means of steganalysis. The incepted method is currently used in our further research where we employ the presented methodology as a means to classify various steganalysis techniques during experimental work on determining viability of various alterations in written Russian texts as containers for concealing information.
While we remain cautious about the prospects and ready to improve upon the methodology, it has already proven to be fruitful as a tool for conducting further research internally.
The core of the methodology allows one to view the text as three different entities simultaneously: as a physical object, as a self-contained language product, and, finally, within a larger text nexus. As such, it allows to regard a text at a meta-object consisting of a plethora of potential containers where additional information could be concealed.
The nature of our research not only improves the theoretical basis of steganology, but has practical value for both concealing and revealing such information.
А. В. Джунковский
REFERENCES
Алферов А. П. [и др.]. Основы криптографии. М. : Гелиос АРВ, 2012. 480 с.
Лукьянов Г. В. Основы кодирования и криптографического преобразования информации. М. : Сарма, 2005. 128 с.
Потапова Р. К. Речь: коммуникация, информация, кибернетика. М. : УРСС, 2010. 600 с.
Bennet K. Linguistic Steganography: Survey, Analysis, and Robustness Concerns for Hiding Information in Text // CERIAS Tech Report. 2013. № 4. P. 1-30.
Encyclopedia of Cryptography and Security. Second Edition // van Tilborg H.C.A., Jajodia S. Berlin : Springer Science & Business Media, 2014. 1416 p.
Katzenbeisser S. Principles of Steganography. Boston : Artech House, 2000. 186 p.
Wayner P. Disappearing Cryptography: Information Hiding. San Francisco : Morgan Kaufmann, 2002. 221 p.
Wayner P. Strong Theoretical Steganography. Berlin : Cryptologia, 2005. 410 p.
Xiang L. Sun X., Luo G. Linguistic Steganalysis Using the Features Derived from Synonym Frequency // Multimedia Tools and Applications. 2014. № 3. P. 1893-1911.