Научная статья на тему 'APPLICATIONS OF NATURAL LANGUAGE PROCESSING IN ECONOMICS AND FINANCE'

APPLICATIONS OF NATURAL LANGUAGE PROCESSING IN ECONOMICS AND FINANCE Текст научной статьи по специальности «Экономика и бизнес»

CC BY
301
111
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ОБРАБОТКА ЕСТЕСТВЕННОГО ЯЗЫКА / բնական լեզվի մշակում / ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ / արհեստական բանականություն / ФИНАНСЫ / ֆինանսներ / ЭКОНОМИКА / տնտեսագիտություն / АНАЛИЗ ОТЧЕТНОСТИ / ոչ կառուցվածքային տվյալներ / ПРИНЯТИЕ РЕШЕНИЙ / հաշվետվությունների վերլուծություն / որոշումների կայացում

Аннотация научной статьи по экономике и бизнесу, автор научной работы — Karamyan Tigran

Natural Language Processing (NLP) is used to extract desired patterns from unstructured data and to convert the raw data into actionable insights. This capability serves as the foundation for all AI-human interactions in the financial sector. AI systems can gather, analyze, warn, and anticipate data thanks to NLP starting from customer care to risk prevention. This article represents some important applications of NLP in different fields of economics and finance. It is aimed to show how NLP can be beneficial for the industries. The following topics(issues) are discussed here: · how NLP can be helpful when predicting financial trends with stochastic time series, · why financial statements analysis is extremely helpful in accounting, · how to gain the general idea of the content without diving into much details and how it affects one’s decision making process (on the example of FED (Federal Reserve System)), · detection of language (style) change in financial reports (statements) on the example of FOMC (Federal Open Market Committee), · why words can be treated like triggers in finance and what the benefits are. The work is written by using scientific abstraction and a combined examination of different modern applications of the methodology. The level of reliability and validity of the sources through their comprehensive study have been verified. This article substantiates the fact that the limitations of the unstructured data can be overcome by using AI - NLP. Human and artificial intelligence, when skillfully mixed, can lead to better investment decisions and risk management and reducing human error is critical and can help expose the hidden intentions (here: trends, expectations, movements, etc.) of unstructured data.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «APPLICATIONS OF NATURAL LANGUAGE PROCESSING IN ECONOMICS AND FINANCE»

ԳԻՏԱԿԱՆ ԱՐՑԱԽ SCIENTIFIC ARTSAKH НАУЧНЫЙ АРЦАХ № 4(11), 2021

APPLICATIONS OF NATURAL LANGUAGE PROCESSING IN ECONOMICS AND FINANCE*

UDC 336, 004.5 DOI: 10.52063/25792652-2021.4-231

TIGRAN KARAMYAN

Yerevan State University,

Faculty of Economics and Management Lecturer, Ph.D. Student,

Yerevan, the Republic of Armenia t. qaramyan@.ysu.am

Natural Language Processing (NLP) is used to extract desired patterns from unstructured data and to convert the raw data into actionable insights. This capability serves as the foundation for all AI-human interactions in the financial sector. AI systems can gather, analyze, warn, and anticipate data thanks to NLP starting from customer care to risk prevention.

This article represents some important applications of NLP in different fields of economics and finance. It is aimed to show how NLP can be beneficial for the industries. The following topics(issues) are discussed here:

• how NLP can be helpful when predicting financial trends with stochastic time series,

• why financial statements analysis is extremely helpful in accounting,

• how to gain the general idea of the content without diving into much details and how it affects one’s decision making process (on the example of FED (Federal Reserve System)),

• detection of language (style) change in financial reports (statements) on the example of FOMC (Federal Open Market Committee),

• why words can be treated like triggers in finance and what the benefits are.

The work is written by using scientific abstraction and a combined

examination of different modern applications of the methodology. The level of reliability and validity of the sources through their comprehensive study have been verified.

This article substantiates the fact that the limitations of the unstructured data can be overcome by using AI - NLP. Human and artificial intelligence, when skillfully mixed, can lead to better investment decisions and risk management and reducing human error is critical and can help expose the hidden intentions (here: trends, expectations, movements, etc.) of unstructured data.

Keyword: natural language processing, artificial intelligence, finance, economics, unstructured data, statement analysis, decision making process.

Introduction. Natural Language Processing (NLP) enables econometric analyses based on information included in user reviews, employee evaluations, surveys, news, and other textual sources. With this kind of information, there are no more constraints for economists to examine those phenomena where measurements on the relevant elements are already accessible, generally through thorough and often expensive or difficult-to-

* Հոդվածը ներկայացվել է 17.11.2021թ., գրախոսվել' 26.11.2021թ., տպագրության ընդունվել' 25.12.2021թ.:

231

ԳԻՏԱԿԱՆ ԱՐՑԱԽ SCIENTIFIC ARTSAKH НАУЧНЫЙ АРЦАХ № 4(11), 2021

arrange research investigations or by accident of existing datasets. The capacity to deal with textual data allows economists to investigate a wider range of issues and gives a plethora of information to mine in the expanding context of online documents and behavior. Reviews and other writings may be used to examine counterfactuals (what could have been). The ability to conduct what-for studies when experiments are impossible to do is extremely important. When controlled trials are feasible but their capacity to imitate real-world settings is questioned, textual analysis can assist in determining how realistic those trials are.

It is worth mentioning that NLP can reveal whether sentiments, information, tone, or themes are associated with specific results. These hints can be utilized to build testable hypotheses about which factors influence decisions and how. When NLP technologies are paired with appropriate models, they may occasionally yield solid proof of textual beliefs that predict important events. It allows to measure information transmission on numerous characteristics of a product, person or event. NLP can offer the measures needed to determine what function the messenger of specific sorts of information plays. For example, we can investigate if evaluations written in a more informal tone are more persuasive to younger people. Even in the absence of studies, NLP tools allow us to explore relationships between information, who it originates from, and what other circumstances might magnify or lessen its influence.

Applications of NLP. In the field of finance, the vast majority of models has traditionally been based on predominantly numeric data inputs. Nevertheless, the later advancements in NLP has given the opportunity to effectively join numeric and textual data when analyzing financial questions such as stock price predictions, interest rate predictions etc. This is a true game-changer for financial markets as most of the decisions finance specialists are making heavily depend on the financial news texts they are bombarded with on a daily basis. Therefore, the value of processing the financial news texts cannot be understated. One approach Pratyush Muthukumar and Jie Zhong mention in their article “A Stochastic Time Series Model for Predicting Financial Trends using NLP” (Muthukumar and Zhong) is the way these hybrid models are constructed. Specifically, the outcomes/results of the sentiment analysis based on the textual data processing are directly utilized as an input to the numeric model. The authors consider this method as a novelty in the sphere. It is worth mentioning that they have proposed a novel architecture that applies Naive Bayes’ sentiment analysis to financial news texts and uses the learned representations alongside financial numerical data to train a Generative Adversarial Network (GAN) for time-series prediction of stock prices. Hybrid modes like this are of a high value as they are able to account for factors that would not be otherwise considered due to the lack of perspective. Therefore, being able to not only watch the numbers talk but look a closer look at the “talk” itself gives even a better understanding of what is happening and what is likely to happen next.

As mentioned above, financial analysis suffers from solely using quantitative methods which often do not provide a good approximation of reality. Such deficiencies affect not only the investment decisions of portfolio managers who are trying to predict a specific trend or stock price but also analysts who base their decisions on the scrutinization and thorough analysis of financial statements of each company of interest. The latter provides them with credit risk analysis which can then be useful in both financing and investing decisions. In particular, a lot of companies are entitled by law to publicize periodic financial statements. These statements contain a lot of numeric data related to the periodic performance of the company as well as some notes and disclosures which further explain the structure and calculations expressed in numeric terms. Besides, the textual data contains sections such as Management’s Discussion and Analysis (MDA). The latter textual data serve as basis for tone analysis and sentiment

232

ԳԻՏԱԿԱՆ ԱՐՑԱԽ SCIENTIFIC ARTSAKH НАУЧНЫЙ АРЦАХ № 4(11), 2021

analysis of the company. Further combination of this analysis with the accounting choices made by the company can give a better understanding of the future prospects of the entity. The need for a qualitative tool in financial statement analysis has risen from the flexibility accounting standards provide the companies with, which later results in figures that often do not faithfully depict the economic reality of the company. For example, under IFRS (International Financial Reporting Standards) there are three acceptable methods a company can use to account for depreciation of assets. Depreciation, being considered in the balance sheet of the company affects the carrying amount of inventory and PPE (property, plant and equipment) therefore the total assets and any other ratios calculated based on it. Besides, depreciation is also accounted for in the income statement through the cost of goods sold (inverse relationship), hence it affects the gross profit as well as the net income and any other profitability measures of the company. This is just one example of how flexibility of accounting methodology creates an opportunity for manipulation of numeric information. Other examples include valuation of assets through fair value vs historical cost models, capitalizing vs expensing the R&D costs as well as differences arising from various accepted accounting standards in different jurisdictions. Hence, the need to process the textual data accompanying financial statements (Ravula).

Over the course of the history of world economic development, economies of different countries and regions have become more and more interconnected and dependent. Nowadays, one of the biggest economies that affects most of the world is the U.S. economy. Several of the world financial crises have roots in the U.S. and it is evident how financial health of the U.S. economy is so vital for the world as a whole. Consequently, the U.S. economy is constantly under the radar of numerous analysts who try to gain an understanding of possible future economic developments in the U.S. and make data-driven conclusions based on these predictions. In particular, the Federal Reserve System (FED) of the U.S. has a significant role in shaping global monetary and financial conditions. Therefore, any news, reports, communications and documentation published by FED are key indicators for all of the players of the bigger financial system. Especially, when it comes to FOMC (Federal Open Market Committee) meetings and decisions that FOMC makes regarding the Federal Funds Rate, financial markets make very timely and strong reactions to whatever decision has been made. There are numerous tools in use for analyzing FOMC meetings and recently NLP has joined the toolkit. In the article “FedNLP: An interpretable NLP System to Decode Federal Reserve Communications” (Lee et al.) Jean Lee and others have used a multi-component NLP system which gives end-users the ability to gain a general idea of the course of the meeting and ability to make conclusions based on this information without the need to go through the lengthy and complex communication language of Fed communications. It is worth emphasizing that the value of the work lays on the simplicity by which the finding of the analysis is presented to the end-users. It is also important to mention that the end-users are assumed to have a background in finance and no programming skills. This quality is achieved by the extensive use of graphical interface as a means of communication of the analysis carried out. Another feature that makes the analysis more holistic is the use of different forms of communications such as reports, post-meeting minutes, members’ speeches, press releases and others. The main objective of the analysis is to predict the direction of change of FFR (Federal funds rate). Some of the methods in use are sentiment analysis, summarization, topic modeling and others. And for making predictions, the best performing model was the XGBoost with TF-IDF features which achieved an accuracy of 0.73 and F1 score of 0.66.

There are numerous papers scrutinizing FOMC statements and the Federal Reserve System itself has initiated conferences and researches on this topic in recent years. One article that is of a particular interest is named “Hanging on every word:

233

ԳԻՏԱԿԱՆ ԱՐՑԱԽ SCIENTIFIC ARTSAKH НАУЧНЫЙ АРЦАХ № 4(11), 2021

Semantic analysis of the FOMC's post meeting statement” (Acosta and Meade). The analysis was conducted based on the FOMC statements from 1999 to 2014. The article describes how the language of FOMC statements have changed throughout this period of time. Nowadays, an untutored eye would not see much difference in language from one statement to the other. In fact, not many words are changed in the main text of the statement. Nevertheless, much has changed in the formulation of the statements since the beginning of post meeting statements. A more detailed analysis conducted by the authors suggests that the semantic persistence of the statements differ much more than it might seem. While the early statements are written at a reading grade level of 9 to 14 years of schooling, the more recent statements are written at a reading grade level of three years beyond a 4-year college degree. The main methods used in the analysis are semantic similarity of FOMC statements (cosine similarity), TF-IDF methodology and other common NLP methods. One distinct feature of the analysis is the more thorough description analysis of the FOC statement following the 2008 Global Financial Crisis. Various NLP methods show how the 2008 December FOMC statement is so dissimilar from the receding statements.

Another intriguing article on the latter topic has recently been published by the Kansas City Federal Reserve Bank: “How You Say It Matters: Text Analysis of FOMC Statements Using Natural Language Processing” (Doh et al.). The analysis is quite unique in nature as are the findings of it. In particular, it is concluded that the quantitative decision regarding the target policy rate is as important as the assessment of the risk in the economy. Besides, the analysis identifies the tone of a post-meeting statement by quantifying how close it is semantically to alternative versions of statements, whose more or less accommodative tones can be determined based on the rationale given for each alternative in FOMC documents. The recent COVID crisis comes to prove the points made in the statement. Now we live in times when each day market participants are looking for clues to when exactly tapering will start and when exactly the Fed will raise the target policy rate. The ones who will be able to correctly predict the right moments will at least not bear the costs of it, if not benefit from it vastly. Starting from the first signs of the end of COVID recession, the Fed has been bombarded with questions regarding the start of the tapering and rate hikes, in response to which the public only gets qualitative information containing words like “sufficient levels”, “substantially improved” and so on. Therefore, now more than ever Fed-watchers are looking for clues to the Federal Reserve’s communications via natural language processing methodologies.

Conclusion. As to conclude this research raises important questions about how NLP can be used to transform large quantities of text into valuable insights for financial analysts, portfolio managers and economists as the data at this vast amount is far beyond human capabilities to process. The applications of NLP discussed in this article show that, as with robotic process automation, human error reduction is very important and can help to reveal threats or good intentions concealed in unstructured data.

Works Cited

1. Muthukumar, Pratyush and Zhong, Jie. «A Stochastic Time Series Model for Predicting Financial Trends using NLP», 2021, arXiv:2102.01290

2. Ravula, Sridhar. «Text analysis in financial disclosures», 2021, arXiv:2101.04480

3. Lee, Jean, et al. «FedNLP: An interpretable NLP System to Decode Federal Reserve Communications», 2021, arXiv:2106.06247

4. Acosta, Miguel and Meade Ellen. «Hanging on every word: Semantic analysis of the FOMC's post meeting statement», Washington: Board of Governors of the Federal Reserve System, 2015, https://doi.org/10.17016/2380-7172.1580

234

ԳԻՏԱԿԱՆ ԱՐՑԱԽ SCIENTIFIC ARTSAKH НАУЧНЫЙ АРЦАХ № 4(11), 2021

5. Doh, Taeyoung, et al. «How You Say It Matters: Text Analysis of FOMC Statements Using Natural Language Processing», Kansas, 2021, vol. 106, no. 1, pp. 25-40

ԲՆԱԿԱՆ ԼԵԶՎԻ ՄՇԱԿՄԱՆ ԿԻՐԱՌՈՒԹՅՈՒՆՆԵՐԸ ՏՆՏԵՍԱԳԻՏՈՒԹՅԱՆ ՄԵՋ ԵՎ ՖԻՆԱՆՍՆԵՐՈՒՄ

ՏԻԳՐԱՆ ՔԱՐԱՄՅԱՆ

Երևանի պետական համալսարանի տնտեսագիտության և կառավարման ֆակուլտետի դասախոս և ասպիրանտ, ք. Երևան, Հայաստանի Հանրապետություն

Բնական լեզվի մշակումը (NLP) օգտագործվում է ոչ կառուցվածքային տվյալներից ցանկալի օրինաչափություններ ստանալու և չմշակված տվյալները կիրառելի դարձնելու համար: Այս հնարավորությունը հիմք է հանդիսանում

արհեստական բանականության և մարդու բոլոր փոխազդեցությունների համար ֆինանսական ոլորտում: AI համակարգերը կարող են հավաքել, վերլուծել,

նախազգուշացնել ու կանխատեսել տվյալներ NLP-ի շնորհիվ՝ սկսած հաճախորդների սպասարկումից մինչև ռիսկերի կանխարգելում:

Հոդվածը ներկայացնում է NLP-ի որոշ կարևոր կիրառություններ տնտեսագիտության ու ֆինանսների տարբեր ոլորտներում: Նպատակն է

ներկայացնել, թե ինչպես NLP-ն կարող է օգտակար լինել տնտեսության համար: Հոդվածում քննարկվում են հետևյալ թեմաները (խնդիրները).

• ինչպես NLP-ն կարող է օգտակար լինել ստոխաստիկ ժամանակային շարքերով ֆինանսական միտումները կանխատեսելիս,

• ինչու է ֆինանսական հաշվետվությունների վերլուծությունը չափազանց օգտակար հաշվապահության մեջ,

• ինչպես ստանալ բովանդակության ընդհանուր պատկերացում՝ առանց մանրամասների մեջ խորանալու, և ինչպես է այն ազդում մարդու՝ որոշումների կայացման գործընթացի վրա (Դաշնային ռեզերվային համակարգի (FED)^ օրինակով),

• ֆինանսական հաշվետվությունների շարադրման լեզվի (ոճի) փոփոխության հայտնաբերում FOMC-ի (Դաշնային բաց շուկայի կոմիտեի) օրինակով,

• ինչու բառերը կարող են հանդես գալ որպես ֆինանսական խթաններ և որոնք են օգուտները:

Աշխատանքը գրված է գիտական վերացարկման և մեթոդաբանության տարբեր ժամանակակից կիրառությունների համակցված քննության միջոցով, և դրանց համակողմանի ուսումնասիրության միջոցով ստուգվել է աղբյուրների հավաստիության ու վավերականության մակարդակը:

Աշխատանքը հիմնավորում է այն փաստը, որ ոչ կառուցվածքային տվյալների սահմանափակումները կարելի է հաղթահարել՝ օգտագործելով AI՝ NLP: Մարդկային ու արհեստական ինտելեկտը երբ հմտորեն միաձուլվում են, կարող են հանգեցնել ավելի լավ ներդրումային որոշումների ու ռիսկերի կառավարման, իսկ մարդկային սխալի նվազեցումը կարևոր է և կարող է օգնել բացահայտելու ոչ կառուցվածքային տվյալների թաքնված միտումները (այստեղ՝ միտումներ, ակնկալիքներ, շարժումներ և այլն):

Հիմնաբառեր՝ բնական լեզվի մշակում, արհեստական բանականություն, ֆինանսներ, տնտեսագիտություն, ոչ կառուցվածքային տվյալներ, հաշվետվությունների վերլուծություն, որոշումների կայացում։

235

ԳԻՏԱԿԱՆ ԱՐՑԱԽ SCIENTIFIC ARTSAKH НАУЧНЫЙ АРЦАХ № 4(11), 2021

ПРИМЕНЕНИЕ ОБРАБОТКИ ЕСТЕСТВЕННОГО ЯЗЫКА В ЭКОНОМИКЕ И ФИНАНСАХ

ТИГРАН КАРАМЯН

преподаватель и аспирант факультета экономики и финансов Ереванского государственного университета, г. Ереван, Республика Армения

Обработка естественного языка (NLP) используется для извлечения желаемых закономерностей из неструктурированных данных и преобразования необработанных данных в полезные идеи. Эта возможность служит основой для всех взаимодействий AI и человека в финансовом секторе. Системы искусственного интеллекта могут собирать, анализировать, предупреждать и прогнозировать данные благодаря NLP, начиная с обслуживания клиентов и заканчивая предотвращением рисков.

В этой статье представлены некоторые важные применения NLP в различных областях экономики и финансов. Цель статьи - показать, как NLP может быть полезным для промышленности. В статье рассматриваются следующие темы (вопросы):

• как NLP может быть полезным при прогнозировании финансовых тенденций с помощью стохастических временных рядов,

• почему анализ финансовой отчетности чрезвычайно полезен в бухгалтерском учете,

• как получить общее представление о содержании, не вдаваясь в подробности, и как это влияет на процесс принятия решений (на примере РБР(Федеральная резервная система),

• обнаружение изменения языка (стиля) в финансовых отчетах на примере FOMC (Федеральный комитет по открытым рынкам),

• почему слова можно рассматривать как триггеры в финансах и в чем заключаются преимущества.

Работа написана с использованием метода научной абстракции и комбинированного изучения различных современных приложений методологии, а уровень надежности и достоверности источников был проверен путем их всестороннего изучения.

В этой статье обосновывается факт, что ограничения неструктурированных данных можно преодолеть с помощью AI - NLP. При умелом сочетании человеческий и искусственный интеллект может привести к более эффективным инвестиционным решениям и управлению рисками, а сокращение человеческих ошибок имеет решающее значение и может помочь выявить скрытые намерения (здесь: тенденции, ожидания, движения и т.д.) в неструктурированных данных.

Ключевые слова: обработка естественного языка, искусственный интеллект, финансы, экономика, анализ отчетности, принятие решений.

236

i Надоели баннеры? Вы всегда можете отключить рекламу.