ВЗАИМОСВЯЗЬ МЕЖДУ ВЕБ-СТРАНИЦАМИ: ДИЗАЙН И ПОРЯДОК

Амадаев А.А.; Дасаев Д.Р.

ИНФОРМАЦИОННЫЕ И КОММУНИКАТИВНЫЕ

ТЕХНОЛОГИИ

УДК 004.7

Амадаев А.А., к.э.н.

доцент

кафедра «Гуманитарные, естественнонаучные

и социальные дисциплины» медицинский институт Дасаев Д.Р. аспирант

ФГБОУ ВО «Чеченский государственный университет»

Россия, г. Грозный ВЗАИМОСВЯЗЬ МЕЖДУ ВЕБ-СТРАНИЦАМИ: ДИЗАЙН И

ПОРЯДОК

Аннотации: в течение нескольких лет с ее начала, машинизации (WWW) резко вырос в члены пользователей, серверов и их географическое распределение. Он возник как форма коммуникабельная, которые люди используют, чтобы создать, поддержать, и стать частью общины. Эти технологии впервые позволили людям из всех категорий, таких как экономическое положение, возрастов и профессий, чтобы чувствовать себя открыл новый современный мир. Действительно, развитие Интернет-технологий привнесло удобство и выгода в мире.

Ключевые слова: интернет-технологии, дизайн, веб-страница, анализ, коммуникации.

Amadaev А.А.

Ph. D., associate Professor of the Department "Humanities, natural and social disciplines" medical Institute Of the "Chechen state University " in Grozny

Dasaev D.R.

Graduate student of FGBOU VO "Chechen state University", Grozny.

^TWORKS OF CON^CTIVITY BЕTWЕЕN WЕB PАGЕS: DESIGN AND PROCEDURE Annotations: In Ше fеw уеаге sinœ its bеginning, Ше WorldWidеWеb (WWW) hаs grown drаmаticаlly in Ше тетЬег of шега, sеrvеrs, аnd its gеogrаphicаl distribution. It hаs аrisеn аs sociаblе form ^t реор1е ше to crеаtе, uphold, аnd Ьеште pаrt of communis. Thеsе tеchnologiеs for Ше first timе аllowеd pеoplе of Ше а11 cаtеgoriеs such аs еconomic stаtus, аgеs аnd bаckgrounds to fееl opеnеd to Ше nеw modеrn world. Indееd, Ше dеvеlopmеnt of Intеrnеt tеchnologiеs hаs brought convеniеncе аnd bеnеfits to Ше world.

Keywords: Internet technology, design, web page, analysis, communication.

1. Reading web pages

To read web pages Python has a library called urllib which makes getting data from the internet quite straightforward. The urllib makes it very easy to retrieve data from web pages and process the data as it returns the page as a string variable. It offers a simple interface in the form of urlopen function which reads the entire HTML of the web page into a single string. It is able of fetching URLs using a variety of different protocols. We simply indicate the web page we would like to retrieve and urllib handles all of the HTTP protocol details [1].

After successfully executing the code in the Fig.2 we get the web page as a

String firstPage = urlopen(currURL).read()

Fig.2 Code snippet for reading a web page

Once the web page is opened with urllib.urlopen we can treat it like a file and read through it. When the program runs we only see the output of the the content of the file and urllib code returns the data to us [2].

2. Regular expressions

One efficient way to parse data from web pages is to use regular expressions. Python provides sufficient means for performing the process of searching and extraction of strings that match particular pattern. To match and extract the link values from the web page we have built regular expression as follows:

## perform string matching with regular expressions

## "Find the title of the web page by looking for <title> tag

patFinderTit1e = re.compile( ' <titLe>(. *)</titLe>')

find all the links in the j^gi^jS^gg, by looking for the anchor tag pafFinderLink = re . compile ( . - SS»r{3}?\. ?(. *?) " >

Fig.3. Regular expressions for extracting links from web pages.

Our regular expression looks for strings that start with "href = http://" followed by characters like ".+?". The question mark added to the ".+?" indicates that the match is to be done in generous way, which tries to find the smallest possible matching string. Parentheses in provided above code indicate which part of our matched string we would like to extract [3].

The compile pattern method gives us all of the matches for a pattern. The provided code in Fig.3 shows the pattern which can be used to match all the links with another links in a given input string where input string is web page. It contains internal and external links. However, we are interested only in external links. The following figure shows code which filters only the external links by checking if the full URL of the link contains the domain name of the page which currently being analysed [4]:

Fig.4. Code snippet for filtering external links

For instance, youtube.com web page has a link youtube.com/inbox. The

most obvious way to classifying it as internal or external link is the described above method. Youtube.com is the domain name and youtube.com/inbox contains the full domain name which makes it an internal link.

The findall regular expression method shown in the following figure gives us list of all external and internal links and any anchor tags in the page.

findPatTitle = re.findall(patFinderTitle,webpage) findPatLink = re.findali(patFinderLinkjWebpage)

Fig.5 Executing string matching using regular expressions Regular expressions are good when HTML is well-formatted. But if there are some poorly formatted links, using regular expressions can not be enough to solve the problem. The problem can be solved by using robust HTML parsing library.

3. Trees

The proposed research is based on building a web scrapper in Python to extract data from web pages in order to compare the number of links to number of pages and find the efficiency and the quality of connectivity between web pages. The search engine is developed using recursive tree traversal method with depth-first search algorithm. Tree traversal method used in many real life applications like searching, sorting, encryption and etc [5].

To retrieve the data stored in the tree structure there are three types of tree traversal algorithms:

Preorder tree traversal algorithm Inorder tree traversal algorithm Postorder traversal algorithm

Binary tree can be traversed through the preorder algorithm using three

steps:

Step1: Visit the root node

Step2: Traverse the left subtree of the node in preorder Step3: Traverse the right subtree of the node in preorder According to (Chandra Mohan 2008) preorder tree traversal algorithm is a backtracking procedure in which the visit begins at the root node and then descends through its left branch recursively through all left most nodes until the visit ends at the leftmost bottom node. If there is leftmost bottom node, then the visit backtracks to its immeditate proceding level, and then descends though the right branch from there. This process is recursively applied until the rightmost bottom node is visited. We can summarize by saying that in this traversal method the root node is visited first and the rightmost bottom is visited last [6].

Fig. 6. The nodes in this tree visited by using preorder algorithm is listed below: F - B - A - D - C - E - G - I - H

Inorder tree traversal algorithm traverses binary tree by using the following three steps.

Step1: Traverse the left sub-tree of the node in inroder Step2: Visit the root node

Step3: Traverse the right sub-tree of the node inorder In binary tree shown in Fig.7, the leftmost bottom node is visited first (A). Then it is immediate preceding node (B) and then the left node (C) of its immediate right node (D) will be visited , and then the node (D) is visited. Then the order will be to E and then to A. And the remaining order is: F - G - H - I . So, the complete result is:A-B-C-D-E-F-G-H-I

i f j

b ; g

a d

Fig. 7. Binary tree (inorder traversal)

Binary tree traverse through the postoder algorithm using following three

steps.

Step1: Traverse the left sub-tree of the node in postorder Step2: Traverse the right sub-tree of the node in postorder Step3: Visit the root node

Fig.8 Binary tree (postorder traversal)

The leftmost buttom node(A) in the binary tree shown in Fig.8 is visited first. Then the left node (C) is visited, then its immediate right node (E) is visited, the their parent node (D) will be visited, and then node (B). This process continues until the root node (F) is included at the end of the traversal [7].

Postorder result fot the binary tree in Fig.8: A - C - E - D - B - H - I - G -

F

In my research to extract data from web pages is used the recursive tree traversal method with depth-frist preorder search algorithm. Depth-first search algorithm is a technique which has been widely used in finding solutions to problems in artificial intelligence. Basically depth-first search is algorithm which searches a tree. Prehn and Toetenel (1991) point that depth-first search explores a path all the way going deeper and deeper before backtracking and fainding another path [8].

In this proposed research to extract data from web pages this algorithm works by visiting web sites and extracting the external links from them. It recursively visits number of external links of given link in the first list. Moreover, the search engine allows us to keep track of the number of the external links in the visited URL's and list of distinct web sites we have visited. We can determine how many links we are visited in order to evaluate the functionality and efficiency of the web site. The written programme also can keep track of dead links we encountered and track of links to web sites we have already visited in order to determine the reliability of the web page. The dead link is a link on the web page that points to another webpage that is permanently unavailable or does not work. No one wants dead links on their page. Nothing makes the person to leave the page faster than encountering a dead link. The web site with dead links often are considered to be unprofessional and unreliable.

HP

Fig.9 Illustration of Depth - first searching algorithm for web sites, where HP - home page, EL - external links. The numbers indicate the order of traversal.

From the Fig.9 above it is clearly seen that the search engine at first visits the home page of the web site, then goes through all the left most external links until the visit ends at the leftmost bottom exernal link. Then the visit backtracks to its immeditate proceding level, and then descends though the right branch from there. This process is recursively applied until the rightmost bottom external link is visited. We can summarize by saying that in this traversal method the home page of the web site is visited first and the rightmost bottom external link is visited last [9].

4. Analysis

1 lomo prgc i\ I

| ___ ---

i j \j

BT 2 \ __

a—\ " _!J

\ ItlNKUtllttn \ Wlrtlti

1 il 1 ™ ■ 1

Totwl n ta m la or of link« m «J ......... Mfc

Fig.10 Table showing the topology of the resulting analysis tree

To analyse the structure of the network between web sites we introuduce the notion of steps and width. Steps represent how deep traversal goes during the scrapping process. On Fig.10 it can be seen the a given page can have many external links but maximum width constant defines how many of them will be

accessed. To investigate the network connectivity between web pages it is useful to calculate the average number of links for each step of the analysis. The following formula on Fig.11 is used to calculate the average number of external links for ith step.

T^i—l, where n - total number of links and i € [1 ..k], i - number

of step.

The following figure shows how the above formula is implemented programmatically.

## calculate the^ave-age^nuirbe-^cf exte-nal links for cu--ent step cf analysis

Fig.11 Code snippet for calculation of average number of links.

5. Displaying results

Matplotlib is the most widely used library for high-quality plotting, with support for a wide array of 2D and 3D plot types, precise layout control, a built-in LaTeX typesetting engine for label equations, and publication-quality output in all major image formats.4 The Chaco library is often used for building graphical interactive interfaces that tightly couple 2 D data visualization to user controls. For high-end data visualization in three dimensions, Mayavi6 provides both a rich GUI to the powerful Visualization ToolKit (VTK) libraries and an easy to use Python library. As Prabhu Ramachandran and Gael Varoquaux describe in "Mayavi: 3D Visualization of Scientific Data" on page 40, Mayavi wraps much of VTK's complexity in high-level objects that are easy to use for common tasks and directly support NumPy arrays. Finally, the Visit (https://wci.llnl.gov/ codes/visit) and ParaView (www.paraview.org) projects provide comprehensive visualization systems with parallel rendering support and rich feature sets that users can control and extend in Python (Wang and Hawk, 2011).

From the variety of available visualization software matplotlib was chosen as an optimal solution [10].

## initialize variables required for plotting of graphs bins = ma*(listriuinOf Links) t = arange(0j bins-l-1)

s = []

## calculate the number of pages having a given number of external links for element in t:

s.append £listPlumOfLinks.count(element))

## plot the previously calculated list ax - subplot fill) ax.plot(tjS)

xlabel{'Number of exirernat Links") ylabel ( 'Number of

title ( 'Number of against number of exrternaL ¿infrs i.n them")

## display the plot showf)

Fig.12 Code snippet for setting up and executing matplotlib visualization procedures.

The Fig.12 shows the process required to visualize the number of web sites against number of links in them. Description of code lines in Fig.12 is provided in comments.

References:

1. Sarawagi, S. 2008. Information extraction. Found. Trends databases 1, 3, 261377.

2. Summerfield, M. (2009). Programming in Python 3 (2nd ed.). Addison-Wesley Professional. ISBN 978-0321680563, pp. 36-37.

3. Tanaka, M. and Ishida, T. (2010). Ontology extraction from tables on the web. In SAINT '06:Proc. of the International Symposium on Applications on Internet. IEEE Computer Society, Washington, DC, USA, 284{290.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

4. Wang, P., Hawk, W. B., and Tenopir, C. (2011). Users' interaction with world wide web resources: an exploratory study using a holistic approach. Inf. Process. Manage. 36, 229-251.

5. Weikum, G. (2009). Harvesting, searching, and ranking knowledge on the web: invited talk. In WSDM '09: Proc, of the Second ACM International Conference on Web Search and Data Mining. ACM, New York, NY, USA, 3-4.

6. Winograd, T.(2010). Understanding Natural Language. Academic Press, Inc., Orlando, FL, USA.

7. Zanasi, A. (2009). Competitive intelligence through data mining public sources. In Competitive intelligence review. Vol. 9. Wiley, New York, NY, ETATS-UNIS (1990-2009) (Revue), 44-54.

8. Zhai, Y. and Liu, B. (2009), "Web data extraction based on partial tree alignment", In WWW '05: Proc. of the 14th international conference on World Wide Web. ACM, NY, 76{85.

9. Zhai, Y. and Liu, B. (2010), "Structured data extrac tion from the web based on partial tree alignment", IEEE Trans. on Knowl, and Data Eng. 18, 12, 1614-1628.

10.Zhao, H. (2010). "Automatic wrapper generation for the extraction of search result records from search engines." Ph.D. thesis, State University of New York at

Binghamton, Binghamton, NY, USA, Adviser-Meng, Weiyi.

УДК 338

Амирагян Л.М. студент 2 курса магистратуры институт управления в экономических, экологических и

социальных системах Пантыкин Д.С. студент 2 курса магистратуры институт компьютерных технологий и информационной безопасности Инженерно-Технологическая Академия ЮФУ

Россия, г. Таганрог

ВЫБОР CMS СИСТЕМ ДЛЯ ЭЛЕКТРОННОЙ КОММЕРЦИИ

Аннотация

В статье рассмотрены популярные системы управления Интернет-магазинами, приведена сравнительная характеристика для каждой системы..

Ключевые слова: CMS, Magento, WooCommerce, PrestaShop, бизнес, электронная коммерция, интернет-магазин.

Keywords: CMS, Magento, WooCommerce, PrestaShop, the Internet, business, electronic commerce, online shop.

Собираясь открыть свой Интернет-магазин, перед каждым предпринимателем встает вопрос как поступить: выбрать готовый вариант либо нанять команду разработчиков и ждать завершения проекта, так как в условиях современных реалий рынка второй вариант не всегда конкурентоспособен потому что высокая стоимость разработки IT-продукта является ограничивающим фактором для сегментов малого бизнеса и начинающих предпринимателей.

В то же время вариант с коробочными решениями очень привлекателен, так как он предоставляет широкий функционал, поддержку от разработчиков и легкость в освоении данного продукта. Рассмотрим самые популярные системы управления магазинами: Magento, PrestaShop и WooCommerce.

Если предприниматель ориентируется на бесплатное готовое решение, которое позволило бы полностью самостоятельно без услуг разработчиков, изменять и дополнять магазин, опираясь на необходимый минимум модулей выбор бы пал на PrestaShop.

Из достоинств данной системы можно выделить разнообразие модулей (как платных, так и бесплатных), понятная документация, стабильная работа при медленном соединении с Интернетом,. PrestaShop не обделена, к сожалению, недостатками.

Много разработчиков создают модули для PrestaShop и каждый из них работает в своем неповторимом стиле из-за этого архитектура CMS

ВЗАИМОСВЯЗЬ МЕЖДУ ВЕБ-СТРАНИЦАМИ: ДИЗАЙН И ПОРЯДОК Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Амадаев А. А., Дасаев Д. Р.

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Амадаев А. А., Дасаев Д. Р.

NЕTWORKS OF CONNЕCTІVІTY BЕTWЕЕN WЕB PАGЕS: DESIGN AND PROCEDURE

Текст научной работы на тему «ВЗАИМОСВЯЗЬ МЕЖДУ ВЕБ-СТРАНИЦАМИ: ДИЗАЙН И ПОРЯДОК»