Научная статья на тему 'A NOVEL METHOD FOR EXTRACTING TEXT FROM NATURALSCENE IMAGES AND TTS'

A NOVEL METHOD FOR EXTRACTING TEXT FROM NATURALSCENE IMAGES AND TTS Текст научной статьи по специальности «Языкознание и литературоведение»

CC BY
51
9
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
TEXT DETECTION / TEXT EXTRACTION / VISUALLY IMPAIRED / NATURAL SCENE TEXT EXTRACTION / TEXT TO SPEECH

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Khamdamov Utkir Rakhmatillaevich, Mukhiddinov Mukhriddin Nuriddin Ugli, Mukhamedaminov Aziz Odiljon Ugli, Djuraev Oybek Nuruddinovich

Extracting text character from natural scene images is a challenging difficulty due to variations in text form, font, dimension, orientation, alignment, and complicated environment. The text information available in images and video includes specific valuable data for content-based information indexing and retrieval, symbol translation and intelligent driving assistance. In natural scene text extraction, nearby character grouping and character stroke orientation methods are performed to examine for natural scene image regions of text strings. We proposed a novel system which is converting obtained a text and the extracted text data into audio. Firstly, we detect an object of interest region from natural scene images after that identify text area. Then, we extract text information and then text recognition method is applied. Finally, we translate text into audio by using TTS techniques in order to assist visually impaired.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «A NOVEL METHOD FOR EXTRACTING TEXT FROM NATURALSCENE IMAGES AND TTS»

Section 6. Information technology

Khamdamov Utkir Rakhmatillaevich, Ph D., docent of Department of Hardware and Software of Control Systems in Telecommunication Tashkent University of Information Technologies named after Muhammad al-Khwarizmi, Republic of Uzbekistan

E-mail: utkir.hamdamov@mail.ru Mukhiddinov Mukhriddin Nuriddin ugli, senior researcher, Department of Hardware and Software of Control Systems in Telecommunication Tashkent University of Information Technologies named after Muhammad al-Khwarizmi, Republic of Uzbekistan

E-mail: mmuhriddinm@gmail.com Mukhamedaminov Aziz Odiljon ugli, senior researcher, Department of Hardware and Software of Control Systems in Telecommunication Tashkent University of Information Technologies named after Muhammad al-Khwarizmi, Republic of Uzbekistan E-mail: azizusmonov1992@gmail.com

Djuraev Oybek Nuruddinovich, senior researcher, Department of Hardware and Software of Control Systems in Telecommunication Tashkent University of Information Technologies named after Muhammad al-Khwarizmi, Republic of Uzbekistan

E-mail: djuraev@bk.ru

A NOVEL METHOD FOR EXTRACTING TEXT FROM NATURAL SCENE IMAGES AND TTS

Abstract: Extracting text character from natural scene images is a challenging difficulty due to variations in text form, font, dimension, orientation, alignment, and complicated environment. The text information available in images and video includes specific valuable data for content-based information indexing and retrieval, symbol translation and intelligent driving assistance. In natural scene text extraction, nearby character grouping and character stroke orientation methods are performed to examine for natural scene image regions of text strings. We proposed a novel system which is converting obtained a text and the extracted text data into audio. Firstly, we detect an object of interest region from natural scene images after that identify text area. Then, we extract text information and then text recognition method is applied. Finally, we translate text into audio by using TTS techniques in order to assist visually impaired.

Keywords: Text detection, text extraction, visually impaired, natural scene text extraction, text to speech.

Introduction video retrieval. In natural scene images, text attributes and lines

Extracting text from pictures or videos is an outstanding usually appear in nearby signboards and hand-held objects and

difficulty in numerous applications like image indexing, docu- provide important awareness of the surrounding environment

ment processing, image understanding, video content summary, and objects. The natural scene images regularly endure from low

quality and low resolution, perspective distortion and complicated background [1]. The natural scene text is tough to identify, extract and recognize as it can seem with any gradient, angle, in any brightness, upon any surface and may be partly blocked. Many methods for text detection from natural scene images have been introduced recently. To extract text information by cameras from the natural scene, automatic and efficient scene text detection and recognition techniques are required. The main enrichment of this paper is combined with the proposed two recognition designs. Firstly, a character descriptor is proposed to extract characteristic and discriminative highlights from character pieces. It combines several feature detectors (HarrisCorner, Maximal Stable Extremal Regions (MSER), and dense sampling) and Histogram of Oriented Gradients (HOG) descriptors [5]. Secondly, to produce a binary classifier for each character class in text retrieval; we propose an original stroke configuration from the character frame and outline to design character structure. The proposed technique merges scene text detection and scene text recognition methods.

Related work

Modern optical character recognition (OCR) methods can perform almost absolute recognition rate on printed text in scanned documents, but cannot correctly recognize text data straight from camera-captured natural scene images and videos. Lu et al. [3] represented the inner character structure by establishing a dictionary of basic pattern codes to achieve character and word retrieval without OCR on scanned documents. Coates et al. [5] extracted local features of character patches from an unsupervised learning technique associated with an alternative of K-means clustering and merged them by cascading sub-patch features. In [8], a complete performance

evaluation of natural scene text character recognition was carried out to produce a discriminative feature representation of natural scene text character structure. Weinman et al. [7] combined the Gabor-based appearance model, a language model related to simultaneity frequency and character case, similarity model, and lexicon design to perform scene character recognition. Neumann et al. [1] proposed a real-time scene text localization and recognition approach based on outer regions Smith et al. [2] built a similarity method of scene text characters based on SIFT, and maximized posterior probability of similarity constraints by integer programming.

Proposed method

Text detection and recognition utilized to detect text in complicated background natural scene images. In order to achieve the main goal it takes text image as an input image and then implementing preprocessing techniques on it to eliminate noise from an image by converting color image to gray image. In the following step, the binarization method is applied that helps to efficient and accurate text classification from an image which is input to OCR. In the preprocessing step, if some part of text information will lose them by thinning and scaling is performed by the connectivity algorithm. Then we obtain connected text character from an image. Then the text recognition is accomplished. The proposed method is divided into three stages. First and second stages are text detection, text recognition to the image and the third stage is text character recognition. The text detection uses to quickly extract text region in images with a very less false positive rate. To provide the recognition for the accurate result we proposed a method to test the text image is segmented, considering a different number of classes in the image each time.

Figure 1. The flowchart of the proposed method

It is a demanding dilemma to automatically localize objects and text Region of Interest (ROI) from captured natural scene images with cluttered environments because the text

in natural scene images is most assuring surrounded by different surroundings outlier noise and text characters usually appear in different fonts and colors [11]. For the content ori-

entations, this thesis assumes that text lines in scene images keep in the region of vertical locations. Several techniques have been developed for localization of text area in natural scene images. We can arrange them into categories component based and operations based. To identify and extract text from complex backgrounds with multiple and variable text patterns, here introduce a text localization method that combines region-based layout analysis and horizontal-based text classifier preparation, which determine characteristic graphs based on stroke orientations and boundary arrangements. To generate representative and discriminative text emphasizes to decide text characters from environment outliers.

Object of Interest Detection:

The Object of Interest (OOI) detection steps searches to detect the existence of content in a camera captured natural scene images. Because of the various font, highlights, different cluttered background image, and demeaning accurate and instantaneous text detection in scene images is a still challenging task. The procedure utilizes a character descriptor to fragment text from an image. Initially, content is identified in multi-size images using an edge based system, morphological function and production summary of the image. These detected text regions are then verified using descriptor and wavelet features. The algorithm is powerful when a diversity in style, size of font and color. Vertical edges with a predefined pattern are applied to detect the edges, then arranging vertical boundaries into text area using a filtering process. The text data normally appears in text lines composed of several character features in similar sizes rather than single character, and text strings are usually in approximately horizontal alignment [12]. Text detection and extraction:

To detect and extract the text from camera captured natural scene images. Carefully extract the text from nearby obj ect caught by the visually impaired from the complex conditions. Text finding is used to get text including image region then text classification to transform image-based knowledge into readable text. This step indicates to classify the characters as they are in the original input image. This is performed by multiplying resultant value with a binary transformed new image. The ultimate result is the white text in black background dependent on the new image. Text characters consist of strokes with constant or variable orientation as the basic structure. Here, we introduce a novel variety of stroke orientation, feature to describe the local structure of text attributes [13].

From the pixel-based analysis, stroke direction is vertical to the slope orientations at pixels of stroke borders. To model the text structure by stroke orientations, we propose a new operator to map a gradient feature of strokes to per pixel. It increases the local structure of a stroke edge into its neighborhood by a gradient of orientations.

Text recognition and TTS:

The speaker is to notify the user of extracted text codes in the type of speech or audio. Any kind of speaker can be applied for speech output. Text extraction is achieved by the modern OCR prior to a crop of beneficial words from the extracted text regions. A text area specifies the minimum rectangular part for the place of lettering within it, so the border of the text region links the edge borderline of the text quality. On the other hand, the current method produces better performance if the text section is first assigned proper margin areas and binaries to segment text characters from the environment. Thus each restricted text part is enlarged by enhancing the height and width by 10 pixels respectively. We test both open and closed basis solutions exist that have APIs that allocate the ending stage of translation to letter codes. The recognized text codes are recorded in script files. Then we develop the TTS to load the script files and perform the audio output of text data.

Conclusion and Future work

We have introduced a technique of natural scene text extraction from detected text areas, which is compatible with PC based and mobile applications also extracted text is converted into audio. This method reads the text information in the objects and informs visually impaired users about the extracted text information. It detects text region from natural scene image and extracts text information from the detected text regions. In image text detection, analysis of color decomposition and horizontal alignment is conducted to examine for image regions of text strings. This system can efficiently differentiate the obj ect of interest from the complex background or other objects in the camera vision. Nearby character grouping is implemented to determine competitors of text patches prepared for text classification. An Adaboost training model is employed to surround the text in camera-based images. Text extraction is applied to produce word identification on the localized text regions and convert into the audio output for visually impaired users. To model text character structure for text retrieval design, we have designed a different stroke configuration map, feature representation, based on boundary and skeleton.

References:

1. Epshtein B., Ofek E., and Wexler Y. "Detecting text in natural scenes with stroke width transform", in Proc. CVPR, 2010.-P. 2963-2970.

2. Beaufort R. and Mancas-Thillou C. "A weighted finite-state framework for correcting errors in natural scene OCR", in Proc. 9th Int. Conf.Document Anal. Recognit., 2007.- P. 889-893.

3. Chen X., Yang J., Zhang J., and Waibel A. "Automatic detection and recognition of signs from natural scenes", IEEE Trans. Image Process.,- Vol. 13.- No. 1. 2004.- P. 87-99.

4. Coates et al., "Text detection and character recognition in scene images with unsupervised feature learning", in Proc. ICDAR, 2011.- P. 440-445.

5. Dalal N. and Triggs B. "Histograms of oriented gradients for human detection", in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2005.- P. 886-893.

6. Campos de T., Babu B., and Varma M. "Character recognition in natural images", VISAPP, 2009.

7. Smith R. "An overview of the tesseract OCR engine", in Proc. Int. Conf. Document Anal. Recognit. 2007.- P. 629-633.

8. Lucas S. M., Panaretos A., Sosa L., Tang A., Wong S., and Young R. "ICDAR2003 robust reading competitions", in Proc. Int. Conf. Document.

9. Zheng Q, Chen K., Zhou Y., Cong G., and Guan H. "Text localization and recognition in complex scenes using local features", in Proc. 10th ACCV, 2010.- P. 121-132.

10. Khamdamov Utkir. "Algorithms for parallel bitmap image processing based on the Haar wavelet", // International Conference on Information Science and Communications Technologies (ICISCT), - Nov, 2017.

11. Hongchan Yoon, Baek-Hyun Kim, Mukhiddinov Mukhriddin, Jinsoo Cho. "Salient Region Extraction based on global contrast enhancement and saliency cut for image information recognition of the visually impaired" // KSII Transac.on internet and infor. Systems - Vol. 12.- No. 5.- May, 2018.

12. Utkir Khamdamov, Hakimjon Zaynidinov. "Parallel Algorithms for Bitmap Image Processing Based on Daubechies Wavelets", // 10th International Conference on Communication Software and Networks (ICCSN), 2018.- P. 537-541.

i Надоели баннеры? Вы всегда можете отключить рекламу.