It job offer analysis through data mining

It job offer analysis through data mining

This article presents the main reasons which led to starting a research on the IT labour market. This research is going to be made in the second semester of 2005 and is financially sustained by the Romanian Ministry of Education and Research through CNCSIS grant no. 1464.

Prof. Constanta Nicoleta Bodea,

Academia de Studii Economice din



Key words: labour market, text mining, data mining, educational system, educational programs

Once with the exponential development of IT technologies, the labour market belonging to this domain has known an important growth. Though in full expansion, the IT labour market is already significant among the other domains due to the increasing request for specialists in information technology.

Due to the rapid evolution rhythm of hardware and software technologies, the requests for professional competences in IT is also very dynamic through the increasing requested professionalism degree and through the continuingly increasing diversification of specializations in this industry sector. The educational system must identify these changes and permanently adapt in order to offer the graduates real chances of integration in economic life.

The dynamics of IT labour market generate a series of malfunctions which are specific to Romanian environment, in two main directions. On one hand, the increasing of exactingnesses and of requested professionalism degree generate a „competences export" through the phenomenon of well trained youth migration towards countries in which, for the same requests, they are getting paid much more better. On the other hand, the increasing diversification of specializations requested by the employees leads to a lack of well trained personnel especially in the new domains of IT.

The government program (2005-2008) attaches a great importance to politics in educational domain ([3]). The government strategy undertakes the achievement of a high quality education and preparation of the society based on knowledge, the transformation of the education in basic resource for Romania's modernization and also the institutional development of the permanent education. The Romanian Government will assure a high quality of education through approaching education as main force of changing technology, economy, administration and promoting values within society, through adapting the educational system to the needs of the local, national and European labour market, through re-establishing the continual training system for the teaching staff and scholar managers, through launching the program upon the education requested by firms and companies, through re-correlating the continual and initial training of the teaching staff, implementing a syllabus that

1. Introduction

includes new didactic methods and scholar pedagogies within the development of an education system centered on pupil and student and oriented towards the formation of cognitive and action capacities. The compliance of University scientific research with the needs of technology, economy and administration and stimulating the competence on the free market of textbooks and educational means in order to increase their quality represent means through which the government seeks for the improving of the educational process.

The tendency shown in government strategy (2005-2008) towards attaching a great importance to educational requests of employees on the labour market, also as the rapid development of Romanian websites which contain job offers, determined our research team to consider making a research on the Romanian on-line labour market through text mining and data mining techniques. www.munca.ro, hosted by Neogen, has a database containing over 20.000 subscribed employees, over 45.000 job seekers and over 1000 active job offers ([7]). During one month, this site is visited by over 2000 guests who visualize over 20.000 job offers. www.bestjobs.ro hosted by the same company contains over 3000 active job offers among which over 700 are in Software and technologies field ([4]).

The existence of this on-line labour market and, more, the perspective of its development in time, justify the research through text mining and data mining methods, on the similarity between the job offers on the labour market and the educational offers of public and private universities. The great amount of IT labour offers compared to other domains' labour offers allows collecting enough data in order to refine the analysis in this domain.

2. Objectives, phases and activities

The main objective of this research is represented by the analysis of the information provided by electronic labour markets, in order to identify solutions of improving the process of professional education of IT specialists. The main phases of the research are (Fig. 1):

1. Designing and achievement of the analysis process of job offers, through text mining and data mining techniques

a. Establishing data sources, respectively identifying websites which present a relevant jobs offer

b. Identifying data required for the analysis

c Identifying preprocessing requirements

d. Identifying text mining and data mining algorithms to be applied on the data set

e. Selecting results evaluation methods

f. Programming text mining algorithms

g. Preprocessing

h. Knowledge extraction

i. Results evaluation.

2. Designing and achievement of analysis process of academic educational programs in IT of the universities with specializations in IT and identifying solutions for improving the educational

Fig. 1 Phases of analysis process

Text collections are represented by databases containing job offers consisting of job description, employee requests and data regarding the employee, data of posting and expiration date etc. These collections are analyzed, in the first phase, through text mining techniques in order to identify the main clusters of jobs in correlation to main requested key competences and to job title. This analysis facilitates the achievement of a normalized database through finding keywords or significant values for each attribute. On this database we are going to apply data mining algorithms, based on symbolic and neural calculus. For this reason, text mining analysis represents a preprocessing phase of data provided by the text collections. Sub-collections of text achieved through text mining will be manually analyzed in order to determine and document information categories which will be contained into the database on which we are going to apply data mining algorithms. Database shall have the following structure: Job code Job category Company code - City Job domain Job type Job description Education level



Personal abilities

Driving license

Offered salary

Foreign languages

The text mining algorithms will be programmed by the research team in Visual C++. The data mining process will be achieved through analysis tools provided by WEKA Software and Matlab Neural Networks toolbox.

The proposed goal for the neural network approach is to predict the demand for the main core competences, associated with the most requered job categories, based on the evolution of the job offers. To achieve this goal we'll choose a feed-forward network architecture for each of three categories of jobs. Each network wil have the following architecture:

- an input layer, with 5 units; 4 input units signify the number of posted jobs for the analyzed category in the last fourth decades and one input unit signifies the number of posted jobs for the analyze category in the same decade of the last year;

- an output layer, with one processing element, signifying the level of the sales for the analyzed category of products in the current month;

- a hidden layer, necessary for the network training, with 5 processing elements (the number of neurons in the hidden layer is a result from several empirical design methods).

In order to realize the simulation of the job offers for the analyzed categories, we'll use the MATLAB Neural Networks toolbox. In the training phase for each network we'll use 36 training sets, which containing as desired outputs the jog offers for 36 decades and the 5 inputs corresponding to each desired output. The type of the activation functions of the neurons will be sigmoid logistic, either for neurons in the hidden layer, or for neurons in the output layer.

Through applying data mining algorithms we intend to reveal interesting patterns into data. Being the purpose of this analysis, we will focus mainly on finding job patterns oriented on some key competences also as on identifying competences which are specific to some main jobs clusters. In this manner, academic curricula can be improved by including into the first cycle (degree specialization) of the disciplines related to determined key competences and by including into the second cycle (master degree specializations) of the disciplines related to competences which are adjacent to key competences, for the main job clusters.

3. Conclusions


