Программные продукты и системы /Software & Systems
№ 2 (110), 2015
DOI: 10.15827/0236-235X.109.095-097 Received 05.02.2015
DYNAMIC STRUCTURES IN RELATIONAL DATABASES
A.A. Poltavtsev, Ph.D. (Engineering), Associate Professor, [email protected] (Tver State Technical University, Nikitin Quay 22, Tver, 1 70026, Russian Federation)
Abstract. There is a set of problems in the development of applications for work with databases. The principal cause of these problems is in non-optimized SQL queries and stored procedures. To achieve good productivity, it is necessary to make SQL queries correctly, create (or delete) additional indexes, denormalize a database (in certain cases), shift a part of logic on triggers and stored procedures. It is necessary to maintain agreed algorithms structuring methods, structuring methods of data used in these algorithms and structuring methods (storage schemes construction) for these data in databases based on a relational model. This paper is devoted to generation and a manipulation data structures in relational databases whose components in programming languages are connected by explicit indexes. The author considers a specific, rather simple data structure. It is a linear unidirectional coherent list. Lists as trees are among the most basic data models used in computer programs. In a sense, lists are simple forms of trees because one can think of a list as a binary tree where every left child is a leaf. However, lists also present some aspects that are not special cases of what we have learned about trees. For instance, we shall talk about operations on lists such as pushing and popping that have no common analog for trees.
Keywords: data model, relational database, database design, data abstraction, data type, integrity constraints, data structuring.
There are close analogies between algorithms structuring methods and data structuring methods. As with all analogies, there are distinctions. However comparison of methods for programs structuring and data structuring can explain problems.
An elementary unstructured statement is r-value assignment to a variable. Its corresponding member in data structures is the scalar, unstructured type. Both are the atomic building blocks for composite statements and data types.
The simplest structures obtained using enumeration or sequencing are a compound statement and a structure record. Both consist of a finite (usually small) number of explicitly enumerated components which may be different from each other. If all components are identical, they do not need to be written out individually. For this case we use FOR statement and ARRAY structure to indicate replication by a known finite factor. A choice among two or more elements is expressed by the conditional or the CASE statement and by extensions of record types, respectively.
A repetition by an initially unknown (and potentially infinite) factor is implemented by WHILE and REPEAT statements. The corresponding data structure is the SEQUENCE (file), the simplest kind which allows constructing infinite cardinality.
The question is whether a data structure that corresponds to the PROCEDURE statement in a similar way exists. Naturally, the most interesting and novel property of procedures in this respect is recursion. Values of such a recursive data type should contain one or more components belonging to the same base type, in analogy to a procedure containing one or more calls to itself. Data type definitions should be directly or indirectly recursive as procedures.
The parallelism of corresponding programs and data structures is shown in the Table [1].
Use of data structures corresponding to algorithm simplifies software creation and increases its quality.
The “array”, “record”, and “set” structures were introduced as fundamental data structures [2]. They are called fundamental because they constitute building blocks to form more complex structures. They also ap-
pear most frequently in practice. In these structures definition the range of values assumed by these variables is fixed, as well as their storage pattern. Hence, variables declared in this way are said to be STATIC.
Construction Pattern Program Statement Data Type
atomic element assignment scalar
enumeration compound statement record
repetition (i=1 n) FOR array
choice IF union (variant record)
repetition WHILE sequence
recursion procedure statement recursive data
general graph GOTO structure linked by pointers
However, there are many problems which involve far more complicated information structures. The problems are that not only the values but also data structures change during computation. They are called DYNAMIC structures. Naturally, the components of such structures are at some level of resolution are static, i.e. of one of the fundamental data types.
The characteristic property of dynamic structures which clearly distinguishes them from fundamental structures (arrays, records, sets) is their ability to vary in size. Hence, it is impossible to assign a fixed amount of storage for a recursively defined structure.
In programming languages we use pointers for this purpose. The system itself responds for these pointers.
In databases where a relational model considers data as relations and links are implemented only using explicit data, there is a need to create more general data structures in addition to applied domain data. These general data structures should provide an opportunity to manipulate links to applied domain data.
The problem of mapping complex data structures in a database model is mentioned firstly by RDB founder. Unfortunately, it still has no unequivocal decision [3]. The most informative reviews can be found in [4, 5].
The simplest way to connect (to establish interrelation) the set of elements is arranging them in the form of
95
Программные продукты и системы /Software & Systems
№ 2 (110), 2015
a list or a queue. In this case, each element requires only one connection (link) to address to its descendant.
Knuth [6] is still a fundamental source on list data structures. While it is hard to trace the origins of very basic notions such as “list” or “stack,” the first programming language which used lists as a part of its data model was IPL-V (Newell et al. [7]). Although among the early list-processing languages only Lisp (McCarthy et al. [8]) has survived so far. Lisp, by the way, stands for “LISt Processing”.
The use of stacks in run-time implementation of recursive programs is discussed in detail in Aho, Sethi and Ullman [9]. Aho [10] surveys a number of algorithms involving matching of character strings.
When implementing a storage scheme of a linear unidirectional coherent list in relational database, it is necessary to provide performance of all typical operations with it: construction, insert/removal from a head/ end of the list, an insert/removal of intermediate elements.
Insertion: it is simply to insert an element after the specified one or before it which is more complicated. Removal: to remove an element which is before a specified one or an element which is specified itself that is more complicated. List traverse: the most often executable operation with this structure. It is used when it is necessary to apply the same operation to all list elements. Search of an element with a specified key: it is carried out by consecutive search of elements.
The simplest operation is an insertion of an element into the head of the list. It is necessary to insert an element specifying that its descendant is an element in the head of the list. Given operation is the elementary form of list creation: to create a list it is enough to repeat it a necessary number of times. Thus the order of elements sequence appears inverse to the order of their insertion: the element in a head of the list will be the last inserted. It is often not comprehensible and it is required to insert elements in the end of the list. The drawback of this approach is that the first element is generated by the algorithm distinct from other elements insert algorithm.
In programming languages to work with the list it is offered to have (and to store) an additional pointer in a list head (besides list structure). From these considerations in [11] it is offered to have a field “type” in the list storage table. This field should show whether the given element is the list head or not. The field accepts the following values: the first element of the list is 1, the last element of the list is 2, and the average is 0. If the list has only one element, then the given field value is 3.
In my opinion, an introduction of such a field brings redundancy to a database and it is a violation of relational approach principles. To determine a head of the list it is enough to organize operations with the list, so that the element with the smallest number (or the greatest) will be in a head of the list.
It is also necessary to distinguish two variants of using list structure: a) when the element number is synthetically entered data that is necessary only for implementation and b) when the element number has an applied sense (for example, it should be controlled by the information system programs developer). In first case it
is possible to use autoincremental type for “element number” attribute.
Relations for list structure storage:
a) list_1 ((id int IDENTITY(1,1)NOT NULL (PK), next_id int NULL (FK id),
data char(1) NOT NULL)
b) list (id int NOT NULL (PK), next_id int NULL (FK id), data char(1) NOT NULL)
Further, there are some SQL scripts examples for basic lists operations:
1. List formation by inserting an element in the end of the list:
a)
--1. first element insertion (’c’) insert into list 1 (data) values (’c’);
--2. subsequent element insertion
DECLARE @prev_id int;
select @prev id = MAX(id) from list 1 insert into list 1 (data) values (’d’);
DECLARE @next_id int;
select @next_id = MAX(id) from list_1
UPDATE list_1
set next id = @next id
WHERE id = @prev_id
b)
--1. first element insertion (11,’c’)
--insert into list (id, data) values (11,'c1)
--2. subsequent element insertion (12,’d’)
DECLARE @prev_id int;
SELECT @prev_id = id FROM list WHERE next_id IS NULL
DECLARE @ins_id int;
--SET @ins_id = 12;
insert into list (id,data,next id) values (15,’g’,NULL)
UPDATE list SET next_id = @ins_id WHERE id = @prev id
2. Inserting an element to the head of the list:
a) DECLARE @first_id int;
select @first_id = MIN(id) from list_1 SET IDENTITY_INSERT list_1 ON insert into list 1 (id,data,next id) values (@first_id - 1,’a’,@first_id)
SET IDENTITY_INSERT list_1 OFF
b) DECLARE @prev_id int;
SELECT @prev_id = MIN(id) FROM list DECLARE @ins_id int;
SET @ins_id = @prev_id-1;
insert into list (id,data,next id) values (@ins id,’a’,@prev id)
3. Inserting an element between elements of the list:
a) DECLARE @prev_data char(1);
SET @prev_data = ’e’;
DECLARE @next_data char(1);
SET @next_data = ’f’;
DECLARE @next_id int;
SELECT @next_id = id FROM list_1 WHERE data =
’f’;
insert into list 1 (data,next id) values (’z’,@next id)
SELECT @next_id = id FROM list_1 WHERE data =
’z’;
UPDATE list_1 SET next_id = @next_id WHERE data = ’e’
b) DECLARE @prev_id int;
SET @prev_id = 13;
DECLARE @next_id int;
SET @next_id = 14;
DECLARE @max_id int
select @max_id = MAX(id) FROM list insert into list (id,data,next id) values (@max id+1,’z’,@next id)
UPDATE list SET next_id = (@max_id+1) WHERE id = @prev id
Naturally, it would be better to form these scripts (and other necessary) as stored procedures. This can al-
96
Программные продукты и системы /Software & Systems
№ 2 (110), 2015
low a reliable storage of all functionality in the database server.
Data models are the abstractions used to describe problems. Data structures are the programming-language constructs used to represent data models. Algorithms are the techniques used to obtain solutions by manipulating data as represented by the abstractions of a data model, by data structures. When a data model of the language in which we are writing a program lacks a built-in representation for the data model of the problem at hand, we must represent the needed data model using the abstractions supported by the language. For this purpose, we study data structures which are methods for representing abstractions in the data model of a programming language, they are not an explicit part of that language. Different programming languages may have strikingly different data models.
This paper showed how a relational theory can be applied to clarify the meaning of certain constructs that are widely used to undertake logical modeling. In particular, we focused on the relationship construct which is central to many modeling methodologies, but seemingly difficult to use in a clear and unambiguous way. We considered relational structures and SQL operations on lists which are an important data model representing sequences of elements. Stacks and queues are important special types of lists.
References
1. Wirth N. Algorithms and Data Structures. Prentice/Hall Int. Publ., 1986, 288 p.
2. Cormen T.H., Leiserson Ch.E., Rivest R.L. Introduction to Algorithms. MIT Press and McGraw-Hill Publ., 2009, 1312 p.
3. Date C.J. SQL and Relational Theory, 2nd Ed. O'Reilly Media Publ., 2011, 448 p.
4. Celko J. Trees and Hierarchies in SQL for Smarties. Morgan Kaufman Publ., 2004, 225 p.
5. Poltavtseva М. Derevya i grafy v relyatsionnoy baze dannykh [Trees and graphs in a relational database]. LAP LAMBERT Academic Publ., 2012, 196 p.
6. Knuth D.E. The Art of Computer Programming. Vol. I. Fundamental Algorithms. Addison-Wesley Publ., Reading, Mass, 1968.
7. Newell A., Tonge F.M., Feigenbaum E.A., Green B.F., Mealy G.H. Information Processing Language-VManual. Prentice-Hall Publ., Englewood Cliffs, New Jersey, 1961.
8. McCarthy J. LISP 1.5 Programmer's Manual. MIT Computation Center and Research Laboratory of Electronics Publ., Cambridge, Mass, 1962.
9. Aho A.V., Sethi R., Ullman J.D. Compilers: Principles, Techniques, and Tools. Addison-Wesley Publ., Reading, Mass, 1986.
10. Aho A.V. Algorithms for finding patterns in strings.
Handbook of Theoretical Computer Science. Vol. A: Algorithms and Complexity. J. Van Leeuwen (Ed.), MIT Press, Cambridge, Mass, 1990.
11. Gladkov М., Shibanov S. Complex structures in relational databases. Otkrytye sistemy [Open systems]. 2004, no. 2, pp. 62-67 (in Russ.).
УДК 519.68 Дата подачи статьи: 05.02.15
DOI: 10.15827/0236-235X.110.095-097
ДИНАМИЧЕСКИЕ СТРУКТУРЫ В РЕЛЯЦИОННЫХ БАЗАХ ДАННЫХ
Полтавцев А.А., к.т.н., доцент, [email protected] (Тверской государственный технический университет, наб. Аф. Никитина, 22, г. Тверь, 170026, Россия)
В разработке приложений для работы с базами данных есть ряд проблем. Основная причина проблем приложений баз данных лежит в неоптимизированных SQL-запросах и хранимых процедурах. Чтобы добиться хорошей производительности, нужно правильно составлять SQL-запросы, создавать (или удалять) дополнительные индексы, в определенных случаях денормализовывать базу данных, перекладывать часть логики на триггеры и хранимые процедуры. Необходимо поддерживать согласованными методы структурирования алгоритмов, методов структурирования использующихся в этих алгоритмах данных и методов структурирования (построения схем хранения) этих данных в базах данных, основанных на реляционной модели. Работа посвящена генерации и манипулированию в РСУБД структурами данных, чьи компоненты в языках программирования связаны явными указателями. Рассмотрена специфическая, относительно простая структура данных - линейный однонаправленный связный список.
Ключевые слова: модель данных, реляционная база данных, разработка баз данных, абстракции данных, типы данных, ограничения целостности, структурирование данных.
Литература
1. Wirth N. Algorithms and Data Structures. Prentice/Hall Int. Publ., 1986, 288 p.
2. Cormen T.H., Leiserson Ch.E., Rivest R.L. Introduction to Algorithms. MIT Press and McGraw-Hill Publ., 2009, 1312 p.
3. Date C.J. SQL and Relational Theory, 2nd Ed. O’Reilly Media Publ., 2011, 448 p.
4. Celko J. Trees and Hierarchies in SQL for Smarties. Morgan Kaufman Publ., 2004, 225 p.
5. Полтавцева М. Деревья и графы в реляционной базе данных. LAP LAMBERT Academic Publishing. 2012. ISBN 9783-659-14890-3.
6. Knuth D.E. The Art of Computer Programming. Vol. I. Fundamental Algorithms. Addison-Wesley Publ., Reading, Mass, 1968.
7. Newell A., Tonge F.M., Feigenbaum E.A., Green B.F., Mealy G.H. Information Processing Language-V Manual. Prentice-Hall Publ., Englewood Cliffs, New Jersey, 1961.
8. McCarthy J. LISP 1.5 Programmer’s Manual. MIT Computation Center and Research Laboratory of Electronics Publ., Cambridge, Mass, 1962.
9. Aho A.V., Sethi R., Ullman J.D. Compilers: Principles, Techniques, and Tools. Addison-Wesley Publ., Reading, Mass, 1986.
10. Aho A.V. Algorithms for finding patterns in strings. Handbook of Theoretical Computer Science. Vol. A: Algorithms and Complexity. J. Van Leeuwen (Ed.), MIT Press, Cambridge, Mass, 1990.
11. Гладков М., Шибанов С. Сложные структуры в реляционных базах данных // Открытые системы. 2004. № 2. С. 62-67.
97