Mathematical Structures and Modeling 2015. N. 3(35). PP. 34-41
UDC 378
why big-o and little-o in algorithm complexity: a pedagogical remark
O. KosHeLeva
Ph.D, (Phys.-Math,), Associate Professor, e-mail: [email protected]
V. KrEinovieH
Ph.D. (Phys.-Math.), Professor, e-mail: [email protected] University of Texas at El Paso, El Paso, TX 79968, USA
AbstrAet. In the comparative analysis of different algorithms, O- and o-notions are frequently used. While their use is productive, most textbooks do not provide a convincing student-oriented explanation of why these particular notations are useful in algorithm analysis. In this note, we provide such an explanation.
KEywords: O(f(n)), o(f(n)), algorithm analysis.
1. FormuLation of thE ProblEm
O is ubiquitous in algorithm analysis. In algorithm analysis, to gauge the speed of an algorithm, usually, O-estimates are used: e.g., the algorithm requires time O(n), or O(n2), etc.; see, e.g., [1]. Often, o-estimates are also used.
NEEd for O and o is not elEarly ExplainEd. In many textbooks, the need to consider O- and o- estimates is not clearly explained.
What we do in this papEr. The main objective of this short paper is to fill this gap by providing the students with a simple and — hopefully — convincing explanation of why O and o estimates are natural.
2. Analysis of thE ProblEm and thE Explanation of O
What we rEally want. What we want is to estimate how fast the algorithm is, i.e., how much time T(x) it takes on different inputs x.
Worst-easE and avEragE computation timE. In some situations, e.g., in automatic control, we need to make decisions “in real time”, i.e., within a certain period of time. For example, if a automatic car sees a suddenly appearing obstacle, it needs to compute the needed change in trajectory so as to have time to avoid the collision. In general, the computation time T(x) is different for different inputs x, so we want to be sure that for all these inputs, this computation time does not exceed a given threshold. In other words, we need to make sure that the worst-case time maxT(x) does not exceed this threshold.
Mathematical Structures and Modeling, 2015. N.3(35)
35
Of course, the computation time depends on the size of the input; e,g,, to answer a query, we sometimes need to look at all the records in the corresponding database. The larger the database, the more time it takes, so if we simply take the maximum over all possible inputs x, we get a meaningless infinity. So to get a meaningful description of the algorithm’s speed, it is natural to limit ourselves to inputs of a given length (e.g., length in bits), i.e., to consider the worst-case time
Tw (n) = max{T(x) : len(x) = n}.
Once we know that we are within the time limit, a natural next thing to estimate is how much time overall we spend on the corresponding computations. This overall time is, in general, proportional to the number of times when we call our algorithm, so, in effect, what we want to estimate is the average computation time
T“*(n) =f Y,p(x) • T(x),
x
where p(x) is the frequency with which the input x happens among all inputs of the given length n = len(x).
How to estimate worst-ease and average computation time: the main challenge. The actual time of an algorithm depends on what computer they are implemented on, since different computers have different times for different elementary operations (such as addition, multiplication, etc.).
As a result, if we compare algorithms by simply using worst-case and average computation times as defined earlier, we may get different results depending on the computer on which the two compared algorithms are implemented. It is desirable to come up with a way of comparing algorithms themselves, a way that would not depend on the underlying computer.
How to compare algorithms: main idea. We want to make the comparison of algorithms independent of the difference in times needed to perform different elementary operations on different computers.
If we knew the number of elementary operations of each type, we could simply;
• count the number of operations t,(x) of each type i,
• multiply this number t,(x) of operations by the computation time w, needed for a single operation of this type, and then
• add up the times needed for all the types;
T(x) = ^2 Wi • ti(x).
Since we want a value that does not depend on the times w,, let us fix arbitrary time for each — the simplest idea is to assume that each elementary operation of each type take exactly one unit of time w, = 1. In this case, what we are doing is simply counting the number of elementary operations t(x) = ti(x).
36 О. Kosheleva, V. Kreinovich. Why Eig-Q and Little-Q in Algorithm...
Comment. As one can easily cheek, the comparison results remain the same whether we use one unit of time for all types or we use different times for different types. Therefore, for simplicity, it is convenient to use the simplest way and take all the times equal to 1.
What is the relation between computation time and overall number of operations. The overall number t(x) of elementary operations on an input x is equal to the sum
t(x) = ti(x)
i
where ti(x) is the total number of elementary operations of type i.
In these terms, the overall computation time is equal to
T(x) = ^2 Wi ■ ti(x),
where wi > 0 is the time needed for a single elementary operation of type i.
Let w = min(wi, w2,...) > 0 denote the smallest of the times wi, and let M = max(w1, w2,...) denote the largest of these times. Then, for every i, we have
m < wi < M.
Multiplying all three sides of this double inequality by ti(x), we conclude that
m ■ ti(x) < wi ■ ti(x) < M ■ ti(x).
Adding up the terms corresponding to all possible types of elementary operations, we conclude that
y^m ■ ti(x) < ^ wi ■ ti(x) < ^ M ■ ti(x),
i.e., that
m ■ t(x) < T(x) < M ■ t(x).
By taking the maximum over all inputs x of length n, we conclude that
m ■ tw(n) < Tw(n) < M ■ tw(n),
where we denoted
tw(n) <=f max{t(x) : len(x) = n}.
Similarly, by taking the average over all inputs x of length n, we conclude that
m ■ tav (n) < Tav (n) < M ■ tav (n),
where we denoted
tav (n) <= ^ p(x) ■ t(x).
These quantities tw(n) and tav(n) are known as, correspondingly, worst-case and average-case computational complexity.
How can we compare the algorithms. A natural way to compare the two algorithms is as follows. We say that an algorithm A is faster than an algorithm B if
Mathematical Structures and Modeling, 2015. N.3(35)
37
• whenever we fix a computer running the algorithm B,
• we can find another computer on which — in the worst case or in the average case — the algorithm A is always faster
(assuming that in principle, we can always find computers which run as fast as we want).
For the worst-case computation time, this means that:
• for each computer running the algorithm B,
• we can find a computer for running the algorithm A for which TW(n) < TW(n) for all n.
For the average-case computation time, this means that:
• for each computer running the algorithm B,
• we can find a computer for running the algorithm A for which TAv(n) <
< Tgv (n) for all n.
How can we describe these properties in terms of the worst-case and average computational complexity? To answer this question, let us consider the property that TW(n) < TW(n) for all n.
We know that mA ■ tw(n) < TJ(n) and that Tw(B) < MB ■ tB(n). Thus, the
desired inequality TW(n) < TW(n) implies that mA ■ tA(n) < MB ■ tB(n) for all n,
i.e., equivalently, that for all n, we have
tA(n) < C ■ tB(n),
where we denoted C =
MB
mA
Similarly, for the average-case complexity, we conclude that for all n, we have
tAv (n) < c ■ tBv (n),
for the same constant C.
In both cases, we have two functions f (n) > 0 and g(n) > 0 with the property that for some C > 0, we have f (n) < C ■ g(n) for all n. This property is abbreviated as f = O(g). In these terms, the above conditions can be described, correspondingly, as
tB (n) = 0(tA(n)) and
taI (n) = O(tBv (n)).
Let us show that, vice versa, if one of these two properties is satisfied, then the algorithm A is faster than the algorithm B in the above sense. Indeed, let us assume that for some C and for all n, we have
tA(n) < c ■ tB(n).
38 О. Kosheleva, V. Kreinovich. Why Eig-Q and Little-Q in Algorithm...
We have assumed that we can select the A-computer which is arbitrarily fast, i.e,, for which the maximal computation time MA can be as small as possible. Let us use this assumption and select a computer with
Ma <
mB C '
For this choice, we have C ■ MA < mB.
Then, from TW(n) < MA ■ tA(n) and tA(n) < C ■ tB(n), we conclude that
Tw(n) <C ■ Ma ■ tB(n).
Due to our choice of MA, we have C ■ MA < mB and thus, T'(n) < mB ■ tB(n). We know that mB ■ t'B(n) < T'(n), and therefore, we can conclude that T'W(n) < T' (n) for all n — exactly what we wanted.
Similarly, for the average-case complexity, tA(n) = O(tB(n)) implies that, for an appropriately selected A-computer, TAv(n) < Tgv(n) for all n. So, we arrive at the following conclusion.
Conclusions.
• The algorithm A is faster than the algorithm B in terms of the worst-case complexity if and only if t'A(n) = O(tB(n)).
• Similarly, the algorithm A is faster than the algorithm B in terms of the average-case complexity if and only if tA(n) = O(tB(n)).
These two results explain why O-notations are used to compare algorithms.
3. Explanation of o
Idea. Instead of selecting a fast computer for the algorithm A, we can consider comparing the implementations of the algorithms A and B on arbitrary computers. Of course, if the computer for A is much slower than the computer for B, then we cannot expect the computation of A to be faster than B on all the inputs, but we can require it for all sufficiently long inputs.
Towards a more precise description of this idea. Let us say that an algorithm A is much faster than an algorithm B if:
• for every implementation of A and
• for every implementation of B,
• there exists a threshold n0 such that for n > n0, we have T'W (n) < TW (n) — or, correspondingly, TAv (n) < Tgv (n).
Let us reformulate this definition in terms of the computational complexities tw(n) and tav(n). Let us start with the algorithms A and B operating on the same computer. In this case, there exists an n0 for which, starting with this n0,
Mathematical Structures and Modeling, 2015. N.3(35)
39
we have T'A (n) < TB (n). Due to mA ■ tA(n) < T'B (n) and T'B (n) < MB ■ tB (n), this
implies that mA ■ tA(n) < MB ■ tB(n), i.e., that
twA(n) < C ■ tB(n),
where C
def Mb
mA
For any real number e > 0, we can now consider a new B-computer which
C
is — times faster than the previous one. For this new B-implementation B', the
e
C
computation times of elementary operations are — times smaller, and thus,
e
Mb' — Mb ■ c
Let us apply the assumption that the algorithm A is much faster than the algorithm B to the original implementation of the algorithm A and to the new implementation B' of the algorithm B. We then conclude that there exists an n0 for which for all n > n0, we have TA (n) < TB (n). Similarly to the above, we can thus deduce that
tA(n) < c' ■ tB(n),
where
C' d—f Mb' mj
e MB e
C mA C
-■C — e.
The resulting inequality t'A(n) < e ■ tB(n) implies that
tA(n)
tB (n)
< e.
Thus, for every e > 0, there exists a natural number n0 such that for all n > n0,
we have 0 <
tA(n)
tWB (n)
< e. This is exactly the definition of a positive sequence having
zero as the limit. So, we conclude that if the algorithm A is much faster than the
algorithm B, then the ratio
tA(n) tB (n)
tends to 0. This is what is denoted by
tA(n) — o(tB(n)).
Similarly, the fact that A is much faster than B in terms of the average computation time implies that
tAv (n) — o(tB (n)).
Let us show that, vice versa, if tA(n) — o(tB(n)), then the algorithm A is much faster than the algorithm B — in the sense of the above definition. Indeed, if
tAn)
tB (n)
^ 0
40 О. Kosheleva, V. Kreinovich. Why Eig-Q and Little-Q in Algorithm...
this means that for every e > 0, there exists an n0 such that for all n > n0, we
tw (n)
have ^ < e, i.e., equivalently, tA(n) < e ■ tB(n).
Let us now consider that the algorithms A and B are implemented on some computers:
• the algorithm A is implemented on a computer with parameters mA and MA, and
• the algorithm B is implemented on a computer with parameters mB and MB.
Let us take e =f mB. Then, tA(n) < e ■ tw(B) means that tA(n) < mB ■ tw(B), i.e.,
MA MA
equivalently, that
Ma ■ tA(n) < тв ■ tB(n).
We know that T'W(n) < MA ■ t'A(n) and that mB ■ tB(n) < TB(n). Thus, we conclude that for all n > n0, we have T'W(n) < TB(n). This is exactly what it means for the algorithm A to be much faster than the algorithm B.
A similar conclusion can be made about average-case computations time, so we arrive at the following conclusions.
Conclusions
• The algorithm A is much faster than the algorithm B in terms of the worst-case complexity if and only if tA(n) = o(tB(n)).
• Similarly, the algorithm A is much faster than the algorithm B in terms of the average-case complexity if and only if tAv(n) = o(tBv(n)).
These two results explain why o-notations are used to compare algorithms.
Acknowledgments
This work was supported in part by the National Science Foundation grants HRD-0734825 and HRD-1242122 (Cyber-ShARE Center of Excellence), and DUE-0926721.
The authors are thankful to all the participants of the North American Annual Meeting of the Association of Symbolic Logic (Urbana-Champaign, Illinois, March 25-28, 2015), especially to Yuri Gurevich, for valuable discussions.
References
1. Th. H. Cormen, C. E. Leiserson, Rivest R.L., Stein C. Introduction to Algorithms. MIT Press, Cambridge, Massachusetts, 2009.
Mathematical Structures and Modeling, 2015. N.3(35)
41
методическое обоснование введения o-нотаций для оценки сложности алгоритмов: педагогические замечания
О. Кошелева
к.Ф.-м.н., доцент, e-mail: [email protected] В. Крейнович
к.Ф.-м.н., профессор, e-mail: [email protected] Техасский университет в Эль Пасо, США
Аннотация. О-нотации («О» большое и «о» малое) широко используются в сравнительном анализе алгоритмов. Несмотря на эффективность этого инструмента анализа, большинство учебников не предоставляют убедительного и ориентированного на студентов обоснования полезности О-нотаций. Данная работа призвана восполнить указанный пробел.
Ключевые слова: 0(f(n)), o(f(n)), анализ алгоритмов.