Pagina principale Algorithms Illuminated: Part 1: The Basics

Algorithms Illuminated: Part 1: The Basics

Algorithms are the heart and soul of computer science. Their applications range from network routing and computational genomics to public-key cryptography and machine learning. Studying algorithms can make you a better programmer, a clearer thinker, and a master of technical interviews. Algorithms Illuminated is an accessible introduction to the subject for anyone with at least a little programming experience. The exposition emphasizes the big picture and conceptual understanding over low-level implementation and mathematical details---like a transcript of what an expert algorithms tutor would say over a series of one-on-one lessons. Part 1 covers asymptotic analysis and big-O notation, divide-and-conquer algorithms and the master method, randomized algorithms, and several famous algorithms for sorting and selection
Anno:
2017
Editore:
Soundlikeyourself Publishing
Lingua:
english
Pagine:
216 / 217
ISBN 10:
0999282905
ISBN 13:
978-0-9992829-1-5
File:
PDF, 5.80 MB
Download (pdf, 5.80 MB)

You may be interested in

 

Most frequently terms

 
 
You can write a book review and share your experiences. Other readers will always be interested in your opinion of the books you've read. Whether you've loved the book or not, if you give your honest and detailed thoughts then people will find new books that are right for them.
Algorithms Illuminated
Part 1: The Basics
Tim Roughgarden

c 2017 by Tim Roughgarden
All rights reserved. No portion of this book may be reproduced in any form
without permission from the publisher, except as permitted by U. S. copyright
law.
Printed in the United States of America
First Edition
Cover image: Stanza, by Andrea Belag
ISBN: 978-0-9992829-0-8 (Paperback)
ISBN: 978-0-9992829-1-5 (ebook)
Library of Congress Control Number: 2017914282

Soundlikeyourself Publishing, LLC
San Francisco, CA
tim.roughgarden@gmail.com
www.algorithmsilluminated.org

To Emma

Contents

Preface

vii

1

Introduction
1.1 Why Study Algorithms?
1.2 Integer Multiplication
1.3 Karatsuba Multiplication
1.4 MergeSort: The Algorithm
1.5 MergeSort: The Analysis
1.6 Guiding Principles for the Analysis of Algorithms
Problems

1
1
3
6
12
18
26
33

2

Asymptotic Notation
2.1 The Gist
2.2 Big-O Notation
2.3 Two Basic Examples
2.4 Big-Omega and Big-Theta Notation
2.5 Additional Examples
Problems

36
36
45
47
50
54
57

3

4

Divide-and-Conquer Algorithms
3.1 The Divide-and-Conquer Paradigm
3.2 Counting Inversions in O(n log n) Time
3.3 Strassen’s Matrix Multiplication Algorithm
*3.4 An O(n log n)-Time Algorithm for the Closest Pair
Problems
The
4.1
4.2
4.3

Master Method
Integer Multiplication Revisited
Formal Statement
Six Examples
v

60
60
61
71
77
90
92
92
95
97

vi

Contents

*4.4 Proof of the Master Method
Problems

103
114

5

QuickSort
5.1 Overview
5.2 Partitioning Around a Pivot Element
5.3 The Importance of Good Pivots
5.4 Randomized QuickSort
*5.5 Analysis of Randomized QuickSort
*5.6 Sorting Requires ⌦(n log n) Comparisons
Problems

117
117
121
128
132
135
145
151

6

Linear-Time Selection
6.1 The RSelect Algorithm
*6.2 Analysis of RSelect
*6.3 The DSelect Algorithm
*6.4 Analysis of DSelect
Problems

155
155
163
167
172
180

A Quick Review of Proofs By Induction
A.1 A Template for Proofs by Induction
A.2 Example: A Closed-Form Formula
A.3 Example: The Size of a Complete Binary Tree

183
183
184
185

B Quick Review of Disc; rete Probability
B.1 Sample Spaces
B.2 Events
B.3 Random Variables
B.4 Expectation
B.5 Linearity of Expectation
B.6 Example: Load Balancing

186
186
187
189
190
192
195

Index

199

Preface

This book is the first of a four-part series based on my online algorithms
courses that have been running regularly since 2012, which in turn
are based on an undergraduate course that I’ve taught many times at
Stanford University.
What We’ll Cover
Algorithms Illuminated, Part 1 provides an introduction to and basic
literacy in the following four topics.
Asymptotic analysis and big-O notation. Asymptotic notation
provides the basic vocabulary for discussing the design and analysis
of algorithms. The key concept here is “big-O” notation, which is a
modeling choice about the granularity with which we measure the
running time of an algorithm. We’ll see that the sweet spot for clear
high-level thinking about algorithm design is to ignore constant factors
and lower-order terms, and to concentrate on how an algorithm’s
performance scales with the size of the input.
Divide-and-conquer algorithms and the master method.
There’s no silver bullet in algorithm design, no single problem-solving
method that cracks all computational problems. However, there are
a few general algorithm design techniques that find successful application across a range of different domains. In this part of the
series, we’ll cover the “divide-and-conquer” technique. The idea is
to break a problem into smaller subproblems, solve the subproblems
recursively, and then quickly combine their solutions into one for the
original problem. We’ll see fast divide-and-conquer algorithms for
sorting, integer and matrix multiplication, and a basic problem in
computational geometry. We’ll also cover the master method, which is
vii

viii

Preface

a powerful tool for analyzing the running time of divide-and-conquer
algorithms.
Randomized algorithms. A randomized algorithm “flips coins” as
it runs, and its behavior can depend on the outcomes of these coin
flips. Surprisingly often, randomization leads to simple, elegant, and
practical algorithms. The canonical example is randomized QuickSort,
and we’ll explain this algorithm and its running time analysis in detail.
We’ll see further applications of randomization in Part 2.
Sorting and selection. As a byproduct of studying the first three
topics, we’ll learn several famous algorithms for sorting and selection,
including MergeSort, QuickSort, and linear-time selection (both randomized and deterministic). These computational primitives are so
blazingly fast that they do not take much more time than that needed
just to read the input. It’s important to cultivate a collection of such
“for-free primitives,” both to apply directly to data and to use as the
building blocks for solutions to more difficult problems.
For a more detailed look into the book’s contents, check out the
“Upshot” sections that conclude each chapter and highlight the most
important points.
Topics covered in the other three parts. Algorithms Illuminated, Part 2 covers data structures (heaps, balanced search trees,
hash tables, bloom filters), graph primitives (breadth- and depth-first
search, connectivity, shortest paths), and their applications (ranging from deduplication to social network analysis). Part 3 focuses
on greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence
alignment, shortest paths, optimal search trees). Part 4 is all about
N P -completeness, what it means for the algorithm designer, and
strategies for coping with computationally intractable problems, including the analysis of heuristics and local search.
Skills You’ll Learn
Mastering algorithms takes time and effort. Why bother?
Become a better programmer. You’ll learn several blazingly fast
subroutines for processing data and several useful data structures for

ix

Preface

organizing data that can be deployed directly in your own programs.
Implementing and using these algorithms will stretch and improve
your programming skills. You’ll also learn general algorithm design
paradigms that are relevant for many different problems across different domains, as well as tools for predicting the performance of such
algorithms. These “algorithmic design patterns” can help you come
up with new algorithms for problems that arise in your own work.
Sharpen your analytical skills. You’ll get lots of practice describing and reasoning about algorithms. Through mathematical analysis,
you’ll gain a deep understanding of the specific algorithms and data
structures covered in these books. You’ll acquire facility with several mathematical techniques that are broadly useful for analyzing
algorithms.
Think algorithmically. After learning about algorithms it’s hard
not to see them everywhere, whether you’re riding an elevator, watching a flock of birds, managing your investment portfolio, or even
watching an infant learn. Algorithmic thinking is increasingly useful
and prevalent in disciplines outside of computer science, including
biology, statistics, and economics.
Literacy with computer science’s greatest hits. Studying algorithms can feel like watching a highlight reel of many of the greatest
hits from the last sixty years of computer science. No longer will you
feel excluded at that computer science cocktail party when someone
cracks a joke about Dijkstra’s algorithm. After reading these books,
you’ll know exactly what they mean.
Ace your technical interviews. Over the years, countless students have regaled me with stories about how mastering the concepts
in these books enabled them to ace every technical interview question
they were ever asked.
How These Books Are Different
This series of books has only one goal: to teach the basics of algorithms
in the most accessible way possible. Think of them as a transcript
of what an expert algorithms tutor would say to you over a series of
one-on-one lessons.

x

Preface

There are a number of excellent more traditional and more encyclopedic textbooks on algorithms, any of which usefully complement this
book series with additional details, problems, and topics. I encourage
you to explore and find your own favorites. There are also several
books that, unlike these books, cater to programmers looking for
ready-made algorithm implementations in a specific programming
language. Many such implementations are freely available on the Web
as well.
Who Are You?
The whole point of these books and the online courses they are based
on is to be as widely and easily accessible as possible. People of all
ages, backgrounds, and walks of life are well represented in my online
courses, and there are large numbers of students (high-school, college,
etc.), software engineers (both current and aspiring), scientists, and
professionals hailing from all corners of the world.
This book is not an introduction to programming, and ideally
you’ve acquired basic programming skills in a standard language (like
Java, Python, C, Scala, Haskell, etc.). For a litmus test, check out
Section 1.4—if it makes sense, you’ll be fine for the rest of the book.
If you need to beef up your programming skills, there are several
outstanding free online courses that teach basic programming.
We also use mathematical analysis as needed to understand
how and why algorithms really work. The freely available lecture
notes Mathematics for Computer Science, by Eric Lehman and Tom
Leighton, are P
an excellent and entertaining refresher on mathematical
notation (like
and 8), the basics of proofs (induction, contradiction,
etc.), discrete probability, and much more.1 Appendices A and B also
provide quick reviews of proofs by induction and discrete probability,
respectively. The starred sections are the most mathematically intense
ones. The math-phobic or time-constrained reader can skip these on
a first reading without loss of continuity.
Additional Resources
These books are based on online courses that are currently running
on the Coursera and Stanford Lagunita platforms. There are several
1

http://www.boazbarak.org/cs121/LehmanLeighton.pdf.

xi

Preface

resources available to help you replicate as much of the online course
experience as you like.
Videos. If you’re more in the mood to watch and listen than
to read, check out the YouTube video playlists available from
www.algorithmsilluminated.org. These videos cover all of the topics of this book series. I hope they exude a contagious enthusiasm for
algorithms that, alas, is impossible to replicate fully on the printed
page.
Quizzes. How can you know if you’re truly absorbing the concepts
in this book? Quizzes with solutions and explanations are scattered
throughout the text; when you encounter one, I encourage you to
pause and think about the answer before reading on.
End-of-chapter problems. At the end of each chapter you’ll find
several relatively straightforward questions to test your understanding,
followed by harder and more open-ended challenge problems. Solutions
to these end-of-chapter problems are not included here, but readers
can interact with me and each other about them through the book’s
discussion forum (see below).
Programming problems. Many of the chapters conclude with
a suggested programming project, where the goal is to develop a
detailed understanding of an algorithm by creating your own working
implementation of it. Data sets, along with test cases and their
solutions, can be found at www.algorithmsilluminated.org.
Discussion forums. A big reason for the success of online courses
is the opportunities they provide for participants to help each other
understand the course material and debug programs through discussion forums. Readers of these books have the same opportunity, via
the forums available from www.algorithmsilluminated.org.
Acknowledgments
These books would not exist without the passion and hunger supplied
by the thousands of participants in my algorithms courses over the
years, both on-campus at Stanford and on online platforms. I am particularly grateful to those who supplied detailed feedback on an earlier
draft of this book: Tonya Blust, Yuan Cao, Jim Humelsine, Bayram

xii

Preface

Kuliyev, Patrick Monkelban, Kyle Schiller, Nissanka Wickremasinghe,
and Daniel Zingaro.
I always appreciate suggestions and corrections from readers, which
are best communicated through the discussion forums mentioned
above.
Stanford University
Stanford, California

Tim Roughgarden
September 2017

Chapter 1

Introduction

The goal of this chapter is to get you excited about the study of
algorithms. We begin by discussing algorithms in general and why
they’re so important. Then we use the problem of multiplying two
integers to illustrate how algorithmic ingenuity can improve on more
straightforward or naive solutions. We then discuss the MergeSort
algorithm in detail, for several reasons: it’s a practical and famous
algorithm that you should know; it’s a good warm-up to get you ready
for more intricate algorithms; and it’s the canonical introduction to
the “divide-and-conquer” algorithm design paradigm. The chapter
concludes by describing several guiding principles for how we’ll analyze
algorithms throughout the rest of the book.

1.1

Why Study Algorithms?

Let me begin by justifying this book’s existence and giving you some
reasons why you should be highly motivated to learn about algorithms.
So what is an algorithm, anyway? It’s a set of well-defined rules—a
recipe, in effect—for solving some computational problem. Maybe
you have a bunch of numbers and you want to rearrange them so that
they’re in sorted order. Maybe you have a road map and you want
to compute the shortest path from some origin to some destination.
Maybe you need to complete several tasks before certain deadlines,
and you want to know in what order you should finish the tasks so
that you complete them all by their respective deadlines.
So why study algorithms?
Important for all other branches of computer science. First,
understanding the basics of algorithms and the closely related field
of data structures is essential for doing serious work in pretty much
any branch of computer science. For example, at Stanford University,
1

2

Introduction

every degree the computer science department offers (B.S., M.S., and
Ph.D.) requires an algorithms course. To name just a few examples:
1. Routing protocols in communication networks piggyback on
classical shortest path algorithms.
2. Public-key cryptography relies on efficient number-theoretic
algorithms.
3. Computer graphics requires the computational primitives supplied by geometric algorithms.
4. Database indices rely on balanced search tree data structures.
5. Computational biology uses dynamic programming algorithms
to measure genome similarity.
And the list goes on.
Driver of technological innovation. Second, algorithms play a
key role in modern technological innovation. To give just one obvious
example, search engines use a tapestry of algorithms to efficiently
compute the relevance of various Web pages to a given search query.
The most famous such algorithm is the PageRank algorithm currently
in use by Google. Indeed, in a December 2010 report to the United
States White House, the President’s council of advisers on science
and technology wrote the following:
“Everyone knows Moore’s Law –– a prediction made in
1965 by Intel co-founder Gordon Moore that the density
of transistors in integrated circuits would continue to
double every 1 to 2 years. . . in many areas, performance
gains due to improvements in algorithms have vastly
exceeded even the dramatic performance gains due to
increased processor speed.” 1

1
Excerpt from Report to the President and Congress: Designing a Digital
Future, December 2010 (page 71).

1.2

Integer Multiplication

3

Lens on other sciences. Third, although this is beyond the scope
of this book, algorithms are increasingly used to provide a novel
“lens” on processes outside of computer science and technology. For
example, the study of quantum computation has provided a new
computational viewpoint on quantum mechanics. Price fluctuations
in economic markets can be fruitfully viewed as an algorithmic process.
Even evolution can be thought of as a surprisingly effective search
algorithm.
Good for the brain. Back when I was a student, my favorite classes
were always the challenging ones that, after I struggled through them,
left me feeling a few IQ points smarter than when I started. I hope
this material provides a similar experience for you.
Fun! Finally, I hope that by the end of the book you can see why
the design and analysis of algorithms is simply fun. It’s an endeavor
that requires a rare blend of precision and creativity. It can certainly
be frustrating at times, but it’s also highly addictive. And let’s not
forget that you’ve been learning about algorithms since you were a
little kid.

1.2

Integer Multiplication

1.2.1

Problems and Solutions

When you were in third grade or so, you probably learned an algorithm
for multiplying two numbers—a well-defined set of rules for transforming an input (two numbers) into an output (their product). It’s
important to distinguish between two different things: the description
of the problem being solved, and that of the method of solution (that is,
the algorithm for the problem). In this book, we’ll repeatedly follow
the pattern of first introducing a computational problem (the inputs
and desired output), and then describing one or more algorithms that
solve the problem.
1.2.2

The Integer Multiplication Problem

In the integer multiplication problem, the input is two n-digit numbers,
which we’ll call x and y. The length n of x and y could be any positive
integer, but I encourage you to think of n as large, in the thousands or

4

Introduction

even more.2 (Perhaps you’re implementing a cryptographic application
that must manipulate very large numbers.) The desired output in the
integer multiplication problem is just the product x · y.
Problem: Integer Multiplication
Input: Two n-digit nonnegative integers, x and y.
Output: The product x · y.
1.2.3

The Grade-School Algorithm

Having defined the computational problem precisely, we describe an
algorithm that solves it—the same algorithm you learned in third
grade. We will assess the performance of this algorithm through the
number of “primitive operations” it performs, as a function of the
number of digits n in each input number. For now, let’s think of a
primitive operation as any of the following: (i) adding two single-digit
numbers; (ii) multiplying two single-digit numbers; or (iii) adding a
zero to the beginning or end of a number.
To jog your memory, consider the concrete example of multiplying
x = 5678 and y = 1234 (so n = 4). See also Figure 1.1. The algorithm
first computes the “partial product” of the first number and the last
digit of the second number 5678 · 4 = 22712. Computing this partial
product boils down to multiplying each of the digits of the first number
by 4, and adding in “carries” as necessary.3 When computing the next
partial product (5678 · 3 = 17034), we do the same thing, shifting the
result one digit to the left, effectively adding a “0” at the end. And so
on for the final two partial products. The final step is to add up all
the partial products.
Back in third grade, you probably accepted that this algorithm is
correct, meaning that no matter what numbers x and y you start with,
provided that all intermediate computations are done properly, it
eventually terminates with the product x · y of the two input numbers.
2

If you want to multiply numbers with different lengths (like 1234 and 56),
a simple hack is to just add some zeros to the beginning of the smaller number
(for example, treat 56 as 0056). Alternatively, the algorithms we’ll discuss can be
modified to accommodate numbers with different lengths.
3
8 · 4 = 32, carry the 3, 7 · 4 = 28, plus 3 is 31, carry the 3, . . .

1.2

5

Integer Multiplication

n rows

5678
× 1234
22712
17034
11356
5678
7006652

2n operations
(per row)

Figure 1.1: The grade-school integer multiplication algorithm.

That is, you’re never going to get a wrong answer, and the algorithm
can’t loop forever.
1.2.4

Analysis of the Number of Operations

Your third-grade teacher might not have discussed the number of
primitive operations needed to carry out this procedure to its conclusion. To compute the first partial product, we multiplied 4 times
each of the digits 5, 6, 7, 8 of the first number. This is 4 primitive
operations. We also performed a few additions because of the carries.
In general, computing a partial product involves n multiplications
(one per digit) and at most n additions (at most one per digit), for
a total of at most 2n primitive operations. There’s nothing special
about the first partial product: every partial product requires at most
2n operations. Since there are n partial products—one per digit of the
second number—computing all of them requires at most n · 2n = 2n2
primitive operations. We still have to add them all up to compute
the final answer, but this takes a comparable number of operations
(at most another 2n2 ). Summarizing:
2
total number of operations  constant
| {z } ·n .
=4

Thinking about how the amount of work the algorithm performs
scales as the input numbers grow bigger and bigger, we see that the
work required grows quadratically with the number of digits. If you
double the length of the input numbers, the work required jumps by

6

Introduction

a factor of 4. Quadruple their length and it jumps by a factor of 16,
and so on.
1.2.5

Can We Do Better?

Depending on what type of third-grader you were, you might well
have accepted this procedure as the unique or at least optimal way
to multiply two numbers. If you want to be a serious algorithm
designer, you’ll need to grow out of that kind of obedient timidity.
The classic algorithms book by Aho, Hopcroft, and Ullman, after
iterating through a number of algorithm design paradigms, has this
to say:
“Perhaps the most important principle for the good
algorithm designer is to refuse to be content.” 4
Or as I like to put it, every algorithm designer should adopt the
mantra:
Can we do better?
This question is particularly apropos when you’re faced with a naive
or straightforward solution to a computational problem. In the third
grade, you might not have asked if one could do better than the
straightforward integer multiplication algorithm. Now is the time to
ask, and answer, this question.

1.3

Karatsuba Multiplication

The algorithm design space is surprisingly rich, and there are certainly
other interesting methods of multiplying two integers beyond what
you learned in the third grade. This section describes a method called
Karatsuba multiplication.5
4

Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman, The Design and
Analysis of Computer Algorithms, Addison-Wesley, 1974, page 70.
5
Discovered in 1960 by Anatoly Karatsuba, who at the time was a 23-year-old
student.

1.3

7

Karatsuba Multiplication

1.3.1

A Concrete Example

To get a feel for Karatsuba multiplication, let’s re-use our previous
example with x = 5678 and y = 1234. We’re going to execute a
sequence of steps, quite different from the grade-school algorithm,
culminating in the product x · y. The sequence of steps should strike
you as very mysterious, like pulling a rabbit out of a hat; later in
the section we’ll explain exactly what Karatsuba multiplication is
and why it works. The key point to appreciate now is that there’s
a dazzling array of options for solving computational problems like
integer multiplication.
First, to regard the first and second halves of x as numbers in their
own right, we give them the names a and b (so a = 56 and b = 78).
Similarly, c and d denote 12 and 34, respectively (Figure 1.2).

a

5678
× 1234

c

b

d

Figure 1.2: Thinking of 4-digit numbers as pairs of double-digit numbers.

Next we’ll perform a sequence of operations that involve only the
double-digit numbers a, b, c, and d, and finally collect all the terms
together in a magical way that results in the product of x and y.
Step 1: Compute a · c = 56 · 12, which is 672 (as you’re welcome to
check).
Step 2: Compute b · d = 78 · 34 = 2652.
The next two steps are still more inscrutable.
Step 3: Compute (a + b) · (c + d) = 134 · 46 = 6164.
Step 4: Subtract the results of the first two steps from the result
of the third step: 6164 672 2652 = 2840.

8

Introduction

Finally, we add up the results of steps 1, 2, and 4, but only after
adding four trailing zeroes to the answer in step 1 and 2 trailing zeroes
to the answer in step 4.
Step 5: Compute 104 · 672 + 102 · 2840 + 2652
6720000 + 284000 + 2652 = 70066552.

=

This is exactly the same (correct) result computed by the gradeschool algorithm in Section 1.2!
You should not have any intuition about what just happened.
Rather, I hope that you feel some mixture of bafflement and intrigue,
and appreciate the fact that there seem to be fundamentally different
algorithms for multiplying integers than the one you learned as a kid.
Once you realize how rich the space of algorithms is, you have to
wonder: can we do better than the third-grade algorithm? Does the
algorithm above already do better?
1.3.2

A Recursive Algorithm

Before tackling full-blown Karatsuba multiplication, let’s explore a
simpler recursive approach to integer multiplication.6 A recursive
algorithm for integer multiplication presumably involves multiplications of numbers with fewer digits (like 12, 34, 56, and 78 in the
computation above).
In general, a number x with an even number n of digits can be
expressed in terms of two n/2-digit numbers, its first half and second
half a and b:
x = 10n/2 · a + b.
Similarly, we can write

y = 10n/2 · c + d.
To compute the product of x and y, let’s use the two expressions
above and multiply out:
x · y = (10n/2 · a + b) · (10n/2 · c + d)

= 10n · (a · c) + 10n/2 · (a · d + b · c) + b · d.

6

(1.1)

I’m assuming you’ve heard of recursion as part of your programming background. A recursive procedure is one that invokes itself as a subroutine with a
smaller input, until a base case is reached.

1.3

Karatsuba Multiplication

9

Note that all of the multiplications in (1.1) are either between pairs
of n/2-digit numbers or involve a power of 10.7
The expression (1.1) suggests a recursive approach to multiplying
two numbers. To compute the product x · y, we compute the expression (1.1). The four relevant products (a · c, a · d, b · c, and b · d) all
concern numbers with fewer than n digits, so we can compute each of
them recursively. Once our four recursive calls come back to us with
their answers, we can compute the expression (1.1) in the obvious way:
tack on n trailing zeroes to a · c, add a · d and b · c (using grade-school
addition) and tack on n/2 trailing zeroes to the result, and finally add
these two expressions to b · d.8 We summarize this algorithm, which
we’ll call RecIntMult, in the following pseudocode.9
RecIntMult
Input: two n-digit positive integers x and y.
Output: the product x · y.
Assumption: n is a power of 2.
if n = 1 then
// base case
compute x · y in one step and return the result
else
// recursive case
a, b := first and second halves of x
c, d := first and second halves of y
recursively compute ac := a · c, ad := a · d,
bc := b · c, and bd := b · d
compute 10n · ac + 10n/2 · (ad + bc) + bd using
grade-school addition and return the result
Is the RecIntMult algorithm faster or slower than the grade-school
7
For simplicity, we are assuming that n is a power of 2. A simple hack for
enforcing this assumption is to add an appropriate number of leading zeroes to x
and y, which at most doubles their lengths. Alternatively, when n is odd, it’s also
fine to break x and y into two numbers with almost equal lengths.
8
Recursive algorithms also need one or more base cases, so that they don’t
keep calling themselves until the rest of time. Here, the base case is: if x and y
are 1-digit numbers, just multiply them in one primitive operation and return the
result.
9
In pseudocode, we use “=” to denote an equality test, and “:=” to denote a
variable assignment.

10

Introduction

algorithm? You shouldn’t necessarily have any intuition about this
question, and the answer will have to wait until Chapter 4.
1.3.3

Karatsuba Multiplication

Karatsuba multiplication is an optimized version of the RecIntMult
algorithm. We again start from the expansion (1.1) of x · y in terms
of a, b, c, and d. The RecIntMult algorithm uses four recursive calls,
one for each of the products in (1.1) between n/2-digit numbers. But
we don’t really care about a · d or b · c, except inasmuch as we care
about their sum a · d + b · c. With only three quantities that we care
about—a · c, a · d + b · c, and b · d—can we get away with only three
recursive calls?
To see that we can, first use two recursive calls to compute a · c
and b · d, as before.
Step 1: Recursively compute a · c.
Step 2: Recursively compute b · d.

Instead of recursively computing a·d or b·c, we recursively compute
the product of a + b and c + d.10
Step 3: Compute a + b and c + d (using grade-school addition),
and recursively compute (a + b) · (c + d).
The key trick in Karatsuba multiplication goes back to the early
19th-century mathematician Carl Friedrich Gauss, who was thinking
about multiplying complex numbers. Subtracting the results of the
first two steps from the result of the third step gives exactly what we
want, the middle coefficient in (1.1) of a · d + b · c:
(a + b) · (c + d) a · c
|
{z
}

b · d = a · d + b · c.

=a·c+a·d+b·c+b·d

Step 4: Subtract the results of the first two steps from the result
of the third step to obtain a · d + b · c.
The final step computes (1.1), as in the RecIntMult algorithm.
10
The numbers a + b and c + d might have as many as (n/2) + 1 digits, but
the algorithm still works fine.

1.3

Karatsuba Multiplication

11

Step 5: Compute (1.1) by adding up the results of steps 1, 2, and
4, after adding 10n trailing zeroes to the answer in step 1 and 10n/2
trailing zeroes to the answer in step 4.
Karatsuba
Input: two n-digit positive integers x and y.
Output: the product x · y.
Assumption: n is a power of 2.
if n = 1 then
// base case
compute x · y in one step and return the result
else
// recursive case
a, b := first and second halves of x
c, d := first and second halves of y
compute p := a + b and q := c + d using
grade-school addition
recursively compute ac := a · c, bd := b · d, and
pq := p · q
compute adbc := pq ac bd using grade-school
addition
compute 10n · ac + 10n/2 · adbc + bd using
grade-school addition and return the result
Thus Karatsuba multiplication makes only three recursive calls! Saving a recursive call should save on the overall running time, but by
how much? Is the Karatsuba algorithm faster than the grade-school
multiplication algorithm? The answer is far from obvious, but it is an
easy application of the tools you’ll acquire in Chapter 4 for analyzing
the running time of such “divide-and-conquer” algorithms.
On Pseudocode
This book explains algorithms using a mixture of
high-level pseudocode and English (as in this section).
I’m assuming that you have the skills to translate
such high-level descriptions into working code in your
favorite programming language. Several other books

12

Introduction

and resources on the Web offer concrete implementations of various algorithms in specific programming
languages.
The first benefit of emphasizing high-level descriptions over language-specific implementations is flexibility: while I assume familiarity with some programming language, I don’t care which one. Second, this
approach promotes the understanding of algorithms
at a deep and conceptual level, unencumbered by lowlevel details. Seasoned programmers and computer
scientists generally think and communicate about algorithms at a similarly high level.
Still, there is no substitute for the detailed understanding of an algorithm that comes from providing
your own working implementation of it. I strongly
encourage you to implement as many of the algorithms in this book as you have time for. (It’s also a
great excuse to pick up a new programming language!)
For guidance, see the end-of-chapter Programming
Problems and supporting test cases.

1.4

MergeSort: The Algorithm

This section provides our first taste of analyzing the running time of
a non-trivial algorithm—the famous MergeSort algorithm.
1.4.1

Motivation

MergeSort is a relatively ancient algorithm, and was certainly known
to John von Neumann as early as 1945. Why begin a modern course
on algorithms with such an old example?
Oldie but a goodie. Despite being over 70 years old, MergeSort
is still one of the methods of choice for sorting. It’s used all the time
in practice, and is the standard sorting algorithm in a number of
programming libraries.

1.4

13

MergeSort: The Algorithm

Canonical divide-and-conquer algorithm. The “divide-andconquer” algorithm design paradigm is a general approach to solving
problems, with applications in many different domains. The basic
idea is to break your problem into smaller subproblems, solve the
subproblems recursively, and finally combine the solutions to the
subproblems into one for the original problem. MergeSort is an ideal
introduction to the divide-and-conquer paradigm, the benefits it offers,
and the analysis challenges it presents.
Calibrate your preparation. Our MergeSort discussion will give
you a good indication of whether your current skill set is a good match
for this book. My assumption is that you have the programming and
mathematical backgrounds to (with some work) translate the highlevel idea of MergeSort into a working program in your favorite
programming language and to follow our running time analysis of the
algorithm. If this and the next section make sense, then you are in
good shape for the rest of the book.
Motivates guiding principles for algorithm analysis. Our running time analysis of MergeSort exposes a number of more general
guiding principles, such as the quest for running time bounds that
hold for every input of a given size, and the importance of the rate
of growth of an algorithm’s running time (as a function of the input
size).
Warm-up for the master method. We’ll analyze MergeSort using the “recursion tree method,” which is a way of tallying up the
operations performed by a recursive algorithm. Chapter 4 builds on
these ideas and culminates with the “master method,” a powerful
and easy-to-use tool for bounding the running time of many different divide-and-conquer algorithms, including the RecIntMult and
Karatsuba algorithms of Section 1.3.
1.4.2

Sorting

You probably already know the sorting problem and some algorithms
that solve it, but just so we’re all on the same page:

14

Introduction

Problem: Sorting
Input: An array of n numbers, in arbitrary order.
Output: An array of the same numbers, sorted from smallest to largest.
For example, given the input array

5 4 1 8 7 2 6 3
the desired output array is

1 2 3 4 5 6 7 8
In the example above, the eight numbers in the input array are
distinct. Sorting isn’t really any harder when there are duplicates,
and it can even be easier. But to keep the discussion as simple as
possible, let’s assume—among friends—that the numbers in the input
array are always distinct. I strongly encourage you to think about
how our sorting algorithms need to be modified (if at all) to handle
duplicates.11
If you don’t care about optimizing the running time, it’s not
too difficult to come up with a correct sorting algorithm. Perhaps
the simplest approach is to first scan through the input array to
identify the minimum element and copy it over to the first element
of the output array; then do another scan to identify and copy over
the second-smallest element; and so on. This algorithm is called
SelectionSort. You may have heard of InsertionSort, which can
be viewed as a slicker implementation of the same idea of iteratively
growing a prefix of the sorted output array. You might also know
BubbleSort, in which you identify adjacent pairs of elements that
11

In practice, there is often data (called the value) associated with each number
(which is called the key). For example, you might want to sort employee records
(with the name, salary, etc.), using social security numbers as keys. We focus on
sorting the keys, with the understanding that each key retains its associated data.

1.4

15

MergeSort: The Algorithm

are out of order, and perform repeated swaps until the entire array is
sorted. All of these algorithms have quadratic running times, meaning
that the number of operations performed on arrays of length n scales
with n2 , the square of the input length. Can we do better? By using
the divide-and-conquer paradigm, the MergeSort algorithm improves
dramatically over these more straightforward sorting algorithms.12
1.4.3

An Example

The easiest way to understand MergeSort is through a picture of
a concrete example (Figure 1.3). We’ll use the input array from
Section 1.4.2.
5 4 1 8 7 2 6 3
divide
5 4 1 8

7 2 6 3

.
.
.
.

.
.
.
.

recursive calls

1 4 5 8

2 3 6 7
merge
1 2 3 4 5 6 7 8

Figure 1.3: A bird’s-eye view of MergeSort on a concrete example.

As a recursive divide-and-conquer algorithm, MergeSort calls itself
on smaller arrays. The simplest way to decompose a sorting problem
into smaller sorting problems is to break the input array in half. The
first and second halves are each sorted recursively. For example, in
12
While generally dominated by MergeSort, InsertionSort is still useful in
practice in certain cases, especially for small input sizes.

16

Introduction

Figure 1.3, the first and second halves of the input array are {5, 4, 1, 8}
and {7, 2, 6, 3}. By the magic of recursion (or induction, if you prefer),
the first recursive call correctly sorts the first half, returning the array
{1, 4, 5, 8}. The second recursive call returns the array {2, 3, 6, 7}.
The final “merge” step combines these two sorted arrays of length 4
into a single sorted array of all 8 numbers. Details of this step are
given below, but the idea is to walk indices down each of the sorted
subarrays, populating the output array from left to right in sorted
order.
1.4.4

Pseudocode

The picture in Figure 1.3 suggests the following pseudocode, with
two recursive calls and a merge step, for the general problem. As
usual, our description cannot necessarily be translated line by line
into working code (though it’s pretty close).
MergeSort
Input: array A of n distinct integers.
Output: array with the same integers, sorted from
smallest to largest.
// ignoring base cases
C := recursively sort first half of A
D := recursively sort second half of A
return Merge (C,D)
There are several omissions from the pseudocode that deserve
comment. As a recursive algorithm, there should also be one or
more base cases, where there is no further recursion and the answer
is returned directly. So if the input array A contains only 0 or 1
elements, MergeSort returns it (it is already sorted). The pseudocode
does not detail what “first half” and “second half” mean when n is
odd, but the obvious interpretation (with one “half” having one more
element than the other) works fine. Finally, the pseudocode ignores
the implementation details of how to actually pass the two subarrays
to their respective recursive calls. These details depend somewhat
on the programming language. The point of high-level pseudocode is

1.4

17

MergeSort: The Algorithm

to ignore such details and focus on the concepts that transcend any
particular programming language.
1.4.5

The Merge Subroutine

How should we implement the Merge step? At this point, the two
recursive calls have done their work and we have in our possession two
sorted subarrays C and D of length n/2. The idea is to traverse both
the sorted subarrays in order and populate the output array from left
to right in sorted order.13
Merge
Input: sorted arrays C and D (length n/2 each).
Output: sorted array B (length n).
Simplifying assumption: n is even.
1
2
3
4
5
6
7
8
9

i := 1
j := 1
for k := 1 to n do
if C[i] < D[j] then
B[k] := C[i]
i := i + 1
else
B[k] := D[j]
j := j + 1

// populate output array
// increment i
// D[j] < C[i]

We traverse the output array using the index k, and the sorted
subarrays with the indices i and j. All three arrays are traversed
from left to right. The for loop in line 3 implements the pass over
the output array. In the first iteration, the subroutine identifies the
minimum element in either C or D and copies it over to the first
position of the output array B. The minimum element overall is either
in C (in which case it’s C[1], since C is sorted) or in D (in which case
it’s D[1], since D is sorted). Advancing the appropriate index (i or j)
13

We number our array entries beginning with 1 (rather than 0), and use
the syntax “A[i]” for the ith entry of an array A. These details vary across
programming languages.

18

Introduction

effectively removes from further consideration the element just copied,
and the process is then repeated to identify the smallest element
remaining in C or D (the second-smallest overall). In general, the
smallest element not yet copied over to B is either C[i] or D[j]; the
subroutine explicitly checks to see which one is smaller and proceeds
accordingly. Since every iteration copies over the smallest element still
under consideration in C or D, the output array is indeed populated
in sorted order.
As usual, our pseudocode is intentionally a bit sloppy, to emphasize
the forest over the trees. A full implementation should also keep track
of when the traversal of C or D falls off the end, at which point
the remaining elements of the other array are copied into the final
entries of B (in order). Now is a good time to work through your own
implementation of the MergeSort algorithm.

1.5

MergeSort: The Analysis

What’s the running time of the MergeSort algorithm, as a function of
the length n of the input array? Is it faster than more straightforward
methods of sorting, such as SelectionSort, InsertionSort, and
BubbleSort? By “running time,” we mean the number of lines of code
executed in a concrete implementation of the algorithm. Think of
walking line by line through this implementation using a debugger,
one “primitive operation” at a time We’re interested in the number of
steps the debugger takes before the program completes.
1.5.1

Running Time of Merge

Analyzing the running time of the MergeSort algorithm is an intimidating task, as it’s a recursive algorithm that calls itself over and
over. So let’s warm up with the simpler task of understanding the
number of operations performed by a single invocation of the Merge
subroutine when called on two sorted arrays of length `/2 each. We
can do this directly, by inspecting the code in Section 1.4.5 (where n
corresponds to `). First, lines 1 and 2 each perform a initialization,
and we’ll count this as two operations. Then, we have a for loop that
executes a total of ` times. Each iteration of the loop performs a
comparison in line 4, an assignment in either line 5 or line 8, and
an increment in either line 6 or line 9. The loop index k also needs

1.5

MergeSort: The Analysis

19

to get incremented each loop iteration. This means that 4 primitive
operations are performed for each of the ` iterations of the loop.14
Totaling up, we conclude that the Merge subroutine performs at most
4` + 2 operations to merge two sorted arrays of length `/2 each. Let
me abuse our friendship further with a true but sloppy inequality that
will make our lives easier: for ` 1, 4` + 2  6`. That is, 6` is also
a valid upper bound on the number of operations performed by the
Merge subroutine.
Lemma 1.1 (Running Time of Merge) For every pair of sorted
input arrays C, D of length `/2, the Merge subroutine performs at
most 6` operations.
On Lemmas, Theorems, and the Like
In mathematical writing, the most important technical statements are labeled theorems. A lemma is a
technical statement that assists with the proof of a
theorem (much as Merge assists with the implementation of MergeSort). A corollary is a statement that
follows immediately from an already-proved result,
such as a special case of a theorem. We use the term
proposition for stand-alone technical statements that
are not particularly important in their own right.

1.5.2

Running Time of MergeSort

How can we go from the straightforward analysis of the Merge subroutine to an analysis of MergeSort, a recursive algorithm that spawns
further invocations of itself? Especially terrifying is the rapid proliferation of recursive calls, the number of which is blowing up exponentially
with the depth of the recursion. The one thing we have going for us
is the fact that every recursive call is passed an input substantially
smaller than the one we started with. There’s a tension between two
14

One could quibble with the choice of 4. Does comparing the loop index k to
its upper bound also count as an additional operation each iteration, for a total
of 5? Section 1.6 explains why such differences in accounting don’t really matter.
So let’s agree, among friends, that it’s 4 primitive operations per iteration.

20

Introduction

competing forces: on the one hand, the explosion of different subproblems that need to be solved; and on the other, the ever-shrinking
inputs for which these subproblems are responsible. Reconciling these
two forces will drive our analysis of MergeSort. In the end, we’ll
prove the following concrete and useful upper bound on the number
of operations performed by MergeSort (across all its recursive calls).
Theorem 1.2 (Running Time of MergeSort) For every input array of length n 1, the MergeSort algorithm performs at most
6n log2 n + 6n
operations, where log2 denotes the base-2 logarithm.
On Logarithms
Some students are unnecessarily frightened by the
appearance of a logarithm, which is actually a very
down-to-earth concept. For a positive integer n, log2 n
just means the following: type n into a calculator,
and count the number of times you need to divide it
by 2 before the result is 1 or less.a For example, it
takes five divide-by-twos to bring 32 down to 1, so
log2 32 = 5. Ten divide-by-twos bring 1024 down to 1,
so log2 1024 = 10. These examples make it intuitively
clear that log2 n is much less than n (compare 10 vs.
1024), especially as n grows large. A plot confirms
this intuition (Figure 1.4).
a
To be pedantic, log2 n is not an integer if n is not a power
of 2, and what we have described is really log2 n rounded up
to the nearest integer. We can ignore this minor distinction.

Theorem 1.2 is a win for the MergeSort algorithm and showcases
the benefits of the divide-and-conquer algorithm design paradigm. We
mentioned that the running times of simpler sorting algorithms, like
SelectionSort, InsertionSort, and BubbleSort, depend quadratically on the input size n, meaning that the number of operations
required scales as a constant times n2 . In Theorem 1.2, one of these

1.5

21

MergeSort: The Analysis
40
f(n)=n
f(n)=log n

35

30

f(n)

25

20

15

10

5

0

0

5

10

15

20

25

30

35

40

n

Figure 1.4: The logarithm function grows much more slowly than the
identity function. The base of the logarithm is 2; other bases lead to
qualitatively similar pictures.

factors of n is replaced by log2 n. As suggested by Figure 1.4, this
means that MergeSort typically runs much faster than the simpler
sorting algorithms, especially as n grows large.15
1.5.3

Proof of Theorem 1.2

We now do a full running time analysis of MergeSort, thereby substantiating the claim that a recursive divide-and-conquer approach results
in a faster sorting algorithm than more straightforward methods. For
simplicity, we assume that the input array length n is a power of 2.
This assumption can be removed with minor additional work.
The plan for proving the running time bound in Theorem 1.2 is to
use a recursion tree; see Figure 1.5.16 The idea of the recursion tree
method is to write out all the work done by a recursive algorithm in a
tree structure, with nodes of the tree corresponding to recursive calls,
and the children of a node corresponding to the recursive calls made
15
16

See Section 1.6.3 for further discussion of this point.
For some reason, computer scientists seem to think that trees grow downward.

22

Introduction

by that node. This tree structure provides us with a principled way to
tally up all the work done by MergeSort across all its recursive calls.
entire input

level 0
(outermost call)
level 1
(first recursive
calls)

left half

right half

level 2
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

leaves (single-element arrays)

Figure 1.5: A recursion tree for MergeSort. Nodes correspond to recursive
calls. Level 0 corresponds to the outermost call to MergeSort, level 1 to its
recursive calls, and so on.

The root of the recursion tree corresponds to the outermost call
to MergeSort, where the input is the original input array. We’ll call
this level 0 of the tree. Since each invocation of MergeSort spawns
two recursive calls, the tree will be binary (that is, with two children
per node). Level 1 of the tree has two nodes, corresponding to the
two recursive calls made by the outermost call, one for the left half
of the input array and one for the right half. Each of the level-1
recursive calls will itself make two recursive calls, each operating on a
particular quarter of the original input array. This process continues
until eventually the recursion bottoms out with arrays of size 0 or 1
(the base cases).
Quiz 1.1
Roughly how many levels does this recursion tree have, as a
function of the length n of the input array?

1.5

MergeSort: The Analysis

23

a) A constant number (independent of n)
b) log2 n
p
c) n
d) n
(See Section 1.5.4 for the solution and discussion.)
This recursion tree suggests a particularly convenient way to
account for the work done by MergeSort, which is level by level. To
implement this idea, we need to understand two things: the number
of distinct subproblems at a given recursion level j, and the length of
the input to each of these subproblems.
Quiz 1.2
What is the pattern? Fill in the blanks in the following
statement: at each level j = 0, 1, 2, . . . of the recursion tree,
there are [blank] subproblems, each operating on a subarray
of length [blank].
a) 2j and 2j , respectively
b) n/2j and n/2j , respectively
c) 2j and n/2j , respectively
d) n/2j and 2j , respectively
(See Section 1.5.4 for the solution and discussion.)
Let’s now put this pattern to use and tally all the operations that
MergeSort performs. We proceed level by level, so fix a level j of
the recursion tree. How much work is done by the level-j recursive
calls, not counting the work done by their recursive calls at later
levels? Inspecting the MergeSort code, we see that it does only three
things: make two recursive calls and invoke the Merge subroutine on
the results. Thus ignoring the work done by later recursive calls, the
work done by a level-j subproblem is just the work done by Merge.

24

Introduction

This we already understand from Lemma 1.1: at most 6` operations,
where ` is the length of the input array to this subproblem.
To put everything together, we can express the total work done
by level-j recursive calls (not counting later recursive calls) as
# of level-j subproblems ⇥ work per level-j subproblem .
|
{z
} |
{z
}
=2j

=6n/2j

Using the solution to Quiz 1.2, we know that the first term equals
2j , and the input length to each such subproblem is n/2j . Taking
` = n/2j , Lemma 1.1 implies that each level-j subproblem performs
at most 6n/2j operations. We conclude that at most
2j ·

6n
= 6n
2j

operations are performed across all the recursive calls at the jth
recursion level.
Remarkably, our bound on the work done at a given level j is
independent of j! That is, each level of the recursion tree contributes
the same number of operations to the analysis. The reason for this is
a perfect equilibrium between two competing forces—the number of
subproblems doubles every level, while the amount of work performed
per subproblem halves every level.
We’re interested in the number of operations performed across all
levels of the recursion tree. By the solution to Quiz 1.1, the recursion
tree has log2 n + 1 levels (levels 0 through log2 n, inclusive). Using
our bound of 6n operations per level, we can bound the total number
of operations by
number
per level  6n log2 n + 6n,
|
{zof levels} ⇥ work
|
{z
}
=log2 n+1

6n

matching the bound claimed in Theorem 1.2. QE D 17

17

“Q.e.d.” is an abbreviation for quod erat demonstrandum, and means “that
which was to be demonstrated.” In mathematical writing, it is used at the end of
a proof to mark its completion.

1.5

MergeSort: The Analysis

25

On Primitive Operations
We measure the running time of an algorithm like
MergeSort in terms of the number of “primitive operations” performed. Intuitively, a primitive operation
performs a simple task (like adding, comparing, or
copying) while touching a small number of simple
variables (like 32-bit integers).18 Warning: in some
high-level programming languages, a single line of
code can mask a large number of primitive operations.
For example, a line of code that touches every element
of a long array translates to a number of primitive
operations proportional to the array’s length.

1.5.4

Solutions to Quizzes 1.1–1.2
Solution to Quiz 1.1

Correct answer: (b). The correct answer is ⇡ log2 n. The reason
is that the input size decreases by a factor of two with each level of
the recursion. If the input length in level 0 is n, the level-1 recursive
calls operate on arrays of length n/2, the level-2 recursive calls on
arrays of length n/4, and so on. The recursion bottoms out at the
base cases, with input arrays of length at most one, where there are
no more recursive calls. How many levels of recursion are required?
The number of times you need to divide n by 2 before obtaining a
number that is at most 1. For n a power of 2, this is precisely the
definition of log2 n. (More generally, it is log2 n rounded up to the
nearest integer.)
Solution to Quiz 1.2
Correct answer: (c). The correct answer is that there are 2j distinct
subproblems at recursion level j, and each operates on a subarray
of length n/2j . For the first point, start with level 0, where there is
one recursive call. There are two recursive calls as level 1, and more
18

More precise definitions are possible, but we won’t need them.

26

Introduction

generally, since MergeSort calls itself twice, the number of recursive
calls at each level is double the number at the previous level. This
successive doubling implies that there are 2j subproblems at each
level j of the recursion tree. Similarly, since every recursive call gets
only half the input of the previous one, after j levels of recursion the
input length has dropped to n/2j . Or for a different argument, we
already know that there are 2j subproblems at level j, and the original
input array (of length n) is equally partitioned among these—exactly
n/2j elements per subproblem.

1.6

Guiding Principles for the Analysis of
Algorithms

With our first algorithm analysis under our belt (MergeSort, in Theorem 1.2), it’s the right time to take a step back and make explicit
three assumptions that informed our running time analysis and interpretation of it. We will adopt these three assumptions as guiding
principles for how to reason about algorithms, and use them to define
what we actually mean by a “fast algorithm.”
The goal of these principles is to identify a sweet spot for the
analysis of algorithms, one that balances accuracy with tractability.
Exact running time analysis is possible only for the simplest algorithms;
more generally, compromises are required. On the other hand, we
don’t want to throw out the baby with the bathwater—we still want
our mathematical analysis to have predictive power about whether
an algorithm will be fast or slow in practice. Once we find the right
balance, we’ll be able to prove good running time guarantees for
dozens of fundamental algorithms, and these guarantees will paint an
accurate picture of which algorithms tend to run faster than others.
1.6.1

Principle #1: Worst-Case Analysis

Our running time bound of 6n log2 n + 6n in Theorem 1.2 holds for
every input array of length n, no matter what its contents. We made
no assumptions about the input beyond its length n. Hypothetically,
if there was an adversary whose sole purpose in life was to concoct a
malevolent input designed to make MergeSort run as slow as possible,
the 6n log2 n + 6n bound would still apply. This type of analysis is

1.6

Guiding Principles for the Analysis of Algorithms

27

called worst-case analysis, since it gives a running time bound that is
valid even for the “worst” inputs.
Given how naturally worst-case analysis fell out of our analysis
of MergeSort, you might well wonder what else we could do. One
alternative approach is “average-case analysis,” which analyzes the
average running time of an algorithm under some assumption about
the relative frequencies of different inputs. For example, in the sorting
problem, we could assume that all input arrays are equally likely and
then study the average running time of different sorting algorithms. A
second alternative is to look only at the performance of an algorithm
on a small collection of “benchmark instances” that are thought to be
representative of “typical” or “real-world” inputs.
Both average-case analysis and the analysis of benchmark instances
can be useful when you have domain knowledge about your problem,
and some understanding of which inputs are more representative
than others. Worst-case analysis, in which you make absolutely no
assumptions about the input, is particularly appropriate for generalpurpose subroutines designed to work well across a range of application
domains. To be useful to as many people as possible, these books focus
on such general-purpose subroutines and, accordingly, use worst-case
analysis to judge algorithm performance.
As a bonus, worst-case analysis is usually much more tractable
mathematically than its alternatives. This is one reason why worstcase analysis naturally popped out of our MergeSort analysis, even
though we had no a priori focus on worst-case inputs.
1.6.2

Principle #2: Big-Picture Analysis

The second and third guiding principles are closely related. Let’s
call the second one big-picture analysis (warning: this is not a standard term). This principle states that we should not worry unduly
about small constant factors or lower-order terms in running time
bounds. We’ve already seen this philosophy at work in our analysis of
MergeSort: when analyzing the running time of the Merge subroutine
(Lemma 1.1), we first proved an upper bound of 4` + 2 on the number
of operations (where ` is the length of the output array) and then
settled for the simpler upper bound of 6`, even though it suffers from
a larger constant factor. How do we justify being so fast and loose
with constant factors?

28

Introduction

Mathematical tractability. The first reason for big-picture analysis is that it’s way easier mathematically than the alternative of
pinning down precise constant factors or lower-order terms. This
point was already evident in our analysis of the running time of
MergeSort.
Constants depend on environment-specific factors. The second justification is less obvious but extremely important. At the level
of granularity we’ll use to describe algorithms, as with the MergeSort
algorithm, it would be totally misguided to obsess over exactly what
the constant factors are. For example, during our analysis of the Merge
subroutine, there was ambiguity about exactly how many “primitive
operations” are performed each loop iteration (4, 5, or something
else?). Thus different interpretations of the same pseudocode can
lead to different constant factors. The ambiguity only increases once
pseudocode gets translated into a concrete implementation in some
high-level programming language, and then translated further into
machine code—the constant factors will inevitably depend on the programming language used, the specific implementation, and the details
of the compiler and processor. Our goal is to focus on properties of
algorithms that transcend the details of the programming language
and machine architecture, and these properties should be independent
of small constant-factor changes in a running time bound.
Lose little predictive power. The third justification is simply
that we’re going to be able to get away with it. You might be
concerned that ignoring constant factors would lead us astray, tricking
us into thinking that an algorithm is fast when it is actually slow in
practice, or vice versa. Happily, this won’t happen for the algorithms
discussed in these books.19 Even though we won’t be keeping track
of lower-order terms and constant factors, the qualitative predictions
of our mathematical analysis will be highly accurate—when analysis
suggests that an algorithm should be fast, it will in fact be fast in
practice, and conversely. So while big-picture analysis does discard
some information, it preserves what we really care about: accurate
guidance about which algorithms tend to be faster than others.20
19

With one possible exception, the deterministic linear-time selection algorithm
in the optional Section 6.3.
20
It’s still useful to have a general sense of the relevant constant factors,
however. For example, in the highly tuned versions of MergeSort that you’ll

1.6

Guiding Principles for the Analysis of Algorithms

1.6.3

29

Principle #3: Asymptotic Analysis

Our third and final guiding principle is to use asymptotic analysis
and focus on the rate of growth of an algorithm’s running time, as
the input size n grows large. This bias toward large inputs was
already evident when we interpreted our running time bound for
MergeSort (Theorem 1.2), of 6n log2 n + 6n operations. We then
cavalierly declared that MergeSort is “better than” simpler sorting
methods with running time quadratic in the input size, such as
InsertionSort. But is this really true?
For concreteness, suppose we have a sorting algorithm that performs at most 12 n2 operations when sorting an array of length n, and
consider the comparison
6n log2 n + 6n vs.

1 2
n .
2

Looking at the behavior of these two functions in Figure 1.6(a), we
see that 12 n2 is the smaller expression when n is small (at most 90
or so), while 6n log2 n + 6n is smaller for all larger n. So when we
say that MergeSort is faster than simpler sorting methods, what we
really mean is that it is faster on sufficiently large instances.
Why should we care more about large instances than small ones?
Because large problems are the only ones that require algorithmic
ingenuity. Almost any sorting method you can think of would sort an
array of length 1000 instantaneously on a modern computer—there’s
no need to learn about divide-and-conquer algorithms.
Given that computers are constantly getting faster, you might
wonder if all computational problems will eventually become trivial to
solve. In fact, the faster computers get, the more relevant asymptotic
analysis becomes. Our computational ambitions have always grown
with our computational power, so as time goes on, we will consider
larger and larger problem sizes. And the gulf in performance between
algorithms with different asymptotic running times only becomes
wider as inputs grow larger. For example, Figure 1.6(b) shows the
difference between the functions 6n log2 n + 6n and 12 n2 for larger (but
still modest) values of n, and by the time n = 1500 there is roughly a
find in many programming libraries, the algorithm switches from MergeSort over
to InsertionSort (for its better constant factor) once the input array length
becomes small (for example, at most seven elements).

30

Introduction

12000

12

#10 5
f(n)=n2/2
f(n)=6n log n + 6n

10000

10

8000

8

f(n)

f(n)

f(n)=n2/2
f(n)=6n log n + 6n

6000

6

4000

4

2000

2

0

0

50

100

n

150

0

0

500

1000

1500

n

(a) Small values of n

(b) Medium values of n

Figure 1.6: The function 12 n2 grows much more quickly than 6n log2 n+6n
as n grows large. The scales of the x- and y-axes in (b) are one and two
orders of magnitude, respectively, bigger than those in (a).

factor-10 difference between them. If we scaled n up by another factor
of 10, or 100, or 1000 to start reaching interesting problem sizes, the
difference between the two functions would be huge.
For a different way to think about asymptotic analysis, suppose
you have a fixed time budget, like an hour or a day. How does the
solvable problem size scale with additional computing power? With an
algorithm that runs in time proportional to the input size, a four-fold
increase in computing power lets you solve problems four times as
large as before. With an algorithm that runs in time proportional to
the square of the input size, you would be able to solve problems that
are only twice as large as before.
1.6.4

What Is a “Fast” Algorithm?

Our three guiding principles lead us to the following definition of a
“fast algorithm:”
A “fast algorithm” is an algorithm whose worst-case
running time grows slowly with the input size.
Our first guiding principle, that we want running time guarantees
that do not assume any domain knowledge, is the reason why we
focus on the worst-case running time of an algorithm. Our second
and third guiding principles, that constant factors are language- and

1.6

Guiding Principles for the Analysis of Algorithms

31

machine-dependent and that large problems are the interesting ones,
are the reasons why we focus on the rate of growth of the running
time of an algorithm.
What do we mean that the running time of an algorithm “grows
slowly?” For almost all of problems we’ll discuss, the holy grail
is a linear-time algorithm, meaning an algorithm with running time
proportional to the input size. Linear time is even better than our
bound on the running time of MergeSort, which is proportional to
n log n and hence modestly super-linear. We will succeed in designing
linear-time algorithms for some problems but not for others. In any
case, it is the best-case scenario to which we will aspire.
For-Free Primitives
We can think of an algorithm with linear or nearlinear running time as a primitive that we can use
essentially “for free,” since the amount of computation
used is barely more than what is required just to read
the input. Sorting is a canonical example of a for-free
primitive, and we will also learn several others. When
you have a primitive relevant for your problem that
is so blazingly fast, why not use it? For example, you
can always sort your data in a preprocessing step,
even if you’re not quite sure how it’s going to be
helpful later. One of the goals of this book series is to
stock your algorithmic toolbox with as many for-free
primitives as possible, ready to be applied at will.

The Upshot
P An algorithm is a set of well-defined rules for
solving some computational problem.
P The number of primitive operations performed
by the algorithm you learned in grade school
to multiply two n-digit integers scales as a
quadratic function of the number n.

32

Introduction

P Karatsuba multiplication is a recursive algorithm for integer multiplication, and it uses
Gauss’s trick to save one recursive call over a
more straightforward recursive algorithm.
P Seasoned programmers and computer scientists
generally think and communicate about algorithms using high-level descriptions rather than
detailed implementations.
P The MergeSort algorithm is a “divide-andconquer” algorithm that splits the input array
into two halves, recursively sorts each half, and
combines the results using the Merge subroutine.
P Ignoring constant factors and lower-order
terms, the number of operations performed by
MergeSort to sort n elements grows like the
function n log2 n. The analysis uses a recursion
tree to conveniently organize the work done by
all the recursive calls.
P Because the function log2 n grows slowly with n,
MergeSort is typically faster than simpler sorting algorithms, which all require a quadratic
number of operations. For large n, the improvement is dramatic.
P Three guiding principles for the analysis of algorithms are: (i) worst-case analysis, to promote
general-purpose algorithms that work well with
no assumptions about the input; (ii) big-picture
analysis, which balances predictive power with
mathematical tractability by ignoring constant
factors and lower-order terms; and (iii) asymptotic analysis, which is a bias toward large inputs, which are the inputs that require algorithmic ingenuity.

33

Problems

P A “fast algorithm” is an algorithm whose worstcase running time grows slowly with the input
size.
P A “for-free primitive” is an algorithm that runs
in linear or near-linear time, barely more than
what is required to read the input.

Test Your Understanding
Problem 1.1 Suppose we run MergeSort on the following input
array:

5 3 8 9 1 7 0 2 6 4
Fast forward to the moment after the two outermost recursive calls
complete, but before the final Merge step. Thinking of the two
5-element output arrays of the recursive calls as a glued-together
10-element array, which number is in the 7th position?
Problem 1.2 Consider the following modification to the MergeSort
algorithm: divide the input array into thirds (rather than halves),
recursively sort each third, and finally combine the results using
a three-way Merge subroutine. What is the running time of this
algorithm as a function of the length n of the input array, ignoring
constant factors and lower-order terms? [Hint: Note that the Merge
subroutine can still be implemented so that the number of operations
is only linear in the sum of the input array lengths.]
a) n
b) n log n
c) n(log n)2
d) n2 log n
Problem 1.3 Suppose you are given k sorted arrays, each with n
elements, and you want to combine them into a single array of kn

34

Introduction

elements. One approach is to use the Merge subroutine from Section 1.4.5 repeatedly, first merging the first two arrays, then merging
the result with the third array, then with the fourth array, and so on
until you merge in the kth and final input array. What is the running
time taken by this successive merging algorithm, as a function of k
and n, ignoring constant factors and lower-order terms?
a) n log k
b) nk
c) nk log k
d) nk log n
e) nk 2
f) n2 k
Problem 1.4 Consider again the problem of merging k sorted lengthn arrays into a single sorted length-kn array. Consider the algorithm
that first divides the k arrays into k/2 pairs of arrays, and uses the
Merge subroutine to combine each pair, resulting in k/2 sorted length2n arrays. The algorithm repeats this step until there is only one
length-kn sorted array. What is the running time of this procedure,
as a function of k and n, ignoring constant factors and lower-order
terms?
a) n log k
b) nk
c) nk log k
d) nk log n
e) nk 2
f) n2 k

35

Problems

Challenge Problems
Problem 1.5 You are given as input an unsorted array of n distinct
numbers, where n is a power of 2. Give an algorithm that identifies the
second-largest number in the array, and that uses at most n+log2 n 2
comparisons. [Hint: What information do you have left over after
computing the largest number?]
Programming Problems
Problem 1.6 Implement Karatsuba’s integer multiplication algorithm in your favorite programming language.21 To get the most out
of this problem, your program should invoke the language’s multiplication operator only on pairs of single-digit numbers.
For a concrete challenge, what’s the product of the following two
64-digit numbers?22
3141592653589793238462643383279502884197169399375105820974944592
2718281828459045235360287471352662497757247093699959574966967627

21

Food for thought: does it make your life easier if the number of digits of each
integer is a power of 2?
22
If you need help or want to compare notes with other readers, visit the
discussion forums at www.algorithmsilluminated.org.

Chapter 2

Asymptotic Notation

This chapter develops the mathematical formalism that encodes our
guiding principles for the analysis of algorithms (Section 1.6). The
goal is to identify a sweet spot of granularity for reasoning about
algorithms—we want to suppress second-order details like constant
factors and lower-order terms, and focus on how the running time
of an algorithm scales as the input size grows. This is done formally
through big-O notation and its relatives—concepts that belong in the
vocabulary of every serious programmer and computer scientist.

2.1

The Gist

Before getting into the mathematical formalism of asymptotic notation,
let’s make sure the topic is well motivated, that you have a strong
sense of what it’s trying to accomplish, and that you’ve seen a couple
of simple and intuitive examples.
2.1.1

Motivation

Asymptotic notation provides the basic vocabulary for discussing the
design and analysis of algorithms. It’s important that you know what
programmers mean when they say that one piece of code runs in
“big-O of n time,” while another runs in “big-O of n-squared time.”
This vocabulary is so ubiquitous because it identifies the right
“sweet spot” for reasoning about algorithms. Asymptotic notation is
coarse enough to suppress all the details you want to ignore—details
that depend on the choice of architecture, the choice of programming
language, the choice of compiler, and so on. On the other hand, it’s
precise enough to make useful comparisons between different highlevel algorithmic approaches to solving a problem, especially on larger
inputs (the inputs that require algorithmic ingenuity). For example,
36

2.1

37

The Gist

asymptotic analysis helps us differentiate between better and worse
approaches to sorting, better and worse approaches to multiplying
two integers, and so on.
2.1.2

The High-Level Idea

If you ask a practicing programmer to explain the point of asymptotic
notation, he or she is likely to say something like the following:
Asymptotic Notation in Seven Words
suppress constant factors and lower-order
|
{z terms}
|
{z
}
too system-dependent

irrelevant for large inputs

We’ll see that there’s more to asymptotic notation than just these
seven words, but ten years from now, if you remember only seven
words about it, these are good ones.
When analyzing the running time of an algorithm, why would
we want to throw away information like constant factors and lowerorder terms? Lower-order terms, by definition, become increasingly
irrelevant as you focus on large inputs, which are the inputs that
require algorithmic ingenuity. Meanwhile, the constant factors are
generally highly dependent on the details of the environment. If we
don’t want to commit to a specific programming language, architecture,
or compiler when analyzing an algorithm, it makes sense to use a
formalism that does not focus on constant factors.
For example, remember when we analyzed MergeSort (Section 1.4)? We gave an upper bound on its running time of
6n log2 n + 6n
primitive operations, where n is the length of the input array. The
lower-order term here is the 6n, as n grows more slowly than n log2 n,
so it will be suppressed in asymptotic notation. The leading constant
factor of 6 also gets suppressed, leaving us with the much simpler
expression of n log n. We would then say that the running time of
MergeSort is “big-O of n log n,” written O(n log n), or that MergeSort

38

Asymptotic Notation

is an “O(n log n)-time algorithm.” 1 Intuitively, saying that something
is O(f (n)) for a function f (n) means that f (n) is what you’re left with
after suppressing constant factors and lower-order terms.2 This “big-O
notation” buckets algorithms into groups according to their asymptotic
worst-case running times—the linear (O(n))-time algorithms, the
O(n log n)-time algorithms, the quadratic (O(n2 ))-time algorithms,
the constant (O(1))-time algorithms, and so on.
To be clear, I’m certainly not claiming that constant factors
never matter in algorithm design. Rather, when you want to make a
comparison between fundamentally different ways of solving a problem,
asymptotic analysis is often the right tool for understanding which
one is going to perform better, especially on reasonably large inputs.
Once you’ve figured out the best high-level algorithmic approach
to a problem, you might well want to work harder to improve the
leading constant factor, and perhaps even the lower-order terms. By
all means, if the future of your start-up depends on how efficiently
you implement a particular piece of code, have at it and make it as
fast as you can.

2.1.3

Four Examples

We conclude this section with four very simple examples. They are
so simple that, if you have any prior experience with big-O notation,
you should probably just skip straight to Section 2.2 to start learning
the mathematical formalism. But if these concepts are completely
new to you, these simple examples should get you properly oriented.
Consider first the problem of searching an array for a given integer t.
Let’s analyze the straightforward algorithm that performs a linear
scan through the array, checking each entry to see if it is the desired
integer t.

1
When ignoring constant factors, we don’t even need to specify the base of
the logarithm (as different logarithmic functions differ only by a constant factor).
See Section 4.2.2 for further discussion.
2
For example, even the function 10100 · n is technically O(n). In these books,
we will only study running time bounds where the suppressed constant factor is
reasonably small.

2.1

39

The Gist

Searching One Array
Input: array A of n integers, and an integer t.
Output: Whether or not A contains t.
for i := 1 to n do
if A[i] = t then
return TRUE
return FALSE
This code just checks each array entry in turn. If it ever finds the
integer t it returns true, and if it falls off the end of the array without
finding t it returns false.
We haven’t formally defined what big-O notation means yet, but
from our intuitive discussion so far you might be able to guess the
asymptotic running time of the code above.
Quiz 2.1
What is the asymptotic running time of the code above for
searching one array, as a function of the array length n?
a) O(1)
b) O(log n)
c) O(n)
d) O(n2 )
(See Section 2.1.4 for the solution and discussion.)
Our last three examples concern different ways of combining two
loops. First, let’s think about one loop followed by another. Suppose
we’re now given two integer arrays A and B, both of length n, and
we want to know whether a target integer t is in either one. Let’s
again consider the straightforward algorithm, where we just search
through A, and if we fail to find t in A, we then search through B. If
we don’t find t in B either, we return false.

40

Asymptotic Notation

Searching Two Arrays
Input: arrays A and B of n integers each, and an
integer t.
Output: Whether or not A or B contains t.
for i := 1 to n do
if A[i] = t then
return TRUE
for i := 1 to n do
if B[i] = t then
return TRUE
return FALSE
What, in big-O notation, is the running time of this longer piece of
code?
Quiz 2.2
What is the asymptotic running time of the code above for
searching two arrays, as a function of the array lengths n?
a) O(1)
b) O(log n)
c) O(n)
d) O(n2 )
(See Section 2.1.4 for the solution and discussion.)
Next let’s look at a more interesting example of two loops that are
nested, rather than in sequence. Suppose we want to check whether
or not two given arrays of length n have a number in common. The
simplest solution is to check all possibilities. That is, for each index i
into the array A and each index j into the array B, we check if A[i]
is the same number as B[j]. If it is, we return true. If we exhaust all
the possibilities without ever finding equal elements, we can safely
return false.

2.1

41

The Gist

Checking for a Common Element
Input: arrays A and B of n integers each.
Output: Whether or not there is an integer t contained
in both A and B.
for i := 1 to n do
for j := 1 to n do
if A[i] = B[j] then
return TRUE
return FALSE

The question is the usual one: in big-O notation, what is the running
time of this piece of code?
Quiz 2.3
What is the asymptotic running time of the code above for
checking for a common element, as a function of the array
lengths n?
a) O(1)
b) O(log n)
c) O(n)
d) O(n2 )
(See Section 2.1.4 for the solution and discussion.)

Our final example again involves nested loops, but this time
we’re looking for duplicate entries in a single array A, rather than
in two different arrays. Here’s the piece of code we’re going to
analyze.

42

Asymptotic Notation

Checking for Duplicates
Input: array A of n integers.
Output: Whether or not A contains an integer more
than once.
for i := 1 to n do
for j := i + 1 to n do
if A[i] = A[j] then
return TRUE
return FALSE
There are two small differences between this piece of code and the
previous one. The first and more obvious change is that we’re comparing the ith element of A to the jth element of A, rather than to
the jth element of some other array B. The second and more subtle
change is that the inner loop now begins at the index i + 1 rather than
the index 1. Why not start at 1, like before? Because then it would
also return true in the very first iteration (since clearly A[1] = A[1]),
whether or not the array has any duplicate entries! Correctness could
be salvaged by skipping all the iterations where i and j are equal, but
this would still be wasteful: each pair of elements A[h] and A[k] of A
would be compared to each other twice (once when i = h and j = k
and once when i = k and j = h), while the code above compares
them only once.
The question is the usual one: in big-O notation, what is the
running time of this piece of code?
Quiz 2.4
What is the asymptotic running time of the code above for
checking for duplicates, as a function of the array length n?
a) O(1)
b) O(log n)
c) O(n)

2.1

43

The Gist

d) O(n2 )
(See Section 2.1.4 for the solution and discussion.)
These basic examples should have given you a strong intuitive sense
of how big-O notation is defined and what it is trying to accomplish.
Next we move on to both the mathematical development of asymptotic
notation and some more interesting algorithms.
2.1.4

Solutions to Quizzes 2.1–2.4
Solution to Quiz 2.1

Correct answer: (c). The correct answer is O(n). Equivalently, we
say that the algorithm has running time linear in n. Why is that
true? The exact number of operations performed depends on the
input—whether or not the target t is contained in the array A and,
if so, where in the array it lies. In the worst case, when t is not in
the array, the code will do an unsuccessful search, scanning through
the entire array (over n loop iterations) and returning false. The key
observation is that the code performs a constant number of operations
for each entry of the array (comparing A[i] with t, incrementing the
loop index i, etc.). Here “constant” means some number independent
of n, like 2 or 3. We could argue about exactly what this constant is
in the code above, but whatever it is, it is conveniently suppressed
in the big-O notation. Similarly, the code does a constant number
of operations before the loop begins and after it ends, and whatever
the exact constant may be, it constitutes a lower-order term that is
suppressed in the big-O notation. Since ignoring constant factors and
lower-order terms leaves us with a bound of n on the total number of
operations, the asymptotic running time of this code is O(n).
Solution to Quiz 2.2
Correct answer: (c). The answer is the same as before, O(n). The
reason is that the worst-case number of operations performed (in an
unsuccessful search) is twice that of the previous piece of code—first
we search the first array, and then the second array. This extra factor
of 2 contributes only to the leading constant in the running time

44

Asymptotic Notation

bound and is therefore suppressed when we use big-O notation. So
this algorithm, like the previous one, is a linear-time algorithm.
Solution to Quiz 2.3
Correct answer: (d). This time, the answer has changed. For this
piece of code, the running time is not O(n), but is O(n2 ). (“Big-O of
n squared,” also called a “quadratic-time algorithm.”) So with this
algorithm, if you multiply the lengths of the input arrays by 10, the
running time will go up by a factor of 100 (rather than a factor of 10
for a linear-time algorithm).
Why does this code have a running time of O(n2 )? The code
again does a constant number of operations for each loop iteration
(that is, for each choice of the indices i and j) and a constant number
of operations outside the loops. What’s different is that there’s now
a total of n2 iterations of this double for loop—one for each choice
of i 2 {1, 2, . . . , n} and j 2 {1, 2, . . . , n}. In our first example, there
were only n iterations of a single for loop. In our second example,
because the first for loop completed before the second one began, we
had only 2n iterations overall. Here, for each of the n iterations of the
outer for loop, the code performs n iterations of the inner for loop.
This gives n ⇥ n = n2 iterations in all.
Solution to Quiz 2.4
Correct answer: (d). The answer to this question is the same as
the last one, O(n2 ). The running time is again proportional to the
number of iterations of the double for loop (with a constant number
of operations per iteration). So how many iterations are there? The
2
answer is roughly n2 . One way to see this is to remember that this
piece of code does roughly half the work of the previous one (since
the inner for loop starts at j = i + 1 rather than j = 1). A second
way is to observe that there is exactly one iteration for each subset
{i, j} of two distinct indices from {1, 2, . . . , n}, and there are precisely
n(n 1)
n
such subsets.3
2 =
2
3 n
is pronounced “n choose 2,” and is also sometimes referred to as a
2
“binomial coefficient.” See also the solution to Quiz 3.1.

2.2

45

Big-O Notation

2.2

Big-O Notation

This section presents the formal definition of big-O notation. We
begin with a definition in plain English, illustrate it pictorially, and
finally give the mathematical definition.
2.2.1

English Definition

Big-O notation concerns functions T (n) defined on the positive integers
n = 1, 2, . . .. For us, T (n) will almost always denote a bound on the
worst-case running time of an algorithm, as a function of the size n of
the input. What does it mean to say that T (n) = O(f (n)), for some
“canonical” function f (n), like n, n log n, or n2 ? Here’s the definition
in English.
Big-O Notation (English Version)
T (n) = O(f (n)) if and only if T (n) is eventually bounded
above by a constant multiple of f (n).

2.2.2

Pictorial Definition

See Figure 2.1 for a pictorial illustration of the definition of big-O
notation. The x-axis corresponds to the parameter n, the y-axis to
the value of a function. Let T (n) be the function corresponding to
the solid line, and f (n) the lower dashed line. T (n) is not bounded
above by f (n), but multiplying f (n) by 3 results in the upper dashed
line, which does lie above T (n) once we go far enough to the right
on the graph, after the “crossover point” at n0 . Since f (n) is indeed
eventually bounded above by a constant multiple of f (n), we can say
that T (n) = O(f (n)).
2.2.3

Mathematical Definition

Here is the mathematical definition of big-O notation, the definition
you should use in formal proofs.

46

Asymptotic Notation

c

3⋅ f (n)

T (n)
f (n)
n0

n→∞
Figure 2.1: A picture illustrating when T (n) = O(f (n)). The constant c
quantifies the “constant multiple” of f (n), and the constant n0 quantifies
“eventually.”

Big-O Notation (Mathematical Version)
T (n) = O(f (n)) if and only if there exist positive constants
c and n0 such that
T (n)  c · f (n)
for all n

(2.1)

n0 .

This is a direct translation of the English definition in Section 2.2.1.
The inequality in (2.1) expresses that T (n) should be bounded above
by a multiple of f (n) (with the constant c specifying the multiple).
The “for all n n0 ” expresses that the inequality only needs to hold
eventually, once n is sufficiently large (with the constant n0 specifying
how large). For example, in Figure 2.1, the constant c corresponds
to 3, while n0 corresponds to the crossover point between the functions
T (n) and c · f (n).
A game-theoretic view. If you want to prove that T (n) =
O(f (n)), for example to prove that the asymptotic running time

2.3

47

Two Basic Examples

of an algorithm is linear in the input size (corresponding to f (n) = n),
then your task is to choose the constants c and n0 so that (2.1) holds
whenever n n0 . One way to think about this is game-theoretically,
as a contest between you and an opponent. You go first, and have to
commit to constants c and n0 . Your opponent goes second and can
choose any integer n that is at least n0 . You win if (2.1) holds, your
opponent wins if the opposite inequality T (n) > c · f (n) holds.
If T (n) = O(f (n)), then there are constants c and n0 such
that (2.1) holds for all n
n0 , and you have a winning strategy
in this game. Otherwise, no matter how you choose c and n0 , your
opponent can choose a large enough n n0 to flip the inequality and
win the game.
A Word of Caution
When we say that c and n0 are constants, we mean
they cannot depend on n. For example, in Figure 2.1,
c and n0 were fixed numbers (like 3 or 1000), and we
then considered the inequality (2.1) as n grows arbitrarily large (looking rightward on the graph toward
infinity). If you ever find yourself saying “take n0 = n”
or “take c = log2 n” in an alleged big-O proof, you
need to start over with choices of c and n0 that are
independent of n.

2.3

Two Basic Examples

Having slogged through the formal definition of big-O notation, let’s
look at a couple of examples. These examples won’t provide us with
any insights we don’t already have, but they serve as an important
sanity check that big-O notation is achieving its intended goal, of
suppressing constant factors and lower-order terms. They are also a
good warm-up for the less obvious examples we will encounter later.
2.3.1

Degree-k Polynomials are O(nk )

Our first formal claim is that if T (n) is a polynomial with some
degree k, then T (n) = O(nk ).

48

Asymptotic Notation

Proposition 2.1 Suppose
T (n) = ak nk + · · · a1 n + a0 ,
where k 0 is a nonnegative integer and the ai ’s are real numbers
(positive or negative). Then T (n) = O(nk ).
Proposition 2.1 says that with a polynomial, in big-O notation,
all you need to worry about is the highest degree that appears in
the polynomial. Thus, big-O notation really is suppressing constant
factors and lower-order terms.
Proof of Proposition 2.1: To prove this proposition, we need to use the
mathematical definition of big-O notation (Section 2.2.3). To satisfy
the definition, it’s our job to find a pair of positive constants c and
n0 (each independent of n), with c quantifying the constant multiple
of nk and n0 quantifying “sufficiently large n.” To keep things easy to
follow but admittedly mysterious, let’s pull values for these constants
out of a hat: n0 = 1 and c equal to the sum of absolute values of the
coefficients:4
c = |ak | + · · · + |a1 | + |a0 |.
Both of these numbers are independent of n. We now need to show
that these choices of constants satisfy the definition, meaning that
T (n)  cṅk for all n n0 = 1.
To verify this inequality, fix an arbitrary positive integer n n0 =
1. We need a sequence of upper bounds on T (n), culminating in an
upper bound of c · nk . First let’s apply the definition of T (n):
T (n) = ak nk + · · · + a1 n + a0 .
If we take the absolute value of each coefficient ai on the right-hand
side, the expression only becomes larger. (|ai | can only be bigger than
ai , and since ni is positive, |ai |ni can only be bigger than ai ni .) This
means that
T (n)  |ak |nk + · · · + |a1 |n + |a0 |.

Why is this step useful? Now that the coefficients are nonnegative,
we can use a similar trick to turn the different powers of n into a
4

and

Recall that the absolute value |x| of a real number x equals x when x
x when x  0. In particular, |x| is always nonnegative.

0,

2.3

49

Two Basic Examples

common power of n. Since n 1, nk is only bigger than ni for every
i 2 {0, 1, 2, . . . , k}. Since |ai | is nonnegative, |ai |nk is only bigger
than |ai |ni . This means that
T (n)  |ak |nk + · · · + |a1 |nk + |a0 |nk = (|ak | + · · · + |a1 | + |a0 |) ·nk .
|
{z
}
=c

This inequality holds for every n
wanted to prove. QE D

n0 = 1, which is exactly what we

How do you know how to choose the constants c and n0 ? The usual
approach is to reverse engineer them. This involves going through a
derivation like the one above and figuring out on-the-fly the choices
of constants that let you push the proof through. We’ll see some
examples of this method in Section 2.5.
2.3.2

Degree-k Polynomials Are Not O(nk

1)

Our second example is really a non-example: a degree-k polynomial
is O(nk ), but is not generally O(nk 1 ).
Proposition 2.2 Let k 1 be a positive integer and define T (n) =
nk . Then T (n) is not O(nk 1 ).
Proposition 2.2 implies that polynomials with distinct degrees
are distinct with respect to big-O notation. (If this weren’t true,
something would be wrong with our definition of big-O notation!)
Proof of Proposition 2.2: The best way to prove that one function
is not big-O of another is usually with a proof by contradiction. In
this type of proof, you assume the opposite of what you want to
prove, and then build on this assumption with a sequence of logically
correct steps that culminate in a patently false statement. Such a
contradiction implies that the assumption can’t be true, and this
proves the desired statement.
So, assume that nk is in fact O(nk 1 ); we proceed to derive a
contradiction. What does it mean if nk = O(nk 1 )? That nk is
eventually bounded by a constant multiple of nk 1 . That is, there
are positive constants c and n0 such that
nk  c · nk

1

50

Asymptotic Notation

for all n n0 . Since n is a positive number, we can cancel nk
both sides of this inequality to derive

1

from

nc
for all n n0 . This inequality asserts that the constant c is bigger than
every positive integer, a patently false statement (for a counterexample,
take c + 1, rounded up to the nearest integer). This shows that our
original assumption that nk = O(nk 1 ) cannot be correct, and we
can conclude that nk is not O(nk 1 ). QE D

2.4

Big-Omega and Big-Theta Notation

Big-O notation is by far the most important and ubiquitous concept
for discussing the asymptotic running time of algorithms. A couple
of its close relatives, the big-omega and big-theta notations, are also
worth knowing. If big-O is analogous to “less than or equal to (),”
then big-omega and big-theta are analogous to “greater than or equal
to ( ),” and “equal to (=),” respectively. Let’s now treat them a little
more precisely.
2.4.1

Big-Omega Notation

The formal definition of big-omega notation parallels that of big-O
notation. In English, we say that one function T (n) is big-omega of
another function f (n) if and only if T (n) is eventually bounded below
by a constant multiple of f (n). In this case, we write T (n) = ⌦(f (n)).
As before, we use two constants c and n0 to quantify “constant multiple”
and “eventually.”
Big-Omega Notation (Mathematical Version)
T (n) = ⌦(f (n)) if and only if there exist positive constants
c and n0 such that
T (n)
for all n

c · f (n)

n0 .

You can imagine what the corresponding picture looks like:

2.4

51

Big-Omega and Big-Theta Notation

f (n)

T (n)
1
⋅ f (n)
4

n→∞

n0

c

T (n) again corresponds to the function with the solid line. The
function f (n) is the upper dashed line. This function does not bound
T (n) from below, but if we multiply it by the constant c = 14 , the
result (the lower dashed line) does bound T (n) from below for all n
past the crossover point at n0 . Thus T (n) = ⌦(f (n)).
2.4.2

Big-Theta Notation

Big-theta notation, or simply theta notation, is analogous to “equal to.”
Saying that T (n) = ⇥(f (n)) just means that both T (n) = ⌦(f (n))
and T (n) = O(f (n)). Equivalently, T (n) is eventually sandwiched
between two different constant multiples of f (n).5
Big-Theta Notation (Mathematical Version)
T (n) = ⇥(f (n)) if and only if there exist positive constants
c1 , c2 , and n0 such that
c1 · f (n)  T (n)  c2 · f (n)
for all n
5

n0 .

Proving this equivalence amounts to showing that one version of the definition
is satisfied if and only if the other one is. If T (n) = ⇥(f (n)) according to the
second definition, then the constants c2 and n0 prove that T (n) = O(f (n)), while
the constants c1 and n0 prove that T (n) = ⌦(f (n)). In the other direction,
suppose you can prove that T (n) = O(f (n)) using the constants c2 and n00 and
T (n) = ⌦(f (n)) using the constants c1 and n000 . Then T (n) = ⇥(f (n)) in the sense
of the second definition, with constants c1 , c2 , and n0 = max{n00 , n000 }.

52

Asymptotic Notation

A Word of Caution
Algorithm designers often use big-O notation even
when big-theta notation would be more accurate. This
book will follow that tradition. For example, consider
a subroutine that scans an array of length n, performing a constant number of operations per entry (like
the Merge subroutine in Section 1.4.5). The running
time of such a subroutine is obviously ⇥(n), but it’s
common to only mention that it is O(n). This is
because, as algorithm designers, we generally focus
on upper bounds—guarantees about how long our
algorithms could possibly run.
The next quiz checks your understanding of big-O, big-omega, and
big-theta notation.
Quiz 2.5
Let T (n) = 12 n2 + 3n. Which of the following statements
are true? (There might be more than one correct answer.)
a) T (n) = O(n)
b) T (n) = ⌦(n)
c) T (n) = ⇥(n2 )
d) T (n) = O(n3 )
(See Section 2.4.5 for the solution and discussion.)
2.4.3

Little-O Notation

There’s one final piece of asymptotic notation, “little-o notation,”
that you see from time to time. If big-O notation is analogous to
“less than or equal to,” little-o notation is analogous to “strictly less
than.” 6
6
Similarly, there is a “little-omega” notation that corresponds to “strictly
greater than,” but we won’t have occasion to use it. There is no “little-theta”

2.4

53

Big-Omega and Big-Theta Notation

Little-O Notation (Mathematical Version)
T (n) = o(f (n)) if and only if for every positive constant
c > 0, there exists a choice of n0 such that
T (n)  c · f (n)
for all n

(2.2)

n0 .

Proving that one function is big-O of another requires only two
constants c and n0 , chosen once and for all. To prove that one
function is little-o of another, we have to prove something stronger,
that for every constant c, no matter how small, T (n) is eventually
bounded above by the constant multiple c · f (n). Note that the
constant n0 chosen to quantify “eventually” can depend on c (but not
n!), with smaller constants c generally requiring bigger constants n0 .
For example, for every positive integer k, nk 1 = o(nk ).7
2.4.4

Where Does Notation Come From?

Asymptotic notation was not invented by computer scientists—it has
been used in number theory since around the turn of the 20th century.
Donald E. Knuth, the grandfather of the formal analysis of algorithms,
proposed using it as the standard language for discussing rates of
growth, and in particular for algorithm running times.
“On the basis of the issues discussed here, I propose
that members of SIGACT,8 and editors of computer
science and mathematics journals, adopt the O, ⌦, and
⇥ notations as defined above, unless a better alternative
can be found reasonably soon.” 9
notation.
7
Here’s the proof. Fix an arbitrary constant c > 0. In response, choose n0 to
1
be c , rounded up to the nearest integer. Then for all n n0 , n0 · nk 1  nk and
hence nk 1  n10 · nk  c · nk , as required.
8
SIGACT is the special interest group of the ACM (Association for Computing
Machinery) that concerns theoretical computer science, and in particular the
analysis of algorithms.
9
Donald E. Knuth, “Big Omicron and Big Omega and Big Theta,” SIGACT
News, Apr.-June 1976, page 23. Reprinted in Selected Papers on Analysis of
Algorithms (Center for the Study of Language and Information, 2000).

54

Asymptotic Notation

2.4.5

Solution to Quiz 2.5

Correct answers: (b),(c),(d). The final three responses are all
correct, and hopefully the intuition for why is clear. T (n) is a quadratic
function. The linear term 3n doesn’t matter for large n, so we should
expect that T (n) = ⇥(n2 ) (answer (c)). This automatically implies
that T (n) = ⌦(n2 ) and hence T (n) = ⌦(n) also (answer (b)). Note
that ⌦(n) is not a particularly impressive lower bound on T (n),
but it is a legitimate one nonetheless. Similarly, T (n) = ⇥(n2 )
implies that T (n) = O(n2 ) and hence also T (n) = O(n3 ) (answer (d)).
Proving these statements formally boils down to exhibiting appropriate
constants to satisfy the definitions. For example, taking n0 = 1 and
c = 12 proves (b). Taking n0 = 1 and c = 4 proves (d). Combining
these constants (n0 = 1, c1 = 12 , c2 = 4) proves (c). The argument in
the proof of Proposition 2.2 can be used to prove formally that (a) is
not a correct answer.

2.5

Additional Examples

This section is for readers who want additional practice with asymptotic notation. Other readers can skip the three additional examples
here and proceed straight to Chapter 3.
2.5.1

Adding a Constant to an Exponent

First we have another example of a proof that one function is big-O
of another.
Proposition 2.3 If
T (n) = 2n+10 ,
then T (n) = O(2n ).
That is, adding a constant to the exponent of an exponential
function does not change its asymptotic rate of growth.
Proof of Proposition 2.3: To satisfy the mathematical definition of
big-O nota