Skip to content

Madhu: Instructions on Posting


Go to (Come to?) this website: ; and scroll down all the way to the bottom. At the bottom right, you’ll see a bunch of buttons under the header “Meta”. One of these is “Login”. Press the button. Enter the username and password sent to you. You should now be able to add/edit posts.


Algebra and Computation: Welcome

Welcome to Algebra and Computation, the blog partner of Algebra and Computation, the website for the course of the same name at MIT.

Hopefully we’ll blog about things we care about there, here.

Elette Boyle: LLL Lattice Basis Reduction Algorithm


Having just come back from a workshop on lattices, I would have to say my current algorithm of choice is the LLL Lenstra-Lenstra-Lovasz lattice basis reduction algorithm. Given a basis B = (v_1,…,v_n) for a lattice in R^n, the algorithm outputs an new basis for the same lattice whose vectors are shorter and more orthogonal. This is useful for things like trying to find a short vector in a lattice given only a “bad” (ie, skewed, long-vector) basis. Indeed, LLL can be used to solve the Shortest Vector Problem within exponential approximation factors, which is surprisingly nontrivial and useful. What impresses me about this algorithm is that it is very simple and yet has a wide array of applications, by framing (sometimes seemingly unrelated) problems in the language of lattices. Plus, after 30 years it still remains our best tool for many of these problems.

The algorithm loosely works by repeating three steps: (1) computing the Gram Schmidt orthogonalization of the current basis, (2) stepping through the basis vectors v_i and subtracting out the projections onto the Gram Schmidt vectors up to i (rounding to ensure the resulting vector stays in the lattice), and (3) swapping basis vectors occasionally, if a certain condition holds.

An example application of the algorithm is in cryptanalyzing schemes based on knapsack or subset sum. (Yes, this is the cryptographer’s perspective of solving these problems) : ) In the subset sum problem, you are given a modulus N, a huge list of numbers x = (x_1,…,x_n), and a challenge value z such that z = \sum_{i \in I} x_i mod N for some secret subset I of [n]; the problem is to identify the corresponding subset I of values who sum to z. One way to view this is as a lattice problem. Consider the lattice of all vectors y = (y_1,…,y_{n+1}) such that y \cdot (x|-z) = 0: ie, for which \sum_{i=1}^n (x_i)(y_i) = (z)(y_{i+1}). In this language, the desired set I of indices corresponds to a 0-1 vector contained in the lattice, such that its last component is 1. In fact, this is going to be one of the only *short* vectors in this lattice. Thus, one approach to solving the problem is to run the LLL algorithm on a basis of the lattice and use it to find short vectors.

Jing Jian: Ellipsoid algorithm

My favorite algorithm is the ellipsoid algorithm in solving linear program-
ming problems. The algorithm uses the duality theorem to first transform a min-
imization/maximization linear programming problem into a problem of fi nding
a feasible solution for a linear system. Let P = {ax = b} be a system of linear
inequalities (x and b are vectors). Let the point set E = {(x-z)^T D^{-1} (x-z)} be
a generalized ellipsoid containing the polytope defi ned by P centered at point
z. The algorithm is as follows:
We start with E and test to see if z satis es P: if it does, the algorithm ter-
minates; if it does not, we find the hyperplane defi ned by the violated constraint
and nd ellipsoid that contains the half of E on the other side of the hyperplane.
We substitute the smaller ellipsoid for E and repeat the above procedure until
the ellipsoid’s center is a feasible point.
The algorithm terminates in O(n^6). The analysis of the algorithm bases on
upper bounding the size of the original ellipsoid and lower bounding the size of
the nal ellipsoid using the fact that our input is polynomial sized. Combined
with the fact that with each iteration of the loop the size of the ellipsoid shrinks
by a constant ratio.

Eric Price: Edge-connectivity via Network-Coding

I’m not sure what my favorite “surprising” algorithm of all time is, but my favorite algorithm from the last couple years is for single-source edge connectivity via network coding (Ho Yee Cheung, Lap Chi Lau, Kai Man Leung, FOCS 2011).

The nicest, simplest case is for single-source edge connectivity on a DAG. If the source s has degree d, we will compute edge connectivity to all other nodes in O(md^2) time. We associate with each vertex v a subspace S_v of (F_q)^d, for q >> m^2 . At the beginning, S_s it the entire space, and S_v = {0} for v != s. We then visit each vertex in topological order. At each vertex v, we scan through the outgoing edges. For each edge out to a vertex u, we choose a random element x of S_v. We then update S_u to be the the subspace spanned by S_u and x. With probability 1-dn/q, at the end the edge connectivity to each vector v is the dimension of S_v.

Updating each subspace can be done in O(d^2) time, giving total time O(md^2). Correctness follows from network coding analysis: for any vertex t consider the min-cut of (s-t), which has size equal to the edge connectivity R. Then R vectors were sent from the component containing s to the component containing t, so every subspace in the component containing t is in the span of those R vectors; hence the dimension of S_t is at most R.

To show that the dimension is at least R with high probability, note that we can think of each “message” x sent from v to u as a random linear combination of the “messages” received by v, with the source receiving the d elementary unit vectors along dummy edges D_1, …, D_d. Define a variable k_{e, e’} to be the coefficient of the message sent across edge e in the message sent across e’, where the endpoint of e is the same as the start point of e’. Then the ith coefficient of the message sent along each edge e is sum_{paths ending at e and starting at D_i} product_{(e, e’) consecutive on path} k_{e, e’}.

Consider the R disjoint paths from s to t. We can create an R x d matrix M, where row contains the message received by t along one of the paths. We can regard M as a random variable over the variables k. If k_{e, e’} = 1 for (e, e’) along one of the R disjoint paths and 0 otherwise, then M will have full rank. Hence det(M) is a non-zero polynomial. But det(M) has degree at most dn, so the probability that det(M) is zero is at most dn/q by the Schwarz-Zippel lemma. But if M has full rank, then the R received messages are independent so S_t has dimension R as desired.

Jin Hao Wan: Probabilistic LSAP and 2-out graph


I read Walkup’s approximation algorithm for the probabilistic linear sum assignment problem (LSAP) in Michael Steele’s book, Probability Theory and Combinatorial Optimization, this past IAP. The algorithm became one of my favorites instantly.

In LSAP, we have a set of n machines, indexed by i, and a set of n jobs, indexed by j. The cost to do job j on machine i is c_{ij}. Our goal is to find an assignment, or a permutation \pi: [n] \mapsto [n], that solves: A_n = \min_{\pi} \sum_{i=1}^{n} c_{i \pi(i)}. The problem becomes interesting once we add “probabilistic” to it. Now c_{ij}’s are independent random variables with the uniform distribution on [0,1]; and we aim to compute the expected value of A_n.

To approximate E[A_n], we can upper-bound it by the expected cost of any assignment algorithm. If we start out with the greedy algorithm, looking at either one job at a time (locally greedy) or all unassigned jobs at once (globally greedy), we will find out that the expected cost in this case is \Omega(\log{n}). In 1979, however, Walkup came up with an algorithm whose expected cost is 3 + o(1). Thus, E[A_n] is bounded above by a constant, irrespective of n!

Before we go on, it is convenient to view the machines and the jobs as the two partitions of a bipartite graph, and the costs as edges. The algorithm selects a subset of edges, E, based on a probability space defined in terms of c_{ij}’s. It turns out that edges in E have two desirable properties. First, E forms a random 2-out graph. A 2-out graph is a directed bipartite graph where the out-degree of every vertex is 2. Walkup proves that with high probability a random 2-out graph has a perfect matching. This allows the algorithm to reduce the original problem to finding a minimum assignment in a much sparser 2-out graph. Now the second property of E comes into play: edges in E have small costs on average. This implies that the total cost, C, of the assignment returned by the algorithm will be small. Taking the expectation, we find out that E[C] = 3 + o(1).

Note: The proof in Walkup’s paper is non-constructive; so the word “Walkup’s approximation algorithm” in the first sentence of this writeup is misleading. It was Karp, Ronnooy Kan and Vohra who came up with the algorithm in 1995. Since my favorite part of this algorithm (introducing the idea of a 2-out graph) is due to Walkup, I feel justified in using “Walkup’s approximation algorithm.”

Alan Guo: NL = coNL


My favorite surprising algorithm is the one that shows that NL = coNL.

The specific problem solved by the algorithm is the decision problem: Given a graph G and nodes s and t, is t unreachable from s? It’s clear that the complementary problem, deciding if t is reachable from s, is in NL since one can nondeterministically guess the path from s to t, storing only one node at a time. This problem happens to be NL-complete, so if it is shown that it is in coNL (equivalently, the unreachability problem is in NL), then it is coNL-complete, and hence NL=coNL.

The idea is as follows. Suppose in addition to G, s, and t, we know the number k of nodes reachable in G from s. Then an NL machine could do the following: guess all the nodes that are reachable from s (guessing a path for each one), and check that the number of such nodes is k and that t is not among them. This  would suffice to prove that t is not reachable from s. But how to acquire the number k? The idea is to inductively compute k(i), the number of nodes reachable from s in at most i steps, for i=1,…,n, using a similar idea. To determine if a candidate node v is reachable in i steps, we loop through all nodes and guess if they are reachable in i-1 steps, and if so, check if any of them are connected to v. If so, then v is reachable in i steps, otherwise if we already checked k(i-1) candidate nodes, then we know v cannot be reachable in i steps. We repeat for all nodes v. By the end, we will have computed k = k(n).

This algorithm is surprising since the result is quite counterintuitive (at least to me)! It is widely suspected that NP and coNP are not equal, since it seems easy to verify the existence of something but hard to verify the nonexistence of something. On the other hand, this algorithm shows that verifying existence and nonexistence are equally doable in logarithmic space.

Michael Forbes: Generating Random Pre-factored Integers

One neat algorithm: Generating a uniform prime in [n], along with the factorization of p-1, all in expected poly(n) time. (Assumes some primality testing algo.  Given an RP algorithm, some error in the distribution occurs. With AKS, get the exact distribution).

It was first done by Eric Bach (say 1970’s), but then was simplified (but worse running time) by Adam Kalai post 2000.  The main idea is to first generate a random number m in [n], and then repeat until m is prime.  To generate such an m, we can follow the heuristic that we take all primes in [n], and for such a prime p, we return p^e with probability 1/p^e, with suitable normalization so this is a probability distribution.  We then output the product of all such p^e. Thus, a number m will be generated with probability 1/m.  If we keep that number with probability m/n (and otherwise repeat the algorithm), then we have generated m with probability 1/n.  One can show that there is a probability 1/lg n of success (using Merten’s theorem).

However, the issue in the above is that we cannot loop through all primes in [n], as there are too many.  Even if we did, not that many primes would even generate p^e with e>0, so most of that time would be wasted.  Thus, to make the above efficient, we will develop an alternate way to sample from the desired distribution. The following process works: start with n, and select uniformly from [n].  Call the result t.  Then select uniformly from [t], and repeat this process with the new value.  One can show that the resulting distribution has that a number m is output e times with probability 1/m^e (pre-normalization).  Thus, performing this process, and then selecting the primes, will yield the needed distribution to select a uniform number from [n].

For details, the book by Shoup (see for an online coup) has a nice writeup.

Zeyuan Allen Zhu: Disjoint-Set Unions

I think the algorithm that surprised me the most in my lifetime is the disjoint union data structure. Perhaps because I was too young (10 years ago, at age of 13), but definitely that was my first time seeing an amortized complexity bound, and in addition to it, if you only do path compression the complexity seems to be log* but if you add it with union-by-rank it decreases to inverse Ackermann. From a practical perspective, doing path compression is enough… So amazing, isn’t it?

Mohammad Bavarian: Equivalence of Read-Once Branching Programs

I was going to nominate NL=Co-NL  but I thought some people might already comment on that.

For me, one of the very surprising algorithm I saw recently was the randomized algorithm for equivalence of read-once branching programs.

As usual when faced with such a problem, you tell yourself that two once-read branching programs could be almost but a single  input out of 2^n possible inputs could behave the same so to efficiently decide the equivalency looks like an impossible task. The great thing is the read-once property allows you to extend the notion of branching programs to much larger set of inputs, while preserving consistency.  By going over a suitably large finite field. you extend the input range of variables of the branching program, while preserving the fact that the (multilinear) polynomial you get from two branching program is the same iff they are equivalent. The equivalence test then follows from another great algorithm polynomial identity testing.

This whole consistent arithmetization I think is a wonderful and surprising gem in this case.

TB Schardl: Splay Trees

I think one of the more surprising results I’ve seen in algorithms is splay trees.Splay trees are self-adjusting binary search trees that achieve optimal performance bounds for many problems on BST’s simultaneously. Splay trees thus address the problem of storing an ordered set of elements and allowing those elements to be accessed efficiently.

The idea behind splay trees is to always move an element to the root of the tree immediately after it is accessed. The element is moved up the tree using BST tree rotation operations. In particular, this movement operation is performed using a sequence of double tree rotations (so called “zig-zig” or “zig-zag” rotations), possibly followed by one single rotation. Hence, after accessing an element x in the tree, the tree is modified to place x at the root of the tree, regardless of the balance of the resulting tree.

What I find remarkable about splay trees is that this simple strategy of invariably placing the most recently accessed element at the root of the tree allows splay trees to achieve optimal performance (amortized) for a variety of problems. For example, splay trees achieve good balance, meaning that for a sequence S of m accesses on n<m elements, the cost of performing S in a splay tree is O((m+n)log n), which is as good as any balanced tree. Furthermore, splay trees achieve “static optimality,” meaning that they are as good as any fixed tree for performing the sequence S of accesses. Formally, if an item i is accessed p_i*m times in S, then the cost of performing S is O(m + m \sum_{i=1}^n p_i log 1/p_i) in a splay tree, which matches the entropic lower bound for performing S on any static tree. Other performance properties on splay trees can be shown, such as the “working set theorem” and the “static finger theorem,” and all of these results follow from the same amortized analysis of splay trees, using different weight functions on the nodes of the tree.

If you would like more of an explanation as to how splay trees or their analysis works, feel free to ask me.