[Solved]: Greedy algorithm for Set Cover problem - need help with approximation

Problem Detail: I want to approximate how close is the greedy algorithm to the optimal solution for the Set Cover Problem, which I’m sure most of you are familiar with, but just in case, you can visit the link above.
The problem is NP-Hard, and I’m trying to find a bound on how well does the greedy algorithm perform.
I know it looks a lot, but please bare with me. I pretty much did most of the work, I’m just missing that last small piece. Here is the pseudo code: Input: $U$ – set of elements, $F$ – family of sets s.t. $bigcup_{Sin F}S=U$
Output: $C$ – a family of sets; $Csubseteq F$ s.t. $bigcup_{Sin C}S=U$

initially C is empty  while U is not empty do:     choose S from F that maximizes the cover of elements in U     add S to C     subtract S's elements from U  return C

The algorithm is pretty straight forward, and it is easy to see that it is indeed polynomial. This is my attempt to approximate:
Claim 1: In a set $U$ of $m$ objects, that can be covered with $k$ sets, there has to be a set $Ssubseteq U$ whose size is at least $frac{1}{k}m$.
Proof: Trivial. (I decided not to prove it) A corollary is that given the situation described in that claim, the greedy algorithm will choose a set whose size is at least $frac{1}{k}m$. Claim 2: Given a universe $U$, if there exist a cover of size $k$, then after $k$ iterations, the greedy algorithm will cover at least half of the elements, meaning at least $frac{1}{2}n$ elements.
Proof: By claim 1, in the first iteration the algorithm will cover at least $frac{1}{k}n$ elements. Upon entering the second iteration, there are at most $n-nfrac{1}{k}$ elements, and so the greedy algorithm will at least cover additional $frac{1}{k}(n-nfrac{1}{k})$ elements. In general, on the $i$’th iteration, the algorithm will cover $frac{1}{k}(n-nfrac{i-1}{k})$ elements. So after $k$ iterations:
$$sum_{i=1}^{k}frac{1}{k}(n-nfrac{i-1}{k})=sum_{i=0}^{k-1}frac{1}{k}(n-nfrac{i}{k})$$
$$=sum_{i=0}^{k-1}frac{n}{k}-sum_{i=0}^{k-1}frac{ni}{k^2}=n-frac{n}{k^2}sum_{i=0}^{k-1}i$$
$$=n-frac{n}{k^2}(frac{k(k-1)}{2})=n-frac{n}{2}(frac{k-1}{k})geqfrac{1}{2}n$$ OK, now this is where I need help: I know that in the first $k$ iterations, the algorithm picks at least half of the elements. After another $k$ iterations, another half was covered, out of what’s left (meaning another $frac{1}{4}n$). So in general, I know that the bound is $klog n$, I just can’t figure out how to formalize it.
What formula represents this behaviour? $T(ki)=T(frac{1}{2^i}n)$ and solve for $i$? It didn’t work…
What formula or equation should I solve to actually show that the number of iterations is bounded by $klog n$?

Asked By : so.very.tired

Answered By : Yuval Filmus

You have shown that if there is a set cover of size $k$ and there are $n$ elements left uncovered, then the greedy algorithm finds $k$ sets which cover at least $n/2$ elements, and so at most $n/2$ elements are left uncovered. You can prove by induction that if there is a set cover of size $k$ and there are $n$ elements left uncovered, then for each $t$, the first $tk$ sets found by the greedy algorithm cover all but at most $n/2^t$ elements. The idea now is to pick a value of $t$ such that $n/2^t < 1$. Since the number of elements left uncovered by the algorithm is an integer, if the number $m$ of elements left uncovered satisfies $m < 1$, then in fact $m = 0$, i.e., all elements are covered. As a side note, you can actually improve on your reasoning and show that the first $k$ sets cover at least $(1-1/e)n$ elements, and so leave only $n/e$ elements uncovered. This improves the bound from your $klog_2 n$ to the sharp $kln n$, which is known to be tight (unless P=NP).

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/40764

[Solved]: Greedy algorithm for Set Cover problem – need help with approximation

Asked By : so.very.tired

Answered By : Yuval Filmus

Best Answer from StackOverflow

Related