Problem Detail: Suppose we’re receiving numbers in a stream. After each number is received, a weighted sum of the last $N$ numbers needs to be calculated, where the weights are always the same, but arbitrary. How efficiently can this done if we are allowed to keep a data structure to help with the computation? Can we do any better than $Theta(N)$, i.e. recomputing the sum each time a number is received? For example: Suppose the weights are $W= langle w_1, w_2, w_3, w_4rangle$. At one point we have the list of last $N$ numbers $L_1= langle a, b, c, d rangle>$, and the weighted sum $S_1=w_1*a+w_2*b+w_3*c+w_4*d$. When another number, $e$, is received, we update the list to get $L_2= langle b,c,d,erangle$ and we need to compute $S_2=w_1*b+w_2*c+w_3*d+w_4*e$. Consideration using FFT A special case of this problem appears to be solvable efficiently by employing the Fast Fourier Transform. Here, we compute the weighed sums $S$ in multiples of $N$. In other words, we receive $N$ numbers and only then can we compute the corresponding $N$ weighed sums. To do this, we need $N-1$ past numbers (for which sums have already been computed), and $N$ new numbers, in total $2N-1$ numbers. If this vector of input numbers and the weight vector $W$ define the coefficients of the polynomials $P(x)$ and $Q(x)$, with coefficients in $Q$ reversed, we see that the product $P(x)times Q(x)$ is a polynomial whose coefficients in front of $x^{N-1}$ up to $x^{2N-2}$ are exactly the weighted sums we seek. These can be computed using FFT in $Theta(N*log (N))$ time, which gives us an average of $Θ(log (N))$ time per input number. This is however not a solution the the problem as stated, because it is required that the weighted sum is computed efficiently each time a new number is received – we cannot delay the computation.
Asked By : Ambroz Bizjak
Answered By : Yuval Filmus
Here is an elaboration of your approach. Every $m$ iterations, we use the FFT algorithm to compute $m$ values of the convolution in time $O(nlog n)$, assuming that the subsequent $m$ values are zero. In other words, we are computing $$ sum_{i=0}^{n-1} w_i a_{t-i+k}, quad 0 leq k leq m-1, $$ where $w_i$ are the $n$ weights (or the reverse weights), $a_i$ is the input sequence, $t$ is the current time, and $a_{t’} = 0$ for $t’ > t$. For each of the following $m$ iterations, we are able to calculate the required convolution in time $O(m)$ (the $i$th iteration needs time $O(i)$). So the amortized time is $O(m) + O(nlog n/m)$. This is minimized by choosing $m = sqrt{nlog n}$, which gives an amortized running time of $O(sqrt{nlog n})$. We can improve this to worst-case running time of $O(sqrt{nlog n})$ by breaking the computation into parts. Fix $m$, and define $$ b_{T,p,o} = sum_{i=0}^{m-1} w_{pm+i} a_{Tm-i+o}, quad C_{T,p} = b_{T,p,0}, ldots, b_{T,p,m-1}. $$ Each $C_{T,p}$ depends only on $2m$ inputs, so it can be computed in time $O(mlog m)$. Also, given $C_{lfloor t/m rfloor-p,p}$ for $0 leq p leq n/m-1$, we can compute the convolution in time $O(n/m + m)$. The plan therefore is to maintain the list $$ C_{lfloor t/m rfloor-p,p}, quad 0 leq p leq n/m-1. $$ For each period of $m$ inputs, we need to update $n/m$ of these. Each update takes time $O(mlog m)$, so if we spread these updates evenly, each input will take up work $O((n/m^2) mlog m) = O((n/m) log m)$. Together with computing the convolution itself, the time complexity per input is $O((n/m)log m + m)$. Choosing $m = sqrt{nlog n}$ as before, this gives $O(sqrt{nlog n})$.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/10612