- http://link.springer.com/article/10.1007/s00453-009-9376-2
- http://en.wikipedia.org/wiki/Longest_increasing_subsequence_problem
- http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
- http://en.wikipedia.org/wiki/Kalman_filter
Update: was thinking a little bit more about it and started from the end, so instead of computing all differences between numbers (top-right corner of the matrix) I can derive small $k$ value from “fixed value” I mentioned at the end of original question. For instance if I am going to raise the alarm when 25% of all numbers are in some sequence I need to focus on small “triangles” in matrix and number of computations required is smaller (much smaller). When I add some sampling then it should be simple enough to implement at scale. Update 2 – Implemented @D.W. algorithm, sample run below:
11:51:06 ~$ time nodejs progression.js L: 694000000,694000002,694000006,694000007,694000009,694000010, 694000013,694000015,694000018,694000019,694000021,694000022,694000023, 694000026,694000028,694000030,694000034,694000036,694000038,694000040, 694000043,694000045,694000046,694000048,694000051,694000053,694000055, 694000057,694000060,694000061,694000063,694000067,694000069,694000072, 694000074,694000076,694000077,694000079,694000080,694000082,694000083, 694000084,694000086,694000090,694000091,694000093,694000095,694000099, 694000102,694000103,694000105,694000108,694000109,694000113,694000116, 694000118,694000122,694000125,694000128,694000131,694000134,694000137, 694000141,694000143,694000145,694000148,694000152,694000153,694000154, 694000157,694000160,694000162,694000163,694000166,694000170,694000173, 694000174,694000177,694000179,694000180,694000181,694000184,694000185, 694000187,694000189,694000193,694000194,694000198,694000200,694000203, 694000207,694000211,694000215,694000219,694000222,694000226,694000228, 694000232,694000235,694000236 N: 100 P: 0.1 L: 10 (min) D: 26 (max) [ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90 ] Found progression of 10 elements, difference: 16 starts: 694000045, ends: 694000189. real 0m0.065s user 0m0.052s sys 0m0.004s
Asked By : Maciej Łopaciński
Answered By : D.W.
Given an input sequence $L[0..N-1]$, check whether an arithmetic progression of length $ge N/4$ appears as a subsequence of $L$.
This is a significantly easier problem, and I can suggest a very efficient algorithm for you. Let me build up to it by first explaining a few key ideas underlying the algorithm. Idea 1. Given any two elements of $L$, it is easy to extend them to the maximum-length arithmetic progression containing those two elements. In particular, suppose we have $x,y in L$ with $x<y$. Set $d=y-x$. Then we can extend this sequence on the right as the longest prefix of $y+d,y+2d,y+3d,dots$ that is wholly contained in $L$, and we can extend it on the left with the longest prefix of $x-d,x-2d,x-3d,dots$ that is wholly contained in $L$. Idea 2. Say that an arithmetic progression has gap $d$ if the difference between consequence elements of the arithmetic progression is $d$. If $L$ contains an arithmetic progression of length $ge N/4$, then its gap $d$ must satisfy $d le 4(L[N-1]-L[0])/N$. Idea 3. Suppose there is an arithmetic progression of length $ge N/4$, and let $alpha,beta$ denote the start and end index (in $L$) of the subsequence. Since the subsequence has to be of length $ge N/4$, the interval $[alpha,beta]$ must contain one or more of the following numbers: $0.2N$, $0.4N$, $0.6N$, $0.8N$ (rounding all of them to an integer). To give you some intuition for this: Imagine you’re playing Battleship on a $N times 1$ board, and your opponent has a single ship: a super-long battleship, that is $N/4$ cells long. Could you find his battleship? You sure could. It’s only gonna take you 4 shots to find his battleship. You can shoot at cells $0.2N$, $0.4N$, $0.6N$, $0.8N$, and it’s guaranteed that at least one of them will be a hit (that’s Idea 3). Moreover, once you get a single hit, it’s not going to be hard to sink the rest of his battleship (that’s basically Ideas 1 and 2). With these ideas, I’m ready to propose an algorithm:
- Set $d_max gets 4(L[N-1]-L[0])/N$.
- For each $i_0 in {0.2N, 0.4N, 0.6N, 0.8N}$, do:
- For each $i$ such that $ige i_0$ and $L[i] le L[i_0]+d_max$, do:
- For each $j$ such that $j>i$ and $L[j] le L[i]+d_max$, do:
- Extend the arithmetic progression $L[i],L[j]$ on the left and right as far as possible (using Idea 1).
- For each $j$ such that $j>i$ and $L[j] le L[i]+d_max$, do:
- For each $i$ such that $ige i_0$ and $L[i] le L[i_0]+d_max$, do:
- Look at the longest arithmetic progression found at any point above. If it has length $ge N/4$, then yes, there exists an arithmetic progression of length $ge N/4$. If its length is $< N/4$, then no, there is no arithmetic progression of length $ge N/4$.
Hopefully you can see why this works. Idea 3 means that, if there is an arithmetic progression of length $ge N/4$, then at least one of the indices $i$ that are visited in the above algorithm must be the index of one element of that progression. Moreover, the index of the next element of that progression must be one of the values taken on by $j$ in the innermost loop. Therefore, at some point the pair $L[i],L[j]$ will be a pair of consecutive elements of the arithmetic progression — which is enough to discover that long arithmetic progression. I would expect that this algorithm will typically be very efficient, on real-world data. If all of the integers in $L$ are in the range $[1,M]$, then we know $d_max le 4M/N$. Moreover, the loop over $i$ typically involves about $d_max N/M le 4$ iterations (on average), and similarly for the loop over $j$. So, we expect to perform maybe $5 times 4 times 4 = 80$ iterations of the innermost statement, regardless of $N$, if the data is randomly chosen (and in particular, not adversarily chosen). Consequently, this should be extremely efficient if the data doesn’t conspire against you. It is possible to further improve this algorithm, for instance, to improve its worst-case running time and reduce the opportunity for worst-case data to cause the running time to blow up, at the cost of making the code more complicated. However, I suspect that this simple algorithm might be sufficient for your needs.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/18951