Average Time Complexity of Two Nested Loops

Problem Detail: Recently I have had this question in one of my interviews. You have 1 Million sorted integer, you have a value of $x$, compare each pair in this array and if the addition of two pair is less or equal to $x$ then increment the paircounter. I have implemented this solution in C++ however I will write pseudocode here(I am sorry I am not very good at pseudo-code)

Initialise ARR_SIZE to 1000000  Initialise index_i to 0  Initialise pairCount to 0  Initialise x to 54321  while index_i is less than ARR_SIZE    Initialise index_j to index_i+1      while index_j is less than ARR_SIZE        if array[index_i]+array[index_j] is less or equal to x          Increment pairCount        increment index_j      endof while      increment index_x  endof while

At first I said it is $O(n log n)$ but then with the hint the second loop itself average complexity is O(n/2) so overall I said it would be $O(ncdot n/2)$ but in Big $O$ notation it would be $O(n)$ because $n/2$ is a constant(although I was not too sure). So what is the average complexity of this overall algorithm? PS: I know that I could have decreased the complexity by adding an extra else, index_j = ARR_SIZE, which would be $O(N)$ complexity, but I couldn’t think of it during the interview.

Asked By : Sarp Kaya

Answered By : svinja

How to solve it in O(N) Let’s take an example sequence of numbers: 1 3 4 6 7 9 Assume we want the number of pairs (A,B) where A + B <= 12. First, let’s compare the 1st (low element) and the Nth element (high element). 9 + 1 <= 12. So that’s one pair. But, as our list is sorted, 9 is the biggest number, if (9, 1) <= 12, for any other number X in the list, (X, 1) <= 12! Meaning, we can just add 6-1 (high index – low index) to the count, we don’t need to test those numbers separately. Those are pairs (9,1), (7,1) … (3,1). Now we increase our low index to 2, we continue to 3. 9 + 3 <= 12 still, so we add 6-2 to the count. We already added (3,1) in the previous step so this is fine. This adds pairs (9,3), (7,3), (6,3), (4,3). Again we increase the low index. 9 + 4 > 12. When this happens, we decrease the high index and add nothing to the count. Now low index is 3 and high index is N-1. 7 + 4 <= 12. Adding 5-3… And so on. If it is unclear why I add high index – low index every time: take for example the 7 + 4 step. I know that, because I am at high = 7, I had to skip 9 because it was too large. So I don’t want to add the pair involving 9. I also know that because I am at low = 4, I already counted the pairs involving 1 and 3 in the previous steps. So I need to add pairs (7,4) and (6,4) in this step.

int count = 0; for (int low = 0, high = list.Count - 1; high >= 1 && low != high; ) {     if (list[high] + list[low] <= x)     {         count += high - low;         low++;     }     else     {         high--;     } }

Regarding constants and complexity Judging by your comment, you are confused about what is ignored and why. First, n/2 is not a constant because it depends on n. As n grows, n/2 grows, so it is clearly not constant. The constant that is ignored in n*n/2 is 1/2, and what remains is n*n. So the complexity is N^2. The actual number of inner loop executions is N^2/2, but this is still O(N^2). The fact you brought up in your comment that the inner loop will run 50 times when 10^2 would indicate 100 times is irrelevant, here’s why: Constants are not meaningful unless they are extremely large (a 2^32 constant because your algorithm tests every 32 bit integer) or you calculate the average case cycle count on a reference architecture. Otherwise, actual speed depends on the language used, on how many instructions each operation actually performs, prediction, caching, pipelining, etc and it is difficult to compare the constants of 2 different algorithms, just outright counting syntactic constructs is very misguided. 2X operations may run faster than X operations if what is being done, how and in which order is different. So until you dig a bit deeper and look into the things I mentioned above, counting operations in a high level language is pure wishful thinking. It is a sign of a questionable CS teaching in my opinion because it teaches a nonsense model of code execution, instead of using a valid one, but on a simple reference architecture (like Knuth does), if you insist on counting operations. Consider these two pieces of code:

for (int j = 0; j < N - 1; j++)     for (int i = j + 1; i < N; i++)         r += A[i][j];

And:

for (int i = 0; i < N; i++)     for (int j = 0; j < N; j++)         r += A[i][j];

The first example has N^2 / 2 inner loop repetitions. The second example has N^2 inner loop repetitions. The inner loop body is the same. But the second one is actually much faster on a standard computer (in fact, a smart compiler with certain settings might rewrite your code if you write the first one – so if you test this, turn off optimization)! This is because the second one has much better memory access locality – it accesses the memory locations in order, while the first one jumps around. This causes a much bigger difference in speed than running the loop twice as many times does. And this is just one of many things which affect run speed and are not immediately visible in the code. Second example, two pieces of code:

for (int i = 0; i < N - 1; i++) {     for (int j = i + 1; j < N; j++)     {         int a = A[i][j] * 3;         a += 5;         r += a;     } }

And:

for (int i = 0; i < N; i++)     for (int j = 0; j < N; j++)         r += A[i][j];

Can you tell me which one has a bigger “constant”? Again, the first inner loop runs 1/2 as many times as the second one. But the first one has 3 lines of code inside the inner loop, while the second one has just one. On my machine, the first one is faster. By what factor? I have no idea (I could measure it, but I can’t tell you by looking at the code). If you decide on a constant A for the first and B for the second program by looking at the code, A/B will probably not be anything like the ratio of actual execution times, making the constants meaningless. So the 1/2 constant is meaningless. Whether a program runs twice faster than some other program depends on so many things that it is not productive to account for small constants on such a superficial level. By the way, you have a bug in your program. The outer loop condition needs to be less than ARR_SIZE – 1 or you are setting index_j to ARR_SIZE in the last outer iteration, and as your indices are 0-based you are reading from beyond the end of your buffer.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/13302