You have a stream of incoming numbers in range 0 to 60000 and you have a function which will take a number from that range and return the count of occurrence of that number till that moment. Give a suitable Data structure/algorithm to implement this system.
The stream is infinite, so if fixed size data structures re used, i.e. primitive types in Java or C, they will overflow. So there is the need to use data structures that have a size that grows over time. As pointed by the interviewer, the memory occupied by those data structures will diverge. The model of computation is a Turing machine with three tapes:
- infinite read-only one-way input tape;
- constant space bounded read-write two way work tape;
- infinite write-only one-way output tape.
The main reason to choose the model above is that in the real world there is virtually no limit to the quantity of input that can be acquired using a keyboard or a network connection. Also, there is virtually no limit to the quantity of information that can be displayed on amonitor over time. But memory is limited and expensive. I modeled the problem as the problem to recognize the language L of all couples (number,number of occurrences so far). As a corollary of the Theorem 3.13 in Hopcroft-Ullman I know that every language recognized by a constant space bounded machine is regular. But, in any given moment, the language L is a finite language, because the number of couples to be recognized is finite: 60001. So I can’t use the pumping lemma for regular languages to prove that such language is not regular. Is there a way I can complete my proof? The original question is here.
Asked By : Vitalij Zadneprovskij
Answered By : Sasho Nikolov
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/2948