[Solved]: Which computational model is used to analyse the runtime of matrix multiplication algorithms?

Problem Detail: Although I have already learned something about the asymptotic runtimes of matrix multiplication algorithms (Strassen’s algorithm and similar things), I have never found any explicit and satisfactory reference to a model of computation, which is used to measure this complexity. In fact, I have found three possible answers, neither of which seems to me as absolutely satisfactory:

Wikipedia says that the model used here is the Multitape Turing Machine. This does not seem to make much sense to me, since in the analysis of matrix multiplication, scalar multiplication is supposed to have a constant time complexity. This is not the case on Turing Machines.
Some texts describe the complexity only vaguely as the number of arithmetic operations used. However, what exactly are arithmetic operations in this context? I suppose that addition, multiplication, and probably subtraction. But what about division, integer division, remainder, etc.? And what about bitwise operations – how do they fit into this setting?
Finally, I have recently discovered an article, which uses the BSS machine as the model of computation. However, this also seems little bit strange to me, since for, e.g., integer matrices, it does not make much sense to me to disallow operations such as, e.g., integer division.

I would be grateful to anyone, who could help me to sort these things out.

Asked By : 042

Answered By : Yuval Filmus

Matrix multiplication algorithms are analyzed in terms of arithmetic complexity. The computation model is straight-line programs with instructions of the form $a gets b circ c$, where $circ in { +,-,times,div }$, $a$ is a variable, and $b,c$ could be either variables, inputs or constants. Additionally, certain variables are distinguished as outputs. For example, here is how to multiply two $2times 2$ matrices using the usual algorithm, with input matrices $a_{ij},b_{ij}$ and output matrix $c_{ij}$: $$ begin{align*} x_{11} &gets a_{11} times b_{11} & y_{11} &gets a_{12} times b_{21} & c_{11} &gets x_{11} + y_{11} x_{12} &gets a_{11} times b_{12} & y_{12} &gets a_{12} times b_{22} & c_{12} &gets x_{12} + y_{12} x_{21} &gets a_{21} times b_{11} & y_{21} &gets a_{22} times b_{21} & c_{21} &gets x_{21} + y_{21} x_{22} &gets a_{21} times b_{12} & y_{22} &gets a_{22} times b_{22} & c_{22} &gets x_{22} + y_{22} end{align*} $$ The complexity measure is the number of lines in the program. For matrix multiplication, one can prove a normal form for all algorithms. Every algorithm can be converted into an algorithm of the following form, at the cost of only a constant multiplicative increase in complexity:

Certain linear combinations $alpha_i$ of the input matrix $a_{jk}$ are calculated.
Certain linear combinations $beta_i$ of the input matrix $b_{jk}$ are calculated.
$gamma_i to alpha_i times beta_i$.
Each entry in the output matrix is a linear combination of $gamma_i$s.

This is known as bilinear normal form. In the matrix multiplication algorithm shown above, $x_{jk},y_{jk}$ function as the $gamma_i$, but in Strassen’s algorithm the linear combinations are more interesting; they are the $M_i$’s in Wikipedia’s description. Using a tensoring approach (i.e. recursively applying the same algorithm), similar to the asymptotic analysis of Strassen’s algorithm, one can show that given such an algorithm for multiplying $ntimes n$ matrix with $r$ products (i.e. $r$ variables $gamma_i$), then arbitrary $Ntimes N$ matrices can be multiplied in complexity $O(N^{log_n r})$; thus only the number of products matters asymptotically. In Strassen’s algorithm, $n = 2$ and $r = 7$, and so the bound is $O(N^{log_2 7})$. The problem of finding the minimal number of products needed to compute matrix multiplication can be phrased as finding the rank of a third-order tensor (a “matrix” with three indices rather than two), and this forms the connection to algebraic complexity theory. You can find more information in this book or these lecture notes (continued here). The reason this model is used is twofold: first, it is very simple and amenable to analysis; second, it is closely related to the more common RAM model. Straight-line programs can be implemented in the RAM model, and the complexity in both models is strongly related: arithmetic operations have unit cost in several variants of the model (for example, the RAM model with real numbers), and are otherwise related polynomially to the size of the numbers. In the case of modular matrix multiplication, therefore, arithmetic complexity provides an upper bound on complexity in the RAM model. In the case of integer or rational matrix multiplication, one can show that for bilinear algorithms resulting from tensorization, the size of the numbers doesn’t grow too much, and so arithmetic complexity provides an upper bound for the RAM model, up to logarithmic factors. It could a priori be the case that a RAM machine can pull some tricks that the arithmetic model is oblivious to. But often we want matrix multiplication algorithms to work for matrices over arbitrary fields (or even rings), and in that case uniform algorithm should only use the arithmetic operations specified by the model. So this model is a formalization of “field-independent” algorithms.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/20245 Ask a Question Download Related Notes/Documents