# Board index » cppbuilder » multiplication of a vector with a matrix

## multiplication of a vector with a matrix

2005-10-21 02:18:36 AM
cppbuilder62
Hi,
I have a mathematical problem that leads to a high
amount of calculation time and slows my program realy
down.
Maybe there is a clever solution of this "easy" problem
anywhere.
"multiplication of a vector with a matrix"
The correct c-code for this is:
for (int i=0; i<100; i++) {
vector1(i) = 0.;
for (int j=0; j<100; j++) {
vector1(i) = vector1(i) + matrix(i,j)*vector2(j);
};
};
So, vector1 is the correct product of vector2 and the
matrix.
The two capseld for-loops lead to an very high amount
of calculation time. Does anybody know a way to program
this in two (or even one) separate loops?
Thank you,
Frank =:^)

## Re:multiplication of a vector with a matrix

Frank < XXXX@XXXXX.COM >writes:
##### Quote
Maybe there is a clever solution of this "easy" problem
anywhere.
The only "clever" solution I know is the insight that many matrix
operations are calculated for viewing just a few elements, rather than
all of them. Some fast math libraries use "lazy evaluation" and only
setup things so that when you actually access an element its value is
calculated.
##### Quote
for (int i=0; i<100; i++) {
vector1(i) = 0.;
for (int j=0; j<100; j++) {
vector1(i) = vector1(i) + matrix(i,j)*vector2(j);
};
};
like operator[]. Is that right? Perhaps that function is slow, or
not inlined, or something to that effect. Since it appears to be
returning references, you should be able to "half" the calls to
vector1(i) for each assignment by doing this:
vector1(i) += matrix(i,j) * vector2(j);
However that does not eliminate the O(N*M) complexity itself, it just
helps reduce the work done in the loop.
--
Chris (TeamB);

## Re:multiplication of a vector with a matrix

Frank wrote:
##### Quote
for (int i=0; i<100; i++) {
vector1(i) = 0.;
for (int j=0; j<100; j++) {
vector1(i) = vector1(i) + matrix(i,j)*vector2(j);
};
};
I don't know anything about the math here, but I'd suggest doing this to
speed it up:
int i = 0;
int j;
while( i < 100 )
{
vector1( i ) = 0;
j = 0;
while( j < 100 )
{
vector1( i ) = vector1( i ) + matrix( i, j ) * vector2( j );
++j;
};
++i;
};
Also it seems like you are setting vector1(i) to 0, is that right ? Because
if you are then this seems a lot simpler:
int i = 0;
int j;
while( i < 100 )
{
vector1( i ) = 0;
j = 0;
while( j < 100 )
{
vector1( i ) = matrix( i, j ) * vector2( j );
++j;
};
++i;
};
HTH
Jonathan

{smallsort}

## Re:multiplication of a vector with a matrix

"Jonathan Benedicto" < XXXX@XXXXX.COM >writes:
##### Quote
Frank wrote:
>for (int i=0; i<100; i++) {
>vector1(i) = 0.;
>for (int j=0; j<100; j++) {
>vector1(i) = vector1(i) + matrix(i,j)*vector2(j);
>};
>};

I don't know anything about the math here, but I'd suggest doing this to
speed it up:

int i = 0;
int j;

while( i < 100 )
{
vector1( i ) = 0;
j = 0;
while( j < 100 )
{
vector1( i ) = vector1( i ) + matrix( i, j ) * vector2( j );
++j;
};
++i;
};
Why would you expect this to run any faster? If (and it's a big if!)
declaring the variables i and j inside the loops actually took CPU
time, which I highly doubt, the savings would be insignificant
compared to the real problem, which is the computational complexity of
nested loops. (I assume that 100x100 is only for example, and the
numbers may be much larger.)
##### Quote
Also it seems like you are setting vector1(i) to 0, is that right ? Because
if you are then this seems a lot simpler:

int i = 0;
int j;

while( i < 100 )
{
vector1( i ) = 0;
j = 0;
while( j < 100 )
{
vector1( i ) = matrix( i, j ) * vector2( j );
++j;
};
++i;
};
It's starting at zero, then accumulating the sum of all the values
calculated into it. This 2nd rewrite changes the meaning of the
expression. (Notice the vector1(i) term you removed? That was adding
the new stuff to the "old" value.)
--
Chris (TeamB);

## Re:multiplication of a vector with a matrix

Chris Uzdavinis (TeamB) wrote:
##### Quote
Why would you expect this to run any faster? If (and it's a big if!)
declaring the variables i and j inside the loops actually took CPU
time, which I highly doubt, the savings would be insignificant
compared to the real problem, which is the computational complexity of
nested loops. (I assume that 100x100 is only for example, and the
numbers may be much larger.)
I seem to recall reading somewhere that using while instead of for, and ++i
instead of i++ made a faster loop because it allowed the compiler to
optimize better.
##### Quote
It's starting at zero, then accumulating the sum of all the values
calculated into it. This 2nd rewrite changes the meaning of the
expression. (Notice the vector1(i) term you removed? That was adding
the new stuff to the "old" value.)
Thank you, I didn't notice that.
Jonathan

## Re:multiplication of a vector with a matrix

Frank wrote:
##### Quote
Hi,

I have a mathematical problem that leads to a high
amount of calculation time and slows my program realy
down.

Maybe there is a clever solution of this "easy" problem
anywhere.

"multiplication of a vector with a matrix"

The correct c-code for this is:

for (int i=0; i<100; i++) {
vector1(i) = 0.;
for (int j=0; j<100; j++) {
vector1(i) = vector1(i) + matrix(i,j)*vector2(j);
};
};

So, vector1 is the correct product of vector2 and the
matrix.

The two capseld for-loops lead to an very high amount
of calculation time. Does anybody know a way to program
this in two (or even one) separate loops?

That's the way it is. Multiplying a vector of length N by a NxN
matrix has complexity of O(N^2) [which means that computation time is
proportional to N^2]. If you know some characteristics of the matrix
(eg it's sparse) it is possible to speed things up. But, for a general
matrix, you are limited by that complexity.
As Chris said, you might wish to check that operator() for both your
matrix and vector types are efficient. That won't change the
complexity of the algorithm, but can give improvement.
I'd probably simplify your algorithm a little to something like;
double temp;
for (int i=0; i<100; ++i) {
temp = 0.;
for (int j=0; j<100; ++j) {
temp += matrix(i,j)*vector2(j);
}
vector1(i) = temp;
}
Again, this doesn't change complexity of the algorithm, but probably
makes individual steps in the loop more efficient. You'll need to
confirm that by testing though.

## Re:multiplication of a vector with a matrix

"Jonathan Benedicto" < XXXX@XXXXX.COM >writes:
##### Quote
I seem to recall reading somewhere that using while instead of for,
and ++i instead of i++ made a faster loop because it allowed the
compiler to optimize better.
Most compilers implement their parser in a way that has an internal
representation of the program in some sort of tree-like structure. At
this level, individual language constructs like "while" and "for"
generally get represented identically, and when this tree is traversed
during the code-gen phase, the same assembly is generated.
I'd be quite surprised to find any compiler that had measurable
differences between while and for loops. (Though I have not
benchmarked Borland specifically and don't know this for certain.)
There is some truth to preferring the prefix rather than postfix
increment/decrement, which you don't need the return value, however,
on a builtin or numeric type, it is usually a difference of zero. The
habit is good to have when you actually are dealing with user-defined
types, such as some iterators, because those must internally implement
the postfix operators inefficiently compard to the prefix version
(since copies and tempories are required.) For an integer, it doesn't
matter.
That said, a speedup of ++i over i++ is O(1), while the overall
complexity of the problem is O(n*m), which is actually, since BigOh
notation deals with worst-cases, is actually O(n^2).
In an O(n^2) problem, a miniscule constant speedup tends to be
insignificant, assuming N is large enough. The graph (in measuring
the time it takes) of N^2 starts nearly horizontal but eventually
bends until it's approaching vertical. A constant speedup merely
shifts this entire graph, but an algorithmic improvement can flatten
it. For large N, an unoptimized implementation of a fast algorithm
will be light-years faster than a highly-optimized implementation of
the inefficient algorithm.
Thus, focusing on the algorithm is where you tend to get the most
"bang for the buck" in terms of optimizations.
--
Chris (TeamB);

## Re:multiplication of a vector with a matrix

Chris Uzdavinis (TeamB) < XXXX@XXXXX.COM >writes:
##### Quote
it. For large N, an unoptimized implementation of a fast algorithm
will be light-years faster than a highly-optimized implementation of
the inefficient algorithm.
Also, I am aware that light-years are a measure of distance, not
speed, but am claiming metaphoric liberty here. :)
--
Chris (TeamB);

## Re:multiplication of a vector with a matrix

Thank you for so much help Rob and Chris !!!
I think it is a really difficult one. And Chris you are right,
100 is only an example for i_max and j_max, it could be more.
And sorry, the matrix is not sparse, so I can not choose this
way to make it easier.
On "weak" solution I found is this:
First I separate the initialization.
for (int i=0; i<100; i++) {
vector1(i) = 0.;
};
then it is the same if I write:
for (int i=0; i<i_max; i++) {
for (int j=0; j<j_max; j++) {
vector1(i) = vector1(i) + matrix(i,j)*vector2(j);
};
};
or this:
for (int j=0; j<j_max; j++) {
for (int i=0; i<i_max; i++) {
vector1(i) = vector1(i) + matrix(i,j)*vector2(j);
};
};
Now I put the index (i or j) with the bigger _max-value inside.
It helps a little bit to put the long loop inside. But I think there
must be any way to make it more easy.
CU,
Frank ;)
Chris Uzdavinis (TeamB) schrieb:
##### Quote
Chris Uzdavinis (TeamB) < XXXX@XXXXX.COM >writes:

>it. For large N, an unoptimized implementation of a fast algorithm
>will be light-years faster than a highly-optimized implementation of
>the inefficient algorithm.

Also, I am aware that light-years are a measure of distance, not
speed, but am claiming metaphoric liberty here. :)

## Re:multiplication of a vector with a matrix

Doesn't this look like a case where templates vaguely like the N! example
might be justified? It can effectively auto-writes hard-code instead of the
inner loop.
. Ed
##### Quote
Chris Uzdavinis wrote in message
news: XXXX@XXXXX.COM ...

>Maybe there is a clever solution of this "easy" problem
>anywhere.

The only "clever" solution I know is the insight that many matrix
operations are calculated for viewing just a few elements, rather than
all of them. Some fast math libraries use "lazy evaluation" and only
setup things so that when you actually access an element its value is
calculated. ...

## Re:multiplication of a vector with a matrix

"Ed Mulroy" < XXXX@XXXXX.COM >writes:
##### Quote
Doesn't this look like a case where templates vaguely like the N!
example might be justified? It can effectively auto-writes
hard-code instead of the inner loop.
I think that expression-templates may be useful here. I think there
are some matrix libraries for fast operations that use some template
tricks, such as the uBLAS library in Boost. However, I'm not sure if
it works with the Borland compiler, because it's not listed in their
supported compiler table. (But it doesn't say that it's NOT
supported, except indirectly.)
,----
| The current version of uBLAS expects a modern (ISO standard
| compliant) compiler. Compilers targeted and tested with this
| release are:
|
| * GCC 3.2.3, 3.3.x, 3.4.x, 4.0.x
| * MSVC 7.1, 8.0
| * ICC 8.0, 8.1
| * Visual age 6
| * Codewarrior 9.4, 9.5
|
| The version of uBLAS in Boost 1.32.0 (and earlier) support many older
| compilers. If you are using such a compiler please use this version of
| uBLAS. Compilers known to accept this older library are:
|
| * MSVC 6.0 with STLPort-4.5.3, 7.0, 7.1
| * GCC 2.95.x, 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x
| * ICC 7.0, 7.1 8.0
| * Comeau 4.2.x
| * Codewarrior 8.3
`----
In any case, the library documentation is here:
www.boost.org/libs/numeric/ublas/doc/index.htm
--
Chris (TeamB);

## Re:multiplication of a vector with a matrix

That is unrelated to what I said but yes, there is a chance that Boost would
work.
. Ed
##### Quote
Chris Uzdavinis wrote in message
news: XXXX@XXXXX.COM ...

I think that expression-templates may be useful here. I think there
are some matrix libraries for fast operations that use some template
tricks, such as the uBLAS library in Boost. However, I'm not sure if
it works with the Borland compiler, because it's not listed in their
supported compiler table. (But it doesn't say that it's NOT
supported, except indirectly.)

,----
| The current version of uBLAS expects a modern (ISO standard
| compliant) compiler. Compilers targeted and tested with this
| release are:
|
| * GCC 3.2.3, 3.3.x, 3.4.x, 4.0.x
| * MSVC 7.1, 8.0
| * ICC 8.0, 8.1
| * Visual age 6
| * Codewarrior 9.4, 9.5
|
| The version of uBLAS in Boost 1.32.0 (and earlier) support many older
| compilers. If you are using such a compiler please use this version of
| uBLAS. Compilers known to accept this older library are:
|
| * MSVC 6.0 with STLPort-4.5.3, 7.0, 7.1
| * GCC 2.95.x, 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x
| * ICC 7.0, 7.1 8.0
| * Comeau 4.2.x
| * Codewarrior 8.3
`----

In any case, the library documentation is here:

www.boost.org/libs/numeric/ublas/doc/index.htm

## Re:multiplication of a vector with a matrix

"Ed Mulroy" < XXXX@XXXXX.COM >writes:
##### Quote
That is unrelated to what I said but yes, there is a chance that
Boost would work.
I was thinking you meant expression templates instead, since
hard-coded solutions like factorial<100>involve so much compile-time
overhead, it would probably exceed the template instantiation depth
long before it had a solution. It is also hard to use at runtime
since, even though it generated each type factorial<1>, factorial<2>,
..., factorial<99>, factorial<100>, using rutime-calculated values to
find those template types is hard.
It's a mapping of runtime VALUE to compile-time TYPE. For example we
cannot do this:
int x = some_number();
int fact = factorial<x>::result;
The problem is each instantiation of factorial<N>(where N is a
compile-time constant), is a unique type, and C++ offers no way to
synthesize types at runtime.
I didn't think you meant something like what is below, as it suffers
from very small arbitrary upper limits, and is bloated and awkared to
use, especially if you can instantiate a large N of your objects.
#include <iostream>
struct abstract_factorial_result
{
virtual unsigned long get_result() const =0;
};
template <unsigned long N>
struct factorial : abstract_factorial_result
{
static const unsigned long result = N * factorial<N-1>::result;
virtual unsigned long get_result() const { return result; }
};
template <>
struct factorial<0>: abstract_factorial_result
{
static const unsigned long result = 1;
virtual unsigned long get_result() const { return result; }
};
abstract_factorial_result const * factorials [] =
{
new factorial<0>,
new factorial<1>,
new factorial<2>,
new factorial<3>,
new factorial<4>,
new factorial<5>,
new factorial<6>,
new factorial<7>,
new factorial<8>,
new factorial<9>,
};
int main()
{
for (int i=0; i < 10; ++i)
{
std::cout << i << "! = " << factorials[i]->get_result() << std::endl;
}
}
--
Chris (TeamB);

## Re:multiplication of a vector with a matrix

I did mean that, but had not thought it through. It is limited in the
manner you describe and the man's program is likely to need runtime values
for the dimensions.
. Ed
##### Quote
Chris Uzdavinis wrote in message
news: XXXX@XXXXX.COM ...

>That is unrelated to what I said but yes, there is a chance that
>Boost would work.

I was thinking you meant expression templates instead, since
hard-coded solutions like factorial<100>involve so much compile-time
overhead, it would probably exceed the template instantiation depth
long before it had a solution. It is also hard to use at runtime
since, even though it generated each type factorial<1>, factorial<2>,
..., factorial<99>, factorial<100>, using rutime-calculated values to
find those template types is hard.

It's a mapping of runtime VALUE to compile-time TYPE. For example we
cannot do this:

int x = some_number();
int fact = factorial<x>::result;

The problem is each instantiation of factorial<N>(where N is a
compile-time constant), is a unique type, and C++ offers no way to
synthesize types at runtime.

I didn't think you meant something like what is below, as it suffers
from very small arbitrary upper limits, and is bloated and awkared to
use, especially if you can instantiate a large N of your objects.

#include <iostream>

struct abstract_factorial_result
{
virtual unsigned long get_result() const =0;
};

template <unsigned long N>
struct factorial : abstract_factorial_result
{
static const unsigned long result = N * factorial<N-1>::result;
virtual unsigned long get_result() const { return result; }
};
template <>
struct factorial<0>: abstract_factorial_result
{
static const unsigned long result = 1;
virtual unsigned long get_result() const { return result; }
};

abstract_factorial_result const * factorials [] =
{
new factorial<0>,
new factorial<1>,
new factorial<2>,
new factorial<3>,
new factorial<4>,
new factorial<5>,
new factorial<6>,
new factorial<7>,
new factorial<8>,
new factorial<9>,
};

int main()
{
for (int i=0; i < 10; ++i)
{
std::cout << i << "! = " << factorials[i]->get_result() << std::endl;
}
}

## Re:multiplication of a vector with a matrix

Chris Uzdavinis (TeamB) wrote:
##### Quote
Most compilers implement their parser in a way that has an internal
representation of the program in some sort of tree-like structure. At
this level, individual language constructs like "while" and "for"
generally get represented identically, and when this tree is traversed
during the code-gen phase, the same assembly is generated.

I'd be quite surprised to find any compiler that had measurable
differences between while and for loops. (Though I have not
benchmarked Borland specifically and don't know this for certain.)
Thank you for this info.
##### Quote
There is some truth to preferring the prefix rather than postfix
increment/decrement, which you don't need the return value, however,
on a builtin or numeric type, it is usually a difference of zero. The
habit is good to have when you actually are dealing with user-defined
types, such as some iterators, because those must internally implement
the postfix operators inefficiently compard to the prefix version
(since copies and tempories are required.) For an integer, it doesn't
matter.
Thank you again.
##### Quote
Thus, focusing on the algorithm is where you tend to get the most
"bang for the buck" in terms of optimizations.
I agree, yes.
Jonathan