Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: comp.arch
Subject: Re: benchmark for evaluating extended precision
Summary: Use an appropriate test, using machine instructions
Keywords: extended precision,multiply,benchmark,arithmetic
Message-ID: <2550@l.cc.purdue.edu>
Date: 13 Sep 90 13:05:35 GMT
References: <3989@bingvaxu.cc.binghamton.edu> <1990Sep12.223253.9574@csc.ti.com>
Distribution: usa
Organization: Purdue University Statistics Department
Lines: 31
Xref: dummy dummy:1
X-OldUsenet-Modified: added Xref
In article <1990Sep12.223253.9574@csc.ti.com>, bmk@csc.ti.com (Brian M Kennedy) writes:
> =>It has been claimed that a lack of 32x32->64 multiplication
> =>makes a factor of 10 difference in the running time of
> =>typical extended precision arithmetic routines. Although it
> =>obviously makes _a_ difference in run time I do not measure
> =>an order of magnitude difference.
............................
> Instead I will measure
> an upper-bound on the performance increase by comparing:
>
> 64*64->64 via 32*32->32 vs. 32*32->32
[Long description deleted.]
The original problem was 32x32 -> 64 compared to 32x32 -> 32. To
do a reasonable type of test, consider the general problem of NxN -> 2N
vs. NxN -> N. Now to do this properly, one should remember that in the
machine with NxN -> N, N is the length available. Thus, in adding two
N-bit numbere, one must use a test-for-carry to detect a bit in position
N (starting the count from 0). Also, the comparison should not depend on
the peculiarities of a particular compiler, but should be done at the
machine-language level. This is not a long code.
To carry out the benchmark, one could use N = 16 (or even 8) to get a
general idea.
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)