Path: utzoo!attcan!uunet!snorkelwacker!usc!zaphod.mps.ohio-state.edu!rpi!leah!bingvaxu!kym
From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell)
Newsgroups: comp.arch
Subject: Re: int x int -> long for * (or is it 32x32->64)
Keywords: arithmetic,arbitrary precision,benchmark,modular arithmetic
Message-ID: <4002@bingvaxu.cc.binghamton.edu>
Date: 13 Sep 90 16:46:05 GMT
References: <3984@bingvaxu.cc.binghamton.edu> <41425@mips.mips.COM> <353@kaos.MATH.UCLA.EDU> <119977@linus.mitre.org>
Reply-To: kym@bingvaxu.cc.binghamton.edu.cc.binghamton.edu (R. Kym Horsell)
Organization: SUNY Binghamton, NY
Lines: 35
Xref: dummy dummy:1
X-OldUsenet-Modified: added Xref
In article <119977@linus.mitre.org> bs@linus.mitre.org (Robert D. Silverman) writes:
\\\
>Even Peter reports a 5 fold speed increase. One difference between his
^^^^^^
>code and mine that would tend to exaggerate the difference is that he
\\\
# SUN 3/280 SUN 4/260 (both with FPUs)
# gcc 1.37.1 cc gcc 1.37.1 cc
# MUL1 10.40 13.52 3.67 2.92 (simple integer arithmetic)
# MUL2 9.00 10.38 2.03 2.05 (floating point arithmetic)
# MUL3 10.14 11.30 6.11 6.32 (break into 16-bit pieces)
# MUL4 1.88 1.98 1.82 1.83 (assembly code)
Ratio of MUL3/MUL1:
.97 .83 1.7 2.2
There does not seem to be a 5-fold difference between using ``simple integer
arithmetic'' and ``break into 16-bit pieces'' in Peter's figures.
Of course the ratio between ``simple integer arithmetic'' and
``assemby code'' are much higher -- this would be comparing apples & oranges.
Since Peter various assembly sources to hand -- I would be
interested to see figures concerning:
32x32->64 multiply
and
32x32->32 via 16-bit pieces multiply
Another experiment, still in assembler, would be to get figures for
the same program using a non-naive multiply and divide in which
the lack of 32x32->64 would be even less marked.
-Kym Horsell