Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!julius.cs.uiuc.edu!rpi!leah!bingvaxu!kym
From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell)
Newsgroups: comp.arch
Subject: FFT break-even point
Summary: initial figure only -- 160 bits
Keywords: FFT,discrete,multiply,arithmetic,high precision
Message-ID: <4024@bingvaxu.cc.binghamton.edu>
Date: 15 Sep 90 02:06:42 GMT
Reply-To: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell)
Organization: SUNY Binghamton, NY
Lines: 69
Xref: dummy dummy:1
X-OldUsenet-Modified: added Xref
Initial report on break-even in discrete FFTs for XP multiply
-------------------------------------------------------------
The thy regarding FFT and extended-precision multiply can be found in any
reasonable book on computer algorithms and complexity (see, e.g. DEK,
Hwang&Briggs, Wilf, etc).
The main result, which can be applied to a class of algorithms including
polynomial multiplication (encompassing fixed-point multiplication), show
these can be performed in ``approximately'' O(n log n) time where n is the
size of the operands.
With careful selection of parameters (and exact details will remain vague
until I verify whether any of this is worth publishing) the FFT
``constant of proportionality'' can be made quite reasonable.
The question is -- where is the break-even point wrt naive ``shift-and-add''
multiplication?
From an initial benchmark, that does not purport to be of a _correctly_
working fft multiply (regard it as ``synthetic'' at this point),
I have obtained the following data (times in ms, figures in parens with
optimization):
-------------------------------------------------------
fft naive
VAX 6440
4020 2590
(3840) (2590)
VAX 8530
6617 6516
(6400) (6483)
Sun 3/60
9434 9750
(7017) (5717)
Sun Sparc 1
8233 19600
(5200) (6533)
-------------------------------------------------------
The size of the operands is approximately 160 bits (giving a 160-bit
========
result: high-order being discarded). [Remember, I'm talking about
XP calcs implemented in s/w here -- this figure is not directly
applicable to h/w implementations of, say, extended precision FP
multiplication].
As far as I can determine, these figures represent a true break-even point
on all the above architectures except the 6440. Note also that optimization
tends to shift the break-even (e.g. Sun 3/60). A more detailed analysis may
determine _why_ this is in each case; meanwhile anyone else is free to jump
the gun.
Since FFTs have an extremely unusual ``constant of proportionality'',
that depends on the canonical factorization of the size of the operands
the exact characterization of a break-even point is quite difficult.
Hence the ``preliminary-only'' nature of this 160-bit figure.
[For those that may like to dispute this initial figure out of hand
please consider: we wish to compare _apples_ with _apples_. Although
it may be possible to hand-code a naive XP mult to give extremely good
performance on a given architecture (and, naturally, _none_ of my code is
in assembler), the same would be true for the FFT algorithm.]
I will post a more detailed report, for those interested, inaculadayz.
-Kym Horsell
---
PS If anyone can save me some work by pointing at >1 authoritative
reference ON THIS SAME WORK, please post or email.