Radix-64 Floating-Point Division and Square Root: Iterative and Pipelined Units
- 25 May 2023
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Vol. 72 (10), 2990-3001
- https://doi.org/10.1109/tc.2023.3280136
Abstract
Digit-recurrence algorithms are widely used in actual microprocessors to compute floating-point division and square root. These iterative algorithms present a good trade-off in terms of performance, area and power. Traditionally, commercial processors have iterative division and square root units where the iteration logic is used over several cycles. The main drawbacks of these iterative units are long latency and low throughput due to the reuse of part of the logic over several cycles, and its hardware complexity with separated logic for division and square root. We present a radix-64 floating-point division and square root algorithm with a common iteration for division and square root and where, to have an affordable implementation, each radix-64 iteration is made of two simpler radix-8 iterations. The radix-64 algorithm allows to get low-latency operations, and the common division and square root radix-64 iteration results in some area reduction. The algorithm is mapped into two different microarchitectures: a low-latency and low area iterative unit, and a low-latency and high-throughput pipelined unit. In both units speculation between consecutive radix-8 iterations is used to reduce the timing.This publication has 11 references indexed in Scilit:
- Low-Latency and High-Bandwidth Pipelined Radix-64 Division and Square Root UnitPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2022
- Low Latency Floating-Point Division and Square Root UnitInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
- Quad Precision Floating Point on the IBM z13Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Performance/Power Space Exploration for Binary64 Division UnitsInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015
- The Floating-Point Unit of the Jaguar x86 CorePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Split-Path Fused Floating Point Multiply Accumulate (FPMAC)Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Advanced Clockgating Schemes for Fused-Multiply-Add-Type Floating-Point UnitsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- A Radix-10 Digit-Recurrence Division Unit: Algorithm and ArchitectureIEEE Transactions on Computers, 2007
- Floating point division and square root algorithms and implementation in the AMD-K7/sup TM/ microprocessorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- 1 GHz HAL SPARC64/sup R/ Dual Floating Point Unit with RAS featuresPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002