# **Design and Simulation of Vedic Multiplier using Urdhva Scheme**

## *Santhana Krishnan*

*Abstract ---* **To improve performance, a new Multiplier is designed using modified old Indian Vedic Mathematics. Multiplier is an essential components of high speed processors. Adders used for partial product addition affects negatively the Multiplier's speed. To obtain partial product of an 8 bit multiplier involves, new Vedic mathematics is used. Carry skip method makes us understand the principle used in partial product addition in Vedic multiplier. Vivado 2015.2 Software is made use to code 8X8 Vedic multiplier in VHDL.**

*Index Terms—***Vedic Multiplier, Urdhva Scheme, Multiplier, Verilog, VLSI**

# **I. MATHEMATIC TECHNIQUES IN ANCIENT INDIA**

The following are the various schemes involved in Vedic multiplier (in Sanskrit) [1-4]

- Ekanyunena Purven
- Yaavadunam
- Nikhilam Navatascaramam Dasatah
- Shunnyamanyat
- Chalanakalabyham
- Ekadhikina Purvena
- Gunakasamuncyay
- Gunita Samchyay
- Parraavartya Yoj
- Purana Purnabyham
- Sankalana Vyanvakalanabhyam
- Shesanyanleena Charamena
- Shuniyam Soamyasamuccaye
- Sopaantyadvayamantyam
- Urdhva Tiryakbhyam(UT)
- Vyashti Samanstih
- Yaavadunam

## **II. UT SCHEME**

It is the common technique applicable to "n" case of multiplication. Then it involves in the Division between two large numbers as in Fig.1. In general it means "Vertical and Crosswise".



Fig. 1. Product of two 2 Digit Numbers as 17X19

# **III. MULTIPLICATION PROCEDURE**

Vedic is defined as the knowledge of everything from the word Veda. Hence the mathematics means Vedic mathematics. Geometry, Algebra are dealt with using 16 Schemes in Vedic mathematics. Urthva-Tiryakhbyam is one among them. Because of parallel calculation to achieve partial generation product multiplier is not affected by clock frequency [5-9]. Thus high clock frequency operation doesn't need microprocessor in UT scheme.

#### **STEPS**

1. Consider four bits each of no A & B.

2. Four bits of A are numbered as  $A(0)$ ,  $A(1)$ ,  $A(2)$ ,  $A(3)$ and B in this same way.



RESULT: C(6)\$(6)\$(5)\$(4)\$(3)\$(2)\$(1)\$(0).

3. The product of  $A(0)$  and  $B(0)$  will be  $S(0)$ .

4. Add the multiplied results of  $A(0) \& B(1)$  to that of  $A(1)$  $\&B(0)$  and the Sum is S(1)  $\&$  Carry C(1).

5. The product of A(2) & B(0), A(1) & B(1) and A(0) &  $\frac{a0}{b0}$  $B(2)$  is added with  $C(1)$  and Sum is  $S(2)$ & Carry  $C(2)$ .

6. Then the results of A(3) & B(0), A(2) & B(1), A(1) & B(2) and A(3) & B(0) is added to C(2) ,the Sum is S(3) & Carry is C(3).

7. Similarly the product of A(3) & B(1), A(2) & B(2), and an A(1)&B(3) is added with C(3) then the Sum is S(4) & b1 Carryis C(4).

8. In this same way results of  $A(3) \& B(3)$  and  $A(2) \& B(3)$ is added to  $C(4)$  and the Sum is  $S(5)$  & Carry is  $C(5)$ .

9. The product of  $A(3)$  &  $B(3)$  is added to  $C(5)$  and the Sum is  $S(6) \& C(6)$ .

10. In the end, result is C(6)S(6)S(5)S(4)S(3)S(2)S(1)S(0).

## **IV. 8X8 VEDIC MULTIPLIER ARCHITECTURE**

For multiplying two binary numbers using any number of bits, the same methodology used above is applied. The UT based 8 bit Vedic multiplier follows that the analogy mentioned below [10-19]. It use four 4X4 multiplier blocks and many adder units to realize perceived topology. It is designed to improve the speed of topology by UT scheme Carry save architecture.



Fig. 3. 8X8 Multiplier Architecture

#### **V. 2x2 VEDIC MULTIPLIER**

The UT based 2 bit Vedic multiplier uses four AND gates and two half adders to obtain partial product generation and realize the required addition process. The delay with this type of design is estimated as one AND gate delay (since computations of partial products uses all AND



architectures and altered AND gate topology.

gates) in addition to two half adder delays. Multiplier's performance is palpably advanced by high speed adder

Fig.4. 2X2 Vedic multiplier block design

#### **VI. 4x4 BIT VEDIC MULTIPLIER**

By explanation of UT scheme in the line diagram, four bit binary multiplier can be understood. By means of AND gates, partial products are created in parallel. Partial products are added by means of several carry save adder arrangements and a final vector merging adder. By way of half adder units in carry save adder arrangements, first bit addition stage and final vector merging adder results found in overall layout area and consumed power, the corresponding topology is under stood. 4 bit binary multiplication can be understood by its similarity to 2 bit multiplier.

Multiplication of two 8 bit inputs A[7:0] and B[7:0] yields 16 bit output P[15:0], because carry save arrangements use adder blocks large latency has not been introduced. There are five type of delays in the architecture wiz one 4X4 multiplier delay, three full adder delay and the one by 8 bit Ripple carry adder. In view of its improved performance in transistor count is designed in the higher side. The inputs to the Ripple carry adder blocks of four bit multiplications are accompanied by zero padding. This is to make uniform bit-lengths for all the inputs to particular adder stage.





## **VII. ANALYSIS AND CONCLUSION**

The Vedic multiplier involves the improvement of computational speed in multiplication operations of the digital image processing and it can be easily realized on hardware.

#### **A. Simulation of Vedic Multiplier in DCT Applications**

Vedic multiplier greatly speeds up computation of image processing. Vedic algorithm is one of many hardware to perform important functions such as DFT, DWT, FFT, etc,. DCT uses Vedic algorithm.

The reconstructed images of standard multipliers and that of Vedic algorithm are not visibly different. Original image is divided in to sum images by the technology of transform coding algorithm. Transfer co-efficient closer to the top left corner each block are calculated. So that they contain most information to quantize the image with perfection. The image is divided into 8X8 or 16X16 by sixteen blocks by JPEG image compression. DCT is computed each block in two dimensions.

Single image is retrieved by JPEG receiver by decoding quantized DCT co-efficient and computing inverse two dimensional DCT of each block. The quality of reconstructed image is not affected by DCT co-efficient as it is often time close to zero.

# **B. JPEG Process:**

Division into 8 by 8 blocks of original image is done by JPEG process. The normal range of pixel values for black and white image is from 0 to 255, but pixel values from - 128 to 127 used in DCT designing. The need to modify each block to work in that range thus arises. DCT matrix on the left DCT transpose matrix on its right are multiplied with modified block and the product DCT is applied to each block. Compression by quantization of each block followed Algorithm using Vedi<br>by entropy encoding of quantized matrix and the image International Advance by entropy encoding of quantized matrix and the image reconstruction through reverse process. Decompression requires inverse DCT by virtue of its coherence and symmetry in in Vedic algorithms, it can have regular silicon [13] Rakshith T R and F<br>layout occupy less area and consume low power<br>Speed Low Power layout occupy less area and consume low power.

#### **C. Digital Signal Processing Applications**

Development of Signal processing algorithm uses high level languages like C or Matlab containing floating point number representations. Floating point representation of architecture mapping uses more hardware and consequently M and Sural expensive. Silicon is used in fixed point number Mathematics expensive. Silicon is used in fixed point number representations. So the aim is to develop minimum hardware modules for multiplication operations like FT, FIR, IIR, image processing, seismic signal processing, Optical signal processing etc.,. Any effort to make simple architecture advantages to the basic block during development stage is a good gesture. Digital signal processing applications are precis with 16 bit Q15 format and 32 bit Q31 format.

Through integral multipliers required for Q format multiplication, it has an edge over floating point multiplier, which are faster and consume less die area.

*Santhana Krishnan (ECE, Francis Xavier Engineering College, India; santhanakrishnan291@gmail.com)*

#### **REFERENCES**

- [1] Suryasnata Tripathy, L B Omprakash, Sushanta K. MandaInternational Conference on Communication, Information & Computing Technology (ICCICT), Jan. 16-17, 2015 , Mumbai, India ©2015 IEEE "Low Power Multiplier Architectures Using Vedic Mathematics in 45nm Technology for High Speed Computing", B S Patro KIIT University, Bhubaneswar, India-751024.
- [2] AmritaNanda,"Design and Implementation of Urdhva- Tiryakbhyam Based March -2014
- [3] C. F. Law, S. S. Rofail, and K. S. Yeo "A Low-Power16×16-Bit Parallel Multiplier Utilizing Pass- Transistor Logic" IEEE Journal of Solid State circuits, Vol.34, No.10, pp. 1395-1399, October 1999.
- [7] K.K.Parhi,VLSI Digital Signal Processing Systems: Design and Implementation. New York, NY, USA: Wiley, 1999.
- [9] L. D. Van and C. C. Yang, "Generalized low-error area-efficient fixedwidth multipliers," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52,no. 8, pp. 1608– 1619, Aug. 2005.
- [10] M.E. Paramasivam and Dr. R.S.Sabeenian, "An Efficient Bit Reduction Binary Multiplication Algorithm using Vedic Methods", IEEE 2nd Computing Conference, 2010,ISBN: 978-1-42444791-6/10, pp. 25-28.
- [13] Rakshith T R and RakshithSaligram, "Design of High Multiplier using Reversible logic: a Vedic Mathematical Approach", International Conference on Circuits, Power and Computing Technologies (ICCPCT-2013), ISBN: 978-1-4673- 4922-2/13, pp.775.
- [14] Sushma R. Huddar, Sudhir Rao Rupanagudi, Kalpana M and Surabhi Mohan, "Novel High Speed Vedic Multiplier using Compressors", International Multi conference on Automation, Computing, Communication, Control and Compressed Sensing(iMac4s), 22-23 March 2013, Kottayam,ISBN: 978-1-4673-5090-7/13, pp.465-469.
- [15] S. N. Tang, J. W. Tsai, and T. Y. Chang, "A 2.4-Gs/s FFT processor for OFDM-based WPAN applications,"IEEE Trans. Circuits Syst. II, Exp.Briefs, vol. 57, no. 6, pp. 451–455, Jun. 2010.
- [16] S. C. Hsia and S. H. Wang, "Shift-register-based data transposition forcost-effective discrete cosine transform,"IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 15, no. 6, pp. 725–728, Jun. 2007.
- [18] "VLSI design of low power digital FIR filter using PSPICE and VLSI design of high speed digital FIR filter using VERILOG HDL." (2013). Chapter 5 by Vigneswaran.T.
- [19] Y. H. Chen, T. Y. Chang, and C. Y. Li, "High throughput DA-based DCTwith high accuracy error compensated adder tree,"IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 19, no. 4, pp. 70 Apr. 2011.