Monday, March 11, 2013

Using ARM Neon to optimize square root function

SA,

I was working on a platform that has an ARM A9 CPU, Exynos4412 Samsung, and had few computer vision codes that has few calculations for sqrtf, and it was really slow :-).

Calculating the square root is helpful in many cases, especially in game development to calculate the distances between points,..etc. Everyone who worked on game development especially at the era of software rendering and even on modern CPUs, know that using sqrt of the cmath.h is really heavy on the processor. The same for calculating sin, cos, one of the tricks is to use a look up table that has the precomputed values of most angles, and then you retrieve the result from the table when you need it.

Carmak who is the lead programmer of  most ID games, quake, doom,  had written a very fast square root function that uses newton raphson approximation, and that was really pretty fast, the function can be found in the quack 3 source code.I have tried to use that function and it was also fast, but not as fast as using the neon intrinsics of ARM.

I have written a small test application which uses the sqart of cmath, and one that uses ARM neon and that was the result: Using neon intrinsics with square root was two times faster than the normal square root.



I also used openMP get_time to measure the time slice between the two functions.

3 comments:

  1. Hello Please can you send me the code ?

    ReplyDelete
  2. The source code of Quake or Doom is available in internet. There are many resources about optimizing the square root. I just tested it on an ARM Platform.

    ReplyDelete
  3. hello, the arm neon algorithm is fast because of the data parallelization, the algorithm is actually slower than Carmak's algorithm, but the arm neon algorithm is more precise than carmak's. so if you don't care much on the accuracy, you can use carmak, or just use the neon algorithm.

    ReplyDelete