Wednesday, October 21, 2009

Cranking iPhone Performance to 11, Without Inline Assembly

I've been writing high performance code on microcontrollers and digital signal processors for embedded systems since the early 90s. I've made a career in realtime control software, the likes of which you'd see in military tracking systems or even in the computer running the engine of your car.

At GDC Austin 2009, Noel Llopis (@SnappyTouch) gave a compelling presentation entitled "Squeezing Every Drop Of Performance Out Of The iPhone". His presentation provides a wonderful overview of the performance concerns in developing applications for the iPhone. This is good stuff, close to my heart.

Noel also gave a presentation at 360iDev here in Denver entitled "Cranking Floating Point Performance To 11". The core of his presentation revolves around utilizing the vector floating point unit by moving more code into inline assembly.

Here is where our opinions differ. I was on that path too. In the early 90's I wrote about half of my code in assembly language. But by the early 00's it was probably 10% and now it's zero. You see, over the last ten years or so things have changed. The machine instruction sets in modern processors are more powerful than ever. They're also getting more difficult to understand. This is especially the case when it comes to optimization and 'hidden' repercussions like pipeline stalls. Having a pipeline means you can reorder your code in a way that causes either a great performance boost or loss.

A few months back I got together with one of my fellow iPhone developers, former CTO of Tendril Networks. We were working on an audio project to do pitch detect and shift (what some people call "autotuning"). The application required running two FFTs every 20 milliseconds. We were pretty much at the limit of the device's capabilities.

We spent the whole weekend working with the Xcode iPhone debugger, Shark, and Instruments and this was our strategy...

Empirical Compiler Optimization:
1. Measure the performance of the time critical section
2. Adjust the optimization settings of the compiler
3. Repeat

Empirical Code Style Optimization:
1. Measure the performance of the time critical section
2. Adjust the C/C++/Obj C code
3. Review results in the debugger's dissassembler
3a. Look for fewer lines of assembly (which is almost always faster)
3b. Look for fewer library calls
4. Repeat

Why we favor this approach:
1. Learning curve involves only cursory understanding of the VPU assembly
2. More time for developing usability and marketing your application
3. It's much, much, much less error prone than inline assembly
4. Not likely to inadvertently trigger performance problems with pipeline stalling
5. Performance will likely be BETTER than inline assembly in all but the rarest of circumstances

The primary compiler optimization setting tricks (optimizing for speed over size):
1. Dissable Thumb mode
2. Set optimization level to "Fastest -O3"
3. Unroll Loops
4. Other C Flags = "-falign-loops=16"

The primary source optimization tricks:
1. Condense code into smaller chunks that fit inside the instruction cache
2. Use all floating point
3. Don't do indexed array lookups inside a bunch of floating point math (this will stall the vpu pipe)
4. Don't use any division, instead always multiply with 1/x

I hope you're able to use some of this advice in your own applications - and crank it to eleven. :)

(Image courtesy Joseph Tey)

Friday, October 2, 2009

Apple TV takes on Wii


Apple has challenged the smart phone industry with the iPhone. However, I'm not so sure we saw the challenge coming to the handheld gaming
industry with the iPod Touch. Apple didn't target handheld gaming, the market just pushed their products there said Jobs in a recent interview.

Apple will release new Apple TV hardware. The primary addition will be an improved controller. Like the Nintendo Wii controller, and so many Apple products, the new controller will contain an accelerometer. This will speed text entry and add tilt input for applications and gaming. With each passing day iTunes makes more headway with the sale and rental of television, movie, music, and application content for Apple's platforms. Driving applications into the Apple TV is a logical and simple next step. Apple already has a devoted and thriving third party developer base and can shift that development and distribution framework over pretty easily to the Apple TV. We'll start to see Apple TV ports of all our favorite iPhone applications first. The Apple TV will become a serious contender to the console market and the Nintendo Wii, just as we saw the iPod Touch become a contender with the Nintendo DS.