Friday, February 8, 2013

Optimizing for Augmented Reality: Ideas for working around the polygon count limit when rending 3D models

Mobile phones are (still) limited in terms of compute power when compared to desktop computers. Part of it is due to the limited CPUs and GPUs, but another part of it is due to power consumption. A desktop or laptop pugged into a power outlet could carry out computations that would drain the battery of a mobile phone. Therefore, there are implementation issues and limitations that may be crucial to mobile phones, while at the same time being close to irrelevant for a desktop.

One such limitation is that of degree of complexity of 3D models rendered by the various AR libraries. The more complex the object, the larger the number of polygons that need to be handled and drawn. Granted the GPU takes care of a lot of the work, but still the limit can be hit fairly easily with a large enough number of polygons (a few thousands).

This limitation is critical for much of the gaming industry, but with Augmented Reality implementations, that limit should be set to much lower. The main reason for that, I believe, is that there is a lot of image processing going on in parallel to rendering the model. Namely, the marker or image tracking that is used to anchor the models in the "real world".

A "simple" solution to the problem is to dumb-down the model, i.e. take a complex object and calculate (off-line) a simpler, approximate, version of the model (for example, by using the Blender Poly Reducer script, or the QTip plugin in 3D Studio Max). That would reduce the number of polygons and boost performance (as is currently done when rendering far away models, and in a technique called "model swapping"). If the simplification is done on a server ahead-of-time, and the server is aware of the mobile client's processing power, a model with a slightly adjusted degree of simplification could be sent to the mobile client based on how powerful the mobile client is. Alternatively, a fairly complex model could be stored on the server, and a simpler model calculated on-demand based on the requesting client's processing power. This sort of simplification tactic can reduce communication bandwidth as well (due to the simplified models also having a smaller memory footprint).


The downside of the dynamic approach is that the algorithms used for reducing the polygon count need to be really well trusted not to wreck the model's general appearance, as there's no human based testing involved to verify the visuals. On the other hand, when the simplifications are slight (i.e. reducing the number of polygons by 10%-20%) the chances of wrecking the 3D model should be much smaller.

Now, what if we wanted to show a 3D model to it's finest detail without sacrificing performance? 
Other than waiting around for stronger and less power-hungry mobile CPUs & GPUs to come along, are there other options?

If you consider eye-tracking technology, the phone's or tablet's front-facing camera could be used to track (and calculate to a degree) where on the screen the user is focusing on.

One way to use such a technology is to take into account that our eyes only see clearly what's in focus. So what if we could define an approximation algorithm that uses a varying degree of approximation based on how close the polygons are to the focal point? That way, only the area that is actually in focus would be rendered with the maximum precision, while areas around it would be rendered in an approximated way. If the approximation is easy enough to calculate, it should be easier for the phone or tablet to handle that rather then fully rendering a very complex 3D model.

Of course, one factor that prevents this from happening is that, to my knowledge, current APIs do not provide a way to access both cameras at the same time, but perhaps a future version would allow this.

Generally speaking, this type of optimization could be equally useful for games on consoles and PCs as it is for augmented reality, and possibly even more feasible in the near-term on the former, say, using a Web cam as an input for eye-tracking based focal-point detection.

However, it is rather likely that with 3D models in AR, there should be areas where viewers would generally focus more and areas where they would generally focus less. Meaning, if we could do some user-based analysis of interactions with such models combined with focal-point detection, we could probably come up with "hot spots" on the model, i.e. areas where users generally focus more than others. We could then use the gradually adjusted simplification I talked about above to create a version of the model that gets more detailed at areas where users would generally focus more, and skip a few details where users would tend to focus less. By doing that, we should be able to reduce the general number of polygons, not as much as by real-time focal-a tracking and adjusting, but perhaps better than rendering a highly detailed model where some parts of it may not draw the user's attention but would still cost compute power.

No comments:

Post a Comment