Augmented Reality Science-Fiction vs. Science-Fact: Are We There Yet?

A common tool in SciFi movies, Augmented Reality is presented as an ideal way for interacting with some types of information. The film "Iron Man", for example, presents a Human Machine Interface that seems to open up possibilities for designers and researchers that were never possible before (and are still not possible to that degree). The film "Minority Report" presents a Human Machine Interface for dealing with large quantities of visual data (which is Science Fact)

Friday, February 15, 2013

Passenger Safety: Augmented Reality for Blind Spots

In the process of looking to buy a car, I noticed that different cars have different blind spots. Some worse than others.
The two cars I tried, a Ford Focus and a Toyota Prius have a very different feel in terms of blind spots.

For example, the rear-view mirror on the Prius isn't all that useful. The back window is small, and more importantly narrow, so it is hard to get a sense of vehicles that are behind you to the left or right that might be attempting to pass you from either of those directions.

On the other hand, the back window on the Focus is large and wide, and the rear view mirror provides a much grander view of the status of the road behind you.

However, I noticed with the Focus that that wide view from the back provides a false sense of security, because vehicles may still be in your blind spots but you are led to believe that no one is trying to pass you, while on the Prius due to the narrow back window, you are constantly reminded to look behind your shoulder before and during a lane change.

That got me thinking... if my rearview mirror could "see through" the frame of the car, and show me what the road looks like from all directions, as a driver, my road sense would keep me alert to trafic that is approaching behind me on neighbouring lanes. Therefore, if I had a couple of cameras installed at either side of the roof of my car (on the outside), aimed at the "blindspot" area, and an ability to superimpose the image somehow on what is seen on the rearview mirror (with little distortion), then my rear-view mirror would essentially eliminate the blindspots.

While searching for more info on the topic, I bumped into a paper that covers current technology for eliminating blindspots as well as how to use Augmented Reality in a low cost and efficient way for that purpose.

Friday, February 8, 2013

Optimizing for Augmented Reality: Ideas for working around the polygon count limit when rending 3D models

Mobile phones are (still) limited in terms of compute power when compared to desktop computers. Part of it is due to the limited CPUs and GPUs, but another part of it is due to power consumption. A desktop or laptop pugged into a power outlet could carry out computations that would drain the battery of a mobile phone. Therefore, there are implementation issues and limitations that may be crucial to mobile phones, while at the same time being close to irrelevant for a desktop.

One such limitation is that of degree of complexity of 3D models rendered by the various AR libraries. The more complex the object, the larger the number of polygons that need to be handled and drawn. Granted the GPU takes care of a lot of the work, but still the limit can be hit fairly easily with a large enough number of polygons (a few thousands).

This limitation is critical for much of the gaming industry, but with Augmented Reality implementations, that limit should be set to much lower. The main reason for that, I believe, is that there is a lot of image processing going on in parallel to rendering the model. Namely, the marker or image tracking that is used to anchor the models in the "real world".

A "simple" solution to the problem is to dumb-down the model, i.e. take a complex object and calculate (off-line) a simpler, approximate, version of the model (for example, by using the Blender Poly Reducer script, or the QTip plugin in 3D Studio Max). That would reduce the number of polygons and boost performance (as is currently done when rendering far away models, and in a technique called "model swapping"). If the simplification is done on a server ahead-of-time, and the server is aware of the mobile client's processing power, a model with a slightly adjusted degree of simplification could be sent to the mobile client based on how powerful the mobile client is. Alternatively, a fairly complex model could be stored on the server, and a simpler model calculated on-demand based on the requesting client's processing power. This sort of simplification tactic can reduce communication bandwidth as well (due to the simplified models also having a smaller memory footprint).

The downside of the dynamic approach is that the algorithms used for reducing the polygon count need to be really well trusted not to wreck the model's general appearance, as there's no human based testing involved to verify the visuals. On the other hand, when the simplifications are slight (i.e. reducing the number of polygons by 10%-20%) the chances of wrecking the 3D model should be much smaller.

Now, what if we wanted to show a 3D model to it's finest detail without sacrificing performance?

Other than waiting around for stronger and less power-hungry mobile CPUs & GPUs to come along, are there other options?

If you consider eye-tracking technology, the phone's or tablet's front-facing camera could be used to track (and calculate to a degree) where on the screen the user is focusing on.

One way to use such a technology is to take into account that our eyes only see clearly what's in focus. So what if we could define an approximation algorithm that uses a varying degree of approximation based on how close the polygons are to the focal point? That way, only the area that is actually in focus would be rendered with the maximum precision, while areas around it would be rendered in an approximated way. If the approximation is easy enough to calculate, it should be easier for the phone or tablet to handle that rather then fully rendering a very complex 3D model.

Of course, one factor that prevents this from happening is that, to my knowledge, current APIs do not provide a way to access both cameras at the same time, but perhaps a future version would allow this.

Generally speaking, this type of optimization could be equally useful for games on consoles and PCs as it is for augmented reality, and possibly even more feasible in the near-term on the former, say, using a Web cam as an input for eye-tracking based focal-point detection.

However, it is rather likely that with 3D models in AR, there should be areas where viewers would generally focus more and areas where they would generally focus less. Meaning, if we could do some user-based analysis of interactions with such models combined with focal-point detection, we could probably come up with "hot spots" on the model, i.e. areas where users generally focus more than others. We could then use the gradually adjusted simplification I talked about above to create a version of the model that gets more detailed at areas where users would generally focus more, and skip a few details where users would tend to focus less. By doing that, we should be able to reduce the general number of polygons, not as much as by real-time focal-a tracking and adjusting, but perhaps better than rendering a highly detailed model where some parts of it may not draw the user's attention but would still cost compute power.

Monday, February 4, 2013

Superman isn't the only one who can See Through Walls - Improving Vehicle Passenger Safety with Augmented Reality

For us ordinary humans who do not possess the power to see through walls, there's technology. For examplee MIT Lincoln Lab's project for seeing through walls using microwave radio frequencies.

Or a more passive "see through walls" radar system which takes advantage of wireless routers developed by University College London. When wireless radiation from WiFi routers passes through moving objects it gets slightly modified by the Doppler effect. A sensitive receiver can pick up on those frequency shifts and locate moving objects.

However, one might not need to see through walls to improve passenger safety with Augmented Reality. One might only need to see around them, or rather, around corners.

In many situations when line-of-sight is obstructed by buildings, approaching an intersection or turning into one may be dangerous. Crossing vehicles, pedestrians, or cyclists might surprise a driver who's braking distance might be too far into crossing traffic.

Some intersections are deployed with Convex Mirrors for this purpose.

However, with such mirrors the distance to an object is harder for a driver to estimate, as its speed. It also requires the driver to shift their focus to the mirror rather than the road, which can create other problems with hazards that are directly in the driver's line-of-sight.

A system developed by Carnegie Mellon University is using Augmented Reality techniques for overlaying live footage taken from cameras positioned on the walls of buildings in a problematic intersection onto live footage taking from the driver's point-of-view and combines them in real-time, such that they both appear to be taken from the driver's point-of-view.

With many cities installed nowadays with networks of CCTV cameras, it should be possible, according to the researchers, to integrate such sources of live footage into the system.

The idea has been explored before by researchers from Japan in a system that integrates footage from surveillance cameras allowing the user of the device to see obscure and hidden areas, essentially "through walls" and other obstacles.

The disadvantage of the approach is that it takes a lot of processing power to get right, and depending on where the CCTV cameras are positioned it may be hard to impossible to perfectly overlay the image without distortion. Therefore, some researchers (also from Japan) have explored the option of using the data from CCTV cameras but instead of overlaying the image as if the driver can "see through walls", they use abstract objects (such as a circle) to mark an object approaching from an obscure area. The approach may be more practical to implement, but its advantages over the "see-through" effect are yet to be fully studied. Although the processing power needed may be reduced, and the accuracy increased (due to the simplified problem), it remains to bee seen whether drivers can respond to the information in real-time with a similar effectiveness as the "see-through" approach.

I would imagine the best application for this being integrated onto a vehicle's heads-up-display, noting the driver of possible collision. For safety purposes this would have to be instantaneous (or "real-time"). Take a look at the linked paper on a practical application of this approach - the intent there is to make it simple enough for it to become real-time.

Current pre-collision detection systems use varying methods for detecting a potential collision with objects that are in the driver's line-of-sight (e.g. radar or real-time video analysis).
As an extension to the model of pre-collision warning systems, it would be great if such information garnered from either cameras or sensors in problematic junctions would present itself in much the same way on the driver's windshield (perhaps noted by an audible signal as well) to warn the driver of a potential collision (or for the car to take action and brake in emergency situations).

Another interesting extension of this model is for fully-autonomeous cars, such as Google's Driverless Car. If you could detect approaching objects around corners it would provide the car with more time to react to hazardous situations.