Augmented reality on mobile devices

Alexandru Fediuc
Associate IT Consultant
@.msg systems Romania

Virgil Andreieș
Associate IT Consultant
@.msg systems Romania

PROGRAMMING

Augmented reality has entered the area of interest of consumers and, of course, that of programmers, too, along with the development of processors and graphics cards on mobile devices. However, one of the first devices that used the idea behind this technology was Sensorama, created by Morton Heilig, over 40 years ago. The device worked on similar principles, but with a more "rudimentary" manner or implementation. What made augmented reality famous was the appearance of the well-known Google Glass, and the one that managed to push the barriers further is the device patented by Microsoft, Kinect, together with the virtual headphones. I will not focus on these topics, as they belong to a different category, which I would call "still experimental". Nevertheless, these technological "pushes" made the appearance of Augmented Reality (AR) possible also on the mobile devices. Nowadays, even a novice programmer can build such an application with the help of some powerful SDKs made available to anyone.

AR is a way of augmenting physical elements by superposing them with digital content. For mobile devices, the applications make use of different sensors of it, such as the GPS, the video camera or the microphone. The industry most "affected" by this trend is that of gaming, followed closely behind by that of retail, but more and more domains are finding some use of augmented reality. Whether they are e-learning applications, which can identify texts, logos or other graphical artifices or applications which can give you information by simply placing the camera in front of a historical monument, they prove the fact that this technology is already beginning to take shape.

AR creates a connection between user, the environment and the virtual world. The AR technique is that of attaching, assigning 3D or 2D pictures to the real elements by the so called "markers". An example of visual marker is a 2D bar code reader. In addition, numerous sensors such as those of movement and tracking, recognition or image or gesture analysis sensors and, most of the times, the GPS are used in AR.

Tracking methods

In order for the application to know where you are exactly and what it is that you are looking at (the location and orientation of the room) a calibrated video camera is needed. The system through which the location and its relative orientation are calculated is called tracking. This is one of the foundations of augmented reality. However, in order to transpose correctly a virtual object into reality, something additional is required, and this is a marker. Its role is to define the size of the virtual object as well as to recognize the orientation of the video camera. A good marker is a marker that is easy to detect in any circumstances, such as the ones based on brightness differences and not the ones based on color variations, which can become difficult to interpret because of the light variation. Many of the market systems are used on black-and-white squares in order to make a clear distinction between markers and non-markers. Markers can be of several different types:

template markers - where the match is made with the help of a black-and-white template. It is advisable to use a clearly defined image, framed by a border.
bar codes - formed, in most of the cases, from black-and-white cells framed by a border or which come together with some graphical marks.
imperceptible markers - images, infrared markers, miniatures (markers impossible to detect by the human eye).

Another way of tracking is the one based on a model. This system consists in comparing a digital model to a real object within a scene. This concept is based on the sequential analysis of a visual scene and the provision of some conceptual descriptions of the events that are occurring inside of it. In order to better understand this system, I suggest the following scenario: a street where cars pass every day and a video camera above it. First of all, it is necessary to separate the static elements from the dynamic ones, to be more specific, the segmentation of the movement. Next, there is the creation of some geometrical 3D models to superpose on as many car categories as possible and the creation of a movement pattern in contrast to the static road. Thus, we can create a scene where the cars are taken out of the context and become the object of focus.

Frameworks

There already are on the market several libraries that come to the help of programmers by giving them the possibility to invest their time more in the conceiving of the product and the software idea than in the algorithms necessary to the creation of markers and the usage of different sensors of a mobile device. Most of these frameworks are cross-platform, meaning that they can be used on several devices and systems. Of all these, three SDKs have drawn my attention and are worth mentioning.

Vuforia

The platform of those from Qualcomm offer a wide range of support for different systems, providing thus the possibility of writing a native application and of making it available on a wide range of devices. It uses a technology based on Computer Vision for perceiving and tracking planar images (Image Targets) and simple 3D objects such as cuboid objects or spheres, in real time. Among the advantages, we should mention the fact that it is a free library which offers support for iOS, android and Unity 3D. 3D objects can also be created by means of code; it supports multi-tag, extended tracking (when the marker is no longer existent in the shot) and face-tracking and, last but not least, it functions very well with the graphic NinivehGL engine. Moreover, the tracking is much more stable compared to the other platforms. The facts that it does not have a graphical interface, that the development of an application is more difficult until you get accustomed to the platform and that you will have to write separate code for the systems (this thing, however, can be solved once you integrate it with Unity 3D) are among the disadvantages.

D'Fusion

The package of those from Total Immersion has a wide range of support for most of the devices. It has a rather good graphical interface where you have the possibility to create the entire scenario. The programming part is carried out in LUA, and the android and iPhone libraries are already precompiled, the applications built in D'Fusion being independent of the operating system. It offers support for Unity 3D and it is compatible with files from Maya or Blender. The D'Fusion Studio development platform can be downloaded for free. D'Fusion is mostly oriented towards the retail part, providing many tools in this line.

Metaio

Another fashionable and very easy to use platform is Metaio. Just like the other above mentioned SDKs, this one provides support for most of the known tracking methods, too: markers, 3D models, image target etc. Important economic agents have turned to this platform for the development of some successful applications: Ikea, Lego, Audi. But Metaio does not offer tools of the "Code Once" type, therefore it is necessary to program separately for iOS and Android. Metaio shows a lot of potential, but the fact that you have to pay in order to use the framework and the existence of a rather poorly written documentation keeps many potential programmers at a distance.

The Creation of an Augmented Reality application by using the Unity 3D engine and the Vuforia extension for Unity

Unity 3D

What is Unity 3D?

Unity 3D represents an extremely powerful 3D engine as well as a development environment for extremely user friendly interactive applications. It has the advantage of being very easy to use by people who do not possess solid knowledge in programming, as well as by the professionals. Another benefit is represented by the fact that Unity Technologies offers two versions to the developers, the free one and the Pro version, for which the user has to pay. The Pro version offers more features and some of them for the amount of 1500\$. However, this price is completely justified if we consider the fact that the Unity publishing license is very permissive. For a starter, the free version should be enough. A short comparison of the two versions can be found at the address http://unity3d.com/unity/licenses , as well as the place where you can download the free version.

General features

The engine uses three programming languages: C#, Boo and Unity JavaScript and it can be used to develop applications for most of the operating systems, even the mobile ones. In addition, it offers the opportunity to work directly in the 3D environment, proper for creating game levels, menus, animations, for developing scripts and attaching them to the objects. And all of these are available for only a few clicks, the graphical interface being an extremely easy to learn one. A Unity project represents a simple file which contains each resource that belongs to the game or to the interactive application.

Assets

The assets represent each resource that the application uses. Therefore, under the name of "Assests" we mention the 3D models, the materials, textures, audio resources, scripts, fonts. Apart from a few simple objects, considered primitive, such as cubes and spheres, Unity does not have the possibility of creating these assets. Instead, they have to be created externally by using 3D simulation applications and painting graphical tools, and, afterwards, they have to be imported in Unity. This thing is very easy to achieve, the importation being at the same time robust and smart. Unity accepts all popular file formats, including 3D Studio Max, Blender, Maya and FilmBox, keeping the materials, textures and rigging.

The Scenes

The scenes represent the locations where the objects from the assets will be placed and arranged in order to create play screens. The hierarchy board represents the content of the current scene in an arborescent format.

Scripting

The scripts are known as the behaviours. They ensure the manipulation and the creation of interactivity between the resources. They can be reused for several objects, their attachment to the resource being done in an extremely simple manner. At the same time, several scripts can be added on the same playing object.

Example (C#):

using UnityEngine;
using System.Collections;

public class PlayerScript : MonoBehaviour {
        // Use this for initialization
        void Start () {
        }         
        // Update is called once per frame
        void Update () {
        }
}

Note: The name of the class should be the same with the name of the file where it has been created.

All the scripts that are attached to an object contain the start() and update() methods. The start() method is only called once, when the object is created, whereas the update() method is called once in a shot.

void Update () {
    float horizontal = Input.GetAxis(“Horizontal”);
    float vertical = Input.GetAxis(“Vertical”);
    transform.Translate(horizontal, vertical, 0);
}

Now that we have created the script, it should be assigned to the asset. This can be done by "drag-and-drop" on the playing object. With the assigned script, the game can be run.

Publication

Unity can publish in Windows, OS X and through the Web Player plug-in. Web Player is a plug-in for browsers, which works with all known browsers and offers the same performance with the stand-alone application for desktop.

With Unity Pro, you can publish for a wider range of platforms, including: Android, iOS, Wii, Xbox One, Xbox 360, PS3, PS4, Windows Store, Windows Phone, Flash.

Vuforia

What should we know about Vuforia?

Vuforia has several technologies incorporated in its SDK, which come to help the developers. Among them, there is also Computer Vision, technology through which developers can position and orient the virtual objects, such as the 3D objects in correlation with the images of the real world when they are viewed through the camera of some mobile devices. The virtual object follows the position and orientation of the image in real time, so that the user's perspective on the object will correspond to the perspective of the target image. Therefore, the virtual object will appear as part of the real world scene. Vuforia allows some variations of the implementation of augmented reality: the model over which this virtual world/ virtual object overlaps is an image, a unique target called Image Target, which can even be a marker offered by Qualcomm. Vuforia also offers the possibility of some multiple targets.

The SDK supports a variety of target types, including "markerless" targets, multi-target 3D configurations, virtual buttons using "Occlusion Detection" and the possibility to create and reconfigure classes of targets at runtime. Vuforia offers APIs in C++, Java, Objective-C and in .NET languages through the extension to the Unity engine. This way, the SDK provides support both for the development in the Android and iOS native environment and the development of AR applications in Unity. These can be equally easy to port on several platforms, including Android and iOS.

In the example described below, we will use a free Vuforia marker. The 3D object has been overlapped on the image. This object is built with a set of Blender and Photoshop tools. Through the sophisticated algorithms of Computer Vision, the features of the image are detected and tracked. The target is eventually known through successive comparisons of these features and characteristics to those of the image, kept in a data base. From the moment when the target is detected, it will be tracked for as long as it remains in the view of the photo/video camera. The creation of the targets requires access in the user account on the Vuforia site. The targets are created from .jpg or .png (RGB and greyscale) files. The characteristics are kept in a data base, being organised into sets of data.

Creation and running of an example - tutorial

We are going to briefly describe, in the following lines, all the steps (some of them can be, of course, omitted through alternative approaches) in the development of an AR application.

Suppose the user has installed the compatible versions of Unity and the Vuforia extension for Unity. In addition, he needs a web camera or the camera of his smartphone or tablet. Also, print the target image on an A4 sheet of paper, after having created it.

After installing the tools, you will have to create an account on the Vuforia official site: https://developer.vuforia.com/user .

The next step is to create the target (Image Target). Navigate to the Target Manager web application on the developer's portal. This application allows the creation of a target data base so that they can be used on certain devices as well as in cloud. Create a data base, give it a name and assign a target to it. After the uploading of the target is complete, Vuforia runs the necessary checking and processing. Then you can download the image target. Download the file with the extension .unitypackage which contains the target.

Start Unity, create a new project and import the .unitypackage files of Vuforia (the SDK and the image target). Delete the Main Camera from the hierarchy of the scene. Import now the 3D model that you wish to place over the image target. In the Project window, open the Assets/Qualcomm Augmented Reality/Prefabs file. Place the ARCamera object in the scene. With this selected object, search in the Inspector and make sure the "Load the Data Set" option with your data base (Image Target) is set as "Active". From the same Prefabs file import the image target into the scene. With the image selected, search by using the Inspector and set the "Data Set" as image target. The previously created image should be visible in the Unity editor. By using "drag-and-drop", add the model in the image target object of the Unity hierarchy. Use the facilities, values and moving tools on the x, y, z axes in order to fix the 3D object right in the center of the target. From now on, everything depends on your creativity. A suggestion we can make is to place a source of light (directional light) from Unity to shed light upon the model. The example can be run with the "Play" button. Vuforia and Unity will detect the web camera and Vuforia will apply the detection and tracking algorithms and will place the object on the printed image. The application can then be ported with the help of the inner tools of Unity to run on a mobile device.

Augmented reality - why only now?

I have tried in these lines to draw an overall picture of this emergent technology. It's not like augmented technology and its implications in everyday life have just been discovered. But to overcome the problems emerging in the development of this technology (and implicitly, its involvement in more areas) needs time. These problems originate in several domains: the sociological domain

the mentality through which we see the mobile devices still as a sort of PCs, when they can be much more, the technological one - the applications of the AR type require powerful graphical processors in order to be able to overlap the 3D object(s) in real time, without distorting or interrupting it; this also means a much greater energy consumption, user interaction - the creation of some easy to use applications that can be applied in real life. An AR application has to run in real time; otherwise it will use outdated, false information. The performance of AR application for the mobile devices is completely dependent on optimisation algorithms, since the processing power and memory are limited for them.

AR applications are necessary in the situations where human perception can be improved and where the usage of virtual objects in our everyday life can significantly improve our living. These applications can bring to us a new way to see and interact with the real environment and the virtual one, at the same time, an improved reality in our own pocket, too.