Human-Computer Interaction: Real-Time Web-Chat Using LeapMotion and SignalR

George Rus
Software Developer @ Yardi România

Daniela Fati
Software Developer @ Yardi România

PROGRAMMING

In a world of technology and smart devices, communication should be treated as a key factor and new ways of human - computer interaction should be considered besides traditional ones.

The aim of this article is to illustrate the idea of using alternative ways of interaction with the computer in accordance with current available and affordable technologies. The proposal will be exemplified by a real-time web chat application that will be built using ASP.NET MVC and SignalR and will receive input from a virtual keyboard controlled by a Leap Motion Controller.

Technologies

In order to implement the real-time web-chat application supporting gesture input from the leap motion controller translated to the virtual keyboard, the following technologies were used:

ASP.NET MVC 5 - http://www.asp.net/mvc/mvc5
SignalR - http://www.asp.net/signalr
LeapJS - https://github.com/leapmotion/leapjs
THREE.js - http://threejs.org/
Tween.js - https://github.com/tweenjs/tween.js/

The web-chat functionality is developed using ASP.NET MVC and SignalR, the virtual keyboard is implemented based on THREE.js and CSS3DRenderer, and the gesture input and feedback signals of the Leap Motion Controller are interpreted making use of LeapJS API and Leap-widgets.js extension.

Real-Time Communication - SignalR

In the past the web pages were static, no one expected them to refresh themselves after some time and each user action required all the content to be resent from the server. Slowly the web began its expansion and the road to dynamic pages started to outline itself. A first attempt came with the introduction of the iFrame tag and the ActiveX control. Later with the introduction of Ajax and jQuery dynamic web pages took over the old static html web pages (http://www.evolutionoftheweb.com/)

When we surf the internet we make use of a web browser to access and display a web page. This web browser has its own rendering engine which helps interpret and coordinate the various tags, elements and resources of that page so it can display it to the user. A couple of years ago we used our pc to surf the internet, we did not have any smartphones, tablets, smart TVs, watches or any other smart and highly capable gadget.

Nowadays almost all our gadgets run a browser that allows us to surf the internet. We want the web applications we develop to be accessible on all these devices. The wide variety of browsers and rendering engines puts us in the situation where our content might look good on some but bad on others. Also we want applications like a monitoring system or a weather application to display new information real-time (as soon as the information becomes available) without us having to request for it. If we connect from our devices to an online social media application, we expect any update we make from our mobile device to be instantly visible on our tablet, smart tv etc. as well as on our friends devices without having to explicitly request for these updates.

So how can we build real-time web applications that we can access on all the devices that are out there? SignalR comes in to help us decide on how we can transport data between the server and the client in a fast and reliable way for real-time communication.

SignalR is an open source library supported by Microsoft. It offers bidirectional (full duplex) communication between the server and the client. So as opposed to the classic client-server model in which the client initiates a request to the server, this time the client and the server share an open channel and the server can also contact the client. It can also provide content asynchronously, supports all browsers and has a smart way of deciding on what kind of transport to use to pass messages.

Transport decision is based on browser user agent, and other client and server configurations and it tries several transports techniques like websocket, server sent events, forever frame and long pooling, having a fallback mechanism if one is not supported.

The connection is started as standard HTTP and if possible it is promoted to websocket which would be the ideal transport from SignalR perspective. Websocket protocol is rather new on the market (2011) and older devices have no support for it. With SignalR we don't have to worry and code our application for older clients, because it can wisely decide what other kind of transport it can use to make its connections. If websocket is not available it decides between server sent events (not supported by Internet Explorer), forever frame (only for Internet Explorer) and Ajax long polling.

SignalR provides Persistent Connections and Hubs communication models. Persistent connection is configurable and similar to web sockets. If for example our application has special requirements and we need to get control of the actions we need to perform when we open a connection we can choose to override it.

Hubs are built over the Connection API. We can choose to use the hubs offered by SignalR and call methods on the server from the client and vice-versa directly. All the connections management part is offered by SignalR. So with Hubs we let SignalR do the dirty work.

Server side: //DemoHub class is derived from class. This allows us to create public methods that can be called from script in web pages

public class DemoHub : Hub
{
    //method that can be called from the client
    public void MessageToServer(string sampleText)
    {
    //specify the function to call on the client, this function has to be defined on the client; This will execute the js function messageToClient on all the clients connected to the hub due to Clients.All property that gives access to all connected clients
        Clients.All.messageToClient(“message received”);
    }
}

Client Side:
      //reference to a hub proxy
       var demo = jQuery.connection.demoHub;
       //js function on the client that can be called from the hub
demo.client.messageToClient = function (message) {
           alert(message);
      };
       //opening a connection to a hub
jQuery.connection.hub.start().done(function () {
       //call MessageToServer method on the hub from the client
demo.server.messageToServer(“Sending message”), 
       });

LeapMotion Controller

The Leap Motion Controller is a device that captures and interprets signals from optical and infrared sensors in order to identify hands, fingers and pointing tools. This tracking device is a good choice for introduction to motion control due to the fact that it has multi-language API support. It also offers a virtual space mapping along with real-time feedback, gesture training and recognition. The next image presents the device along with its coordinate system.

Figure 1. Leap Motion Controller Coordinates System Source: https://developer.leapmotion.com/documentation/cpp/devguide/Leap_Overview.html

The field of view of the optical sensors is about 150 degrees and the working / detection range extends from 25 to 600 millimeters above the device. Measurement units of the device are represented by millimeters for distance, microseconds for time, millimeters per second for speed and radians for angles.

The entity model of the data captured by the controller is denoted by a Frame, containing information from the working space, such as motion and gestures, hands, fingers and/or pointing tools. Using this information gestures such as circle, key tap, screen tap, or swipe may be identified.

The integration of the Leap Motion Controller in the application will be based on the WebSocket interface provided by the system architecture. Data captured by the device is transmitted in a JSON structure through a WebSocket. In this way, client side JavaScript libraries can interpret and parse the messages. The interpretation of the raw data coming from the WebSocket server is done in the browser by the leap.js framework.

Virtual Keyboard

The visual element that intermediates the communication with the computer is the virtual keyboard. This keyboard is implemented using client-side JavaScript technologies such as THREE.js, Tween.js and CSS3DRENDERER as a rendering engine.

Using THREE.js a 3D scene is created in order to present the elements of the keyboard. Some animations are easily integrated with the help of the functionality offered by the Tween.js framework. The rendering choice is represented by the CSS3DRENDERER due to the fact that this kind of rendering is fast, works also on mobile browsers, it is lightweight and it is mainly based on CSS.

The next JavaScript lines will present the minimal code that is necessary to be written in order to create a THREE.CSS3DObject denoting a key and to initialize the scene.

function CreateButton(key, keyDescription) {
    var keyElement = document.createElement(‚div’);
    keyElement.className = ‚key’;
    keyElement.id = key;

    var letter = document.createElement(‚div’);
    letter.className = ‚letter’;
    letter.textContent = key;
    keyElement.appendChild(letter);

    var description = document.createElement(‚div’);
    description.className = ‚description’;
    description.innerHTML = keyDescription;
    keyElement.appendChild(description);

    var css3dObject = new THREE.CSS3DObject(keyElement);
    css3dObject.name = key;

    return css3dObject;
}

function InitScene() {
    var camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 1, 1000);
    camera.position.set(0, 0, 200);

    var scene = new THREE.Scene();

    var key = CreateButton(‚w’, ‚w key’);
    key.position.set(20, 20, 20)
    scene.add(key);

    var css3dRenderer = new THREE.CSS3DRenderer();
    css3dRenderer.setSize(window.innerWidth, window.innerHeight);
    document.getElementById(‚scene’).appendChild(css3dRenderer.domElement);

    var trackballControls = new THREE.TrackballControls(camera, css3dRenderer.domElement);
    trackballControls.minDistance = 50;
    trackballControls.maxDistance = 1000;
}

The following picture illustrates the states of a key – normal, hover and tapped. All these states are realized using CSS rules applied to HTML elements transposed into THREE.CSS3DObjects and rendered into the scene by the CSS3DRENDERER.

These CSS classes will be used during the interaction of the keys with the Leap Motion Controller in order to provide a visual feedback to the user actions.

LeapJS and Interaction with the Virtual Keyboard

The interaction between the virtual keyboard and the LeapMotion controller will be realized through the LeapJS framework. The information provided by the framework will be updated in a loop capturing data with a frequency of 60 frames per second.

Due to the fact that we will need information related about the position of the hand onto the screen and the gestures made by the user, the initialization of the controller will be described by the following lines of JavaScript code:

var leapController = Leap.loop({ enableGestures: true })
                         .use(‚screenPosition’);
leapController.connect();

In order to identify what action is intended by the user, the Frame object provided by LeapJS framework should be considered.

The Frame object provides useful information captured from the detection area, regarding its id, the current frame rate, fingers, gestures, hands, pointables, tools, interaction box, and also a flag indicating the validity of the data provided. A Frame is considered to be valid when it contains data for all of the detected items. Along with these attributes the Frame object provides also a set of functions, such as: dump() , finger(), hand(), pointable(), rotationAngle(), rotationAxis(), rotationMatrix(), scaleFactor(), tool(), toString(), translation(). These functions are very useful in the process of interrogation and processing of the captured data.

Frame Event Callbacks are used in order to handle the information about the gestures recognized and take the necessary actions. First of all, the position of the hand is transposed onto the screen. This is accomplished using the 'hand' detection event as illustrated bellow:

var leapController =
    Leap.loop({ enableGestures: true },
              {
                  hand: function (hand) {
                      var screenPosition = hand.screenPosition(hand.palmPosition);
                      //... screen position related logic implementation
                  }
              })
         .use(‚screenPosition’);

Relative to the hand position onto the screen, elements of interest will be identified, in this case, the key that is intended to be tapped. As a feedback to the user, CSS classes presented in the previous section will be used in order to illustrate the hover and tapped states.

The tapped state is transposed to a key state using gesture related information from the current Frame object. If there is a keyTap gesture after interrogating the gestures collection, then the key situated at that position will change its state and it will be added to a collection in order to create a message that will be sent to the chat mechanism.

Conclusion

Using ASP.NET MVC and SignalR, a real-time web chat mechanism was implemented. Combining the THREE.js along with the CSS3DRENDERER, a virtual 3D keyboard was realized using CSS styled HTML elements. Interpretation of the user gestures as a form of interaction with the keyboard was accomplished from the integration of the LeapMotion Controller in the application through LeapJS and Leap-widgets.js. The result of linking together these information is a functional real-time web-chat application having as input an interactive keyboard that understands user’s gestures.