3D user interaction

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In 3D user interaction (3DUI) the human interacts with a computer or other device with an aspect of three-dimensional space. This interaction is created thanks to the interfaces, which will be the intermediaries between human and machine.

The 3D space used for interaction can be the real physical space, a virtual space representation simulated in the computer, or a combination of both. When the real physical space is used for data input, the human interacts with the machine performing actions using an input device that should know the relative position and distance of the user action, among other things. When it is used for data output, the simulated 3D virtual scene is projected onto the real environment through one output device.

The principles of 3D interaction are applied in a variety of domains such as tourism, art, gaming, simulation, education, information visualization, or scientific visualization.[1]


Research in 3D interaction and 3D display began in the 1960s, pioneered by researchers like Ivan Sutherland, Fred Brooks, Bob Sproull, Andrew Ortony and Richard Feldman. But it was not until 1962 when Morton Heilig invented the Sensorama simulator.[2] It provided 3D video feedback, as well motion, audio, and feedbacks to produce a virtual environment. The next stage of development was Dr. Ivan Sutherland’s completion of his pioneering work in 1968, the Sword of Damocles.[3] He created a head-mounted display that produced 3D virtual environment by presenting a left and right still image of that environment.

Availability of technology as well as impractical costs held back the development and application of virtual environments until the 1980s. Since then, further research and technological advancements have allowed new doors to be opened to application in various other areas such as education, entertainment, and manufacturing.

3D user interfaces[edit]

Scheme of 3D User Interaction phases

3D user interfaces, are user interfaces where 3D interaction takes place, this means that the user’s tasks occur directly within a three-dimensional space. The user must communicate with commands, requests, questions, intent, and goals to the system, and in turn this one has to provide feedback, requests for input, information about their status, and so on.

Both the user and the system do not have the same type of language, therefore to make possible the communication process, the interfaces must serve as intermediaries or translators between them.

The way the user transforms perceptions into actions is called Human transfer function, and the way the system transforms signals into display information is called System transfer function. 3D user interfaces are actually physical devices that communicate the user and the system with the minimum delay, in this case there are two types: 3D User Interface Output Hardware and 3D User Interface Input Hardware.

3D user interface output hardware[edit]

These hardware devices are usually called display devices or output devices and their aim is to present information to one or more users through the human perceptual system. Most of them are focused on stimulating the visual, auditory, or haptic senses. However, in some unusual cases they also can stimulate the user’s olfactory system.

3D visual displays[edit]

This type of devices are the most popular and its goal is to present the information produced by the system through the human visual system in a three-dimensional way. The main features that distinguish these devices are: field of regard and field of view, spatial resolution, screen geometry, light transfer mechanism, refresh rate and ergonomics.

Another way to characterize these devices is according to the different categories of depth perception cues used to achieve that the user can understand the three-dimensional information. The main types of displays used in 3D UIs are: monitors, surround-screen displays, workbenches, hemispherical displays, head-mounted displays, arm-mounted displays and autostereoscopic displays.

3D audio displays[edit]

3D Audio displays are devices that present information (in this case sound) through the human auditory system, its objective is to generate and display a spatialized 3D sound so the user can use its psychoacoustic skills and be able to determine the location and direction of the sound. There are different localizations cues: binaural cues, spectral and dynamic cues, head-related transfer functions, reverberation, sound intensity and vision and environment familiarity.

3D haptic displays[edit]

These devices use the sense of touch to simulate the physical interaction between the user and a virtual object. There are three different types of 3D Haptic displays: those that provide the user a sense of force, the ones that simulate the sense of touch and those that use both. The main features that distinguish these devices are: haptic presentation capability, resolution and ergonomics. The human haptic system has 2 fundamental kinds of cues, tactile and kinesthetic. Tactile cues are a type of human touch cues that have a wide variety of skin receptors located below the surface of the skin that provide information about the texture, temperature, pressure and damage. Kinesthetic cues are a type of human touch cues that have many receptors in the muscles, joints and tendons that provide information about the angle of joints and stress and length of muscles.

3D user interface input hardware[edit]

These hardware devices are called input devices and their aim is to capture and interpret the actions performed by the user. The degrees of freedom (DOF) are one of the main features of these systems. Classical interface components (such as mouse and keyboards and arguably touchscreen) are often inappropriate for non 2D interaction needs.[1] These systems are also differentiated according to how much physical interaction is needed to use the device, purely active need to be manipulated to produce information, purely passive do not need to. The main categories of these devices are desktop input devices, tracking devices, 3D mice, brain-computer interface.

Desktop Input devices[edit]

This type of devices are designed for an interaction 3D on a desktop, many of them have an initial design thought in a traditional interaction in two dimensions, but with an appropriate mapping between the system and the device, this can work perfectly in a three-dimensional way. There are different types of them: keyboards, 2D mice and trackballs, pen-based tablets and joysticks. Nonetheless, many studies have questioned the appropriateness of desktop interface components for 3D interaction [1][4][5] though this is still debated.[6][7]

Tracking devices[edit]

3D user interaction systems are based primarily on motion tracking technologies, to obtain all the necessary information from the user through the analysis of their movements or gestures, these technologies are called, tracking technologies.

For the full development of a 3D User Interaction system, is required to have access to a few basic parameters, all this technology-based system should know, or at least partially, as the relative position of the user, the absolute position, angular velocity, rotation data, orientation or height.

The collection of these data is achieved through systems of space tracking and sensors in multiple forms, as well as the use of different techniques to obtain. The ideal system for this type of interaction is a system based on the tracking of the position, using six degrees of freedom (6-DOF), these systems are characterized by the ability to obtain absolute 3D position of the user, in this way will get information on all possible three-dimensional field angles.

The implementation of these systems can be achieved by using various technologies, such as electromagnetic fields, optical, or ultrasonically tracking, but all share the main limitation, they should have a fixed external reference, either a base, an array of cameras, or a set of visible markers, so this single system can be carried out in prepared areas.

Inertial tracking systems do not require external reference such as those based on movement, are based on the collection of data using accelerometers, gyroscopes, or video cameras, without a fixed reference mandatory, in the majority of cases, the main problem of this system, is based on not obtaining the absolute position, since not part of any pre-set external reference point so it always gets the relative position of the user, aspect that causes cumulative errors in the process of sampling data.

The goal to achieve in a 3D tracking system would be based on obtaining a system of 6-DOF able to get absolute positioning and precision of movement and orientation, with a precision and an uncut space very high, a good example of a rough situation would be a mobile phone, since it has all the motion capture sensors and also GPS tracking of latitude, but currently these systems are not so accurate to capture data with a precision of centimeters and therefore would be invalid.

However, there are several systems that are closely adapted to the objectives pursued, the determining factor for them is that systems are auto content, i.e., all-in-one and does not require a fixed prior reference, these systems are as follows:

Nintendo WII Remote ("Wiimote")[edit]
Wiimote device

The Wii Remote device does not offer a technology based on 6-DOF since again, cannot provide absolute position, in contrast, is equipped with a multitude of sensors, which convert a 2D device in a great tool of interaction in 3D environments.

This device has gyroscopes to detect rotation of the user, accelerometers ADXL3000, for obtaining speed and movement of the hands, optical sensors for determining orientation and electronic compasses and infra-red devices to capture the position.

Should be noted that this type of device can be affected by external references of infra-red light bulbs or candles, causing errors in the accuracy of the position.

Google Tango Devices[edit]
Google's Project Tango tablet, 2014

The Tango Platform is an augmented reality computing platform, developed and authored by the Advanced Technology and Projects (ATAP), a skunkworks division of Google. It uses computer vision and internal sensors (like gyroscopes) to enable mobile devices, such as smartphones and tablets, to detect their position relative to the world around them without using GPS or other external signals. It can therefore be used to provide 6-DOF input which can also be combined with its multi-touch screen.[8] The Google Tango devices can be seen as more integrated solutions than the early prototypes combining spatially-tracked devices with touch-enabled-screens for 3D environments.[9][10][11]

Microsoft KINECT[edit]
Kinect Sensor

The Microsoft Kinect device offers us a different motion capture technology for tracking.

Instead of basing its operation on sensors, this is based on a structured light scanner, located in a bar, which allows tracking of the entire body through the detection of about 20 spatial points, of which 3 different degrees of freedom are measured to obtain position, velocity and rotation of each point.

Its main advantage is ease of use, and the no requirement of an external device attached by the user, and its main disadvantage lies in the inability to detect the orientation of the user, thus limiting certain space and guidance functions.

Leap Motion[edit]
Leap Motion Controller

The Leap Motion is a new system of tracking of hands, designed for small spaces, allowing a new interaction in 3D environments for desktop applications, so it offers a great fluidity when browsing through three-dimensional environments in a realistic way.

It is a small device that connects via USB to a computer, and used two cameras with infra-red light LED, allowing the analysis of a hemispheric area about 1 meter on its surface, thus recording responses from 300 frames per second, information is sent to the computer to be processed by the specific software company.

3D Interaction Techniques[edit]

3D Interaction Techniques are the different ways that the user can interact with the 3D virtual environment to execute different kind of tasks. The quality of these techniques has a profound effect on the quality of the entire 3D User Interfaces. They can be classified into three different groups: Navigation, Selection and manipulation and System control.


Navigation is the most used by the user in big 3D environments and presents different challenges as supporting spatial awareness, giving efficient movements between distant places and making navigation bearable so the user can focus on more important tasks. These techniques can be divided into two components: travel and wayfinding.


Travel is a conceptual technique that consists in the movement of the viewpoint from one location to another. This orientation is usually handled in immersive virtual environments by head tracking. Exists five types of travel interaction techniques:

  • Physical movement: uses the user's body motion to move through the virtual environment. Is an appropriate technique when is required an augmented perception of the feeling of being present or when is required physical effort form the user.
  • Manual viewpoint manipulation: the user's hands movements determine the displacement on the virtual environment. One example could be when the user moves their hands in a way that seems like is grabbing a virtual rope and pulls his self up. This technique could be easy to learn and efficient, but can cause fatigue.
  • Steering: the user has to constantly indicate where to move. Is a common and efficient technique. One example of this are the gaze-directed steering, where the head orientation determines the direction of travel.
  • Target-based travel: user specifies a destination point and the system effectuates the displacement. This travel can be executed by teleport, where the user is instantly moved to the destination point or the system can execute some transition movement to the destiny. These techniques are very simple from the user’s point of view because he only has to indicate the destination.
  • Route planning: the user specifies the path that should be taken through the environment and the system executes the movement. The user may draw a path on a map of the virtual environment to plan a route. This technique allows users to control travel while they have the ability to do other tasks during motion.


Is the cognitive process of defining a route for the virtual environment, using and acquiring spatial knowledge to construct a cognitive map of the virtual environment.

In order for a good wayfinding, users should receive wayfinding supports during the virtual environment travel to facilitate it because of the constraints from the virtual world.

These supports can be user-centered supports such as a large field-of-view or even non-visual support such as audio, or environment-centered support, artificial cues and structural organization to define clearly different parts of the environment. Some of the most used artificial cues are maps, compasses and grids, or even architectural cues like lighting, color and texture.

Selection and Manipulation[edit]

Selection and Manipulation techniques for 3D environments must accomplish at least one of three basic tasks: object selection, object positioning and object rotation.


The task of selecting objects or 3D volumes in a 3D environments requires first being able to find the desired target and then being able to select it. Most 3D datasets/environments are severed by occlusion problems,[12] so the first step of finding the target relies on manipulation of the viewpoint or of the 3D data itself in order to properly identify the object or volume of interest. This initial step is then of course tightly coupled with manipulations in 3D. Once the target is visually identified, users have access to a variety of techniques to select it.

Usually, the system provides the user a 3D cursor represented as a human hand whose movements correspond to the motion of the hand tracker. This virtual hand technique [13] is rather intuitive because simulates a real-world interaction with objects but with the limit of objects that we can reach inside a reach-area.

To avoid this limit, there are many techniques that have been suggested, like the Go-Go technique.[14] This technique allows the user to extend the reach-area using a non-linear mapping of the hand: when the user extends the hand beyond a fixed threshold distance, the mapping becomes non-linear and the hand grows.

An other technique to select and manipulate objects in 3D virtual spaces consists in pointing at objects using a virtual-ray emanating from the virtual hand.[15] When the ray intersects with the objects, it can be manipulated. Several variations of this technique has been made, like the aperture technique, which uses a conic pointer addressed for the user's eyes, estimated from the head location, to select distant objects. This technique also uses a hand sensor to adjust the conic pointer size.

Many other techniques, relying on different input strategies, have also been developed.[16]


3D Manipulations occurs before a selection task (in order to visually identify a 3D selection target) and after a selection has occurred, to manipulate the selected object. 3D Manipulations require 3 DOF for rotations (1 DOF per axis, namely x, y, z) and 3 DOF for translations (1 DOF per axis) and at least 1 additional DOF for uniform zoom (or alternatively 3 additional DOF for non-uniform zoom operations).

3D Manipulations, like navigation, is one of the essential tasks with 3D data, objects or environments. It is the basis of many 3D softwares (such as Blender, Autodesk, VTK) which are widely used. These software, available mostly on computers, are thus almost always combined with a mouse and keyboard. To provide enough DOFs (the mouse only offers 2), these software rely on modding with a key in order to separately control all the DOFs involved in 3D manipulations.With the recent avent of multi-touch enabled smartphones and tablets, the interaction mappings of these softwares have been adapted to multi-touch (which offers more simultaneous DOF manipulations than a mouse and keyboard). A survey conducted in 2017 of 36 commercial and academic mobile applications on Android and iOS however suggested that most applications did not provide a way to control the minimum 6 DOFs required,[7] but that among those which did, most made use of a 3D version of the RST (Rotation Scale Translation) mapping: 1 finger is used for rotation around x and y, while two-finger interaction controls rotation around z, and translation along x, y, and z.

System Control[edit]

System control techniques allows the user to send commands to an application, change the interaction mode or modify a parameter. The command sender always includes the selection of an element from a set. System control techniques can be categorized into four groups:

  • Graphical menus: visual representations of commands.
  • Voice commands: menus accessed via voice.
  • Gestural interaction: command accessed via body gesture.
  • Tools: virtual objects with an implicit function or mode.

Also exists different hybrid techniques that combine some of the types.

See also[edit]


  1. ^ a b c Bowman, Doug A. (2004). 3D User Interfaces: Theory and Practice. Redwood City, CA, USA: Addison Wesley Longman Publishing Co., Inc. ISBN 0201758679.
  2. ^ US US3050870A, Heilig, Morton L, "Sensorama simulator", published 1961-01-10, issued 1962-08-28 
  3. ^ Sutherland, I. E. (1968). "A head-mounted three dimensional display". Proceedings of AFIPS 68, pp. 757-764
  4. ^ Chen, Michael; Mountford, S. Joy; Sellen, Abigail (1988). A study in interactive 3-D rotation using 2-D control devices (PDF). New York, New York, USA: ACM Press. doi:10.1145/54852.378497. ISBN 0-89791-275-6.
  5. ^ Yu, Lingyun; Svetachov, Pjotr; Isenberg, Petra; Everts, Maarten H.; Isenberg, Tobias (2010-10-28). "FI3D: Direct-Touch Interaction for the Exploration of 3D Scientific Visualization Spaces" (PDF). IEEE Transactions on Visualization and Computer Graphics. Institute of Electrical and Electronics Engineers (IEEE). 16 (6): 1613–1622. doi:10.1109/TVCG.2010.157. ISSN 1077-2626.
  6. ^ Terrenghi, Lucia; Kirk, David; Sellen, Abigail; Izadi, Shahram (2007). Affordances for manipulation of physical versus digital media on interactive surfaces. New York, New York, USA: ACM Press. doi:10.1145/1240624.1240799. ISBN 978-1-59593-593-9.
  7. ^ a b Besançon, Lonni; Issartel, Paul; Ammi, Mehdi; Isenberg, Tobias (2017). Mouse, Tactile, and Tangible Input for 3D Manipulation. New York, New York, USA: ACM Press. doi:10.1145/3025453.3025863. ISBN 978-1-4503-4655-9.
  8. ^ Besancon, Lonni; Issartel, Paul; Ammi, Mehdi; Isenberg, Tobias (2017). "Hybrid Tactile/Tangible Interaction for 3D Data Exploration". IEEE Transactions on Visualization and Computer Graphics. Institute of Electrical and Electronics Engineers (IEEE). 23 (1): 881–890. doi:10.1109/tvcg.2016.2599217. ISSN 1077-2626.
  9. ^ Fitzmaurice, George W.; Buxton, William (1997). An empirical evaluation of graspable user interfaces. New York, New York, USA: ACM Press. doi:10.1145/258549.258578. ISBN 0-89791-802-9.
  10. ^ Angus, Ian G.; Sowizral, Henry A. (1995-03-30). Fisher, Scott S.; Merritt, John O.; Bolas, Mark T., eds. Embedding the 2D interaction metaphor in a real 3D virtual environment. SPIE. doi:10.1117/12.205875.
  11. ^ Poupyrev, I.; Tomokazu, N.; Weghorst, S. Virtual Notepad: handwriting in immersive VR (PDF). IEEE Comput. Soc. doi:10.1109/vrais.1998.658467. ISBN 0-8186-8362-7.
  12. ^ Shneiderman, B. The eyes have it: a task by data type taxonomy for information visualizations. IEEE Comput. Soc. Press. doi:10.1109/vl.1996.545307. ISBN 0-8186-7508-X.
  13. ^ Poupyrev, I.; Ichikawa, T.; Weghorst, S.; Billinghurst, M. (1998). "Egocentric Object Manipulation in Virtual Environments: Empirical Evaluation of Interaction Techniques". Computer Graphics Forum. Wiley. 17 (3): 41–52. doi:10.1111/1467-8659.00252. ISSN 0167-7055.
  14. ^ Poupyrev, Ivan; Billinghurst, Mark; Weghorst, Suzanne; Ichikawa, Tadao. "The go-go interaction technique: non-linear mapping for direct manipulation in VR" (PDF). ACM Digital Library. doi:10.1145/237091.237102. Retrieved 2018-05-18.
  15. ^ Mine, Mark R. (1995). Virtual Environment Interaction Techniques (PDF) (Technical report). Department of Computer Science University of North Carolina.
  16. ^ Argelaguet, Ferran; Andujar, Carlos (2013). "A survey of 3D object selection techniques for virtual environments" (PDF). Computers & Graphics. Elsevier BV. 37 (3): 121–136. doi:10.1016/j.cag.2012.12.003. ISSN 0097-8493.
Reading List
  1. Bowman, D., Kruijff, E., LaViola, J., Poupyrev, I. (2001, February). An Introduction to 3-D User Interface Design. Presence, 10(1), 96–108.
  2. Csisinko, M., Kaufmann, H. (2007, March). Towards a Universal Implementation of 3D User Interaction Techniques [Proceedings of Specification.
  3. Rhijn, A. van (2006). Configurable Input Devices for 3D Interaction using Optical Tracking. Eindhoven: Technische Universiteit Eindhoven.
  4. Bowman, Doug. 3D User Interfaces. Interaction Design Foundation. Retrieved October 15, 2015
  5. Interaction Techniques. DLR - Simulations- und Softwaretechnik. Retrieved October 18, 2015