It’s been a couple of years since I became keenly interested in cross reality (XR) app design and development. To begin with, I was experimenting with Google Daydream VR (virtual reality) and had also recently acquired an Oculus Go headset; which had become widely available and popular. At this time I began thinking about ideas for a self-directed project with immersive technologies.
My first serious VR development experience was during a 48 hour app jam (48 hours spread out over a week). I recall switching between the Go and Daydream as the build target while I repeatedly ran into major limitations with each SDK (software development kit) – the Oculus SDK provided poor support for my Mac development environment while the Google Daydream SDK appeared to have been all but abandoned – a status which has since been essentially confirmed by Google.
I had chosen the Unity Game Engine as my preferred IDE (integrated development environment) due it’s comprehensive documentation and developer community, as well as its close association with C# and .Net; a language and framework which I was already familiar with from my web development work.
These early XR experiences brought to my attention the fast-evolving nature of the development environment and hardware eco-system within the still emerging immersive technologies industry.
I have since moved to a windows environment for VR experimentation and development. Oculus has now discontinued the production of the Go HMD (head mounted device) which has been replaced by the hugely successful Quest – a more powerful, yet lighter-weight standalone headset. The Unity game engine XR components have also been through several major redesigns during this time.
Alongside my VR experimentation I have been working on a concept for an AR (augmented reality) app. The app would allow users to place virtual lovelocks in real-world physical locations. The idea was inspired by newspaper articles I had repeatedly seen about the plight of historic bridges in Paris and elsewhere, which were being seriously damaged by the practice of attaching real physical padlocks, in huge numbers, to these structures.
My AR technical research resulted in a simple prototype built with Unity and Vuforia. However, there were serious limitations :
- A lovelock would not be saved in it’s position once the user closes the app.
- A lovelock could only be placed on a horizontal plane
To overcome the first limitation, I turned to Cloud Anchors, a feature of Google’s ARCore framework designed for multiplayer or collaborative AR experiences. My tests found that this technology, while promising, was still generally unfit for a commercially viable mainstream AR app. The process of recognising a user’s environment took too long and was cumbersome, the positioning of shared objects was not always accurate and at the time of my ARCore Cloud Anchor experimentation the anchors would expire after 24 hours. Nevertheless, I remained hopeful that the 24 hour limitation would eventually be lifted and the other issues would improve over time – this is after all, emergent technology and compromise will always be part of the risk of ‘early adoption’ of new tech.
The second major limitation is much harder to overcome. Plane detection – the process of identifying horizonal and vertical surfaces such as floors, table-tops and walls already works well, based on my own experimentation. But, I would need a more precise approach. I’d want to be able to position my virtual Lovelocks, not on the ground or a wall, but as part of intricate structures.
Recently, Unity launched a new product – MARS (Mixed Augmented Reality Studio) which they say “brings environment and sensor data into the creative workflow” this means developers are able to “build intelligent AR apps that are context-aware and responsive to physical space, working in any location and with any kind of data”. This sounds exactly like what I need. I signed up for MARS during its beta stages and I am impressed with many of the features. I especially like the real-world simulations and in-built context-aware controls. However, these controls are still extremely limited. They take plane detection a small step further – using algorithms to detect specific shaped planes rather than general ones. For example, I can specify that I only want to target a horizontal plane which is smaller than 1 metre in width and 2 metres in length. But, this doesn’t come close to being able to identify intricate shapes and structures. For this, I’d need much more complex algorithms. My perception of MARS is that of a ‘complex-content-aware ready‘ studio but you have to provide the AI (artificial intelligence) yourself in order to get any complex content awareness into the mix.
To better understand how AI can be deployed to recognise intricate features in the realworld – a process sometimes described as computer vision, I began research into applied ML (Machine Learning). This research culminated in the development of an experimental app based on the game I Spy, whereby the user plays against the AI (device) by identifying and guessing objects spotted in the immediate surroundings.
The underlying technology, powered by the Tensorflow platform, uses a method known as deep learning and more precisely Deep Neural Network (DNN) a type of Artificial Neural Network (ANN) used to compute a certain output from a given input. In the field of computer vision, the class of network most commonly used is known as a Convolutional Neural Network (CNN) which referes to the mathematical operation of convolution that is used. In my experimental app, I tested a range of formats of pre-trained data models. These are sets of visual objects that can be identified with varying levels of probability. In other words, the ML platform combined with a pre-trained dataset of images and labels can identify objects within a photograph or video frame and map the object to the class to which it most likely belongs – horse, phone, chair, table, etc.
My question at this point is whether or not it is feasable to research a potential solution using deep neural networks to help identify detailed specific structural characteristics. If this can be done, it could be used as a data source for unity MARS to help detect viable targets for an app like my LoveLocks idea – to target the type of structure to which a padlock could be attached. This does seem like a huge undertaking for a mere fun app idea. But, if successful, the research could have much wider reaching applications within engineering and future AR innovations. I note that identifying the appropriate target is not the whole solution. The target must also then be virtualised within the app so that when a virtual lock is attached, the relevent physical laws apply – for example, if the padlock is attached to a diagonal bar, then it would slide down to the nearest obstacle – either another lock or a structural joint. All the while, the virtual representation must be superimposed over the realword structure through tracking and even persisted there, so that the particular section of the structure can be regonised during other user sessions so that the virtual padlock can be seen by other players too.
My view, is that the machine learning research and development to achieve all of this represents a considerable undertaking but it is doable. It is likely that other researchers are already working on similar problems and I may be able to use this existing research to help my own development. I do believe, however, that the persistence of virtual objects within the exact same spot in the realworld, poses a near insurmountable challenge, at this time.
GPS (global positions system) is nowhere near accurate enough when considering the small margin of precision. Furthermore, the type of approach currently used to persist cloud anchors – points of interest (POI), would not be reliable; given the repetitive nature bridge-like structures – POI requires uniqueness to distinguish a specific part of an environment with any degree of accuracy.
Despite the problems with persisting virtual AR objects in the realworld, creative solutions could be found in the meantime. For example, the action of placing the lovelock could be recorded as an AR video/animation and saved to the GPS location. Overall, though, I conclude that commercial viability of mainstream AR is still rather limited until development tools are able to provide much better context awareness and persistence across user sessions.