In addition to moving augmented images (see previous), my team’s tracking tech enables object detection & tracking on iOS & Android:
The Object Detection and Tracking API identifies the prominent object in an image and then tracks it in real time. Developers can use this API to create a real-time visual search experience through integration with a product search backend such as Cloud Product Search.
Hmm—I’ve never had occasion to use this solipsistic-but-cool flight mode on my drone, but now I’m tempted to try capturing some epic dronies. (Just gotta figure out where I misplaced my moody Scottish highlands…)
A couple of years ago, we used Google’s super-resolution tech (think “Genuine Fractals gone wild,” fellow oldsters) to dramatically reduce bandwidth costs for users without any perceptible loss in visual quality. Now that tech is used in the Pixel phone camera, and this quick video gives a nice overview of how it works:
Who knew that the goofball mannequin challenge could generate a 2000-video dataset that could help train AI to compute depth, segment humans, and (optionally) content-aware fill them out of existence? This new work from Google Research handles scenes where both the camera & human subjects are moving. Check it out:
During their performance that night, Steven Drozd from The Flaming Lips, who usually plays a variety of instruments, played a “magical bowl of fruit” for the first time. He tapped each fruit in the bowl, which then played different musical tones, “singing” the fruit’s own name. With help from Magenta, the band broke into a brand-new song, “Strawberry Orange.”
The Flaming Lips also got help from the audience: At one point, they tossed giant, blow-up “fruits” into the crowd, and each fruit was also set up as a sensor, so any audience member who got their hands on one played music, too. The end result was a cacophonous, joyous moment when a crowd truly contributed to the band’s sound.
New research from Samsung Moscow can turn a single image (or, for better quality results, a series of images) into a puppet that can be driven by another person’s performance. (Hmm, new feature for Google Arts & Culture’s artistic doppelgänger-finder? 😌)
This is the first time people will be able to use Tilt Brush on a completely wireless VR system. It costs $19.99, though if you previously purchased it on Oculus Home, you’ll have it for free on Oculus Quest.
The original Glass will be to AR wearables as the Apple Newton was to smartphones—ambitious, groundbreaking, unfocused, premature. After that first… well, learning experience… Google didn’t give up, and folks have cranked away quietly to find product-market fit. Check out the new device—dramatically faster, more extensible, and focused on specific professionals in medicine, manufacturing, and more:
My team has been collaborating with TensorFlow Lite & researchers working on human-pose estimation (see manypreviousposts) to accelerate on-device machine learning & enable things like the fun “Dance Like” app on iOS & Android:
Hmm… am I a size 10.5 or 11 in this brand? These questions are notoriously tough to answer without trying on physical goods, and cracking the code for reliable size estimation promises to enable more online shoe buying with fewer returns.
Now Nike seems to have cracked said code. The Verge writes,
With this new AR feature, Nike says it can measure each foot individually — the size, shape, and volume — with accuracy within 2 millimeters and then suggest the specific size of Nike shoe for the style that you’re looking at. It does this by matching your measurements to the internal volume already known for each of its shoes, and the purchase data of people with similar-sized feet.
My team has been accelerating machine learning on devices and enabling AR face effects for developers (via ARCore & ML Kit). In recent months we’ve worked with Care OS, makers of smart mirror technology, to enable virtual try-ons via their hardware. Here’s a quick demo from Google I/O:
The app consists of two modes — a cutout mode and a collage mode.
The idea is that you should walk around and collect a bunch of different materials from the world in front of your camera’s viewfinder while in the cutout mode. These images are cut into shapes that you then assemble when you switch to collage mode. To do so, you’ll arrange your cutouts in the 3D space by moving and tapping on the phone’s screen.
You can also adjust the shapes while holding down your finger and moving up, down, left and right — for example, if you want to rotate and scale your “weird cuts” collage shapes.
Unrelated (AFAIK), this little app lets you sketch in 2D, then put the results into AR space. (Adobe Capture should do this!)
What if speech impediments were no impediment to interacting with devices & making oneself understood? Google researchers (the crew behind the amazing Live Transcribe) have been working with folks affected by ALS, deafness, & other conditions to make their speech & even voice utterances work well with computers & other humans. Take a look:
Environmental HDR uses machine learning with a single camera frame to understand high dynamic range illumination in 360°. It takes in available light data, and extends the light into a scene with accurate shadows, highlights, reflections and more. When Environmental HDR is activated, digital objects are lit just like physical objects, so the two blend seamlessly, even when light sources are moving.
Check out the results on a digital mannequin (left) and physical mannequin (right):
I haven’t yet tried it, but sample results look impressive:
It’s free to download, but usage carries a somewhat funky pricing structure. PetaPixel explains,
You’ll need to sign up for an API key through the website and be connected to the Internet while using it. You’ll be able to do 50 background removals in a small size (625×400, or 0.25 megapixels) through the plugin every month for free (and unlimited removals through the website at that size). If you work with larger volumes or higher resolutions (up to 4000×2500, or 10 megapixels), you’ll need to buy credits.
The rockstar crew behind Night Sight have created a neural network that takes a standard RGB image from a cellphone & produces a relit image, displaying the subject as though s/he were illuminated via a different environment map. Check out the results:
I spent years wanting & trying to get capabilities like this into Photoshop—and now it’s close to running in realtime on your telephone (!). Days of miracles and… well, you know.
Our method is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights. Our proposed technique produces quantitatively superior results on our dataset’s validation set compared to prior works, and produces convincing qualitative relighting results on a dataset of hundreds of real-world cellphone portraits. Because our technique can produce a 640 × 640 image in only 160 milliseconds, it may enable interactive user-facing photographic applications in the future.
It took Belgian designer Gilles Augustijnen about eight months of on-and-off work, using After Effects, C4D, Photoshop, Illustrator, Substance Painter, ZBrush, Fusion360, and DAZ3D to bring the sequence to life, aided by Pieterjan Djufri Futra and Loris Ayné (who provided feedback, support, and help with hard surface modeling.)
Oh, and since you’re here already, fancy a little Song Of Vanilla Ice & Fire? Sure ya do!