Category Archives: Try-on

Kling AI promises virtual try-ons

Accurately rendering clothing on humans, and especially estimating their dimensions to enable proper fit (and thus reduce costly returns), has remained a seductive yet stubbornly difficult problem. I’ve written previously about challenges I observed at Google, plus possible steps forward.

Now Kling is promising to use generative video to pair real people & real outfits for convincing visualization (but not fit estimation). Check it out:

AI Holiday Leftovers, Vol. 1

Dig in, friends. 🙂

  • Drawing/painting:
    • Using a simple kids’ drawing tablet to create art: “I used @Vizcom_ai to transform the initial sketch. This tool has gotten soo good by now. I then used @LeonardoAi_’s image to image to enhance the initial image a bit, and then used their new motion feature to make it move. I also used @Magnific_AI to add additional details to a few of the images and Decohere AI’s video feature.”
    • Latte art: “Photoshop paint sent to @freepik’s live canvas. The first few seconds of the video are real-time to show you how responsive it is. The music was made with @suno_ai_. Animation with Runways Gen-2.”
  • Photo editing:
    • Google Photos gets a generative upgrade: “Magic Eraser now uses gen AI to fill in detail when users remove unwanted objects from photos. Google Research worked on the MaskGIT generative image transformer for inpainting, and improved segmentation to include shadows and objects attached to people.”
    • Clothing/try-on:
      • PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns: “We propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable photorealistic synthesis of personalized composite clothing on input human images.”
      • AnyDoor is “a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.”
    • SDXL Auto FaceSwap enables to create new images using the face of a source image (example attached).

Google uses generative imaging for virtual try-on

In my time at Google, we tried and failed a lot to make virtual try-on happen using AR. It’s extremely hard to…

  • measure bodies (to make buying decisions based on fit)
  • render virtual clothing accurately (placing virtual clothing over real clothing, or getting them to disrobe, which is even harder!; simulating materials in realtime)
  • get a sizable corpus of 3D assets (in a high-volume, low-margin industry)

Outside of a few limited pockets (trying on makeup, glasses, and shoes—all for style, not for fit), I haven’t seen anyone (Amazon, Snap, etc.) crack the code here. Researcher Ira Kemelmacher-Shlizerman (who last I heard was working on virtual mirrors, possibly leveraging Google’s Stargate tech) acknowledges this:

Current techniques like geometric warping can cut-and-paste and then deform a clothing image to fit a silhouette. Even so, the final images never quite hit the mark: Clothes don’t realistically adapt to the body, and they have visual defects like misplaced folds that make garments look misshapen and unnatural.

So, it’s interesting to see Google trying again (“Try on clothes with generative AI”):

This week we introduced an AI-powered virtual try-on feature that uses the Google Shopping Graph to show you how clothing will look on a diverse set of real models.

Our new guided refinements can help U.S. shoppers fine-tune products until you find the perfect piece. Thanks to machine learning and new visual matching algorithms, you can refine using inputs like color, style and pattern.

They’ve posted a technical overview and a link to their project site:

Inspired by Imagen, we decided to tackle VTO using diffusion — but with a twist. Instead of using text as input during diffusion, we use a pair of images: one of a garment and another of a person. Each image is sent to its own neural network (a U-net) and shares information with each other in a process called “cross-attention” to generate the output: a photorealistic image of the person wearing the garment. This combination of image-based diffusion and cross-attention make up our new AI model.

They note that “We don’t promise fit and for now focus only on visualization of the try on. Finally, this work focused on upper body clothing.”

It’s a bit hard to find exactly where one can try out the experience. They write:

Starting today, U.S. shoppers can virtually try on women’s tops from brands across Google, including Anthropologie, Everlane, H&M and LOFT. Just tap products with the “Try On” badge on Search and select the model that resonates most with you.

DALL•E + Snapchat = Clothing synthesis + try-on

Though we don’t (yet?) have the ability to use 3D meshes (e.g. those generated from a photo of a person) to guide text-based synthesis through systems like DALL•E, here’s a pretty compelling example of making 2D art, then wrapping it onto a body in real time:

Pinterest adds AR shopping

“I’m like, ‘Bro, how much furniture do you think I buy??'”

I forget who said this while I was working on AR at Google, but it’s always made me laugh, because nearly every demo inevitably gets into the territory of, “Don’t you wish you could see whether this sofa fits in your space?”

Still, though, it’s a useful capability—especially if one can offer a large enough corpus of 3D models (something we found challenging, at least a few years back). Now, per the Verge:

Pinterest is adding a “Try On for Home Decor” feature to its app, letting you see furniture from stores like Crate & Barrel, CB2, Walmart, West Elm, and Wayfair in your house… According to the company’s announcement post, you’ll be able to use its Lens camera to try out over 80,000 pieces of furniture from “shoppable Pins.”

As the article notes,

Of course, this isn’t a new idea — Ikea’s app lets you drop virtual furniture into your house in 2013, and in the almost decade since, companies like TargetAmazonShopify, and even Etsy have introduced ways to let you see how certain products will work in your house. Other companies, like Walmart, have gone even further, imagining (and trademarking ideas forentire virtual shopping experiences.

To me the progress here is access & ubiquity, making it commonly possible for shoppers to try these experiences. I’m glad to see it.

Strike a pose with Adobe AI

My colleagues Jingwan, Jimei, Zhixin, and Eli have devised new tech for re-posing bodies & applying virtual clothing:

Our work enables applications of posed-guided synthesis and virtual try-on. Thanks to spatial modulation, our result preserves the texture details of the source image better than prior work.

Check out some results (below), see the details of how it works, and stay tuned for more.

Body Movin’: Google AI releases new tech for body tracking, eye measurement

My old teammates keep slapping out the bangers, releasing machine-learning tech to help build apps that key off the human form.

First up is Media Pipe Iris, enabling depth estimation for faces without fancy (iPhone X-/Pixel 4-style) hardware, and that in turn opens up access to accurate virtual try-on for glasses, hats, etc.:

https://twitter.com/GoogleAI/status/1291430839088103424

The model enables cool tricks like realtime eye recoloring:

I always find it interesting to glimpse the work that goes in behind the scenes. For example:

To train the model from the cropped eye region, we manually annotated ~50k images, representing a variety of illumination conditions and head poses from geographically diverse regions, as shown below.

The team has followed up this release with MediaPipe BlazePose, which is in testing now & planned for release via the cross-platform ML Kit soon:

Our approach provides human pose tracking by employing machine learning (ML) to infer 33, 2D landmarks of a body from a single frame. In contrast to current pose models based on the standard COCO topology, BlazePose accurately localizes more keypoints, making it uniquely suited for fitness applications…

If one leverages GPU inference, BlazePose achieves super-real-time performance, enabling it to run subsequent ML models, like face or hand tracking.

Now I can’t wait for apps to help my long-suffering CrossFit coaches actually quantify the crappiness of my form. Thanks, team! 😛

New AR effects debut in Google Duo

In the past I’ve mentioned augmented reality lipstick, eyeshadow, & entertainment effects running in YouTube. I’m pleased to say that fun effects are arriving in Google Duo as well:

In addition to bringing masks and effects to our new family mode, we’re bringing them to any one-on-one video calls on Android and iOS—starting this week with a Mother’s Day effect. We’re also rolling out more effects and masks that help you express yourself, from wearing heart glasses to transforming into a flower. 

I can haz cheeseburgAR?

Here’s an… appetizing one? The LA Times is offering “an augmented reality check on our favorite burgers.”

NewImage

I’ve gotta say, they look pretty gnarly in 3D (below). I wonder whether these creepy photogrammetry(?)-produced results are net-appealing to customers. I have the same question about AR clothing try-on: even if we make it magically super accurate, do I really want to see my imperfect self rocking some blazer or watch, or would I rather see a photo of Daniel Craig doing it & just buy the dream that I’ll look similar?

Fortunately, I found the visual appearance much more pleasing when rendered in AR on my phone vs. when rendered in 3D on my Mac, at least unless I zoomed in excessively.

NewImage

Introducing AR makeup on YouTube

I’m so pleased to be able to talk about the augmented reality try-on feature we’ve integrated with YouTube, leveraging the face-tracking ML tech we recently made available for iOS & Android:

Today, we’re introducing AR Beauty Try-On, which lets viewers virtually try on makeup while following along with YouTube creators to get tips, product reviews, and more. Thanks to machine learning and AR technology, it offers realistic, virtual product samples that work on a full range of skin tones. Currently in alpha, AR Beauty Try-On is available through FameBit by YouTube, Google’s in-house branded content platform.

M·A·C Cosmetics is the first brand to partner with FameBit to launch an AR Beauty Try-On campaign. Using this new format, brands like M·A·C will be able to tap into YouTube’s vibrant creator community, deploy influencer campaigns to YouTube’s 2 billion monthly active users, and measure their results in real time.

As I noted the other day with AR in Google Lens, big things have small beginnings. Stay tuned!

AR: Nike aims for visual foot-size estimation, jersey try-on

Hmm… am I a size 10.5 or 11 in this brand? These questions are notoriously tough to answer without trying on physical goods, and cracking the code for reliable size estimation promises to enable more online shoe buying with fewer returns.

Now Nike seems to have cracked said code. The Verge writes,

With this new AR feature, Nike says it can measure each foot individually — the size, shape, and volume — with accuracy within 2 millimeters and then suggest the specific size of Nike shoe for the style that you’re looking at. It does this by matching your measurements to the internal volume already known for each of its shoes, and the purchase data of people with similar-sized feet.

Seems like size estimation could be easily paired with visualization a la Wanna Kicks.

NewImage

On a semi-related note, Nike has also partnered with Snapchat to enable virtual try-on of soccer jerseys:

NewImage

Inside Google’s realtime face ML tech

I’m delighted that my teammates are getting to share the details of how the awesome face-tracking tech they built works

We employ machine learning (ML) to infer approximate 3D surface geometry to enable visual effects, requiring only a single camera input without the need for a dedicated depth sensor. This approach provides the use of AR effects at realtime speeds, using TensorFlow Lite for mobile CPU inference or its new mobile GPU functionality where available. This technology is the same as what powers YouTube Stories’ new creator effects, and is also available to the broader developer community via the latest ARCore SDK release and the ML Kit Face Contour Detection API.

We’ve been hard at work ensuring that the tech works well for really demanding applications like realistic makeup try-on:

NewImage

If you’re a developer, dig into the links above to see how you can use the tech—and everyone else, stay tuned for more fun, useful applications of it across Google products.