3d Object Reconstruction from Depth Maps

I want to 3D print the most comfortable (and least glasses fogging!) face mask... but my nose doesn't fit in the Montana Mask. Even their excellent shnozmod didn't sit quite right.

However, it turns out that the Google Pixel 4 has pretty snazzy front camera depth map tech. Can I use it to create the perfect-fit mask? Or short of that, how about a nose-bridge that holds a cloth mask closer to my face?

STEP 1: Get Data (of my FACE)

Backing up to Step 0: get a picture out of the Android camera2 API. This is harder than I guessed it would be, there are a lot of callbacks, and I'm still not sure if you need a "preview surface." Lucklily Kotlin supports "Flows" which IMHO make it a lot more fun to quickly grab a few images:

FlowCam(this@MainActivity, textureView, ImageFormat.DEPTH16).flow().take(4).toList(images)

No sweat! I cheated a bit trying to stuff all the information into RGB pixels so I could sneak it out of the Android camera DEPTH16 format and on to my desktop:

red = big 256mm[1] steps of depth
green = small 1mm steps of depth
blue = confidence

So's my face. (Please don't use it to hack into my phone using face-unlock)

Step 2: Make a Mesh

Discard any pixel with bad confidence.
Discard any pixel that is far away.
Discard triangles that are a funny shape.
Dump the surviving triangles into STL's ASCII format.

I'm all torn up over this

Great! This is huge progress! But... still messy, and I'm a human topo map (those step sizes look bigger than 1mm). Not to worry: I'll just average together a few shots, maybe get a different angle to fill in the holes, and everything will be fine.

Wait.

How do you do that in 3d?

Combining Scans

In 2D panorama stitching, Hugin would do some SIFT image registration to line up interesting features, do a pairwise comparison to register a few photos, figure out the shift and rotate, and then a median filter to get rid of noise or image stacking to produce some nice HDR landscapes.
In 3d it should be easier, because you don't need to worry about finding and correlating identifying features - the 3d shape is right there! I tried not change expression too much, so my cheeks are a consistent static 3d shape, and the depth maps are taken from a few different camera orientations. Granted, I don't know the camera's FOV characteristics, but cmon... the 3d shape is right there!

Humm. Ok...

The camera is a six-axis position question (x, y, z, and looking at vector x1, y1, z1).
No zooming, so it doesn't change the FOV between shots... or maybe I could use the lens distortion values. Or maybe ignore it.
Not all of the mesh overlaps. Sometimes you see the side of my face or nose, sometimes you don't. But that should be solved with local optimization, right?

This feels like it should still be easy: Given a reasonable base shape of a generic head-shaped oval thing, orient the current view to the shape. Wiggle until the camera angle fits best. Now use it to refine the best guess of the base shape. Add another snapshot, repeat, iterate, throw lots of CPU time at it, wait for your laptop to heat up, get a reasonable "carved" mesh out.

I don't think it will be that easy. SLAM is hard, and this is like an inside-out SLAM.

[1] I think that each unit is 1mm of depth, but even if it isn't, this should be something you could optimize for... right?

Search This Blog

Benjamin M. Hill