Looks like the 3D model is just for navigating the street-view-like set of panoramic photospheres. It's so far a necessary compromise, to get much more realistic photos before you can have perfect structure, material and lighting capture/rendering. (the benefit is clear looking at their scan of the old yc building), but if this is to be useful with VR (and it definitely would be) they'll need something in between those panoramic photospheres.
I couldn't help but notice that it doesn't let you zoom in on the "dollhouse" view, i.e. the one actually showing the 3D model. No way to see how these models look up close.
This'll work great for navigating buildings—I can imagine it being a hit with real-estate sites—but the hiding of detail in the 3D view suggests the tech wouldn't work as well for other applications, like VR. (I'd love to be wrong here, though!)
The dollhouse view is actually a low detail version of the 3d model, with low resolution textures, it's just really not meant for a close inspection :) Load time and performance has been high priority for us, particularly because we also support mobile devices. We might do something like streaming in higher quality mesh/textures in the future, and allow closer inspection.
As for VR... we're playing around with it a whole lot, and it actually looks really great! Hopefully it'll get in front of more people somehow soon.
As a second ask, why does the dollhouse view have all the noise and random parts of the ceiling visible? It makes it difficult to navigate to 'see through' to the ground, and I would suspect just cutting the ceiling off completely would make for a better visualization.
It's actually harder than it seems to cut off the ceiling, houses may have all sorts of weird heights, angles and corners, doors, arches... Any simple solution might work for 80-90% of models, but our viewer has to work for 100%. So we only remove faces with backface culling currently, we're going to do better soon though :)
I think it's early to know what will and won't ultimately work for VR. Ultimately, VR is going to be a very imperfect representation of physical reality for a while. But, I think representations that are unrealistic in all sorts of ways still have a lot of potential for tricking us enough to be fun and useful. There are going to be tradeoffs and we don't know which ones they are until they're tried.
This is a cool thesis. I would definitely like to see how this could work in VR.
Yeah - they build the model from photos, but they limit the movement to the spots in the model where the photo was taken, then show you the photo. It's the same thing Microsoft Photosynth did a few years ago.
Saying "Our model quality has changed a lot in 2 years as well." and then presenting the 2014 photo is pretty sketchy.
An accurate comparison would be viewing the model from the same position as the photo, using the model, not the photo. Or allowing arbitrary angles - I bet if you look under the machines on the front desk on the actual model, they've melted into the desk a little bit.
Or - I do understand the quality is not there yet - say 'Our viewing experience has changed a lot in 2 years as well.' and show the old viewer and new 2D/3D combined viewer.
You could bring in the entire .object into Unreal Engine first person template right now. Having whole rooms as models (rather than BSP brushes) is actually pretty common now. You might have to fix lightmaps and maybe make a low poly version first.
I got curious how much space it would take to have enough spherical panoramas of the YC office to walk freely around it like in a first person shooter game. No 3D model involved, just a panorama for every possible position the user might want to walk to.
I think 60 frames per second would be smooth enough. Walking speed 5km/hour is roughly 1 meter/second. Per frame distance then is about 0.02 meters.
Looking at the map the YC office seems to be about 30 x 40 meters. Imagining a grid with lines each 0.02 meters overlaid on it, it seems you would need 3000000 panoramas. That might sound way too many, but wait! We live in the future.
One spherical panorama of 10000x5000 pixels seems to be about 4MB jpeg-compressed. So you would need only 12 terabytes of space. Also since you need 60 of these each second, you need storage that can move 240MB/sec, which is lower than the current speeds of SSDs.
1TB SSD seems to cost about $400, so for only $4800 you would have enough speedy space to store the panoramas. Enough space to explore whole YC building with no snapping at all between frames, with complete realism. Actually you could even do stereo 3D, as you already have the data.
Now it's a different question entirely on how we could take those 3000000 panoramas. Even if you had a Double Robotics bot with a spherical camera attached to it going around the space, snapping 10 panoramas per second, it would take 4 days to complete. While that itself is tolerable, how it would know its position and control its movement to the required accuracy I have no idea.
Still it blows my mind that storing all that would be possible.
Quoting: "Whereas many previous systems have used still pho- tography and 3D scene modeling, we avoid explicit 3D reconstruction because it tends to be brittle."
A simple optimization can probably make things much better. If you think of it, the delta between panaromas corresponding to two adjacent points is very little. So, compressing image data across panaromas instead of each separately, I think, will make it _so_ much better.
Such grid would mean that you can move straight only in 8 directions, all other camera rotations would need to be rounded to these 8 directions. I think this can result in quite 'jumpy' movement, but maybe human won't be able to notice such jumps?
I'm not sure copying the controls from google maps is a good thing, I've always hated them, for me they're incredibly frustrating to use. It could simply be that I'm so used to FPS controls so they feel too uncanny valley. But I've had years to get used to them and they still feel awkward.
I appreciate it could just be a personal thing, I've never asked anyone else what they think, but I seem to regularly accidentally look up because click to location often competes with viewport manipulation, you then can't get it 'level' which is mildly OCD annoying and the controls also feel backwards as you have to pull right to go left, pull down to look up, etc. Also the sensitivity of up/down seems exaggerated to left/right, but probably because it's windowed, I actually don't know.
I can't describe very well what's wrong, it just generally 'feels' wrong.
Google aren't exactly known for their UIs and as far as I can remember that control system was their first go at doing it and they've never changed it.
If you're curious what the camera+tripod setup looks like when it's taking photos, I found this "selfie" in the 3D model for the Four Seasons Silicon Valley, Presidential Suite: http://i.imgur.com/clHfeC5.png
I told a landscape designer it should be possible to walk around a property with a camera and then automatically build a 3d model. Turns out others are working on it. Good job. FYI what he does is take measurements and make a 2D top down drawing and then start sketching concepts for hard-scape ideas around the building.
Ultimately he'd want a drone to fly around a house and automatically creaete a 3d model and 2d plan. That would probably be exceptionally useful. He's pretty efficient at doing measurements and drawings, so fully automatic is almost the only way to be useful to him. Of course that's just one guy but I thought the example may be helpful.
I tried a Rift last week for the first time and the first thing I thought of was getting a drone to do a scan of a room. People could use it to set up their "home space" like in Snow Crash/Gibson's stuff.
I've been interested in the two approaches to mapping out cities and larger places. Google, for example, has StreetView cars, building a point cloud and texture map of parts of the world.
In video games like GTA, LA Noire and Watch Dogs real cities are mapped out using a sort of "conceptual compression" that leaves landmarks but somehow brings the space closer together. My sense is that this is a labor intensive process, but what if it could be automated?
It would be interesting way to explore a place, though with the obvious pitfall that what's not included in the map "doesn't exist".
The process would still be somewhat labor-intensive, since someone would have to apply significance to the landmarks.
GTA, specifically, doesn't use all the landmarks they find - they instead use rough facsimiles that give the same impression. It's really rather brilliant - a different design, but somehow it's familiar if you've been there or seen that.
Ultimately, for video games, they are still designed by hand, because of the "if it's not there, it doesn't exist" problem and for a host of other reasons. Unfortunately, for space reasons, most of it is non-interactive in a meaningful way.
There's some interesting work being done by ESRI that could hopefully lead to virtual city designs that are almost fully interactive. Imagine GTA where every building could be entered and every object could be interacted with because they are generated on the fly.
Sort of a realistic Minecraft. Very sort of, but still.
I would argue the "significant landmark" problem has essentially been solved as a side effect of online photo sharing sites (originally Flickr, now everything) -- the most frequently photographed things are the landmarks, and the number of photos scales with the significance of the landmark. When you search for "Rome" on Flickr, the clusters of photos that pop out as being 3D-reconstructable are precisely the landmarks.
Quoting:
"The data set consists of 150,000 images from Flickr.com associated with the tags "Rome" or "Roma". Matching and reconstruction took a total of 21 hours on a cluster with 496 compute cores. Upon matching, the images organized themselves into a number of groups corresponding to the major landmarks in the city of Rome. Amongst these clusters can be found the Colosseum, St. Peter's Basilica, Trevi Fountain and the Pantheon."
Hi, I downloaded the 3d model you have on your site. It looks really good. Here's a link to a standalone I made using Unity that allows you to move around the model (OSX only).
You can call it photogrammetry, it uses both 2d imagery and 3d depth data from depth sensors (think Microsoft Kinect or Google Tango) to make the 3d model. The viewer limits your movement when you are inside to be able to project panorama images onto the model, so you get the image quality of 2d images while you're still in a 3d model. 3d scanning tech just isn't good enough yet that you would be happy with how it looked up close if you are trying to sell a house, for instance.
Don't know Photoscan or 123D catch enough to comment spesifically, but in general many other 3d companies focus on scanning small objects or features, while we do large buildings/indoor spaces/rooms better (faster/better quality/cheaper/more convenient) than anyone else I've seen :)
That's amazing, even considering the $4500 for the camera (which will probably pay for itself in a couple of months if you're a small architecture/design studio)
I've lost touch with architectural 3D "capture" - who else is working on this space? Are there any consumer-level offerings?
Looks very cool, but can't get the demo to work at all.
On iPad it said "upgrade to iOS8", on W7/FF33 it said "Oops, something went wrong", on W8.1/FF33 it went all the way through the loading and then got stuck with just one pixel in the progress bar left to go. :-|
Hey, dev here, thanks for not giving up on the first try :) WebGL is still pretty new, so it has some quirks.. but you seemed to be particularly unlucky. Does WebGL usually work on your W7/FF33 setup? (does http://get.webgl.org/ work?) As for the W8.1/FF33 case, does it work if you try again, or is it still the same?
On W8.1 it cycles between "500 Internal server error" (with a blank page, and not just for the above link, but for the homepage as well) and the stuck case. When it gets stuck, here's the console from the firebug -
Typekit errors are due to the Referrers being blocked, this is very common and never disastrous. Disqus and Google Analytics are just blocked at the domain level.
really cool, but tbh I was a little frustrated with the navigation interface. Was there supposed to be only 2 degrees of freedom? I would love to turn my head without clicking the mouse.