Seems super fast, some are saying 600x faster [0], than than the version made off of Google's paper. But it is a little less accurate. Point clouds are less useful but some on Reddit and the authors have tools to try to convert to meshes [1][2]. It does feel like stable diffusion level generation of good 3d assets is right around the corner. It will be interesting to see which tech wins out, whether it's some variant of depth estimation like sd2 and non ai tools can do, object spinning/multi angle view like Google's tool does, or whatever this tool does.
> The main problem with mesh generation from stuff like this is that usually the topology is a mess and needs a lot of cleanup to be useuable. It's not quite so bad for static non deforming objects but anything that needs to be animated deforming or that is organic looking would likely need retopologizing by hand.
>
> That's one of the worst parts of 3D modeling so it's like you're getting the AI to do the fun part and leaving you to do all the boring cleanup process.
From [1]. Seems like there is a pattern of "AI asked to generate final results with only final results to learn from, immediately asked for the apple in the picture" in AI generators. I suppose lack of specialization in application domains of NNs is a deliberate design choice for these high-profile projects, in a vague hope of simulating emergent behaviors as seen in the nature and avoiding to be another expert system(while being one!), but that attitude seems limiting usefulness, here and again.
People developing these models are very aware of what 3D workflow is like.
The issue is that image->point cloud training data is very easy to get, whereas image or point cloud -> clean 3d mesh training data is very hard to get in unconstrained domains.
Generating point clouds is where the state of the art is now. That doesn't mean that the whole field isn't entirely aware that text->3d mesh unlocks many more capabilities.
Seems like video game engines and the like would be useful ways to get lots of 3d models to corresponding point cloud data. What's the blocker to doing that? The models shown on that page look like 3d graphics circa 2000's or earlier.
I agree that random sampling surfaces of 3D meshes seems like a reasonable way to generate synthetic data for mesh > point cloud.
Without knowing a dang thing about AI, it feels like the problem moreso lies in:
1. Math related to topology: vertices, faces, edges, tri vs quad etc
2. Different topologies for the same object are better for different use cases. Rendering, skinning, morphing, physics etc all have different optimal topologies, and the definition of optimal varies based on workflow and scene specifics or even the human who has skills based on certain topological preferences. In other words, I'm not sure how much of 3D workflows are standardized even -- getting the topological data for workflows is no easy task, and it's not super usable until the model output can plug right into a workflow and the existing DCC ecosystem.
text2img generates a static asset, text2mesh is far more interesting beyond just the static rendering part which is where mesh topology becomes a big sticking point.
* There isn't software that generates point clouds from video games. This should be solvable but AFAIK hasn't been done yet.
* The diversity of models in video games is much lower than the real world
* Games use a bunch of techniques to reduce the poly count while making assets look like they are high poly (eg texture mapping). It's unclear what should be generated here.
Or ask CG designers, under consent and with credits, for data recordings of intermediate steps. Same for illustrations. It almost seems like circumventing experts is the point.
Don't human designers do image or point cloud -> clean 3D mesh in an iterative manner? I see it will be significantly more computationally expensive to iteratively deform a cube to a tree by NN, but I don't see why it isn't a solution.
the thing is that it's been shown times and times again (with chatGPT for example) that you can get really pretty good results by giving massive amounts of final results to the model. This approach is better by far than anything we've ever had in either text AI or image generation AI
It’s a fun demo. Worth to note that on mobile it didn’t include any button to download the generated point cloud data itself, at least not that I could find. Might be the same on desktop also.
Additionally I think the amount of time taken depends on the amount of visitors. I had to wait about 7 minutes for it to finish.
Too many users, I don't know Hugging Face's rules but they seem to limit how much each demo can use to a ceiling. When I ran it originally it was like 12 people using it, looks like the queue is now around 300 and Hugging Face doesn't spin up more instances. That being said the model is relatively small and can be run locally with at least 5 GB of VRAM according to the Stable Diffusion subreddit.
Seems super fast, some are saying 600x faster [0], than than the version made off of Google's paper. But it is a little less accurate. Point clouds are less useful but some on Reddit and the authors have tools to try to convert to meshes [1][2]. It does feel like stable diffusion level generation of good 3d assets is right around the corner. It will be interesting to see which tech wins out, whether it's some variant of depth estimation like sd2 and non ai tools can do, object spinning/multi angle view like Google's tool does, or whatever this tool does.
[0] https://twitter.com/DrJimFan/status/1605175485897625602?t=H_...
[1] https://www.reddit.com/r/StableDiffusion/comments/zqq1ha/ope...
[2] https://github.com/openai/point-e/blob/main/point_e/examples...