Hacker News new | past | comments | ask | show | jobs | submit login
GALA3D: Towards Text-to-3D Complex Scene Generation (gala3d.github.io)
78 points by jfoster 8 months ago | hide | past | favorite | 28 comments



The media is claiming we’re in an AI Cold War with China, yet here’s a sinoamerican paper casually talking about automating the simple task of the entirety of human spatial understanding. Maybe we can all get along, after all?

EDIT: is this paper… broken…? for anyone else? I cant read past the first two pages on phone or Mac, haven’t tried Linux. https://arxiv.org/pdf/2402.07207.pdf


Individuals have an easier time cooperating than their respective governments.


Yeah, I guess I was alluding to the idea that the disagreements between the governments are not based on the actual concerns of the people. Tho it’s hard to tell at what level the vague nationalistic ideas of “we deserve to rule X land” and “they’re brainwashed” can be described as actual concerns, much less more openly racist and xenophobic ones.


The US media is claiming.

The CIA and Fake News in 1980s https://www.youtube.com/watch?v=XibCflWxZuA


This is more accurate than I gave you initial credit for! I couldn’t find a single Chinese source of note from their government or media. Eg https://en.wikipedia.org/wiki/Artificial_Intelligence_Cold_W...


Not hard to find Chinese sources if you know Chinese. http://old2022.bulletin.cas.cn/publish_article/2024/1/202401... This one even has an English abstract.


Am I the only one reluctant getting into lengthy discussions with a computer about making 3D models (or any other creation activity in fact) instead of doing it? Especially when I have specific ideas about certain details not that easy to put into words that both me and the receiving end (in this case computer) have the exact same (potentially cultural) understanding about. These panda and things are fun early demos for play but how far is this and more importantly, how far can it be reliable? How useful this can be eventually for serious matter where it really counts to be used? Perhaps there is a reason that practically no real engineering disciplines (work with 3D models professionally) rely on anecdotes and conversations, storytelling, going to the place of manufacture or construction telling with pretty words how to form the product but rely on technical drawings following specific rules that limit what and how is told and what people involved learn to make and read throughout years of training. If casual conversation was enough perhaps other than art and entertainment matters was not having their restricted language either (on top of special formatted documents).

Make me a house... no, not that tall... a bit taller now... good... and a family home only... also it is ugly, make it look like in England... no, not modern England but old....not that old, from after Victorian era... ah, not the ones made for coal miners, those are ugly, make it detached and something a middle class family in Surrey would like... no, that's too big ... put it on the shelf darling... hey, I did not talk to you, do not put the house on a shelf! ... good, but I do not have that much land, make it narrow... and have American kitchen in it... no, not with American stove but an induction one... and do not start the stairs right at the entrance that is unpractical... have the bathroom in the ground floor... I don't care if it is not a proper English home anymore just do what I tell.... but not exactly how I tell but do it the way I want!....


I think the best we can hope for, at least for the foreseeable future, is a workflow where you ask the AI to do a thing it gets it 80% right (someone who's good at writing prompts might even have it get a bit more correct) and then manually correct the things that it didn't get exactly like we wanted them.

I think we're quite far from replacing skilled professionals entirely, but making them a lot more productive is within reach!


This obviously depends on context, but taking someone else's 3d model from partial to finished is often slower than starting over from scratch, because of topological considerations. For a random piece of environment, it's probably a good workflow, but for hero objects, where topology is downstream from the art direction, I'm not sure you get much time savings from a partially-complete model.

Of course, there may be workarounds to topological speedbumps with respect to deformation/animation, texture, detail, etc, that would make the 80% right 3d model very useful.


Sounds promising. In a way.

But perhaps I was hoping for the other 20% to be made for me, to tidy up the remaining 20% of the whole that takes 80% of the time usually and is tedious, boring fiddling with loose pieces progressing slowly after the bulk is done. I like the automation for the soul crushing, tedious, and repetitive tasks. So can concentrate on the essence.

(an evolving actual trouble of mine nowadays is that when I do something then the predictive algorithms suggest too much that diverts attention in 80% of the cases but only being useful in 20% of the situations. as creating something new does not fit into the usual logic, and it cannot read my mind, it has no chance figuring out mostly. but is in my face constanctly. so this plenty of help is starting to get in the way towards the goal. ... but I believe this is just a dark UX trend that product manufacturers tend to force their expensive built competition fuelled spectacular presented predictive functionality on us, put it into our face, not just lingering around to jump on a call for it. a recent annoyance is the default settings of a predictive editing tool that puts right in the suggestions instead of mere suggesting it in the attempt of making it the fastest ever functionality possible, I don't even think about it and it is already there, wow square times amazing, looks pretty in marketing material, but is in the way, breaking my flow, need to go back and fix what the tool does, need to revert, need to fiddle with settings a lot to turn off, and off, and off all the very advance and spectacular help that does not really help, just the opposite, making me the underling finishing details for the master)


> These panda and things are fun early demos for play but how far is this and more importantly, how far can it be reliable? How useful this can be eventually for serious matter where it really counts to be used?

This reminds me of the thoughts the industry had on transformers, attention, etc. It seems OpenAI had to train GPT-2 to show others (including Google itself!), that this is perhaps something worthy of more research. And even then many doubted it has much further potential to improve and be more useful.

Maybe it (these generative 3D scenes) will lead to something and maybe it won't but it is a very interesting thing to research.


I think for regular folks who just want to quickly generate something it may look like it. But for people who need precise control they will get it. Look at ControlNet extensions for StableDiffusion and imagine something like that but for 3D models. Perhaps you will provide a sketch of the 3D scene with rough objects placement and the AI will fill in the rest. Or draw a fantasy map with points of interest and AI will generate the landscape, terrain and foliage to fit your description.


If natural language can't exactly describe what you want, just generate more variations and let Monte Carlo sampling cover the way you want


Should I perhaps feed the results back to some generative method and genetic algorithm supported by decision support and crossanalyze the optimal choices too instead of simply making what I need? Too simple right, wallow in the plethora of tools to fascinate the world with my full technological skills and sophistication at once is much better, right?! : )


Humans have been known for successfully using tools to complete their tasks in a more comfortable and efficient manner for quite some time now, even at the cost of introducing additional complexity. So if the results seem promising and the increased cognitive load isn't too much to bear, then the answer is perhaps yes? If not, then fortunately no one is forcing you to use AI tooling! Just make it :)


I wish there was more progress in text-to-3D mesh for creating basic but very specific and functional shapes. With the last few years of progress, it really feels like it should be possible, but none of the big players are finding it worthwhile to look at. It would give the 3D printing community a massive boost.


Depends what "very specific and functional shapes" you have in mind, I guess. For my use of a 3D printer, I believe something like OpenSCAD is going to be a significantly more efficient textual description of the object than ~english -- for me and the computer.


OpenSCAD will definitely be more efficient but not as many people speak it fluently. When we learn or talk about objects, we do it in English, even if we have to resort to very technical jargon. I think this is a case of opening it up to more people rather than making it more efficient for people who already do this.


Hi, can you explain this problem a bit more. I’m a new PhD student and love low-hanging fruit.


I haven't thought it through completely, but it could start out as basically dall-E but for 3d meshes. Then what would be really useful if it could faithfully represent some specs you give it. Dimensions, shape, etc. Imagine all the specific instructions someone who knows a lot about gears (but nothing about how to use CAD) could give as a prompt. All these specs should be followed faithfully. It should be able to create any arbitrary gear you describe to it. Gears are just one simple example but you get the idea.


Do we really want a boom in the field of half baked plastic trinkets made on a whim without much consideration put into them? I think something made on a whim is a lot more likely to be discarded. If somebody wants something made out of plastic, it should at least be something they're sure they want. Having some human time invest some time in designing it seems like a good thing.


I think this is putting the cart before the horse. In my experience, the reason many 3d prints are useless trinkets (NOT functional parts) is because it takes a lot of effort to design your own custom piece. Most 3d prints that are actually useful are custom to your use case.

The amount of effort it takes makes it so that if something close enough exists, I will buy it online. If it doesn't exist I will not bother designing it from scratch since this is not a massive part of my life that I'm willing to sink much more time into than it is worth.


When you want a very specific shape, text is probably not the right input modality. See also image generation, where to get very specific outputs, you're better off defining the large-scale structure spatially with a controlnet and only using text for the visual style and decorative details that do not need to be precisely controlled.

What shapes would you ask a text-to-3D model to create for you?


That makes me wonder whether an LLM for code generation could actually work for OpenSCAD-style designs as well, especially as it has a quite small set of functions. The only thing that makes me doubtful is that the coordinates have to correctly map to 3D space.


Have you seen https://text-to-cad.zoo.dev/ ?

But yes, there's a lot of potential and, in my uneducated opinion, low hanging fruits for 3D printing especially.


Oh my, now that is exciting! A friend of mine had implemented procedural rigging, and physics based generative animations used in some pretty big name games. Combine those 2 technologies with this, and you've seriously lowered the entry bar for video game production by a lot!


Was this at Ubisoft maybe? I remember their presentations, some really amazing tech a decade ago... Never seen anything as straightforward to use in off-the-shelf software since.


It wasn't, my friend certainly wasn't the first to do something similar, this was much more recent. I was just impressed by a demo he showed me of his work. He did procedural rigging for Darktide 40k, and physics based animations for The Finals. Unfortunately I don't think his software is available outside of the companies that hired him.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: