The world doesn't need Text-to-CAD. The world needs a fully capable open source parametric 3D geometric CAD kernel.
Solidworks, Creo, AutoCAD, Fusion, etc., can all take their bug ridden unoptimized single threaded rent-seeking monstrosities and stick em where the sun don't shine.
Seriously - if anyone wants to create an absolutely world-changing piece of software, start working on a new CAD kernel that takes the last 50 years of computer science advances into account, because none of the entrenched industry standards have done so. Don't worry about having to provide customer service, because none of the entrenched industry standards worry about that either.
And no - while openCascade and solvespace are impressive, they aren't fully capable, nor do they start from a modern foundation.
Truck[1] and Fornjot[2] are recent attempts in the Rust space, both are WIP.
But both seem to be going the traditional way. I.e. B-Rep that can be converted to (trimmed) NURBS.
I think if one wanted to incorporate the last 50 years of computer science, particularly computer graphics, one needed to broaden the feature set considerably.
You need support for precision subdivision surface modeling with variable radius creases (either via reverse subdivision where you make sure the limit surface pass through given constraints or using an interpolating subivision scheme that but has the same perks as e.g. Catmull-Clark).
Then you need to have SDF modeling ofc.
Possibly point based representations. If only as inputs.
And traditional B-Rep.
Finally, the kernel should be able to go back and forth lossless between these representations wherever possible.
And everything must be node-based, like e.g. Houdini. Completely non-destructive.
The main problem I have with most CAD software is that I can't do:
1. create an object, e.g. a cube
2. perform lots of random transformations
3. perform the inverse of the above transformations
4. subtract the object from the original object
5. end up with exactly nothing
How are modern approaches solving this (robustness), if at all?
As someone who does CAD work (product design) every day, I'm curious about the use case? Starting with Surfaces (in Creo, at least) can be provide such flexibility.
As far as I'm aware, we do not have the math needed to do what you're imagining. CSG requires a lot of numerical methods even for relatively simple operations.
While we are not fully capable yet, that's the end goal by next year. Years from now we cannot still be using these single-threaded nightmares from 30 years ago.
Very interesting. Do you have information/documentation on the engine itself? I'm not seeing anything technical that explains what it is/isn't capable of, or how it relates to existing CAD kernels.
Super cool work either way! I'll be wishing you luck.
It seems a bizarre statement to state that OpenCASCADE isn't fully capable. Its the only OS licensed kernel that'll read a STEP file. Also "modern foundation" is a misleading statement, any CAD kernel bearing any kind of relevance seems implying a codebase that's been around for a quarter century. Like it or not OpenCASCADE is the hand that was dealt. I've worked with the technology [1] extensively and it provided the underpinnings for a startup I've founded [2]. pythonocc is the bees knees, it allows you to develop a proper CAD app.
I'd consider CGAL a modern kernel, but it doesn't cover CAD since there is no BRep support [4]
Don't take my word for it, but see also the many publications that have built on the tech [3]
Wondering why no BRep support means it is not a CAD. It seems that OpenVSP doesn't use BRep but uses parametric surface [1]. I wonder if mesh-based modeling + some constraints solving will get you a CAD, or are there other requirements that I don't know? I only work on mesh processing library on my free-time and I don't know much about BRep.
To be clear - I do think OpenCascade is impressive. Incredibly so, once one becomes aware of the magnitude of the problem it is trying to solve. I will also admit I haven't used it in the past couple of years, but when I did it's limitations in filleting and chamfering alone were enough to make it a non-starter for industry use.
My broader point was that there is a need to start from a new paradigm that leverages the possibilities of modern, highly parallel computing hardware. The hardware requirements for performant and reliable CAD software are incredibly high, and their reliance on high clock speed single core processors is quickly being left behind by modern processing hardware.
It does seem a bit of a throwaway statement regarding OCCT - I also work with it every day and, for the most part, it has all the same eccentricities and limitations of any large heritage-listed C++ library. There's a lot it can do!
The math is quite difficult to do right, and there's a billion corner cases to make a kernel useful for real world designs. Take a fillet: It needs to handle inside corners, outside corners, compound angles coming in from arbitrary numbers of directions, it probably needs the ability to vary along its distance, create more geometry when adjacent faces don't leave enough room, etc etc.
That's just the start of a single feature type. Now you need a bunch more feature types, and they all need to interact well with each other. The kernel also needs some way of solving the topological naming problem to be useful (FreeCAD might get a basic version of this after a decade(?) of work).
It's probably tantamount to writing a modern-day browser in terms of complexity.
I've written some custom code for computational geometry (like computing the offset for a geometric object using Apple Metal). It was a lot of fun, but also quite hard, and a lot of edge cases I just didn't deal with because I had a particular use case and speed was paramount.
Maybe the idea of a "kernel" is the problem here. A kernel the size of a browser is not a kernel.
I think what's really needed is a full-blown integration with a theorem proving system (which has an easier to define kernel of its own).
I doubt if it is possible to have a practical small kernel. Take mesh processing in https://github.com/elalish/manifold for example, we encountered a lot of problems when trying to deal with inexact floating point arithmetic. Using exact arithmetic can probably result in much simpler code (doesn't make sense in our case because the point of the library is about dealing with inexact arithmetic), but exact means slow. Also, a lot of code is about fast-paths that occurs frequently, data structures and algorithms that cut down complexity but are difficult to implement. Can we remove those fast-paths and complex algorithms? Probably yes, but this will slow things down a lot.
I completely agree. You want to be able to be fast and correct, where correct also depends on some assumptions you can make depending on what you apply your algorithm to. That's why I think a theorem prover kernel is needed, which connects the various algorithms, and the theory behind them that is needed to prove correctness. Until you get there, you will have various different algorithms that don't really fit together, and where it is not clear how to make them work together and still be confident about the result.
Yes, you are right. Time for better theorem proving tools! A theorem proving tool shouldn't add to your time spent designing and implementing, it should help you to get your stuff done faster and with (much) higher quality.
I can't wait for Idris2 (or such? It's quite good for programming and having e.g. core data structure implementations verified to uphold their API invariants, without making you prove everything around it.) to get a good fine tuned LLM attached to replace the human in the interactive theorem proving steps/parts.
I'd expect this to be an LLM application that's unusually suited to automatic-feedback reinforcement learning.
Especially because much of the interactive proving task/process is quite straightforward nondeterministic (Turing) machine stuff:
You try (with enough randomization/temperature to not get stuck in non-creative stupidity) a prove attempt/step/hint, and get feedback after a moment of calculation from the proof assistant. Then you try further, until eventually hopefully getting "success" as the assistant's feedback.
Once you got it to succeed once, or seeded with originally human-made step sequences to teach some basic sense into the model, you know an upper bound on the _required_ step count and assistant calculation time to prove the theorem at hand. Thus you can let the LLM auto-play/train with a step limit and computation timeout close to the known upper bound, rewarding for expected total runtime/effort to combine "spamming cheap proof tactics and hope something sticks" and "elaborate careful proof process likely to succeed but always expensive/(semi-exhaustive) to go through".
Perhaps even with a GPT-4 like multi-agent LLM to specialize into the various approaches and have some way of rating/predicting each agent's expected efficiency/cost each "chat message":
Turns out, interactive theorem proving is literally a (beyond-)NP heuristics-guiding sampler (traditionally a human with trained gut feeling based problem-solving brainstorming creativity) chatting with a non-creative algorithmic oracle.
At the start, if initiated not by the "human", the oracle would info-dump the theorem and appropriate context along a description of what "today's" task is:
This may be:
1) "I expect the theorem to be False because: [reason for what caused the expectation]."
2) "I expect the theorem to be True because: [reason for what caused the expectation]."
3) "I expect a weaker (in it's implications, so less general) form of that theorem to be sufficient in the proof for this situation here. It could make proving them for data types we'd like to use with this implementation/specialization cheaper than the general API's demands, potentially (though not the reason for today's task!) allowing weaker data types to be used here than in the general API.
In particular, there (maybe) are cheaper (to implement) yet weaker (in effects) variants of the functions we call on the supplied data type, that still suffice in our algorithm (like weaker demands on a comparison function when only stable sorting is used; dropping distinctiveness demands for either hash function or comparison function if only the other is relied on for Set/Map data structure efficiency).
Confidence of it to be True is at X (confidence measure/score the LLM can feel natively) and here's what certain outcomes would be worth (either enumerated weaker forms or a scoring function (in a form the LLM agent comparer can utilize to plan what to aim for, when to pivot, and when to declare defeat))."
4) "I expect a weaker (in it's demands, so more general) form of that theorem to be sufficient in the proof for this situation here. It would allow us to use this implementation/specialization with more data types than the general API promises. Confidence of it to be True is at X (confidence measure/score the LLM can feel natively) and here's what certain outcomes would be worth (either enumerated weaker requirements for the theorem or a scoring function (in a form the LLM agent comparer can utilize to plan what to aim for, when to pivot, and when to declare defeat))."
5) "Find (and codify!) sufficient invariants demanded from (functions on) data types so we can uphold our API's promised invariants. [Optionally, aim for something that matches this natural language description of what (parts/aspects of) the data type's function's purpose[s]/semantics are supposed to be.]"
6) "Prove this is constant-time/constant-memory-access-pattern w.r.t. that part of data (even in ways where the data in question may be in chunks behind some memcpy-reducing indirection), or e.g. that this key here affects nothing persistent other than that ciphertext/plaintext/signature/hash/success-flag."
7) "Prove time/memory complexity of this implementation. Here's what certain bounds are worth: split between various kinda of bounds, e.g. lower bounds, upper bounds, average (w.r.t. that data) complexity, best/worst case (w.r.t. that data), handle those parameters by enumerating over those and proving with their values fixed (because a general equation may be too complicated/weak, or even just too hard to derive)."
8) A classic: "Prove these two implementations produce identical (under that comparison/test-sampling method) results."
9) "Find bounds on when (conditions) and/or how much (some appropriate supplied measure) these two implementations differ."
10) "Find a faster implementation along with proven limits on it's inaccuracy. Combining the two (+) dimensions of candidate quality from the obvious pareto frontier into a single number score is according to this: [formula in useful format for directing search/exploration]."
11) "Cough up an implementation limited to those numerical primitives there, along with proof of it complying with these accuracy requirements, for this implementation that uses (inherently computationally-unsuitable) real numbers. Speed/memory performance importance: [scoring function suitable for directing where to aim, when to stop, and when to give up]."
And afterwards, the "human" would ask/explore the oracle about context, suggest/try proof tactics, and in some cases write/transform code in both LLM-style and by commanding ("textbook"/library/archive, or even freshly written) rewrite rules.
That process could then be trained with reinforcement learning, even if intermediate states have no useful score function defined, as the presence of certain results after certain amounts of expended effort is directly useful as a score for/of the solver/agent itself. The multi-agent suggestions applicability/efficiency predictor/arbiter should be amenable to more normal (stochastic) backpropagation at a completed-chat granularity, as the final efficiency/score will be known, and if it were a perfect predictor, it'd have predicted that exact score for the entire time along the path that was taken. The intermediate predictions on how much effort the chosen agent's suggestion actually took to complete is also easily recorded for training the per-step cost predictions as a more fine-grained aspect of the final-score-when-taking-this-branching-path-now machinery.
Have you worked with Open Cascade recently? As someone who works with it every day for developing a CAD application, it would be interesting to know what people see as its limitations. It's the only geometry kernel I've had access to, and it seems like an absolute gift
> Build123d is a python-based, parametric, boundary representation (BREP) modeling framework for 2D and 3D CAD. It's built on the Open Cascade geometric kernel and allows for the creation of complex models using a simple and intuitive python syntax. Build123d can be used to create models for 3D printing, CNC machining, laser cutting, and other manufacturing processes. Models can be exported to a wide variety of popular CAD tools such as FreeCAD and SolidWorks.
> Build123d could be considered as an evolution of CadQuery where the somewhat restrictive Fluent API (method chaining) is replaced with stateful context managers* - e.g. with blocks - thus enabling the full python toolbox: for loops, references to objects, object sorting and filtering, etc.*
> This package is inspired by the NURBS-Python package, however uses a NumPy-based backend for better performance.
> Curve, and Surface are non-uniform non-rational B-Spline geometries (NUBS), RationalCurve, and RationalSurface are non-uniform rational B-Spline Geometries (NURBS). They all built upon the class BSpline. Coordinates have to be in 3D space (x, y, z)
I think the interesting thing about CAD kernel is that there are different representations and limitations to each representation. You have triangular (or polygonal) mesh, BREP which uses NURBS, SDF which is based on functional representation. I have experience working with triangular meshes and SDF so here are my opinions about them, please correct me if I am wrong:
Triangular mesh is conceptually simple, but requires many faces to approximate curved surfaces with high precision (you may be able to use subdivision surface in some cases but intersection/union in those cases are more challenging). Also, for more complicated models, floating point errors really add up and you either have to use an exact representation (which is really slow) or try some other approaches which can be robust w.r.t. errors (e.g. https://github.com/elalish/manifold but it is really hard to get right). Another disadvantage comparing with BREP is the lack of constraint solving, which I will write about it below.
SDF is nice for mathematically defined objects. They are computationally intensive, so some sdf libraries use GPU to speedup the computation. There are approaches that can speed up the evaluation, but doesn't work well if the function is not really the distance (https://github.com/curv3d/curv/blob/master/docs/shapes/Shape...).
-----
Constraints solving: This is a big problem with mesh-based CAD. Traditional CAD usually allows you to have under-defined constraints, and users can iteratively set constraints until the model is fully defined. There is no such a thing (yet) with mesh-based CAD. Also, we don't really have nice ways to represent constraints relative to curved surfaces because there is no curved surface in our mesh...
Also, one particular challenge with text-based (or code-based) CAD is how to select the surfaces with an ergonomic API. GUI can solve this problem but writing a good GUI is a complicated task (that I am not willing to touch).
So I’ve been collaborating with some mechanical engineering friends over the past year, and one experiment we did was exploring ways to make cad better.
One conclusion we came to is that large language models do not understand geometry, in a pretty fundamental way! They do understand pair wise relationships that are more topological in character. But that’s not quite the same thing.
The blocker on the text to concept art to 3D model to cad route is the meshes and geometry you’ll generate won’t have the easy adjusting and parameterization manipulations you can take for granted in human authored cad!
That’s also ignoring developing a robust geometry kernel that would live underneath all this!
To me, the more interesting AI/CAD capabilities integration will be when specialist AIs can analyse models and understand real-world constraints.
Imagine an AI helping an architect ensure theirs drawings are compliant with local code ordinances, and it can produce some of the paperwork itself.
Imagine an AI that understands the manufacturing processes well enough to guide you or give useful advice about how to manufacture or modify your parts so they become easier and cheaper to build.
Imagine and AI that can know more about an electronic or mechanical project and can get outputs from various simulation tools to advise you on your design compliance with regulation, or that would recognise weak design choices pulling from a knowledge base of part failure or other real-world constraints.
It could propel computer-aided design in ways we can't imagine today, but this integration will probably be hard and not be just text-based.
Try a combination of taking screenshots and GPT4 image input. You would be surprised how good it is in analyzing part even without prompting. This is one of the things I'm investigating if it makes sense putting effort in. Just basic company process/rules following (for our own internal products).
Anything else, we are talking about computational design + AI for interpretability.
I haven't checked all the examples in the link posted, but I know AutodeskAI Lab has some seriously impressive papers out. (code too)
I don’t do mechanical design anymore but did prior for about a decade. A lot of the examples the author has is text to 3D model which is not the same as text to CAD. Think of AutoCAD (even though it’s 2D CAD, the example still stands). You would design a blueprint of your house with it but you won’t make an ice cream sundae model. You would use Maya for that.
I'm not that optimistic about an algorithm producing useful results from a text-based description given the solution space is any possible combination of any number of shapes. The data structure for 3D objects also depends on your software. Where CAD matters professionally, there usually are strict requirements or business rules that would rule out an "average .OBJ files that had these tags"
Now, I AM optimistic about algorithms solving for useful relationships between pre-defined objects, like routing conduit through a building without collisions or optimizing a lumber cut-list. Finch and Hypar are interesting small companies in this space.
Having an expert (or other more specialised software) optimise an otherwise satisfactory 3d model is going to be vastly more simple a process than what can be a painstaking process of getting the “original vision” for a shape into a digital form.
I’m better with CAD style workflows than I am with Freeform modelling, but I’m also not an experienced enough user of any particular CAD program that I don’t spend ages frustratingly trying to understand where to go to edit the correct part of the chain of operations that built up a particular fillet, chamfer, or sweep, every program is a little different… just different enough to throw me off since I don’t use them often enough to really get over that learning threshold, for instance I’ve used Onshape one time in the last 12 months… spent an hour trying to work something out… and then 5 minutes actually making my edit…
If I could have opened up the design in 2d blueprints form, and put in some kind of multi modal query to the effect of “these are my blueprints, can you convert them to (CAD program of choice), and while you are converting them, increase cross section CS1 vertically to from 8mm to 10mm keeping everything else fixed” … would save me a lot of time making adjustments to peoples 3d models for printing. It can be really annoying to fuck around as much as is sometimes necessary due to individual cad workflow preferences just to make conceptually simple edits like “this hanging pot, but stretched 50% taller, and with all the holes/cutouts kept the same size please”
The ML models might not be able to do a perfect curve for a gorgeous arcing buttress or a complex bolt hole pattern for attaching multiple elements to a primary support structure… but it should be able to do a lot of the sorts of work that people use OpenSCAD for.
I’ve been working on light weight text to cad as part of my participation in Paddle’s AI Launchpad accelerator.
I am an amateur woodworker and wanted easier ways to quickly prototype ideas.
The sweet spot for me is more accurate measurements and better drawings than my pen and paper but without the overhead of firing up Fusion 360 and trying to lay out the 2D then 3D process.
Neither of the above is great for iteratively exploring designs either.
My last project was a custom drill press workbench, and I did the 3D in sketchup and Fusion to get a feel for both tools popular with woodworker hobbyists.
These types of designs are often sold for a few bucks with the project assembly videos posted on YouTube.
I did my initial testing of this using iterative prompts to OpenAI models asking them to refine the design of an outdoor wooden bench with dimensions appropriate for a toddler.
I had some live edge donor wood and wanted it to comply with the thickness of the materials as input.
I was able to prove to myself it could be done with generated scripts for the blender-API.
I set aim at single page that can record spoken audio, perform STT, process it into valid blender Python, export a .glb and display it on the same page.
Making a great demo is a lot of integration work and a lot of LLM programming, pre and post processing, system context refinement etc.
But it’s pretty awesome.
In my experience generating for Fusion is dicier than blender, but I suspect with specialized model training and a bunch of dark art LLM incantations this could become a prosumer tool, and possibly speed along professional work as described in the blog.
So far this is not stuff w complex mechanics or fancy hinges, so it might not meet the threshold of CAD for some. But there are a lot of folks who want “cad like” experiences without having to muck w the tools.
I’d love feedback if anyone is interested in checking it out. Demo day is toward end of this month, I can be reached at the email in my HN profile.
To your point about iterative design - something I learned in school for industrial design is 3D modeling will never be used for iterative design. The closest I’ve seen is VR modeling. But even then, the commitment is too much while instead I can sketch on a napkin instantly. Move to 3D only when I’m satisfied with what I want to construct.
I wouldn't be so worried about LLMs making guns. The SketchIT project found that it's very difficult to describe mechanical device to other humans using just text, images are needed too[0]. I'd also worry about the gun produced being structurally sound, how can one be so sure that the gun barrel hasn't been hallucinated to be too thin?
Guns and other mechanical devices don't exist alone. A gun must interface with a bullet, a part of an aircraft must interface with other parts. So CAD AI must be able to understand the geometric context of the parts it is making.
That being said, I think AI will soon be capable of making mechanical devices. There has been some improvement in physical reasoning benchmarks like PHYRE[1]. Understanding physical reasoning and how multiple objects move with respect to each other is important in the synthesis of new mechanical devices.
SketchIt[0] demonstrated that by making reduced 2DoF description of how pairs of objects in a device may move with respect to each other, it's possible to synthesize a new device which performs the same function.
Solving PHYRE problems requires reasoning with larger degrees of freedom. The first example on the homepage has something like 5 objects which each have 3 positional DoF (translation and rotation). Even reasoning with 3DoF is quite difficult for approaches like those used in SketchIT.
Given that approaches like slotformer[2] already do somewhat well at solving these huge DoF problems, I don't think we're very far from AI being able to design complicated mechanical devices.
There is so much wrong with this article. Throw a little bit of ML pixie dust on everything for more hype? Check. Compare wildly different things as if they were the same? Double check.
digdugdirk has the right idea, and AFAIR, there is some work on that front (https://www.fornjot.app/).
Also the Fiat 500 goes 100km on about 4l of gas, while the Ford F150 uses 7l. No clue where the author gets the idea that the Fiat would get worse mileage, perhaps he's dividing by weight?
Deepfakes, weapons, racist content, copyright infringement, and "digital colonialism"? The author is really grasping at straws for downsides of using generative AI, while downplaying the benefits.
Sure it may help others produce lots of low quality content you don't enjoy, but it will also help you generate content you really enjoy. Especially things you can then manifest in the physical world.
If anything I think it will help elevate more obscure styles, as it creates and effectively infinite number of artists that can produce such styles.
Looking at the images of comparisons (i.e. pig wearing a backpack) I am dreaming of the moment I can 'talk' to an AI engine, and ask it "design XYZ with LEGO bricks, then print me the list so I can order them, and instructions to build it".
(yeah I hate it that Star Trek universe is not signing a deal with Lego)(I would much rather have a Enterprise than any Star Wars item)(https://xkcd.com/1563/)
Solidworks, Creo, AutoCAD, Fusion, etc., can all take their bug ridden unoptimized single threaded rent-seeking monstrosities and stick em where the sun don't shine.
Seriously - if anyone wants to create an absolutely world-changing piece of software, start working on a new CAD kernel that takes the last 50 years of computer science advances into account, because none of the entrenched industry standards have done so. Don't worry about having to provide customer service, because none of the entrenched industry standards worry about that either.
And no - while openCascade and solvespace are impressive, they aren't fully capable, nor do they start from a modern foundation.