This is gold. I spend my days debugging LLM training setups in support of resear...

mentos · on Jan 23, 2024

Im a game developer looking to get into ml/dl. Biggest thing for me is finding a problem with real value that’s not too difficult that I can work on to learn. I believe I found one, curious if you’d offer any thoughts on it.

Right now there are two systems for capturing motion capture data for animations in games/movies, inertial and optical. Inertial is easier and more affordable but comes with more errors/inaccuracies in the capture that requires manual correction. Optical is more accurate and requires less cleanup but has an expensive hardware and space requirement.

My thought is to have someone wear an inertial mocap suit and record both inertial and optical session at the same time and then use ml to learn how to automatically fix the mocap data. Then you could theoretically get the precision of optical capture from inertial recordings after passing it through the ml.

Curious if you think something like this is tractable for a first project? If you have any suggestions for how you’d solve this or if there are any existing projects you could point me I appreciate any help!

HanClinto · on Jan 23, 2024

Yes, I think this is very tractable for a first project. I've played around with using AI to do optical-only with pose detection models -- if I had to do it again, I would probably start with this model and try to get it running locally:

https://github.com/facebookresearch/co-tracker

This sounds like a perfect place for you to get started!

mentos · on Jan 23, 2024

Incredible would never have guessed that pixel tracking is possibe like that! Thank you.

jebarker · on Jan 23, 2024

If you have the means to collect the data then this seems pretty tractable to me. Mostly because you don't need to deal with the raw optical data but rather just the derived trajectories. So data formats and volumes shouldn't be a distraction.

I'm assuming you've done some tutorials or courses on basics of DL and can program Python. At that point the easiest first step would be to just train an MLP to convert a single time step from inertial data to match the optical prediction (presumably they are in the same coordinate system and temporally close enough).

The crux of building something good would be in how you handle the temporal aspect I'd imagine. Clearly you want to use multiple samples over time from the inertial to get more accurate positional estimates. I'd imagine a fixed window of the past n inertial samples would be a good start. I wouldn't worry about more complicated temporal modeling, e.g. RNN or transformer, unless you can't get satisfactory results with the MLP.

My gut says there's probably a non-ML approach to this too, some sort of Kalman Filter etc. Always best to avoid ML if a simpler solution exists :)

mentos · on Jan 23, 2024

Awesome thank you! Really appreciate you taking the time to reply. I’m still learning Python and working my way through this course https://fleuret.org/dlc/ but without a problem I care about its hard. Your encouragement and tips are much appreciated! Knowing that this idea isn’t too ambitious will give me the motivation to keep pushing. Thank you.

grepLeigh · on Jan 23, 2024

I started learning ML at a video game studio too. My 2c is to start on a tractable problem that immediately pays off, so you have the grace to learn something more complicated later. I started with a recommendation system for our in-game store and email promotions (custom coupons based on previous purchases).