Hacker News new | past | comments | ask | show | jobs | submit login

I am writing a tabular data textbook for O'Reilly - on building ML systems with a feature store.

I try to be opinionated about modelling - XGBoost is all you really need, but the challenges are more like you say - how to prevent data leakage (ASOF LEFT JOIN (or use a feature store)), separating model-independent data transformations from model-specific data transformations, APIs for things like time-series splits, logging, monitoring, building and operating the pipelines. All pretty standard software engineering in Python nowadays.

Free chapters:

https://www.hopsworks.ai/lp/oreilly-book-building-ml-systems...




Personally I am very interested on building big data pipeline for machine learning with initially batch and then real-time data of ECG and seismic, for CVDs screening/early detection and earthquakes early detection/prediction respectively. Any idea when the completed book will be available?

Just wondering what is the main difference between your book and this book, Architecting Data and Machine Learning Platforms also from O'Reilly:

https://www.oreilly.com/library/view/architecting-data-and/9...


My book is a hands-on book, where you build AI systems. The first 4 chapters are already out.

I have run a course at KTH for years, here are the AI systems they built in 2024 over a 2-3 week period. There was an earthquake project amongst them!

https://id2223kth.github.io/assignments/2024/ID2223Projects2...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: