All the pipelines are async, so for ingestion we have typically seen that R2R ca...

All the pipelines are async, so for ingestion we have typically seen that R2R can saturate the vector db or embedding provider. We don't yet have backpressure so it is up to the client to rate limit.

Ingestion is pretty straightforward, you can call R2R directly or use the client-server interface to pass the html files in directly to the ingest_files endpoint (https://r2r-docs.sciphi.ai/api-reference/endpoint/ingest_fil...).

The data parsers are all fairly simple and easy to customize. Right now we use bs4 for handling HTML but have been considering other approaches.

What specific features around ingestion have you found lacking?