Hacker News new | past | comments | ask | show | jobs | submit | lizen_one's comments login

Rust is a "new" language and many packages from other languages get reimplemented in it. This is similar to Julia. Unfortunately, I had the experience that many Julia packages are not of high quality, not maintained, or do not run any more on the newest version.

How is this in Rust?


Like every package repository (or human endeavor in general) it follows Sturgeon's Law: 80% of everything is crap. That said, there's 100k crates on crates.io, and many of them are fantastic (well-supported, actively developed, documented, etc.). For a new user, understanding which are high-quality is a daunting task, and is expedited by just asking an experienced person for specific recommendations.


Honestly if you're writing Rust, just stick with the top 100 downloaded packages, and you'll be fine. That's basically what I do, unless the tasks requires quite specialized work.


Didn’t expect to see such harsh criticism of the Rust ecosystem by this user.


Contrary to the memes, a lot of long running Rust users are happy to tell you about where Rust is deficient.


Not what I meant.


What did you mean?


Fine. I imagine someone posting that they just published their first ever crate. Maybe it’s the first and only Rust binding to some useful library. But cool your horses, this poster says (by implication), because 80% of everything is crap. Of course no one says that out loud. But that is the inevitable conclusion.

Maybe the main point of four average Rust users publishing a crate each is so that burntsushi can publish one great one.


(for whatever it's worth, im not sure why you are downvoted in reply to me; I had upvoted you.)


It could look curt so I can understand it. Thanks.


Although I cannot comment on this specifically for Rust, what I would confidently say is that one of the best methods for finding the “best” dependencies in any language is to read lots of code. Find the popular and/or most useful projects written in the language on GitHub and see which dependencies that project uses and how they are used. At least in my career this method has served me well. For a given problem domain I was able to quickly identify the best/most popular packages to use by reading the code that was heavily used by others. Obviously the more you do this the easier it becomes.


Julia's engineering is notoriously low quality (perhaps because it's more popular for scientific code). Almost any other language has a higher bar for what level of best practices is normal, IME.


I don't know how it's in general, but the few Rust packages I checked seem to be of very high quality.


What car do you have that comes with a LiDAR? Is it already used for a useful driving assistant?


I read or saw a similar paper. I guess that they used (multiple) 3D cameras instead of a LiDAR to get the point cloud. But otherwise it was similar. They used a octree or similar data structure for speed up. What did you use?


I take it you didn't "read or see" anything I wrote above.

But thanks for taking the time to post a comment!


They were talking about the concept of carving out empty space to leave a 3d representation of the thing you’re interested in, which was the primary method in your comment. This can be done with images using the background segmentation, and cutting with the edge/profile of the object, for each 2d view, and assisted with other 2d to 3d methods.

It can be used for live 3d models with single (tracked and moving) or multiple (known and fixed) cameras.

If it was the same that I read, it was a neat paper, but I’m having trouble finding it within a few searches. I don’t recall I’d they used a point cloud or voxels, but the difference between the storage of the 3d structure as a point cloud or voxel doesn’t seem to warrant your response. They’re trivially converted to one another, in this context.


Very interesting. What LiDAR are you using? Does it happen if the sun shines directly into the LiDAR from the front?


Standard commercial unit, no clue. It happens as you say, with sun shining directly in front, especially when passing under bridges or when another vehicle passes the sun (but not in your lane). The stark difference of direct sunlight and long shadows triggers it the most.


LiDAR is finally cheap.

I repeat my comment on LiDAR that I gave a few days ago. The gist is that LiDAR is cheap and you will be able to buy a LiDAR with sufficient resolution for in the next 1-2 years because it will be integrated in normal passenger cars for L2/L3 assistants. These cars are coming out now or in the next year.

LiDAR is finally getting cheap. OEMs (like VW) are very price sensitive. It is estimated the sensors from Valeo cost about 500 dollars. The fact that you see more and more normal passenger cars with higher resolution LiDARs means that LiDARs are getting cheaper.

The Audi A8 used Valeo's (with Ibeo) first generation low resolution LiDAR Scala 1 from the automotive supplier Valeo. Mercedes new models will be using Valeo's second (or third) generation LiDAR. All these are used for L2/L3 assistants. Valeo is a traditional large automotive supplier.

Luminar, a public company from the US, cooperates with Volvo. Some models will come with a LiDAR in the base configuration. These are "new LiDARs" with high resolution.

Innoviz, a 'startup' from Isreal, will supplies LiDARs to VW. Its angular resolution is (in its focus area) about 0.1 (or 0.2) degrees, which is sufficient for higher levels of autonomy and surpasses/equals the resolution of the expensive Velodyne sensors of the past. They will probably be in the same price range. Due to the limited FOV due to the technology, you will need multiply LiDARs.

Many new models from Chinese car brands will also ve equipped with a LiDAR. Most of them with Chinese LiDAR manufacturers like RoboSense or Hesai. Some are equipped by European manufactures like Ibeo/ZF. For example, there is the automotive sensor AT128 by Hesai. It targets normals vehicles (see price range above) and claims a similar performance (except for FOV, so you need multiple) like the Velodyne Ultra Puck (~$50000).

So costs of LiDARs are a not the very expensive obstacle they were in the past. The only problem could be that the new LiDAR manufactures cannot scale up series production. For example, Ibeo just filed for insolvency because they could not close another round after aggressively increasing spending in the past years.


LiDAR is finally getting cheap. OEMs (like VW) are very price sensitive. It is estimated the sensors from Valeo cost about 500 dollars. The fact that you see more and more normal passenger cars with higher resolution LiDARs means that LiDARs are getting cheaper.

The Audi A8 used Valeo's (with Ibeo) first generation low resolution LiDAR Scala 1 from the automotive supplier Valeo. Mercedes new models will be using Valeo's second (or third) generation LiDAR. All these are used for L2/L3 assistants. Valeo is a traditional large automotive supplier.

Luminar, a public company from the US, cooperates with Volvo. Some models will come with a LiDAR in the base configuration. These are "new LiDARs" with high resolution.

Innoviz, a 'startup' from Isreal, will supplies LiDARs to VW. Its angular resolution is (in its focus area) about 0.1 (or 0.2) degrees, which is sufficient for higher levels of autonomy and surpasses/equals the resolution of the expensive Velodyne sensors of the past. They will probably be in the same price range. Due to the limited FOV due to the technology, you will need multiply LiDARs.

Many new models from Chinese car brands will also ve equipped with a LiDAR. Most of them with Chinese LiDAR manufacturers like RoboSense or Hesai. Some are equipped by European manufactures like Ibeo/ZF. For example, there is the automotive sensor AT128 by Hesai. It targets normals vehicles (see price range above) and claims a similar performance (except for FOV, so you need multiple) like the Velodyne Ultra Puck (~$50000).

So costs of LiDARs are a not the very expensive obstacle they were in the past. The only problem could be that the new LiDAR manufactures cannot scale up series production. For example, Ibeo just filed for insolvency because they could not close another round after aggressively increasing spending in the past years.


DVC has had the following problems, when I tested it (half a year ago):

I gets super slow (waiting minutes) when there are a few thousand files tracked. Thousands files have to be tracked, if you have e.g. a 10GB file per day and region and artifacts generated from it.

You are encouraged (it only can track artifacts) if you model your pipeline in DVC (think like make). However, it cannot run tasks it parallel. So it takes a lot of time to run a pipeline while you are on a beefy machine and only one core is used. Obviously, you cannot run other tools (e.g. snakemake) to distribute/parallelize on multiple machines. Running one (part of a) stage has also some overhead, because it does commit/checks after/before running the executable of the task.

Sometimes you get merge conflicts, if you run a (partial parmaretized) stage on one machine and the other part on the other machine manually. These are cumbersome to fix.

Currently, I think they are more focused on ML features like experiment tracking (I prefer other mature tools here) instead of performance and data safety.

There is an alternative implementation from a single developer (I cannot find it right now) that fixes some problems. However, I do not use this because it propably will not have the same development progress and testing as DVC.

This sounds negative but I think it is currently the one of the best tools in this space.


You might be referring to me/Dud[0]. If you are, first off, thanks! I'd love to know more about what development progress you are hoping for. Is there a specific set of features that bar you from using Dud? As far as testing, Dud has a large and growing set of unit and integration tests[1] that are run in Github CI. I'll never have the same resources as Iterative/DVC, but my hope is that being open source will attract collaborators. PRs are always welcome ;)

[0]: https://github.com/kevin-hanselman/dud

[1]: https://github.com/kevin-hanselman/dud/tree/main/integration...


> You are encouraged if you model your pipeline in DVC.

Encouraged to do what?

You might want to slow down on the use of parentheses, we are both getting lost in them.


I assume they meant to say "you are encouraged to use DVC to run your model and experiment pipeline". They want to encourage you to do this because they are trying to build a business around being a data science ops ecosystem. But the truth is that DVC is not a great tool for running "experiments" searching over a parameter space. it could be improved in that regard, but that's just not what I use it for nor is it what I recommend it to other people for.

However it's fantastic for tracking artifacts throughout an project that have been generated by other means, and for keeping those artifacts tightly in sync with Git, and for making it easy to share those artifacts without forcing people to re-run expensive pipelines.


> But the truth is that DVC is not a great tool for running "experiments" searching over a parameter space.

Would love your feedback what's missing there! We've been improving it lately - e.g.

- Hydra support https://dvc.org/doc/user-guide/experiment-management/hydra

- VS Code extension - https://marketplace.visualstudio.com/items?itemName=Iterativ...


Last I checked it wasn't easy to use something like optuna to do hyperparameter tuning with hydra/DVC.

Ideally I'd like the tool I use for data versioning (DVC/git-lfs/gif-annex) to be orthogonal to that which I use for hyperparameter sweeping (DVD/optuna/SageMaker experiments), and orthogonal to that which I use for configuration management (DVC/Hydra/Plain YAML), to that what I use for experimental DAG management (DVC/Makefile)

Optuna is becoming very popular in the data-science/deep learning ecosystem at the moment. It would be great to see more composable tools, rather than having to opt all-in into a given ecosystem.

Love the work that DVC is doing though to tackle these difficult problems though!


Big +1 about composability and orthogonality. I don't want one "do it all" tool, I want a collection of small tools that interoperate nicely. Like how you can use Airflow and DBT together, but neither tool really tries to do what the other one does (not that Airflow is "small", but still).


DVC is great for use cases that don't get to this scale or have these needs. And the issues here are non-trivial to solve. I've spent a lot of time figuring out how to solve them in Pachyderm which is good for use cases where you do need higher levels of scale or might run into merge conflicts with DVC. There's trade-offs though. DVC is definitely easier for a single developer / data scientist to get up and running with.


I think it's worth noting that DVC can be used to track artifacts that have been generated by other tools. For example, you could use MLFlow to run several model experiments, but at the end track the artifacts with DVC. Personally I think that this is the best way to use it.

However I agree that in general it's best for smaller projects and use cases. for example, it still shares the primary deficiency of Make in that it can only track files on the file system, and now things like ensuring a database table has been created (unless you 'touch' your own sentinel files).


The alternative tool you are referring to is `Dud` I believe

Dvc is the best tool (I found) inspite of being dead slow and complex (trying to do many things).

What alternatives would you recommend?


What’s best if parallel step processing is required?


Yeah we had a lot of problems with things getting out of sync and we just got tired of it


I used Julia in a robotics project doing statistics/estimation/easy optimization but not deep learning. I also do ML/DL:

Julia vs. Python

- PyTorch is standard and it is hard to convince other people to switch

- long compile time on startup during deployment (not so good for a robot) but also for plotting; other people really hated this

Julia vs. C++

- Julia has a JIT and is MUCH faster than Python if you cannot write it as a sequence of numpy operations, e.g. if you have loops and if-blocks in the main loop; C++ obviously also shines here

- however, similar to Python you can only detect problems of the code when running it - the linters etc. are not good enough; hence, I also fear changing only a few lines; programming in C++ is much easier and you have much more confidence that the code is correct if it compiles

After learning JAX in Python, which compiles numeric code JIT, I have almost no reason using Julia anymore. Of course, DifferentialEquations.jl and many optimization libraries are top notch.


I guess the text was extracted using two different methods. One results in 0.8TB and the other in 0.5TB text.

1) I assume 1TB (not TiB) of uncompressed (?) text

2) I assume one character is one byte

3) I assume 5 (actually it seems to be 4.7 in English) characters per word

So 1TB/1B/5 = 1.0E12/5 ~= 2.0E11 = 0.2T = 200B words.

Your article mentioned that Chinchilla is trained on 1.4T tokens. So there is quite some difference.

The article also mentions different mysterious book data sets with 27B tokens, 560B tokens, or 390B tokens.

The latter datasets were made by Google. So you are still behind Google massive book dataset even if you use probably the largest book dataset "available" to people or instituions outside of Google.

EDIT: I thought I made a mistake, but T stands for trillion or tera which are both 1E12.


This sounds like a very interesting area! I guess you are the "Statistical Process Monitoring"/"Control charts"/"Shewhart charts" [0] for images. Very cool!

Is this correct or is your solution totally different? In what aspect is it most similar and most different from "Control charts"?

Are there any keywords for interested hackernews readers to research this further and play with this concept? Is it correct that you do "just" outlier detection on the embeddings of the images? I guess it works something like this:

1) Image --CNN--> Embedding: maybe enforce (properties) of distribution on the embedding (something like VAE)

2) Approximate this distribution and call a (sequence of) images an outlier if its likelihood is small. Alternatively, compare the empirical distribution of a few collected images to a distribution of "good images", e.g. via embedding into RKHS.

What type of anomalies can be detected? Does in evaluate each image separately (i.e. it cannot differentiate between objects going from left to right) or does it "understand" short sequences of images? The latter sound even more interesting. Could you provide some keywords for it.

On the production line, there are already cameras and computer vision products, e.g. Halcon. These can be used to "drag/drop" a computer vision pipeline together. Could your software be integrated into it such that the output can be further processed in Halcon etc. ?

[0]: https://en.wikipedia.org/wiki/Control_chart


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: