Hacker Newsnew | past | comments | ask | show | jobs | submit | 0b01's commentslogin

Portuguese should have the flag of Brazil.

Don’t dim screen on iPhone during conversation.

The tutor should terminate the lesson when its goals are achieved and do a warm handoff.

Overall it’s quite good.


For me I like to use a tournament tracker.


If it doesn’t have speech to text, I ain’t listening to all that


Same way to write self referential structs - use index types into the whatever arena you are using. Indexes are usually 32 bit so they are a bit faster than pointers.

If you are building one off trees such as for parsing and ast transforms, bumpalo is your friend.

In your case, you can look into generational arenas and slabs which are useful for graphs.


https://tail2.com

Continuous profiling for Linux. It uses ebpf to sample running programs and unwinds the stack, then uploads to server so you know why it look 10 seconds to process that customer’s request last Christmas. :)

Currently supports x86-64 and ARM64

Contact: feel free to join the discord on the website


The simplest mutual recursion is a Set-Reset flip flop circuit.


This still makes no sense at all. Self hosting means running a program on your own machine instead of on a cloud platform such as cloudfalre.


There is an ongoing debate in the selfhosting subreddits if it's self hosted if it's not on your hardware.

Seems yes for rented servers like hetzner but gets more heated when it's about "serverless"


So what do you want to call it then? "Running on a lower level general compute abstraction with provided networking infrastructure" instead of self-hosted?


Have you tried DuckDB? It’s the columnar version of SQLite.


Not yet, I might give it a try.


DuckDB is also great for stuff like this. You can replace a MapReduce cluster with a single SQL.


I checked DuckDB and your statement appears to be untrue.

    >>> con.execute("CREATE TABLE passwords (hash TEXT, count INT)")
    <duckdb.DuckDBPyConnection object at 0x7fc7bceb55f0>
    >>> con.execute("CREATE INDEX ix_hash ON passwords (hash)")
    <duckdb.DuckDBPyConnection object at 0x7fc7bceb55f0>
    >>> con.execute("COPY passwords FROM 'pwned-passwords-sha1-ordered-by-hash-v8.txt' (SEPARATOR ':')")
    100%  
    100% 
It froze in an attempt to load the data. Nothing happens after it displays 100%.


CREATE INDEX currently has the restriction that the index must fit in memory [1]. As the data is already sorted, creating an index is not necessary anyway. The min/max indexes created automatically by the system are sufficient to complete the query in a few milliseconds.

  D CREATE TABLE passwords (hash TEXT, count INT);
  D COPY passwords FROM '~/Downloads/pwned-passwords-sha1-ordered-by-hash-v8.txt' (SEPARATOR ':');
  D .timer on
  D SELECT \* FROM passwords WHERE hash=upper('5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8');
  ┌──────────────────────────────────────────┬─────────┐
  │                   hash                   │  count  │
  │                 varchar                  │  int32  │
  ├──────────────────────────────────────────┼─────────┤
  │ 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8 │ 9545824 │
  └──────────────────────────────────────────┴─────────┘
  Run Time (s): real 0.005 user 0.007455 sys 0.000584
[1] https://duckdb.org/docs/sql/indexes


based on the headline, it must be under 1 ms. love the table


I cannot even ssh into the server after trying to use DuckDB. It is completely dead (with all the ducks, what a misery).

The reason is probably that it's using a full index, in contrast with the sparse index in ClickHouse, and maybe it's trying to build it in memory, going to swap (the server has 32 GB memory).


Interesting - the database file looks ok, but the data is lost (the table is empty):

  ubuntu@ip-172-31-3-138:~$ ls -l
  total 69561648
  -rw-rw-r-- 1 ubuntu ubuntu 17631031296 Dec 16 23:57 my-db.duckdb
  -rw-rw-r-- 1 ubuntu ubuntu         326 Dec 16 23:53 my-db.duckdb.wal
  -rw-rw-r-- 1 ubuntu ubuntu 16257755606 Jan 21  2022 pwned-passwords-sha1-ordered-by-hash-v8.7z
  -rw-rw-r-- 1 ubuntu ubuntu 37342268646 Dec  2  2021 pwned-passwords-sha1-ordered-by-hash-v8.txt
  ubuntu@ip-172-31-3-138:~$ python3
  Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import duckdb
  >>> con = duckdb.connect(database='my-db.duckdb')
  >>> con.execute("SELECT count(*) FROM passwords").fetchall()
  [(0,)]


Because DuckDB uses ACID [1] data is loaded in an all-or-nothing manner. As the load was interrupted due to the system running out of memory, the table is expected to be empty.

[1] https://en.wikipedia.org/wiki/ACID


Curious: Are you affiliated with ClickHouse or any other Columnar DB project in any way? If so, you may want to add that as a disclosure.


Yes, I'm working on ClickHouse, here is my GitHub profile: https://github.com/alexey-milovidov

I'm also trying to follow every existing technology in the data engineering space :)


If you load the data properly (creating the index after insertion, which is definitely preferable in this case), it will load extremely quickly (milliseconds).

You should also disclose your relationship with a competing project. For the record, I use DuckDB in personal projects and love it. You seem to be misusing it. :)


have tested duckdb v0.6.0 2213f9c946

  4e17b76fc101c9db7222e0cd8d6f5eee  pwned-passwords-sha1-ordered-by-hash-v8.txt

  select count(*) from read_csv('pwned-passwords-sha1-ordered-by-hash-v8.txt', delim=':', header=False, columns={'Hash': 'VARCHAR', 'Count': 'INT'});
60.32s, 847223402 rows

  create table hashes as select * from ...
OOM :( set PRAGMA temp_directory

  create table ...
144.92s (83.19s on BATCH CREATE, 61.53s on READ CSV)

  select \* from hashes where Hash = 'F2B14F68EB995FACB3A1C35287B778D5BD785511'; -- secret123

  0.0269s -- 1st
  0.0043s -- 2nd
  0.0026s -- 3rd
  0.0062s -- 4th
  0.0047s -- 5th
edits: attempt to fix formatting


4. Cast to &'static and *. Expert Rust hacker level.


`Box::leak()` gets you a `&'static` at runtime. No raw pointers and no `unsafe`, but also no way to free the memory (because `&'static`).

This is a technique I've used in anger: https://github.com/AS207960/xml-serde/pull/8


I didn't read the pull request in detail, but won't leaking the fields lead to memory leaks when `xml-serde` is used in a long-running application?


This particular situation had to do with `&'static str`s baked into the program repeatedly getting compiled into `Regex`es at runtime. It wasn't possible to precompile these `Regex`es due to `serde` architectural limitations.

I chose to cache them at runtime by compiling `&'static str`s once and leaking to make a corresponding `&'static Regex`. This is a "leak" insofar as I can't ever release them, but it's leaking into a global cache, and it's bounded because the input strings can't ever be released either. There was a code path which handles dynamic strings, and that path still allocates and frees regexes after the changeset.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: