More

0b01 · 2025-06-26T18:45:01 1750963501

Portuguese should have the flag of Brazil.

Don’t dim screen on iPhone during conversation.

The tutor should terminate the lesson when its goals are achieved and do a warm handoff.

Overall it’s quite good.

0b01 · 2024-11-03T22:34:01 1730673241

For me I like to use a tournament tracker.

0b01 · on June 3, 2024

If it doesn’t have speech to text, I ain’t listening to all that

0b01 · on Aug 29, 2023

Same way to write self referential structs - use index types into the whatever arena you are using. Indexes are usually 32 bit so they are a bit faster than pointers.

If you are building one off trees such as for parsing and ast transforms, bumpalo is your friend.

In your case, you can look into generational arenas and slabs which are useful for graphs.

0b01 · on Feb 2, 2023

https://tail2.com

Continuous profiling for Linux. It uses ebpf to sample running programs and unwinds the stack, then uploads to server so you know why it look 10 seconds to process that customer’s request last Christmas. :)

Currently supports x86-64 and ARM64

Contact: feel free to join the discord on the website

0b01 · on Jan 29, 2023

The simplest mutual recursion is a Set-Reset flip flop circuit.

0b01 · on Dec 27, 2022

This still makes no sense at all. Self hosting means running a program on your own machine instead of on a cloud platform such as cloudfalre.

geek_at · on Dec 27, 2022

There is an ongoing debate in the selfhosting subreddits if it's self hosted if it's not on your hardware.

Seems yes for rented servers like hetzner but gets more heated when it's about "serverless"

lloydatkinson · on Dec 27, 2022

So what do you want to call it then? "Running on a lower level general compute abstraction with provided networking infrastructure" instead of self-hosted?

0b01 · on Dec 16, 2022

Have you tried DuckDB? It’s the columnar version of SQLite.

genericlemon24 · on Dec 16, 2022

Not yet, I might give it a try.

0b01 · on Dec 16, 2022

DuckDB is also great for stuff like this. You can replace a MapReduce cluster with a single SQL.

zX41ZdbW · on Dec 17, 2022

I checked DuckDB and your statement appears to be untrue.

    >>> con.execute("CREATE TABLE passwords (hash TEXT, count INT)")
    <duckdb.DuckDBPyConnection object at 0x7fc7bceb55f0>
    >>> con.execute("CREATE INDEX ix_hash ON passwords (hash)")
    <duckdb.DuckDBPyConnection object at 0x7fc7bceb55f0>
    >>> con.execute("COPY passwords FROM 'pwned-passwords-sha1-ordered-by-hash-v8.txt' (SEPARATOR ':')")
    100%  
    100%

It froze in an attempt to load the data. Nothing happens after it displays 100%.

mytherin · on Dec 17, 2022

CREATE INDEX currently has the restriction that the index must fit in memory [1]. As the data is already sorted, creating an index is not necessary anyway. The min/max indexes created automatically by the system are sufficient to complete the query in a few milliseconds.

  D CREATE TABLE passwords (hash TEXT, count INT);
  D COPY passwords FROM '~/Downloads/pwned-passwords-sha1-ordered-by-hash-v8.txt' (SEPARATOR ':');
  D .timer on
  D SELECT \* FROM passwords WHERE hash=upper('5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8');
  ┌──────────────────────────────────────────┬─────────┐
  │                   hash                   │  count  │
  │                 varchar                  │  int32  │
  ├──────────────────────────────────────────┼─────────┤
  │ 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8 │ 9545824 │
  └──────────────────────────────────────────┴─────────┘
  Run Time (s): real 0.005 user 0.007455 sys 0.000584

[1] https://duckdb.org/docs/sql/indexes

elecush · on Dec 18, 2022

based on the headline, it must be under 1 ms. love the table

zX41ZdbW · on Dec 17, 2022

I cannot even ssh into the server after trying to use DuckDB. It is completely dead (with all the ducks, what a misery).

The reason is probably that it's using a full index, in contrast with the sparse index in ClickHouse, and maybe it's trying to build it in memory, going to swap (the server has 32 GB memory).

zX41ZdbW · on Dec 17, 2022

Interesting - the database file looks ok, but the data is lost (the table is empty):

  ubuntu@ip-172-31-3-138:~$ ls -l
  total 69561648
  -rw-rw-r-- 1 ubuntu ubuntu 17631031296 Dec 16 23:57 my-db.duckdb
  -rw-rw-r-- 1 ubuntu ubuntu         326 Dec 16 23:53 my-db.duckdb.wal
  -rw-rw-r-- 1 ubuntu ubuntu 16257755606 Jan 21  2022 pwned-passwords-sha1-ordered-by-hash-v8.7z
  -rw-rw-r-- 1 ubuntu ubuntu 37342268646 Dec  2  2021 pwned-passwords-sha1-ordered-by-hash-v8.txt
  ubuntu@ip-172-31-3-138:~$ python3
  Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import duckdb
  >>> con = duckdb.connect(database='my-db.duckdb')
  >>> con.execute("SELECT count(*) FROM passwords").fetchall()
  [(0,)]

mytherin · on Dec 17, 2022

Because DuckDB uses ACID [1] data is loaded in an all-or-nothing manner. As the load was interrupted due to the system running out of memory, the table is expected to be empty.

[1] https://en.wikipedia.org/wiki/ACID

ignoramous · on Dec 17, 2022

Curious: Are you affiliated with ClickHouse or any other Columnar DB project in any way? If so, you may want to add that as a disclosure.

zX41ZdbW · on Dec 18, 2022

Yes, I'm working on ClickHouse, here is my GitHub profile: https://github.com/alexey-milovidov

I'm also trying to follow every existing technology in the data engineering space :)

lazzlazzlazz · on Dec 25, 2022

If you load the data properly (creating the index after insertion, which is definitely preferable in this case), it will load extremely quickly (milliseconds).

You should also disclose your relationship with a competing project. For the record, I use DuckDB in personal projects and love it. You seem to be misusing it. :)

thewisenerd · on Dec 17, 2022

have tested duckdb v0.6.0 2213f9c946

  4e17b76fc101c9db7222e0cd8d6f5eee  pwned-passwords-sha1-ordered-by-hash-v8.txt

  select count(*) from read_csv('pwned-passwords-sha1-ordered-by-hash-v8.txt', delim=':', header=False, columns={'Hash': 'VARCHAR', 'Count': 'INT'});

60.32s, 847223402 rows

  create table hashes as select * from ...

OOM :( set PRAGMA temp_directory

  create table ...

144.92s (83.19s on BATCH CREATE, 61.53s on READ CSV)

  select \* from hashes where Hash = 'F2B14F68EB995FACB3A1C35287B778D5BD785511'; -- secret123

  0.0269s -- 1st
  0.0043s -- 2nd
  0.0026s -- 3rd
  0.0062s -- 4th
  0.0047s -- 5th

edits: attempt to fix formatting

0b01 · on Nov 23, 2022

4. Cast to &'static and *. Expert Rust hacker level.

willglynn · on Nov 24, 2022

`Box::leak()` gets you a `&'static` at runtime. No raw pointers and no `unsafe`, but also no way to free the memory (because `&'static`).

This is a technique I've used in anger: https://github.com/AS207960/xml-serde/pull/8

ibotty · on Nov 24, 2022

I didn't read the pull request in detail, but won't leaking the fields lead to memory leaks when `xml-serde` is used in a long-running application?

willglynn · on Nov 27, 2022

This particular situation had to do with `&'static str`s baked into the program repeatedly getting compiled into `Regex`es at runtime. It wasn't possible to precompile these `Regex`es due to `serde` architectural limitations.

I chose to cache them at runtime by compiling `&'static str`s once and leaking to make a corresponding `&'static Regex`. This is a "leak" insofar as I can't ever release them, but it's leaking into a global cache, and it's bounded because the input strings can't ever be released either. There was a code path which handles dynamic strings, and that path still allocates and frees regexes after the changeset.