More

toth · 2024-06-15T15:28:23 1718465303

> Are you just being overawed by IQ tests, which are notorious for measuring only ability to pass IQ tests?

People like to say things like this, but nothing could be further than the truth. There is a vast literature showing that IQ predicts things like job performance, school performance, income and wealth [1]. IQ is highly persistent across time for fixed individuals. Yes, "intelligence" is not a precisely defined concept, but that doesn't mean that it isn't real. A lot of useful concepts have some vagueness about, even "height" to take the example parodied in the OP.

And "super intelligence" is admittedly even vaguer, it just means sufficiently smarter than humans. If you do have a problem with that presentation just think of specific capabilities a "super intelligence" would be expected to have. For instance, the ability to attain super-human performance in a game (e.g., chess or go) that it had never seen before. The ability to produce fully functional highly complex software from a natural language spec in instants. The ability to outperform any human at any white-collar job without being specifically trained for it.

Are you confident that a machine with all those capabilities are impossible?

[1] https://en.wikipedia.org/wiki/Intelligence_quotient#Social_c...

card_zero · 2024-06-15T15:59:14 1718467154

I think those capabilities are made out of ideas. I think yes, a machine could have ideas, and then it would be a person, and an anticlimax.

ben_w · 2024-06-15T16:38:38 1718469518

> and then it would be a person, and an anticlimax.

It might be a person (but: can you prove those things are sufficient for personhood?), but even then it sure isn't a human person.

And what do you mean by an anticlimax?

To circumvent any question of what it takes to make an AI work, let's posit a brain upload. Just one person, so it's a memetic monoculture — no matter how many instances you make, they'll all have the same skills and same flaws.

Transistors are faster than synapses by about the same ratio to which a marathon runner is faster than continental drift, so even if the only difference is speed, not quality of thought, that's such a huge chasm of difference that I can't see how it would be an anti climax even if the original person is extraordinarily lazy.

toth · 2024-05-05T19:45:08 1714938308

You make a valid point, but I feel there is something in the direction the article is gesturing at...

The mean of the n-dimensional gaussian is an element of R^n, an unbounded space. There's no uninformed prior over this space, so there is always a choice of origin implicit in some way...

As you say, you can shrink towards any point and you get a valid James-Steiner estimator that is strictly better than the naive estimator. But if you send the point you are shrinking towards to infinity you get the naive estimator again. So it feels like the fact you are implicitly selecting a finite chunk of R^n around an origin plays a role in the paradox...

kgwgk · 2024-05-05T19:58:52 1714939132

> But if you send the point you are shrinking towards to infinity you get the naive estimator again.

You get close to it but strictly speaking wouldn’t it always be better than the naive estimator?

toth · 2024-05-05T20:41:58 1714941718

Right, it's a limit at infinity

rssoconnor · 2024-05-05T21:28:20 1714944500

> There's no uninformed prior over this space, so there is always a choice of origin implicit in some way...

You could use an uninformed improper prior.

kgwgk · 2024-05-06T05:49:41 1714974581

You would just need to come up with a way to pick a point at random uniformly from an unbounded space.

rssoconnor · 2024-05-06T12:49:55 1714999795

You can just use the function that is constantly 1 everywhere as your improper prior.

Improper priors are not distributions so they don't need to integrate to 1. You cannot sample from them. However, you can still apply Bayes' rule using improper priors and you usually get a posterior distribution that is proper.

kgwgk · 2024-05-06T13:30:36 1715002236

Sure.

The point is that you wrote that « you can pick any point […] » and when toth pointed out that « there is always a choice of origin implicit in some way » you replied that « you could use an uninformed improper prior. »

However, it seems that we agree that you cannot pick a point using an uninformed improper prior - and in any method for picking a point there will be an implicit departure from that (improper) uniform distribution.

rssoconnor · 2024-05-06T14:15:15 1715004915

Oh.

When I said "you can pick any point P", I meant universal quantification, i.e "for all points P", rather than a randomly chosen P.

I did say "choose P", which was pretty bad phrasing on my part.

toth · 2024-05-05T19:34:58 1714937698

Ditto.

I think that ship has sailed, but I think it's unfortunate that "ReLU(x)" became a popular notation for "max(0,x)". And using the name "rectified linear unit" for basically "positive part" seems like a parody, like insisting on calling water "dihydrogen monoxide".

BlueTemplar · 2024-05-06T09:07:23 1714986443

It hasn't "sailed" as long as they want to communicate with non-machine-learning people.

toth · 2024-04-30T20:36:19 1714509379

To me it sounded like part of the problem is that people on ECMO cannot leave the ICU because at any moment they might have a complication that requires immediate emergency care.

So it's not enough to make them smaller and cheaper, they also have to be made much less prone to these complications. I am sure that will happen in time, but I am also sure we'll be able to grow people new lungs in time

FireBeyond · 2024-04-30T20:45:35 1714509935

Critical care paramedic: that's very much the bigger issue. Some life flight helicopters are being fitted for ECMO and there is NOT much space in a helicopter, once you fit in two providers, a patient on a gurney and care equipment (most HEMS units are Bell 429s and EC/H-135s - MSP uses much larger AW-139s).

https://live.staticflickr.com/3142/2639039443_ba623ddca0_b.j... shows the working space on a -135. Note that access to most of the patient is heavily restricted - only chest and head, really.

Still, to be clear, we are not really at the 'portable' stage either. There's about 65lb of equipment needed for an ECMO patient just for the ECMO itself, beyond other things like Lifepaks for monitoring.

iancmceachern · 2024-05-01T11:09:43 1714561783

Thank yoi for what you do.

If anyone wants to design a portable, integrated ECMO system reach out.

toth · 2024-03-27T17:21:43 1711560103

Is github copilot using GPT-4 or 3.5? I've tried to find out for sure but I can't seem to find the information anywhere

mellosouls · 2024-03-27T17:27:32 1711560452

I think 3.5, that was the last official note.

Copilot Chat uses 4, but it's suspiciously free of confirmation that is also used in the more contextual Copilot (no-chat).

paradite · 2024-03-27T17:25:14 1711560314

GitHub Copilot uses OpenAI Codex, which is a much older model fined-tuned on GPT-3.

Definitely not GPT-4, otherwise it would not be less than $10 a month for constant usage.

csnweb · 2024-03-27T18:08:21 1711562901

The chat part (mostly) uses GPT-4, you can also see which model is called in the request logs. Here is the official announcement: https://github.blog/changelog/2023-11-30-github-copilot-nove...

paradite · 2024-03-27T18:14:32 1711563272

Okay thanks for pointing that out.

I figure if they do this, they have to throttle or nert it somehow since it is cheaper than ChatGPT Plus which also gives access to GPT-4.

csnweb · 2024-03-27T19:05:46 1711566346

It won’t answer questions that are not somehow related to code or computing, I usually don’t need anything else so I didn’t really test the limits of that so far.

toth · 2024-02-01T10:29:22 1706783362

I think matplotlib's main strength is its breadth and power. It really lets you do exactly what you want if you spend enough time fiddling and digging through the documentation.

All this versatility comes at the expense of ease of use. It could certainly do a better job of making the simple common use cases more straightforward.

gnuplot arguably has similar power and versatility and it does make the simple stuff easier.

One thing that matplotlib is IMO bad at is interactive plots. They are very slow, and the controls are not intuitive. 99% of the time you just want to zoom and pan and those should be default actions.

gnuplotlib looks interesting and I will have a look, but these days most of the plots I do are in jupyter notebooks and I really want inline interactive plots so I don't think I will use it much. FWIW, what I use currently is plotly - the interactivity is very good (way better than matplotlib's) and plotly.express is very easy to use for the simple use cases.

toth · 2024-01-29T11:38:33 1706528313

> The originally published equations were "20 or so" because one equation was written for each scalar component.

> Rewriting the equations in vector form reduces the number to the modern number.

And if you use the differential form or 4d tensor notation they get reduced to 1 equation. Of course, for a lot of practical problems this is not very useful and it's better to work with the 3d vector form.

> The variant with 4 equations is the simplified variant for vacuum, which is mostly useless, except for the purpose of studying the propagation of electromagnetic radiation in vacuum.

> Instead of learning a large number of simplified variants of the Maxwell equations with limited applicability, it would have been much better if a manual would present since the beginning the only complete variant that is always true, which must be in integral form, as initially published by Maxwell.

Here I have to strongly disagree. The version of Maxwell's equations that is fundamental and exactly correct [1] is the vacuum version. The ones with magnetization and displacement vectors are only approximations where you assume continuous materials that respond to fields in simple way. In truth, materials are made of atoms and are mostly vacuum: there is no actual displacement vector if you look close enough.

Also the vacuum Maxwell equations are useful in many scenarios. For instance, that's how you compute the energy levels of Hydrogen atom or how you derive QED. Also, you have to start from them to derive the macroscopic versions with magnetization and displacement that you seem to like.

[1] Well, up to non-linear quantum mechanic effects.

adrian_b · 2024-01-29T13:48:52 1706536132

Even the vacuum version is incomplete without adding an equation for force or energy, because no meaning can be assigned to the electromagnetic field or potential otherwise than by its relationship with the force or energy.

Even today, there exists no consensus about which is the correct expression for the electromagnetic force. Most people are happy to use approximate expressions that are known to be valid only in restricted circumstances (like when the forces are caused by interactions with closed currents, or the forces are between stationary charges).

Moreover, when the vacuum equations are written in the simplified form present in most manuals, it is impossible to deduce how they should be applied to systems in motion, without adding extra assumptions, which usually are not listed together with the simple form of the equations (e.g. the curl and the divergence are written as depending on a system of coordinates, so it is not obvious how these coordinates can be defined, i.e. to which bodies they are attached).

While the vacuum equations are fundamental, they may be used as such only in few applications like quantum mechanics, where much more is needed beyond them.

In all practical applications of the Maxwell equations you must use the approximation of continuous media that can be characterized by averaged physical quantities that describe the free and bound carriers of electric charge. The useful form of the Maxwell equations is that complete with electric polarization, magnetization, electric current of the free carriers and electric charge of the free carriers. It is trivial to set all those quantities to zero, to retrieve the vacuum form of the equations.

toth · 2024-01-29T17:37:30 1706549850

I agree that to fully specify electromagnetism you also need to include how the fields affect charged matter. So EM = Maxwell's equations + Lorentz force equation (not sure why you say there is no consensus about what this is, that is new to me).

This is just a matter of taste, but OTOH I would not include descriptions of how some materials respond to the fields in the continuous limit as part of a definition of EM.

It is true that for most terrestrial applications you do need those to do anything useful with EM. But if you want to study plasmas you need to add Navier-Stokes to EM, doesn't mean hydrodynamics is part of EM. To study charged black holes you need EM + GR, but it still makes sense to treat them as mostly separate theories.

neutronicus · 2024-01-29T19:38:54 1706557134

You also need to include how charged matter affects the forcing fields in Maxwell's equations (i.e. moving charges depositing a current field).

I actually basically agree with your viewpoint, I studied Plasma Physics in graduate school in a regime where we did _not_ use Navier-Stokes or constitutive relations and everything was in fact just little smeared-out packets of charge moving according to the Lorentz Force Law and radiating.

PaulHoule · 2024-01-29T12:56:36 1706532996

The fact that you can write it in one equation shows that the theory is very simple because it is an expression of symmetry. E and B are not these two different things related by an inscrutable cross product but just two aspects of the same thing.

scotty79 · 2024-01-29T17:11:44 1706548304

You could write all physics in a single simple equation. deltaW=0 Where deltaW is deviation of the universe from the relevant math.

Writing Maxwell's as 1 equation or 4 or more is just esthetic choice where you decide what to accentuate.

20 might be too much because three dimensions are not really different from each other so the notation that maps over them wholesale is probably a good idea.

4 equations seem perfect if you want to differentiate between classical effects of the electric field and relativistic effects (magnetism).

I don't know if single equation really shows that they really have the same source and the relativity is involved or is it just a matrix mashup of the 4 separate equations that doesn't really provide any insights.

toth · 2024-01-29T17:23:58 1706549038

It's true that you can always define notation to combine all equations you want into one. This means that, by itself, the observation that you can write Maxwell's equations as a single equation doesn't say anything very meaningful.

However, the notation that lets you do this in this specific case is very natural and not specific to Maxwell's equations. Differential forms are very natural objects in differential geometry, mathematicians would have likely introduced them and studied without inspiration from physics. The fact that Maxwell's equations are very simple in this natural geometrical language does say something meaningful about their nature and elegance, I think.

toth · 2024-01-29T11:28:49 1706527729

>You can make the electric field disappear by choosing the right gauge. Same goes for the magnetic field (can't make both disappear together though).

What? No you can't. The fields are invariant under gauge transformations.

ThePhysicist · 2024-01-29T12:37:39 1706531859

You're right, sorry I was thinking of a Lorentz transformation that would make either the magnetic or electric field disappear under certain conditions.

wch4999 · 2024-01-29T13:10:05 1706533805

"transform into each other" would be more appropriate. The gauge choice you mentioned is not totally wrong. The gauge freedom can be used to set the electric field to zero, but only once at a single point.

toth · 2024-01-29T17:17:27 1706548647

Sorry, but gauge transformations do not (by construction) affect the physical fields at all. You cannot set E to 0, even at a point, with a gauge transformation.

toth · 2024-01-04T15:41:21 1704382881

You don't need the look up table. All you are asked for is min/mean/max which can all be computed in one pass without storing the data. All you need is a hash table with 400 entries and 3 floats (running min, mean and max) and and int (count, for updating running mean). That's just 16 bytes, if you use 16 bytes for name of station you can fit everything under 16K.

IO will dominate the running time for this, and JSON parsing will be second.

refulgentis · 2024-01-04T16:00:39 1704384039

I don't have a CS background and when I eventually had to do interviews for Google, "Calculate mean/median/mode of temperatures" was the interview question I went with, intending to avoid BS leetcode*

I always worried it was too easy, but I'm heartened by how many comments miss the insight I always looked for and you named: you don't need to store a single thing.

I do wonder if it'll work here, at least as simply as the tracking vars you mention, with so many rows, and the implicit spirit of "you should be able to handle _all_ data", overflow might become a legitimate concern - ex. we might not be able to track mean as simply as maintaining sum + count vars.

* "oh sorry times up, but the right answer is a red black self balancing terenary tree recursive breadth-first search", ironically, was one of my Apple interviews.

toth · 2024-01-04T17:18:47 1704388727

In general, median and mode are much harder than min/avg/max. You can't compute the former with constant memory in one pass (you can do approximate median, but not exact median).

(Here there is a restricted range for the temperature with only 199 possible values (-99.9 to 99.9 with 0.1 increment) so you could do it constant memory, need something like 4*199 bytes per unique place name))

For the sum overflow is not an issue if you use 64-bit integers. Parse everything to integers in tenths of degree and even if all 1 billion rows are 99.9 temperature for same place name (worst possible casE), you are very far from overflowing.

refulgentis · 2024-01-04T20:45:55 1704401155

Putting this on top comment because I can't edit:

I am silly and wrote 'mode' and shouldn't have :P (wetware error: saw list of 3 items corresponding to leetcode and temperature dataset, my 3 were min/max/average, their 3 are mean/median/mode)

londons_explore · 2024-01-04T16:21:26 1704385286

Maintaining sum+count is more expensive than you'd imagine... Because to maintain the sum, you need to convert the numbers from ascii to decimal. And to do that you at least need to know the memory alignment of the number and the position of the decimal point.

All far more expensive than the state machine approach which is pretty much 'alignment doesn't matter, ascii doesn't matter, we're gonna just handle all possible states that could end up in a 32 bit register').

CamperBob2 · 2024-01-04T17:21:37 1704388897

Streaming calculation of the exact median with no storage at all is non-trivial at best in the general case, and I'm not aware of any way to calculate the mode at all. Any pointers to the methods you used in your interview answers?

If you came up with them on the fly, then... well, sign here for your bonus. Can you start Monday?

feoren · 2024-01-04T19:36:29 1704396989

> Streaming calculation of the exact median with no storage at all is non-trivial at best in the general case

It's not "non-trivial" it's impossible. I'm not sure why people think median can be approximated at all. You need to look at every data point and store a counter for the lesser of: (a) all possible values, or (b) all elements. Just consider a data set with 1 million ones and 999,999 zeroes. Throwing away (or failing to look at) one single number can give you an error of 50%, or 100% for two numbers. If you want to make it really nasty, throw in another million random 64-bit floats between [-0.1, 0) and a million between (1, 1.1]. Four million elements, two million of them unique, in the range [-0.1, 1.1], and failing to account for two elements can get your answer wrong by +/- 1.0.

Unless you start by sorting the list, but I wouldn't call that a "streaming" calculation.

CamperBob2 · 2024-01-04T19:47:06 1704397626

Yeah, that's why I hedged so heavily; AFAIK it's impossible to compute a streaming median (having spent some time trying).

If someone on HN knows how to do it, they will jump in and tell me exactly why it's "trivial," and I'll get some nice R&D work for free. Of course, it will probably involve mounting an FTP account, spinning up a CA and running rsync...

refulgentis · 2024-01-04T20:44:34 1704401074

> If someone on HN knows how to do it, they will jump in and tell me exactly why it's "trivial,"

n.b. to you and your parent comment's commenter: you're the only two who used the word trivial in the entire comments section modulo another comment not in this thread that contains "SQL in theory makes this trivial...". The HNer claiming to build a streaming median function in a weekend fantasy might not come to fruition :(

menaerus · 2024-01-05T12:35:28 1704458128

It certainly is possible to approximate the median: https://aakinshin.net/posts/p2-quantile-estimator/

A quick search shows that Boost.Accumulators have several median estimation implementations available to choose from: https://www.boost.org/doc/libs/1_84_0/doc/html/accumulators/...

> Median estimation based on the P^2 quantile estimator, the density estimator, or the P^2 cumulative distribution estimator.

feoren · 2024-01-05T19:12:59 1704481979

No, it is not possible to approximate the median in the general case. The linked paper does no such thing. Their algorithm can be completely defeated by carefully tailored inputs. They only ever test them against very well-behaved random distributions.

But it brings up an important point: real data that you actually encounter is not a random sampling from "the general case". It is often possible to do a pretty good job approximating the median from real-world data, if you have some understanding of the distribution of that data. You just have to accept the fact that you might be totally wrong, if the data behaves in ways you don't expect. But of course, the more you have to know about your data before running the algorithm, the less useful the algorithm itself is.

The difference is whether you can make guarantees or not. Most algorithm design is concerned about such guarantees: on all possible inputs, this algorithm has at worst such-and-such performance. Hence the reliance on Big-O. You cannot ever make such guarantees on a "median approximator" without specifying some sort of precondition on the inputs.

What guarantees do we want to make? With "an estimator", it'd be nice to say something like: we'd like our approximation to get better given more values. That is: the more values we see, the less the next single value should be able to change our approximation. If you've looked at a billion values, all between [min, max], it'd be nice if you knew that looking at the next one or two values could only have an effect of at most 1 / f(1 billion) for some monotonically increasing f. Median does not have that property: looking at just two more values (even still within the range of [min, max]) could move your final answer all the way from max to min. If you stopped 2 data points earlier, your answer would be as wrong as possible. This remains true, for some inputs, even if you've looked at 10^10^10^10^300 values. The next 2 might change your answer by 100%.

kadoban · 2024-01-05T04:55:19 1704430519

Yeah, you can do it in O(n) time, but you can't do it at all in a streaming fashion. I think you just plain need O(n) memory.

refulgentis · 2024-01-04T18:14:51 1704392091

I am silly and wrote 'mode' and shouldn't have :P (wetware error: saw list of 3 items corresponding to leetcode and temperature dataset, my 3 were min/max/average, their 3 are mean/median/mode)

londons_explore · 2024-01-04T16:17:35 1704385055

Merely finding the start/end of each line will use more computation (in the inner loop) than the approach I outlined. Let alone converting the number from ascii to a float, or looking up the place name in a has table (oh, and the name is variable length, so you're gonna need to find how long the name is first).

toth · 2024-01-04T17:40:38 1704390038

I deleted two previous comments because I realized I misunderstood your proposal. I understand it better now, but I am still confused about something...

Your state machine would need at least 2*160,000 states (you need an extra bit to flag whether you have reached a newline in the last word and need to increment a counter or not), correct? And you are assuming the input is 4 bytes, so won't your transition table need (2^32)*2*160,000 ~ 10^15 entries (at least 3 bytes each)?

londons_explore · 2024-01-04T18:40:50 1704393650

The states don't need to map 1:1 with cities or temperatures. They merely need to encode all information collected so far which is still relevant. They also don't need to represent all possible situations - anything that is super rare (eg. temperature of 95C) can simply be diverted to a special "invalid" state which triggers regular code to take over for those few entries.

toth · 2024-01-04T19:30:35 1704396635

Hmm, still doesn't seem feasible. Even if you only have 256 "relevant" states (which I think you'll agree is far less than what you need) then given a 32-bit input your state transition table is 2^32*256 = 1 Terabyte.

You could shrink your input size to 2 bytes but then you can't work on a word at a time, and for a realistic number of relevant states your transition table is still way bigger than you can fit in even L3 cache.

Unless I am missing something very basic, this doesn't seem like a viable approach.

usefulcat · 2024-01-04T15:59:54 1704383994

> IO will dominate the running time for this, and JSON parsing will be second.

Memory bandwidth might dominate, but probably not I/O. The input file is ~12GB, the machine being used for the test has 32GB, and the fastest and slowest of five runs are discarded. The slowest run will usually be the first run (if the file is not already cached in memory), after which there should be little or no file I/O.

nojvek · 2024-01-07T10:45:41 1704624341

Is there a way to validate from app whether all file pages are cached in memory?

What if the code was run against constraints such as Max memory limit in a docker container.

toth · 2024-01-04T15:48:14 1704383294

s/JSON/string/

Twirrim · 2024-01-04T20:59:20 1704401960

I took a few shots at this in rust (following on from https://www.reddit.com/r/rust/comments/18ws370/optimizing_a_...), and every single time the bottleneck was down to the string parsing, everything else was largely irrelevant.

You can do this whole thing in about 24 seconds, as long as you're smart about how you chunk up the text and leverage threading. https://github.com/coriolinus/1brc/blob/main/src/main.rs

toth · 2024-01-04T21:10:46 1704402646

Interesting, I guess that makes sense.

Having a quick look at your code, couple of thoughts:

   - You shouldn't bother with parsing and validating UTF-8. Just pretend it's ASCII. Non ASCII characters are only going to show up in the station name anyway, and all you are doing with it is hashing it and copying it.

   - You are first chopping the file into line chunks and then parsing the line. You can do it it one go, just look at each character byte by byte until you hit a semicolon, and compute a running hash byte by byte. You can also parse the number into an int (ignoring decimal point) using custom code and be faster than the generic float parser.

   - If instead of reading the file using standard library, you mmap it, should also speed things up a bit.

toth · on Dec 11, 2023

I think what you need is something that is at least as large as the largest Fibonaci number you are going to compute. Since the Fibonnaci numbers grow as `phi^n` where `phi` is the golden ratio (~= 1.618) if you want to use something of the form `A*b^n` you need `b >= phi` and to pick A to make it work out for small `n`. So, if you want integer `b` then 2 is the smallest you can do. But you could also do `b=1.7` or something like `A*17^n/10^n` if you want to use only integers.