Hacker News new | past | comments | ask | show | jobs | submit | bluechair's comments login

Suggestion to the void: update the map so that it shows the counties and districts without power.

If you click on the state, you get a county view.

I had this exact same thought yesterday.

I’d go so far as to add one more layer to monitor this one and stop adding layers. My thinking is that this meta awareness is all you need.

No data to back my hypothesis up. So take it for what it’s worth.


This is where I was headed but I think you said it better. Some kind of executive process monitoring the situation, the random stream of consciousness and the actual output. Looping back around to outdated psychology you have the ego which is the output (speech), the super ego is the executive process and the id is the <think>internal monologue</think>. This isn't the standard definition of those three but close enough.

My thought on the same guess being - all tokens live in same latent space or in many spaces and each logical units train separate of each other…?

Do I understand correctly that these apps are able to bypass OS permissions of whether to allow location data?


I wonder if apps are abusing background app refresh to do this on iOS.

My understanding is that it isn't difficult to create a background task that can periodically make network requests. Just have a background task make a HTTP request including some unique identifier to some ad network server, then have the server handle IP geolocation.

While the accuracy won't be great on a lot of mobile networks, you can get pretty granular on wifi as some ISPs have their IPs as granular as a neighborhood.

I disable background app refresh for almost all apps in anticipation of this and haven't had a degredation in app experience.

I noticed something when using 1Blocker on iOS, which creates a dummy on-device VPN to block tracker IP requests. After I turned off background app refresh, I noticed that the number of blocked requests went down a lot. While some were innocuous diagnostics, like Sentry, the vast majority were not.

I'd appreciate if someone familiar with iOS development could weigh in on if this would be practical or not, given the all of the execution limits of background tasks.


> you can get pretty granular on wifi as some ISPs have their IPs as granular as a neighborhood

I’ve heard that this might be the case in some places in the USA. Meanwhile, I have not seen that level of granularity for residential IP addresses in Norway for example.

The MaxMind GeoIP databases include information about how accurate (granular) the location data is for each entry in their db according to https://support.maxmind.com/hc/en-us/articles/4407630607131-...

Has anyone done analysis on the MaxMind GeoIP data to see how the granularity of the data differs between different cities and countries and published anything about that online?


I'm in the US and my current IP address puts me in an area about 30 miles away currently. However, last year up until a few weeks ago my IP would place me in my current ZIP code (using ipinfo).

My city is comprised of several ZIP codes so you could have figured out where I live within a ~1.5 mile radius.

The granularity may not matter that much though. You can infer a fair bit of data. If you remove mobile network IP addresses, which tend to be quite vague here, you can sort of tell how often someone leaves the house, goes on vacation, or if they visit a friend/family member often.


>I'm in the US and my current IP address puts me in an area about 30 miles away currently. However, last year up until a few weeks ago my IP would place me in my current ZIP code (using ipinfo).

>My city is comprised of several ZIP codes so you could have figured out where I live within a ~1.5 mile radius.

How do you know that it accurately knows your location down to the zip code level, and not just that your zip code just happened to match up? After all, a broken clock is right twice a day.

>The granularity may not matter that much though. You can infer a fair bit of data. If you remove mobile network IP addresses, which tend to be quite vague here, you can sort of tell how often someone leaves the house, goes on vacation, or if they visit a friend/family member often.

That might be useful for stalker-ish reasons, but it requires work to implement, and it's unclear why advertisers would care about this sort of stuff. You go to work 9-5 and visit your friends on weekends, how can you turn that into money? "people with a job and friends" isn't exactly a very lucrative marketing demographic.


Meanwhile, working on legitimate GPS requests in an app, my fiber optic ISP has the GPS of my IP about 2 streets up from where I live. I took a stroll and sure enough there's a big ol' grey communications box there.


You know, I'm totally okay with that.


What app actually needs background refresh? I suppose messaging (sms, iMessage) and email. Assuming you want those async fetched and not pulled on app open. Curious what you’ve found you left enabled or had to enable because I agree with overly restricting apps.


"a significant amount of this geolocation dataset appears to be inferred by IP address to geolocation lookups, meaning the vendor or their source is deriving the user's geolocation by checking their IP address rather than by using GNSS [Global Navigation Satellite System]/GPS data. That would suggest that the data is not being sourced entirely from a location data SDK."

Probably it would use the location if the permission was enabled, otherwise fall back to IP geolocation

Real-time bidding is a privacy nightmare - basically spraying your actions in real-time to every ad provider, with a pinky promise that they won't abuse it.


Pinky promises from scoundrels. Pretty much with that group asking the pinky promise is to provoke abuse of it.


No. Wherever fine grained location data is available, users granted it.

I don’t know why Candy Crush would require fine grained data, but I am pretty confident CC doesn’t ask for it.


It's not even listed as a permission on the manifest, so it can't even request it: https://play.google.com/store/apps/details?id=com.king.candy...


This almost guaranteed to be a waste of time for others, though, because you created them for yourself, perhaps less so for you.

Learning isolated facts via flashcard give you the illusion of learning something. Most likely, when it comes time to apply it, it will not surface.


I got very positive feedback from students preparing for high school keystone exams (given that it can generate questions from a given online resource / PDF) so I'm pretty happy about that!


Didn’t see it mentioned but good to know about: UCR matrix profile.

The Matrix Profile is honestly one of the most underrated tools in the time series analysis space - it's ridiculously efficient. The killer feature is how it just works for finding motifs and anomalies without having to mess around with window sizes and thresholds like you do with traditional techniques. Solid across domains too, from manufacturing sensor data to ECG analysis to earthquake detection.

https://www.cs.ucr.edu/~eamonn/MatrixProfile.html


What do you mean you don't have to mess around with window sizes? Matrix profile is highly dependent on the window size.


The MP is so efficent that you can test ALL window lengths at once! This is called MADRID [a].

[a] Matrix Profile XXX: MADRID: A Hyper-Anytime Algorithm to Find Time Series Anomalies of all Lengths. Yue Lu, Thirumalai Vinjamoor Akhil Srinivas, Takaaki Nakamura, Makoto Imamura, and Eamonn Keogh. ICDM 2023.


Thank you for your kind words ;-)


MP is one of the best univariate methods, but it's actually mentioned in the article.


Thanks for sharing; I am most intrigued by the sales pitch. But the website is downright ugly.

This is a better presentation by the same folks. https://matrixprofile.org/


I don't think it's being updated. Latest blog posts are from 2020, and Github repos haven't seen commits for the last 5-6 years. MP went a long way since then.


I don’t think it’s the same people.


Ah, you are right. I got the link from the original URL, so I just assumed. Thanks for the correction.


What is the relationship? And why is there a foundation for something like this?


Are you being serious? The first page actually has information on it. You can add margins in the devtools.


Covered in section 6.2.1.


What does it do? Anything to do with matrices, like, from math?


It does convolution of each sub sequence across the series length, and then shows the distance of the closest match. This can detect both outliers (long distance from closest match) as well as repeated patterns (short distance).



I saw this hypothesis in another thread. Although interesting, I did wonder why a regular helicopter doing the same thing would not be sufficient.

I don’t now anything about such scans but that was my immediate rebuttal.

Like you, though, at least I find this explanation plausible.


Gas mileage, safety, noise, cost, on top of the fact they already have tons of long range single prop drones kitted out with surveillance gear. Almost all of the videos look like predator style drones flying search grids. Maybe they have some large quadcopter drones in the air but I haven’t seen a single photo or video of anything that looks like one.


The military has all sorts of cool techniques to find things without using drones, which makes me wonder if it's a private party conducting a search. Or is this in order to make it deniable for the military - not _looking_ like a military op


Yeah wouldn't they pull out all the stops if this were the case? Including running these drones during the day and all night (and making up a cover story)?


Has anyone seen how these constraints affect the quality of the output out of the LLM?

In some instances, I'd rather parse Markdown or plain text if it means the quality of the output is higher.


Working with OpenAI's models I've found a very good strategy is to have two passes if you can afford the extra tokens: one pass uses a heavy model and natural language with markdown sections discussing the reasoning and providing a final natural language answer (ideally labeled clearly with a markdown header). The second pass can use a cheaper and faster model to put the answer into a structured output format for consumption by the non-LLM parts of the pipeline.

You basically use JSON schema mode to draw a clean boundary around the wishy-washy language bits, using the LLM as a preprocessor to capture its own output in a useful format.


It depends how fine-tuned the model is to JSON output.

Also, you need to tell the model the schema. If you don't you will get more weird tokenization issues.

For example, if the schema expects a JSON key "foobarbaz" and the canonical BPE tokenization is ["foobar", "baz"], the token mask generated by all current constrained output libraries will let the model choose from "f", "foo", "foobar" (assuming these are all valid tokens). The model might then choose "foo", and then the constraint will force eg. "bar" and "baz" as next tokens. Now the model will see ["foo", "bar", "baz"] instead of ["foobar", "baz"] and will get confused [0]

If the model knows from the prompt "foobarbaz" is one of the schema keys, it will generally prefer "foobar" over "foo".

[0] In modern models these tokens are related, because of regularization but they are not the same.


YMMV, it's a negative effect in terms of "reasoning" but the delta isn't super significant in most cases. It really depends on the LLM and whether your prompt is likely to generate a JSON response to begin with, the more you have to coerce the LLM the less likely it is to generate sane input. With smaller models you more quickly end up at the edge of space where the LLM has meaningful predictive power and so the outputs start getting closer to random noise.

FWIW measured by me using a vibes based method, nothing rigorous just a lot of hours spent on various LLM projects. I have not used these particular tools yet but ollama was previously able to guarantee json output through what I assume is similar techniques and my partner and I worked previously on a jsonformer-like thing for oobabooga, another LLM runtime tool.


We’ve been keeping a close eye on this as well as research is coming out. We’re looking into improving sampling as a whole on both speed and accuracy.

Hopefully with those changes we might also enable general structure generation not only limited to JSON.


Who is "we"?


I authored the blog with some other contributors and worked on the feature (PR: https://github.com/ollama/ollama/pull/7900).

The current implementation uses llama.cpp GBNF grammars. The more recent research (Outlines, XGrammar) points to potentially speeding up the sampling process through FSTs and GPU parallelism.


If you want avoid startup cost, llguidance [0] has no compilation phase and by far the fullest JSON support [1] of any library. I did a PoC llama.cpp integration [2] though our focus is mostly server-side [3].

[0] https://github.com/guidance-ai/llguidance [1] https://github.com/guidance-ai/llguidance/blob/main/parser/s... [2] https://github.com/ggerganov/llama.cpp/pull/10224 [3] https://github.com/guidance-ai/llgtrt


I have been thinking about your PR regularly, and pondering about how we should go about getting this merged in.

I really want to see support for additional grammar engines merged into llama.cpp, and I'm a big fan of the work you did on this.


This looks really useful. Thank you!


Thank you for the details!


I can say that I was categorically wrong about the utility of things like instructor.

It’s easy to burn a lot of tokens but if the thing you’re doing merits the cost? You can be a bully with it and while its never the best, 95% as good for zero effort is a tool in one’s kit.


There was a paper going around claiming that structured outputs did hurt the quality of the output, but it turns out their experiment setup was laughably bad [0].

It looks like, so long as you're reasonable with the prompting, you tend to get better outputs when using structure.

0. https://blog.dottxt.co/say-what-you-mean.html


I’ve seen one case where structured output was terrible: OCR transcription of handwritten text in a form with blanks. You want a very low temperature for transcription, but as soon as the model starts to see multiple blank sequences, it starts to hallucinate that “” is the most likely next token.


same here. I noticed that when you ask model to generate elaborate responses in natural text, and then come up with an answer, quality is orders of magnitude better, and something in line you would expect human-like reasoning.

asking LLM to directly generate JSON gives much worser results, similar to either random guess or intuition.


That was funny even if I don’t agree with the sentiment.


Who is this guy and why should we listen to him?


Given that he has no formal qualifications in neuroscience or any related field you shouldn’t trust him. The author of the book posted the thread. Anyone offering to offer “actionable insights to fortify the brain against degenerative diseases and enhance cognitive performance” should have a prefix of “Dr.” with MD or PhD after their name, and peer reviewed experiments as evidence. He is a “certified IT project manager” and is calling his personal anecdotes “experiments.”


Perplexity says:

Said Hasyim is a certified IT project manager and an author with a strong focus on productivity enhancement. He has dedicated significant time to self-experimentation and research in areas such as bio-hacks, willpower, lifestyle changes, neuroplasticity, and sleep routines, which have informed his writing and personal development strategies[1][2].

### Credentials and Career - *IT Project Management*: Said Hasyim is certified in IT project management, which underscores his professional background in technology and management. - *Author*: He has authored several books aimed at improving productivity and personal development. His works include the Peak Productivity book series, which covers topics such as optimizing one's body's clock, enhancing self-control, and leveraging brain plasticity for better memory and learning[2][3]. - *Entrepreneurship*: Beyond writing, Hasyim is also described as a serial tech entrepreneur[4].

### Publications Said Hasyim's notable publications include: - *Peak Human Clock*: A guide on optimizing daily routines around the body's natural rhythms to enhance productivity. - *Peak Self-Control*: Focuses on building willpower to achieve significant life goals. - *Peak Brain Plasticity*: Discusses methods to improve memory and cognitive function through neuroplasticity[2][3].

Said Hasyim's work is recognized for its practical strategies and actionable advice aimed at helping individuals maximize their productivity and unlock their potential.

Sources [1] Said Hasyim - Award-winning Author - indieBRAG https://www.bragmedallion.com/self-published-authors/said-ha... [2] Said Hasyim - The Independent Author Network https://www.independentauthornetwork.com/said-hasyim.html [3] Said Hasyim – Audio Books, Best Sellers, Author Bio | Audible.com https://www.audible.com/author/Said-Hasyim/B08MYDCMFP [4] Said Hasyim: Serial Tech Entrepreneur, Author, and Lifetime Student https://www.saidhasyim.com [5] Said Hasyim - Singapore | Professional Profile | LinkedIn https://sg.linkedin.com/in/saidhasyim


The importance of SRS is often overstated.

To my knowledge, there isn’t a single study showing SRS as effective for language learning where it was an experimental variable.

There’s anecdotal evidence thrown about, which gives us some indication that it’s helpful. But I have doubts that it’s a good return on investment.

To avoid diving deep into long arguments about this or that, I’ll keep my advice short: If you use an SRS, make sure that the item your test goes through the brain structures you want to get good at, eventually reading can help with listening, but because you’re not processing the language through the typical brain structures that handle it, you’re delaying getting good until you’ve exercised these “muscles”.

Also, don’t learn words in isolation. Better is to learn the words in context. Better yet is to vary the practice, maybe hookup an LLM to vary the cloze word, if that’s your cup of tea.

Use audio if possible. If you’re comfortable with the language, use a TTS.


These criticisms are all of text-only flashcards, not of spaced repetition itself. It is a common mistake to conflate the two, but it's perfectly easy to have videos, images, and audio on your flashcards in Anki.

Spaced repetition is a fundamental learning method that is attuned to how the brain works. It's not really tied to a specific method like flashcards. Rather, flashcards are merely the most logical and easy implementation of spaced repetition.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: