This is where I was headed but I think you said it better. Some kind of executive process monitoring the situation, the random stream of consciousness and the actual output. Looping back around to outdated psychology you have the ego which is the output (speech), the super ego is the executive process and the id is the <think>internal monologue</think>. This isn't the standard definition of those three but close enough.
I wonder if apps are abusing background app refresh to do this on iOS.
My understanding is that it isn't difficult to create a background task that can periodically make network requests. Just have a background task make a HTTP request including some unique identifier to some ad network server, then have the server handle IP geolocation.
While the accuracy won't be great on a lot of mobile networks, you can get pretty granular on wifi as some ISPs have their IPs as granular as a neighborhood.
I disable background app refresh for almost all apps in anticipation of this and haven't had a degredation in app experience.
I noticed something when using 1Blocker on iOS, which creates a dummy on-device VPN to block tracker IP requests. After I turned off background app refresh, I noticed that the number of blocked requests went down a lot. While some were innocuous diagnostics, like Sentry, the vast majority were not.
I'd appreciate if someone familiar with iOS development could weigh in on if this would be practical or not, given the all of the execution limits of background tasks.
> you can get pretty granular on wifi as some ISPs have their IPs as granular as a neighborhood
I’ve heard that this might be the case in some places in the USA. Meanwhile, I have not seen that level of granularity for residential IP addresses in Norway for example.
Has anyone done analysis on the MaxMind GeoIP data to see how the granularity of the data differs between different cities and countries and published anything about that online?
I'm in the US and my current IP address puts me in an area about 30 miles away currently. However, last year up until a few weeks ago my IP would place me in my current ZIP code (using ipinfo).
My city is comprised of several ZIP codes so you could have figured out where I live within a ~1.5 mile radius.
The granularity may not matter that much though. You can infer a fair bit of data. If you remove mobile network IP addresses, which tend to be quite vague here, you can sort of tell how often someone leaves the house, goes on vacation, or if they visit a friend/family member often.
>I'm in the US and my current IP address puts me in an area about 30 miles away currently. However, last year up until a few weeks ago my IP would place me in my current ZIP code (using ipinfo).
>My city is comprised of several ZIP codes so you could have figured out where I live within a ~1.5 mile radius.
How do you know that it accurately knows your location down to the zip code level, and not just that your zip code just happened to match up? After all, a broken clock is right twice a day.
>The granularity may not matter that much though. You can infer a fair bit of data. If you remove mobile network IP addresses, which tend to be quite vague here, you can sort of tell how often someone leaves the house, goes on vacation, or if they visit a friend/family member often.
That might be useful for stalker-ish reasons, but it requires work to implement, and it's unclear why advertisers would care about this sort of stuff. You go to work 9-5 and visit your friends on weekends, how can you turn that into money? "people with a job and friends" isn't exactly a very lucrative marketing demographic.
Meanwhile, working on legitimate GPS requests in an app, my fiber optic ISP has the GPS of my IP about 2 streets up from where I live. I took a stroll and sure enough there's a big ol' grey communications box there.
What app actually needs background refresh? I suppose messaging (sms, iMessage) and email. Assuming you want those async fetched and not pulled on app open. Curious what you’ve found you left enabled or had to enable because I agree with overly restricting apps.
"a significant amount of this geolocation dataset appears to be inferred by IP address to geolocation lookups, meaning the vendor or their source is deriving the user's geolocation by checking their IP address rather than by using GNSS [Global Navigation Satellite System]/GPS data. That would suggest that the data is not being sourced entirely from a location data SDK."
Probably it would use the location if the permission was enabled, otherwise fall back to IP geolocation
Real-time bidding is a privacy nightmare - basically spraying your actions in real-time to every ad provider, with a pinky promise that they won't abuse it.
I got very positive feedback from students preparing for high school keystone exams (given that it can generate questions from a given online resource / PDF) so I'm pretty happy about that!
Didn’t see it mentioned but good to know about: UCR matrix profile.
The Matrix Profile is honestly one of the most underrated tools in the time series analysis space - it's ridiculously efficient. The killer feature is how it just works for finding motifs and anomalies without having to mess around with window sizes and thresholds like you do with traditional techniques. Solid across domains too, from manufacturing sensor data to ECG analysis to earthquake detection.
The MP is so efficent that you can test ALL window lengths at once! This is called MADRID [a].
[a] Matrix Profile XXX: MADRID: A Hyper-Anytime Algorithm to Find Time Series Anomalies of all Lengths. Yue Lu, Thirumalai Vinjamoor Akhil Srinivas, Takaaki Nakamura, Makoto Imamura, and Eamonn Keogh. ICDM 2023.
I don't think it's being updated. Latest blog posts are from 2020, and Github repos haven't seen commits for the last 5-6 years. MP went a long way since then.
It does convolution of each sub sequence across the series length, and then shows the distance of the closest match.
This can detect both outliers (long distance from closest match) as well as repeated patterns (short distance).
Gas mileage, safety, noise, cost, on top of the fact they already have tons of long range single prop drones kitted out with surveillance gear. Almost all of the videos look like predator style drones flying search grids. Maybe they have some large quadcopter drones in the air but I haven’t seen a single photo or video of anything that looks like one.
The military has all sorts of cool techniques to find things without using drones, which makes me wonder if it's a private party conducting a search. Or is this in order to make it deniable for the military - not _looking_ like a military op
Yeah wouldn't they pull out all the stops if this were the case? Including running these drones during the day and all night (and making up a cover story)?
Working with OpenAI's models I've found a very good strategy is to have two passes if you can afford the extra tokens: one pass uses a heavy model and natural language with markdown sections discussing the reasoning and providing a final natural language answer (ideally labeled clearly with a markdown header). The second pass can use a cheaper and faster model to put the answer into a structured output format for consumption by the non-LLM parts of the pipeline.
You basically use JSON schema mode to draw a clean boundary around the wishy-washy language bits, using the LLM as a preprocessor to capture its own output in a useful format.
It depends how fine-tuned the model is to JSON output.
Also, you need to tell the model the schema. If you don't you will get more weird tokenization issues.
For example, if the schema expects a JSON key "foobarbaz" and the canonical BPE tokenization is ["foobar", "baz"], the token mask generated by all current constrained output libraries will let the model choose from "f", "foo", "foobar" (assuming these are all valid tokens). The model might then choose "foo", and then the constraint will force eg. "bar" and "baz" as next tokens. Now the model will see ["foo", "bar", "baz"] instead of ["foobar", "baz"] and will get confused [0]
If the model knows from the prompt "foobarbaz" is one of the schema keys, it will generally prefer "foobar" over "foo".
[0] In modern models these tokens are related, because of regularization but they are not the same.
YMMV, it's a negative effect in terms of "reasoning" but the delta isn't super significant in most cases. It really depends on the LLM and whether your prompt is likely to generate a JSON response to begin with, the more you have to coerce the LLM the less likely it is to generate sane input. With smaller models you more quickly end up at the edge of space where the LLM has meaningful predictive power and so the outputs start getting closer to random noise.
FWIW measured by me using a vibes based method, nothing rigorous just a lot of hours spent on various LLM projects. I have not used these particular tools yet but ollama was previously able to guarantee json output through what I assume is similar techniques and my partner and I worked previously on a jsonformer-like thing for oobabooga, another LLM runtime tool.
The current implementation uses llama.cpp GBNF grammars. The more recent research (Outlines, XGrammar) points to potentially speeding up the sampling process through FSTs and GPU parallelism.
If you want avoid startup cost, llguidance [0] has no compilation phase and by far the fullest JSON support [1] of any library. I did a PoC llama.cpp integration [2] though our focus is mostly server-side [3].
I can say that I was categorically wrong about the utility of things like instructor.
It’s easy to burn a lot of tokens but if the thing you’re doing merits the cost? You can be a bully with it and while its never the best, 95% as good for zero effort is a tool in one’s kit.
There was a paper going around claiming that structured outputs did hurt the quality of the output, but it turns out their experiment setup was laughably bad [0].
It looks like, so long as you're reasonable with the prompting, you tend to get better outputs when using structure.
I’ve seen one case where structured output was terrible: OCR transcription of handwritten text in a form with blanks. You want a very low temperature for transcription, but as soon as the model starts to see multiple blank sequences, it starts to hallucinate that “” is the most likely next token.
same here. I noticed that when you ask model to generate elaborate responses in natural text, and then come up with an answer, quality is orders of magnitude better, and something in line you would expect human-like reasoning.
asking LLM to directly generate JSON gives much worser results, similar to either random guess or intuition.
Given that he has no formal qualifications in neuroscience or any related field you shouldn’t trust him. The author of the book posted the thread. Anyone offering to offer “actionable insights to fortify the brain against degenerative diseases and enhance cognitive performance” should have a prefix of “Dr.” with MD or PhD after their name, and peer reviewed experiments as evidence. He is a “certified IT project manager” and is calling his personal anecdotes “experiments.”
Said Hasyim is a certified IT project manager and an author with a strong focus on productivity enhancement. He has dedicated significant time to self-experimentation and research in areas such as bio-hacks, willpower, lifestyle changes, neuroplasticity, and sleep routines, which have informed his writing and personal development strategies[1][2].
### Credentials and Career
- *IT Project Management*: Said Hasyim is certified in IT project management, which underscores his professional background in technology and management.
- *Author*: He has authored several books aimed at improving productivity and personal development. His works include the Peak Productivity book series, which covers topics such as optimizing one's body's clock, enhancing self-control, and leveraging brain plasticity for better memory and learning[2][3].
- *Entrepreneurship*: Beyond writing, Hasyim is also described as a serial tech entrepreneur[4].
### Publications
Said Hasyim's notable publications include:
- *Peak Human Clock*: A guide on optimizing daily routines around the body's natural rhythms to enhance productivity.
- *Peak Self-Control*: Focuses on building willpower to achieve significant life goals.
- *Peak Brain Plasticity*: Discusses methods to improve memory and cognitive function through neuroplasticity[2][3].
Said Hasyim's work is recognized for its practical strategies and actionable advice aimed at helping individuals maximize their productivity and unlock their potential.
To my knowledge, there isn’t a single study showing SRS as effective for language learning where it was an experimental variable.
There’s anecdotal evidence thrown about, which gives us some indication that it’s helpful. But I have doubts that it’s a good return on investment.
To avoid diving deep into long arguments about this or that, I’ll keep my advice short:
If you use an SRS, make sure that the item your test goes through the brain structures you want to get good at, eventually reading can help with listening, but because you’re not processing the language through the typical brain structures that handle it, you’re delaying getting good until you’ve exercised these “muscles”.
Also, don’t learn words in isolation. Better is to learn the words in context. Better yet is to vary the practice, maybe hookup an LLM to vary the cloze word, if that’s your cup of tea.
Use audio if possible. If you’re comfortable with the language, use a TTS.
These criticisms are all of text-only flashcards, not of spaced repetition itself. It is a common mistake to conflate the two, but it's perfectly easy to have videos, images, and audio on your flashcards in Anki.
Spaced repetition is a fundamental learning method that is attuned to how the brain works. It's not really tied to a specific method like flashcards. Rather, flashcards are merely the most logical and easy implementation of spaced repetition.
reply