Hacker News new | past | comments | ask | show | jobs | submit | rfurmani's comments login

Completely agree on both counts! I loved those two games and felt Conquests of the Longbow didn't get the recognition it deserves.

On the second point, when I read his book (https://kensbook.com/) I was disappointed to not hear about the magic of the games themselves and the creative process behind them. It became clear that his primary goal was to grow a business, he thought being a game distributor was more exciting, but then was disrupted by Steam, shareware, and online distribution.


I'm building such tools at https://sugaku.net, right now there's chatting with a paper and browsing similar papers. Generally arXiv and other repositories want you to link to them and not embed their papers, which makes it hard to build inline reading tools, but it's on my roadmap to support that for uploaded papers. Would love to hear if you have some feature requests there


One feature could be that it automatically fetches the papers that it refers to and also feeds them through the llm. And maybe apply that recursively. This could give the AI a better overview of the related literature.


After I opened up https://sugaku.net to be usable without login, it was astounding how quickly the crawlers started. I'd like the site to be accessible to all, but I've had to restrict most of the dynamic features to logged in users, restrict robots.txt, use cloudflare to block AI crawlers and bad bots, and I'm still getting ~1M automated requests per day (compared to ~1K organic), so I think I'll need to restrict the site to logged in users soon.


Has someone made honeypot for AI yet?

Take all regular papers and change their words or keywords to something outrageous and watch it feed it to users.


This kinda fits, though it's on a personal blog level:

https://www.brainonfire.net/blog/2024/09/19/poisoning-ai-scr...


If there was a non-profit dedicated do this, I would donate


One thing that worked well for me was layering obstacles

It really sucks that this is the way things are, but what I did was

10 requests for pages in a minute, you get captchad (with a little apology and the option to bypass it by logging in). asset loads don’t count

After a captcha pass, 100 requests in an hour gets you auth walled

It’s really shitty but my industry is used to content scraping.

This allows legit users to get what they need. Although my users maybe don’t need prolonged access ahem.


What happens if you use the proper rate limiting status of 429? It includes a next retry time [1]. I'm curious what (probably small) fraction would respect it.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...


Probably makes sense for a b2b app where you publish status codes as part of the api

Bad actors don’t care and annoying actors would make fun of you for it on twitter


I've wanted to but wasn't sure how to keep track of individuals. What works for you? IP Addresses, cookies, something else?


I use IP addy. Users behind cgnat are already used to getting captcha the first time around

There’s some stuff you can do, like creating risk scores (if a user changes ip and uses the same captcha token, increase score). Many vendors do that, as does my captcha provider.


> This allows legit users to get what they need.

Of course they could have just used the site directly.


If bots and scrapers respected the robots and tos, we wouldn’t be here

It sucks!


Or just buy cloudflare :)


What is your website?


Very cool! This is also one of my beliefs in building tools for research, that if you can solve the problem of predicting and ranking the top references for a given idea, then you've learned to understand a lot about problem solving and decomposing problems into their ingredients. I've been pleasantly surprised by how well LLMs can rank relevance, compared to supervised training of a relevancy score. I'll read the linked paper (shameless plug, here it is on my research tools site: https://sugaku.net/oa/W4401043313/)


I'm serving AI models on Lambda Labs and after some trial and error I found having a single vllm server along with caddy, behind cloudflare dns, to work really well and really easy to set up

vllm serve ${MODEL_REPO} --dtype auto --api-key $HF_TOKEN --guided-decoding-backend outlines --disable-fastapi-docs &

sudo caddy reverse-proxy --from ${SUBDOMAIN}.sugaku.net --to localhost:8000 &


It's really best to avoid running web servers as root. It's easy to forward the port 80 with iptables, change the kernel knob to let unprivileged users use port 80 and above, or set the network capability on the binary.

https://stackoverflow.com/questions/413807/


You can use Cloudflare Tunnel, which is even better and simple than having an extra service.


As a former mathematician, I found research to be a very winding path. While that can be fun, I felt there's a lot of opportunity to train LLMs and ML models on the corpus of math papers, to try to make research more deliberate and less reliant on talking to the right person at the right time.

This is very much a work in progress but so far you can:

* Browse through similar papers

* Get recommendations for new papers and collaborators

* Chat with papers and ask questions to all the major reasoning models

* Have it come up with future paper ideas (along with references) giving a potential title or collaborators.

My focus very much is on the exploratory stages since that's where a lot of the time is spent, but I intend to integrate more tools for problem solving, writing, and computation.


I think you should have some "about us" section on your webpage if you want people to give their email addresses. I already get loads of spam that knows my email address belongs to someone with a PhD (though they are often shaky on the details). I looked at your site and there's no information about who is doing it and why.


That's fair, though there's not much to say since I'm building it out myself as a benefit corporation. I also have strict opt-out for any communications and a proper privacy policy.

I've also tried to keep as much as I can accessible without login, but I want to protect some of the more expensive features from being spammed.

Without signup up you can:

* explore works (but not chat with them) https://sugaku.net/oa/W4206400500/

* explore authors https://sugaku.net/oa/A5059543195/

* see and share AI answers (eg https://sugaku.net/qna/4e59662a-a938-404e-8c0b-b9dc79e37c29/ and https://sugaku.net/qna/517930ff-42ad-47c5-9d9c-e807d06a8453/)

* prompt for new paper ideas https://sugaku.net/current/papergen/

* see and share these ideas https://sugaku.net/current/papergen/idea/719aed36-8dcd-4fd1-...


No matter how original you think you are, it's almost always already been done. You think you found a new theorem and then you check some old pdf from 20+ years ago and it's already been done.

If you can pull it off, and the result is actually novel and not trivial, you can get a PhD. that is how hard it is.


The flipside of that is seeing hints of a result that would be really helpful. I still remember how excited I was to stumble on a book from 1931 (The Taylor Series by Dienes) since it had the only english-language proofs of some results by Szego and Polya that I felt could unblock my research. My hope is that this discovery problem can be largely solved.

This is also why I'm not as excited by the focus on pure reasoning and olympiad problem solving in the math and AI space. It's like the early career phase of trying to solve Collatz and Riemann but just repeating work from decades ago.


Wow, Gemini 2 has a 9 step process for the first question https://sugaku.net/qna/1d922e55-fd28-4c70-b3e9-d70c3a05ce1f/

> Here's how you can get exactly 6 liters of water using a 12-liter jug and a 6-liter jug:

1. Fill the 6-liter jug completely.

2. Pour the water from the 6-liter jug into the 12-liter jug. You now have 6 liters in the 12-liter jug.

3. Fill the 6-liter jug completely again.

4. Pour water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full. Since the 12-liter jug already had 6 liters, you will pour 6 liters. This leaves you with 0 liters in the 6 liter jug.

5. Now empty the 12 liter jug.

6. Pour the water from the 6-liter jug into the 12-liter jug.

7. Fill the 6-liter jug completely again.

8. Pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full. Since the 12-liter jug already had 0 liters, you will pour 6 liters. This leaves you with 0 liters in the 6 liter jug.

9. There are 6 liters left in the 6 liter jug.


It really loves the sound of its own voice!


As for Rs in strawberry, trying a bunch of models side by side only Sky T-1, Gemini 2 Flash got it wrong! https://sugaku.net/qna/792ac8cc-9a41-4adc-a98f-c5b2e8d89f9b/

Simple questions like 1+1 can also be fun since R1 goes overboard (as do some other models when you include a system prompt asking it to think) https://sugaku.net/qna/a1b970c0-de9f-4e62-9e03-f62c5280a311/

And if that fails you can ask for the zeros of the ζ function! https://sugaku.net/qna/c64d6db9-5547-4213-acb2-53d10ed95227/


Absolutely agree. There's some interesting articles in a recent [AMS Bulletin](https://www.ams.org/journals/bull/2024-61-02/home.html?activ...) giving perspectives on this question: what does it do to math if there's a strong theorem prover out there, in what ways can AI help mathematicians, what is math exactly?

I find that a lot of AI+Math work is focused on the end game where you have a clear problem to solve, rather than the early exploratory work where most of the time is spent. The challenge is in making the right connections and analogies, discovering hidden useful results, asking the right questions, translating between fields.

I'm getting ready to launch [Sugaku](https://sugaku.net), where I'm trying to build tools for the above, based on processing the published math literature and training models on it. The kind of search of MR that you mentioned doing is exactly what a computer should do instead. I can create an account for you and would love some feedback.


You're definitely not the only one! We [1] have been in-person in San Francisco predominantly, at first due to visa sponsorship rules, but also due to the energy you get at a fast moving early stage startup, allowing us to scale super fast. We've had to constantly evaluate whether we are making the right decision, especially as we say no to really talented people who are remote, but some of our early engineers put their foot down and let us know that they chose us /because/ we were in person. And time and again we've seen that there's plenty of people like you who want to be around people and feel connected to everything happening, especially those who are looking at series A/B startups. Frankly, it also just seems that you need to be a lot more rigid, focusing on specs, structure, documentation, when remote-first and that is not as fun when hacking. We of course are flexible about work from home, or traveling to see family or be in other places, but last time I was with family I definitely noticed that working over zoom and slack was way more exhausting.

[1] https://parafin.com/ https://www.linkedin.com/company/buildparafin


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: