Wanted to share my weekend project, Giggle. I started using Google's Programmable Search Engine recently and needed an easy way to use it. Let me know what you think!
I made a similar thing for myself, except that it parses Google' s HTML response and therefore has no limits, except for when Captcha is send, which the server detects and redirects the client to another instance.
My code is a total messy hack, because I'm not really that good of a programmer yet.
I really didn't want to write this as an advertisement, lol (: I made it because my slow laptop can't handle Google's homepage (:
YaCy is also worth checking out, it's a peer to peer crawling network and search engine. Though it's really CPU and RAM hungry, so in fear of that and knowing that I do not know Java, I wrote my script in C with libxml2 for requesting HTTP/1.0 and parsing and libmicrohttpd for serving results.
The code is on a git repo linked at the bottom of the page.
The interface is in Slovene, but code and README are in English. The interface is using preprocessor strings, defined in src/i18n.h, which can be edited.
I'm requesting the very lightweight WAP website for the Nokia 6020 mobile phone. When I used this phone, I noticed Google sent a more lightweight page, but that was sometimes still too much for the phone to handle.
Later on, I added h=yes&l=20 parameters for using this specific phone, so that links are rewritten to HTTP from HTTPS (which is unsupported on the phone) and limited to 20 results, because the phone can't handle that big of a website.
Soon after I just went back to using a smartphone (:
Pretty high given this is exactly how I would scrape and resell Google results (eg. for a SEO service) if I wanted to evade their automated abuse systems (stolen Google account credentials).
This is using a Google API the way it's meant to be used, and the API offers up to 100 free requests a day after which you need to pay.
The odds of Google deciding to ban you for using their public API are presumably similar to the odds of them deciding to ban you for using their other public services. So maybe high enough that you would t want to build a business plan around it, but I'm not sure why you're specifically worried that a user would get banned from the whole of Google for using a Google API.
That's not the API-using part of the code, it's the part that pretends to be a browser, logs in, opens the "custom search engines" control panel and pulls the name+ID data. It's definitely not a public API.
"Custom Search JSON API provides 100 search queries per day for free. If you need more, you may sign up for billing in the API Console. Additional requests cost $5 per 1000 queries, up to 10k queries per day.
If you need more than 10k queries per day and your Programmable Search Engine searches 10 sites or fewer, you may be interested in the Custom Search Site Restricted JSON API, which does not have a daily query limit."
I counted my google searches for yesterday from my browser history. 100/day may just be enough for me if I make sure I don't run the same search twice, because there are dups in the list.
Thanks for posting this. I should take this opportunity to say Giggle currently uses the non-site-restricted endpoint for queries but could easily be updated to use the site restricted one. Cheers!
How does this one compare with Whoogle?
As an example, on Whoogle I don't need to configure any account. Are the results with Giggle better in terms of speed and accuracy somehow?
Oh nice, I hadn't heard of Whoogle. Giggle requires a Google account to log into the Programmable Search Engine service and get a list of the account's Custom Search Engines. This is necessary because Google doesn't currently offer an API to do it.
As far as speed, I can't really say because I haven't used Whoogle before. From what I've seen so far, result responses are pretty snappy. I will say the results could be considered more accurate since you can filter in/out the sites you actually care about and curate site lists any way you want. They both use Google search so I'm sure the quality of the results is the same.
That could be another way to do it for sure. I went this direction for my own needs really, I didn't want to have to spend time manually managing the ID strings/engine names and Puppeteer login seemed like a good fit. Since I run it locally, it feels pretty safe for me at the moment. I definitely get the concern though. For what it's worth, the Puppeteer session is cached so you could technically remove your credentials after the initial login and re-enter them any time it expires.
I really like how Google's extreme Javascripty login page can be "bypassed" with a browser. Really nice!
> A Google account without MFA - You'll need to inject its username and password as environment variables. If you're curious/concerned about how they're used, check out this file. Basically, Giggle uses Puppeteer to log in to Google in order to retrieve your custom search engine IDs. If you do not have an account without MFA, just go ahead and make a new one - MAKE SURE NOT TO TURN ON MFA!
Creating a new account without Mother Fucker Authentication does not help to protect your "main" google account. Google regularly bans all accounts associated with bad-acting-accounts, and yours will surely be "associated".
I think that the idea is terrific, but the implementation is dangerous.
If you rather use serpapi.com as a backend, reach out to our support and we can add Ludicrous Speed to our free plan offerings for that usage. It’s as close to perfect response times it can be: https://serpapi.com/status
My code is a total messy hack, because I'm not really that good of a programmer yet.
Example instance: http://1426059603:7327/?q=search+engine+site%3Awikipedia.org
I really didn't want to write this as an advertisement, lol (: I made it because my slow laptop can't handle Google's homepage (:
YaCy is also worth checking out, it's a peer to peer crawling network and search engine. Though it's really CPU and RAM hungry, so in fear of that and knowing that I do not know Java, I wrote my script in C with libxml2 for requesting HTTP/1.0 and parsing and libmicrohttpd for serving results.