Hacker News new | past | comments | ask | show | jobs | submit | spulec's comments login

Yes, we've been improving our scraping technology for the past 8 years as we've worked on YipitData (the #1 provider of web data to wall street).

Python is the only supported language right now. Scrapy is an awesome project, but we have a very different approach. We strive to be Flask instead of Django.

If you want to, you can use ReadyPipe entirely in the browser through jupyter notebooks instead of needing to setup a local environment. This is especially helpful with more complicated systems using Selenium and Puppeteer. We discuss a lot more of the features that differentiate us on our homepage: https://readypipe.com/ and the docs https://docs.readypipe.io/

Feel free to reach out (email in profile) with additional questions.


YipitData, NYC, Fulltime. VISAs welcome.

Data Systems Engineer: http://yipitdata.com/jobs/#job-engineer

YipitData develops clever ways to learn about businesses from online data. We specialize in acquiring this difficult to obtain but extremely valuable information through modern, large-scale technology.

We are profitable, VC backed, and our clients include the largest funds on Wall Street. Our goal is to predict quarterly financial performance and identify long-term inflection points for every public internet company. There are over 100 such companies today that create meaningful data, and 15 new ones go public every year.

We are building the premier destination for understanding data about important companies. We think this is the future (http://avc.com/2012/08/it-is-hard-to-hide-from-the-web/) of investment research.

YipitData:

- Python + Redis + MySQL

- Continous integration with Travis CI and Buildbot

- Continuous deployment

- Cloud hosting in AWS with Cloudformation and Chef

- "Automate the boring stuff"

You:

- Conscientious about assumptions baked into a system

- Critical eye for small things that "seem off"

- Disdain for inefficient processes (you often find yourself complaining about elevator algorithms)

- Curiosity to dig deep into issues that you don't fully understand

Your first day will consist of:

- Fresh bagels (is there any other way to start a new job in NYC?)

- Unwrapping your new MacBook

- Your first commit (adding yourself to our about page)

- Your first production rollout (this is automatic since we practice continuous deployment)

- Introduction meetings with team members from across the company

- Learn about our different products by going through the codebase with another engineer

- Welcome drinks!

Your first week will involve training for the following:

- Overview of our infrastructure

- Introduction to Finance

- Web Extraction

- Introduction to Sell Side Research

- Introduction to Outsourcing

- Overview of our internal libraries that allow us to quickly develop new products

Within three months of joining, you will:

- Investigate and evaluate the potential for new data products

- Help write the technical spec for a new data product

- Take primary ownership for the development and maintenance of that product

- Configure the servers in the production environment for this product

- Work with data analysts to refine the resulting data from this product

- Start to contribute back to the shared libraries we use across products

- Pair program with data analysts on smaller projects

- Help in recruiting more data engineers

We offer highly competitive salary, equity, and excellent benefits. YipitData promotes strong engineering and company cultures; candidates should be excited about being a part of a fast growing start-up.

Send me an email if you are interested: steve@yipit.com


Agreed. We've had success using a very base ubuntu ami and then having the user data install chef and do a chef run.

This results in slower startups for new machines, but the flexibility seems to be worth it. Eventually we'll get to a point where the system will just build a new ami anytime a cookbook change is committed.


I'm curious about 2012-08 and 2012-11. Pet feature, hackathon, large release with hotfixes?


Refactorings to help make our codebase less of a giant hairball and more of a DAG. Big project that a lot of people helped out on and that I cared about a lot. Also it was relatively easy for me to pitch in cause most of the work was tedious but uncomplicated.


Hey Timothy, thanks for the response. You are correct. I'll add a warning to the library and work on a solution. Unfortunately, I think it will need to involve ctypes.


Have you tried patching the now() method onto datetime.datetime instead of replacing the whole class? If that doesn't work, you could replace datetime first:

    import freezegun; freezegun.monkey_patch()
    from datetime import datetime
After that, freeze_time would just set a flag on your datetime class.


Sadly, it's not possible. datetime is a C class, so it is immutable.


Correct. I've gone with his latter solution for now and added a warning about import order. I'm not very happy with this solution through and will be spending some time with ctypes in the next few days to come up with something better.


Cool. We had looked at doing it that way, but decided that we wanted to have the flexibility of accessing the entire repo so that we can run things like "./manage.py validate" and quick unit tests.

A leftover pdb.set_trace() that snuck into production is the exact reason we first added this :)


So wouldn't:

git checkout-index --prefix=../temp/ --all

work?


Absolutely. Stashing seemed like the cleaner choice to me, but either would work.


If you're still interested in NYC startups, check out yipit.com/jobs and email me(steve@yipit.com). We're hiring and have sponsored visas before.



Yeah I really don't get it? Google is valuable because (the theory is) you buy advertising and you get more business and you make more money. So the more money you spend on google the more money you make. Ok I get it.

Facebook is valuable because it increases the (perceived) size of your social circle, and maybe you make more friends, maybe you get laid more. Ok I get it.

Dropbox you get synced files? Ok that's useful obviously but so what? It seems like at best a 100m business to me, not a 4b one, what's the angle here?


you must have never had a hard-drive that crashed, or a laptop that got stolen, or had to share a 2gig project with a friend.

2 things people pay for:

   - Convenience
   - Peace of Mind
I don't pay for dropbox but I'd just like to say that "peace of mind" is pretty valuable.

My friends use it to share pictures and music/movies. We don't have to think about it anymore. That's something.

Lastly, I'm a professional freaking software engineer and I don't know how to get files onto my droid x. Then I remembered dropbox has an app. Well what do you know; put file in folder; file appears on phone..! =)


New York, NY - Yipit

Just off raising $6 million, we are looking for the 12th member of the team(when we posted this last month, we were looking for our 8th).

Come join us on the ground floor of one of the best startups in New York. Right now, great companies like 10Gen, FourSquare, Hunch, SeatGeek, and YCharts are all here growing together. Silicon Alley is going through a renaissance and you can be part of it.

-UI Lead Architect: Our interface sits on top of over 350 daily deal services and is used by hundreds of thousands of people. We need you to own that interface.

-HTML5/CSS3/jQuery Developer: All user-facing activity relies on these technologies. We will commit the full resources of the team to supporting you.

-Python(Django) Developers: We work with the latest technology including: Amazon Web Services, RabbitMQ, Gunicorn, Nginx, and Git. This should excite you. Go to http://yipit.com/about/jobs/ to apply. Email steve@yipit.com with any questions.


New York, NY - Yipit

Just off raising $6 million, we are looking for our 8th member of the team. Come join us on the ground floor of one of the best startups in New York. Right now, great companies like 10Gen, FourSquare, Hunch, SeatGeek, and YCharts are all here growing together. Silicon Alley is going through a renaissance and you can be part of it.

-UI Lead Architect: Our interface sits on top of over 350 daily deal services and is used by hundreds of thousands of people. We need you to own that interface.

-HTML5/CSS3/jQuery Developer: All user-facing activity relies on these technologies. We will commit the full resources of the team to supporting you.

-Python(Django) Developers: We work with the latest technology including: Amazon Web Services, RabbitMQ, Gunicorn, Nginx, and Git. This should excite you.

Go to http://yipit.com/about/jobs/ to apply. Email steve@yipit.com with any questions.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: