Hacker News new | past | comments | ask | show | jobs | submit login
Google Picks Diane Greene to Expand Its Cloud Business (nytimes.com)
175 points by scommab on Nov 19, 2015 | hide | past | favorite | 90 comments



Google compute has taken some awesome strides in the last year. I now prefer GCE over AWS, because of pricing (per minute) and simplicity. AWS quickly can get overwhelming and overly complex.


Not just Compute, Google Network, Cloud Storage, Container Engine (Kubernetese is OpenSource, so you can run it internally too), Cloud Shell, Pub/Sub, their new Cloud Console, are awesome.


Pub/Sub is pretty meh. The Google Cloud team still don't seem to understand the need for a Kafka-like unified log service, unlike AWS (Kinesis is 2 years old this week) or even IBM Bluemix (who just launched Message Hub, which is true hosted Kafka).


Depending on the sort of logging you're looking for, there's a logging API (https://cloud.google.com/logging/docs/api/) and you can also stream to BigQuery.

(Disclaimer: I work on GCP)


This is the best, but I keep thinking that I will no longer need ELK, but still do. Can you create a dashboard system that runs off of bigquery?


One thing you gotta remember is, PubSub charges per volume, regardless of speed (in other words, scaling is free). AWS will charge you varying orders of magnitudes for varying scales, in addition to volume.


What's meh about it versus Kinesis?


Cloud Pub/Sub is really a competitor to Amazon SQS, not Kinesis. It's more helpful to think of Kafka and Kinesis as databases containing first-class, immutable streams; writing to the streams and reading from the streams are completely decoupled, unlike in a traditional pub-sub system. Jay Kreps' blog post explains it better than I can:

https://engineering.linkedin.com/distributed-systems/log-wha...


In my limited Pub/Sub experience, this seems to be how it works. You publish to a topic (an immutable stream), and then create a decoupled subscription that reads messages from the topic. Am I missing something?


I think this sentence [1] helps to explain the difference:

> When you create a subscription, the system establishes a sync point. That is, your subscriber is guaranteed to receive any message published after this point.

[1] https://cloud.google.com/pubsub/subscriber

With Kafka or Kinesis, I can write events to a stream/topic completely independently of any consumer. I can then bring as many consumers online as I want, and they can start processing from the beginning of my stream if they want. If one of my consumers has a bug in it, I can ask it to go back and start again. That's what I mean by an immutable stream in Kafka or Kinesis.


Cloud Pub/Sub engineer here. You can create as many consumers as you want. You can create them offline and bring them up and down whenever you want. Each consumer will receive a full copy of the stream, starting with its sync point (subscriber creation). Each message is delivered, and redelivered, to each consumer until that consumer acks that message.

If I understand your point correctly, the only expectation we haven't matched is the ability to "go back and start again". We hear you.


From your comment it sounds like you haven't used Kinesis or Kafka yourself - rather than take my word for it, I'd suggest your team give both of those platforms a serious try-out to really understand the capability gaps. I'd be surprised if a lot of your [prospective] customers weren't asking for these kinds of unified log capabilities in Cloud Pub/Sub.


We hear you.

Let me see if I'm understanding the criticism: when creating a consumer, the sync point of a new consumer really should start from the very beginning of the topic, at a predictable explicit start point, rather than at the current end of the topic. This makes a lot of sense, and yes, there is a disconnect between the models. We think the capabilities you are talking about are great and those use cases are important. All I can say is keep your eyes open.

We went with defaults from Google's internal use of Pub/Sub, which is older than the public release of Kinesis and Kafka. Internal use involves an approach where topics and consumers are very long-lived. Topics are high throughput, in terms of bytes published per unit time. Retaining all messages and starting consumers from the very beginning wasn't a sensible default; our focus was more centered on making sure that, once topics and consumers were set up, consumers could keep up over time.

One example use case to help illustrate this thinking is doing real-time sentiment analysis on tweets: https://www.youtube.com/watch?v=O3mfuc-syTI

In the work described by that video, they were essentially publishing tweets in real time into a Cloud Pub/Sub topic, thus making an "all tweets on Twitter in realtime" topic. This is a great example of a topic where producers and consumers are completely decoupled from each other. It doesn't necessarily make sense to retain all tweets forever by default (although there certainly are use cases for that). There are plenty of use cases where a consumer might want to say "ok, please start retaining all tweets made from here on out" rather than starting from a specific tweet.


Thanks for the detailed explanation jganetsk.

> when creating a consumer, the sync point of a new consumer really should start from the very beginning of the topic, at a predictable explicit start point, rather than at the current end of the topic

I'll talk about Kinesis because that's the technology we use more at Snowplow. When creating a Kinesis consumer, I can specify whether I want to start reading from a) TRIM_HORIZON (which is the earliest events in the stream which haven't yet been expired aka "trimmed"), b) LATEST which is the Cloud Pub/Sub capability, c) AT_SEQUENCE_NUMBER {x} which means from the event in the stream with the given offset ID or d) AFTER_SEQUENCE_NUMBER {x} which is the event immediately after c).

Kinesis streams or Kafka topics don't themselves care about the progress of any individual consumer - consumers are responsible for tracking their own position in the stream via sequence numbers / offset IDs.

> It doesn't necessarily make sense to retain all tweets forever by default (although there certainly are use cases for that)

Completely agree. I think a good point of distinction between pub/sub systems and unified log is: use pub/sub when the messages are a means-to-an-end (which is feeding one or more downstream apps); use unified log when the events are an end-in-themselves (i.e. you would still want to preserve the events even if there were no consumers live).

Anyway, I could talk about this stuff all day :-) - if you'd like to chat further, my details are in my profile!


I'm not familiar with Kafka

1. Can you direct the consumer to a point in stream? (ideally time based i.e messages from 16 Nov UTC)

2. Can old events be auto removed defined by rules?


I haven't played with kafka in a while, but basically,

1. each group id represents a point in the stream that a consumer is processing off of. You could technically have multiple processes consuming off of a single group id.

2. there was a configuration on time to keep things there as well as space if I remember correctly, but basically, there has to be. There's a pretty hard limit on what all you can store on disk.

edit: changed consumer id to group id. If you want more info, feel free to ping me about the ecosystem


Take a look at PubSub + Google Cloud Dataflow combo.


There are some great aspects to Google Compute, but I hope this change results in catch up on features needed for corporate adoption compared to AWS.

The lack of a billing API and the lack of centralized management are really painful if you're trying to adopt it across an organization.


There is a billing API: https://cloud.google.com/billing/

There have been some recent introductions of account and key management that help too.


Excellent, thanks. Hopefully we'll see tools like Cloudability support it then.

I'll check it out again, really the frustrating management aspect was the lack of org oversight over multiple projects started within your domain.


Pricing is fine. Tech support is a joke. I wish I could pay more for Google to actually offer good support. We tried GCE for a while but it just didn't cut it in terms of mean response time to questions. AWS really leads on support.


We do have paid support tiers (https://cloud.google.com/support/) with explicit time-to-response targets. Which were you using?

Disclaimer: I work on GCE.


Well tbh we got answers but they usually were; nope not offered, cannot be fixed. 3 weeks later I met a Google engineer working with GCE at a party who fixed it for us the day after hearing about our issue, which means it probably was able to be fixed from the first time.


I will admit that the party support plan is a bit pricey.


Sounds like we should throw more parties ;).


I think this is a top priority customer ask; I'm running it up to our new brass.


I haven't used GCE but I've had one of the worst support experiences ever with Google Cloud Billing Support while using Google App Engine.

I had a very simple question about billing (why was my bill higher than it seemed like it should be). Each reply would take a week and they'd often consist of copy and paste messages asking me to enter information I had already supplied or requests that I take screenshots of my console (all information they should readily have available). Then right at the end they swapped out who I was talking to for someone else and asked me to look up more information they already had and just ignored my question in the last email which would mean another week before I got an answer.

I had, luckily, experimented with the configuration and figured out what was wrong. The default instance class in reality is one higher (F2) than the documentation says it is ("If you do not specify a class, F1 is assigned by default."). Nowhere on the Console does it list what instance class is being used (which would have made the problem obvious) so there was really no way of knowing this without just guessing what the problem was. They never did answer my question "What is the default instance class?" (instead just abruptly ended the support ticket after proposed my theory about what was wrong).

Then I started getting emails about a billing account being past due. It was an old billing account from before I moved to High Replication (I have no idea how I ended up with two billing accounts...it was during the dark time when the console was even worse than it is now). That billing account was assigned to no projects and had no oustanding balance. I jumped in and just deleted the unused billing account. Then a few days later they sent a scary email saying that the billing account had been terminated (even though I had deleted it) which made my scramble to make sure they didn't close my in-use billing account out of nowhere (they hadn't thankfully).

None of this has left me with any confidence in Google's Cloud offerings.

I plan to migrate off GAE as soon as I can rewrite the app (luckily it's not very big).


Can you say more about your experience with AWS support? My experience has been consistently quite bad. I've had to resort to AWS technical support perhaps a dozen times over the last few years. They're always slow to respond (even if you have an SLA promising otherwise), and it always takes half a dozen go-rounds to get to a resolution. They will stick with you until the issue is resolved, but they don't add much value along the way. It's a slow process of "did you do obvious thing X" (even if you've already provided facts indicating that couldn't be the problem), "please provide six different pieces of information" (that don't seem to bear on the actual question), and "I've never seen this before, let me go research it".


For the lowest support plan (Silver), high priority issues were addressed pretty quickly. Low priority ones are a bit slow, but hey, they're low priority. Plus the Silver support plan is very cheap.


Has anyone tried Cloud BigTable? Performance numbers are compelling but I'm not always sure where it fits in with the rest of the GCP storage options.


Bigtable is best thought of as an "event database". High reads, high writes, single index, accessible through the Hbase API. Cassandra and Hbase are similar technologies that are inspired by the original Bigtable paper.

One big benefit of Bigtable is its scalability. To scale up, you turn the 'scale' knob. By contrast, Cassandra and Hbase are headaches to scale (Apple has acquired Cassandra companies to aid in operation and scale).

Here's a couple of guys from Sungard, who scaled to about 3,000,000 writes per second with a couple weekends' worth of effort (something only few beyond the likes of Facebook, Netflix, and Apple can achieve) https://cloud.google.com/bigtable/pdf/SunGardCATCaseStudy.pd...


Hey. I'm one of the "guys from SunGard", although I'm no longer there. The longer version is this: https://cloud.google.com/bigtable/pdf/ConsolidatedAuditTrail... . A lot of it is related to the use case, but yeah, Bigtable handled pretty much whatever we wanted to throw at it. No other cloud provider can offer this sort of scale and performance right now without a ton of manual management or significant compromises, something that seems to have yet to sink in (although few companies need the scale we went up to).

It did take a lot more work than "a couple weekends" though :).


I don't understand why the reporters hold Google in such high regard, particularly in the cloud business. Here is a statement from the article:

> "At the same time, analysts say, the company’s offerings in cloud development services — computing, storage, data analytics and others — are already comparable to Amazon’s."

AWS is far far ahead of GC and it is in no way comparable. Plus the ecosystem around AWS has evolved and is much more stable. There are a lot of articles explaining "how to fix X or how to do Y" with AWS than with GC.

I also don't think Google will ever have the level of customer obsession that Amazon has. Your account got hacked? No worries, AWS will waive the fee, but I honestly don't think Google will ever do that.

Google is a technology company and might outrun Amazon in terms of technical superiority, but I don't think they can simply outrun Amazon in cloud business.


In the ones they mentioned, GCE vs EC2, GCS vs S3, and BigQuery vs Redshift, they seem pretty comparable to me.


GCE > EC2, GCS > S3, Big query > Red Shift. Compute engine costs less and gives you more flops per buck compared to EC2. GCS has unified storage interface to all the storage classes. This makes developer life easy. Amazon has two different services (S3 and Glacier). With Redshift, you create a cluster of fixed size and pay for it whether you use it or not. With Big Query, you just pay for what you use.


I like to think that BigQuery is far far ahead of Redshift in terms of performance + scale + manageability + cost (but i may be biased).


I don't have direct experience with Redshift, but I have heard some stuff. BigQuery has really variable performance for us though, not sure how it compares to Redshift.


But what about ECS? Lambda? Role Based Authentication? EBS snapshots?


GKE is miles ahead of ECS, and being based on Kubernetes is huge. We don't really have an equivalent to Lambda (yet?) but classic App Engine isn't actually too far off 'technology-wise' (pay per use, containerized, instant start).

Our lack of IAM is beyond painful. We're sorry. We're fixing it.

PD has had snapshots since Day 1; they're differential, fast and we even encourage people to use them for super-fast "rsync"!


GKE is why I am personally switching from AWS to GCP. I'm running Kubernetes on AWS at my current gig, but I'd rather not have to build and maintain the cluster myself if I don't have to.


Also I've found the network load balancer to be amazingly better than ELBs.


Haven't had a chance to try GKE, but I'll give it a whirl for my upcoming project.

With Lambda, its the whole ecosystem around it which makes it better than App Engine. A file changes in S3 and you want to do something? Lambda, in a few simple lines of code.


Google Container Engine is far ahead of AWS ECS. Google does not have a counterpart to Lambda and IAM. Block storage is far better with Google Cloud (Things like the ability to mount a drive read-only on multiple instances simultaneously make sharing data breeze. We ran into some weird race conditions related to inode pointers being modified with AWS EBS sometimes). And yes, it supports snapshots.


Reading this I just want them all to drop the incomprehensible acronyms.


Even the terms wouldn't probably make sense unless you've used them. Elastic Block Storage - I surely didn't get what it was supposed to do when I read it loud the first time.

Elastic Container Service - "Why the heck is it Elastic!?" was my reaction the first time I read the term.


There is no Elastic Container Service. ECS stands for EC2 Container Service.

I realise this isn't any better :-)


Market share difference must be something like 10x


Diane was 43 when she founded VMware, and 56(?) when she started Bebop. That's encouraging. Does anybody have details about her background in the years 1978 to 1998?


http://www.theguardian.com/technology/2008/jan/11/computing

> What is known about her life is that she grew up in Annapolis, on the coast of Maryland, in a house on the shores of the Chesapeake Bay. Her father was an engineer and her mother a teacher. It was on the north-eastern seaboard that she developed a passion for water sports, especially sailing and later windsurfing. She helped to organise the first windsurfing world championship in 1974 and two years later won the women's national double-handed dinghy championship.

> Her love of the sea influenced her choice of college education after she studied mechanical engineering at the University of Vermont. She moved to MIT to study naval architecture before a brief spell working for an oil consultancy based in San Francisco. She left that job relatively quickly to go to Hawaii to design windsurfing equipment, but returned to the US a few years later to study computer science at Berkeley. She worked for a succession of Silicon Valley stalwarts: Sybase, Silicon Graphics and Tandem. But her first big break came with the founding of her own media streaming business, VXtreme, in the early days of the dotcom boom. It was sold for a rumoured $75m in 1997.


She needs to write a book. Interesting life...


She gave a talk at Startup School 2013: https://www.youtube.com/watch?v=zSEeFxq2X_c


She did indeed, I was there and she was not only intelligent and knowledgeable but probably the most endearing speaker on the day.


I'm glad to hear that "the cloud" is becoming less likely to mean "the Amazon cloud". Monocultures are dangerous, and I doubt that any monoculture is quite as dangerous -- or at least as unexpected -- as a monoculture for providing distribution and resilience.


Relational Databases is one key area where Google Cloud can improve. Adding support for the most used databases (Maria, Postgres and may be Google own version of SQL-like AWS Aurora to support large data stores) and easing the migration path for companies to move their databases into Cloud can ease a lot of customer pain. This can accelerate the migration to Cloud much faster. It is very difficult for companies to move to Cloud and adopt NoSQL stores simultaneously.



I know. Compared to Amazon's offering of MySQL, MariaDb, Aurora, SQL Server and Oracle, Google MySQL offering looks not so good. It's hard to persuade everyone to migrate to MySQL. Everyone has their own reason to use different databases.


Oh yeah, in terms of variety, totally true. I tend to just spin up a compute instance if I need any one of those though.


Their managed SQL service doesn't support Postgresql and POSTGIS, its the only thing stopping me from move over to GCN.


Trust me, it won't be the only thing.

It's just the first roadblock.


Oh? I have found the GCN to be quite impressive. Their prices are very cheap and i've found their international offerings to be next to none. What issues have you had?


An easy area of improvement is documentation, samples etc. They are seriously lagging behind AWS and Azure. Becomes hard to choose Google because the cost of learning is higher.


I dunno, I've found aws documentation to be atrocious. For example, I've had a very hard time figuring out how to do things like reattaching ebs volumes to ec2 spot instances. Maybe it's because I don't use aws much, but I get frustrated and have a hard time understanding how simple tasks need to be so complex.

I wonder how effective the Microsoft style API lockin strategy will be for aws. My personal guess is very effective.


I second this. I like digital ocean a lot for the simplicity and the community. Obviously, it takes a much more bare bones approach, but being greeted with 50 icons on your dashboard, each of them a proprietary product and cost/second pricing services makes it super hard to figure out what you even are paying, much less receiving.


Yeah, AWS docs are really not very good in my limited experience. Google's docs have some pretty good parts, but they can be hard to find. Authentication is still kind of confusing on GCP.


Yes, it isn't perfect but they have more samples for every language, tutorials etc. They also have quite a few training videos etc. online.


Moving to GCE would be great, if they offered to eat the cost of moving all my data off S3 ;)


How much data? 10 TB will "only" cost $900. Alternatively there's the "ship us hard drives" option. Failing that, if you're at a crazy scale, we did announce a switch-and-save program as part of Nearline (https://cloud.google.com/storage-nearline/).

Depending on how much you're storing, the one time hit to move may make sense given our ~25% lower cost per byte (and as you mention way better GCE pricing).


How'd you arrive at that number? To my understanding, you get hit up twice -- paying for data leaving S3 and data entering GCS. The other thing to keep in mind is that if lots of your customers are also using AWS, their S3 traffic from you is free.


No, you only pay for egress (outbound traffic) not ingress. You pay for operations (GETS and PUTS) on each side but the total costs are going to be dominated by the egress.

In addition to free egress within GCP regions, we also recently announced reduced egress pricing for major CDN partners: https://cloud.google.com/interconnect/cdn-interconnect .

So how much data do you store and serve? ;)


In 5 years will AWS still have 25% operating margins?


Jeff Bezos' quote "Your margin is my opportunity" might come back to haunt their AWS Business.


I suspect yes. AWS seems to be pushing hard towards more managed application services (things like lambda) built on top of their core AWS services like EC2 and S3, and they can maintain their margin on those services even if competition drives down the prices on EC2.


Those services are incredibly proprietary to AWS. I really question why companies would want to tie themselves to a single cloud provider.


Because technology evolves too quickly to do cross-platform, cross-cloud solution and the demand for getting work done now is more demanding than ever. Tried to do it for two cloud platforms + on-premise, you will literally want to kill yourself for supporting your code three-way. Do it right once for one platform, and then next step. AWS isn't going away any time soon. But seriously, you don't just click buttons all day to spin up your infrastructure. Lock in with GCP is like lock in with AWS or herkuo, or App Engine, their APIs belongs to them, you built around kubernetes, and you lock yourself into their BigTable Query. Whatever your SRE/Infra/DevOps team are building now for your company's infrastructure automation is totally locked down for one kind of solution.


Why not Kubernetes?

It run's on all major cloud platforms, VM environments and bare metal. Supported by every major player except Amazon (for obvious reasons, why would they want to support something that makes it easy to migrate away from them?)


Kubernetes is supported on AWS by various companies in the industry, including my own, Kismatic (the enterprise Kubernetes company). We have been pushing for the team at ECS to adopt K8S as a standard framework on which to run containerized workloads. As Kubernetes gets adopted as a container cluster management standard for orchestrating and running microservice-oriented apps, all the cloud providers will need to support it based on customer demand, in the same way they are all supporting Docker due to the demand.


You can, but is Google going to make their environment so compatible with other cloud platform? Actually, how many major players do we have in this space, capable of delivering a true cloud environment?

AWS, Microsoft, and Google.

When AWS first started, it was EC2 and S3, so the model was about VM without worrying the bare metal. But as the platform continues to grow to challenge its competitors, the platform will begin to add more services which are only available and are proprietary to the its own platform.


> is Google going to make their environment so compatible with other cloud platform?

Yes. Google's strategy with Kubernetes is to commoditize the cloud - making them all functionally interchangeable. Write to Kubernetes, and your app runs on AWS, GCE, Azure, etc...

They are betting that they can deliver raw CPU cycles, network bandwidth, lower latency etc. - better/faster/cheaper than their competitors.


I am not familiar with GCE so I can be absolutely wrong, but my way of thinking is that any time anyone says something can run the open source version anywhere, true, but when it comes to offering a paid service, making some technology native to the platform, means there can be difference such as customized APIs or customized features which may never get backport into the community/open source version.


That is very clearly not how they are working with Kubernetes. They are doing it entirely in the open and even recently hired one of their huge community contributors (shoutout to Kelsey Hightower) to the team. If they were doing an "open core" version, my money would be on Kelsey speaking out against it. Besides, given their goal of making the cloud a commodity, they can't do what you say.


Do you write everything cross operating system or cross database engine or cross language runtime or cross chipset architecture?


yeah totally


Did Diane's bebop even launch? I can't find their website. It is kind of frustrating that everybody involved (employees, investors) in bebop are getting a payday, without really putting in much work, or verifying their ideas. Just leaves a sour taste in my mouth (old boys network) as a two-time failed entrepreneur.


I understand how you feel but you should realize that this point of view is wrong-headed. Her, or anyone else's, success has zero impact on you so there is no justification to feel bad about their achievements being "undeserved".

By the way, it is often really hard to see just how much other people put into their projects and so from the outside their success does not seem very justified (hence the phenomena of ten year overnight success). I find myself making this mistake as well, but it is useful to remind yourself that it is the wrong way to think.


I have no personal knowledge of Diane, but I do know people who have worked closely with her and they have very good things to say. They are people who do not often have good things to say. So the bit would stay un-flipped for me.


You could look at this as a very high-end acquihire.


Google thinks bebop is better than whatever they could do internally (including finding people to hire up) to build a comparable product for that budget.


It is who you know.


Seriously? You feel it is unfair that Google would talent acquire a company with a founder that revolutionized virtualization and dismiss this with an implication that she was acquired because she knew "the right people" (and you apparently don't) vs. being one of the most impressive people Google could hire to run their cloud business?


And besides this, she's not even an unknown quantity herself. She's been on Google's board for 3 years! :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: