TickScript is indeed the killer feature, the problem is debugging is difficult because of the language itself is esoteric. Simply adding the ability to put printf statements to the console would be a game changer. The ability to write tests outside of chronograf would be a game changer. The ability to mock inputs would be a game changer. The problem is you've re-implemented the wheel and now you have to build the debugging ecosystem that is a well beaten path for other languages. Love your guy's product. It's also hard to give you guys money, because there's no offering you make that quite fits one of our needs.
Thanks for the TICKscript feedback. That’s all stuff that we want to have addressed in Flux. This alpha release doesn’t have that yet but printf, a test runner built into the influx CLI, and test inputs and outputs are all on the near term roadmap
I looked at flux, and it seems pretty compatible with TICK script, although I don't have a clear understanding on how to make it easy to write alerts/queries and debug them yet.
I would be curious to try out REPL:
> We plan to provide a flux command line program that exposes a REPL and talks to various data sources.
Especially interested to see how easy it would be to edit and troubleshoot a multi-line transformation query like this one in it:
cpu = data
// only get the last 5m of data
|> range(start: -5m)
// only get the "usage_user" data from the _measurement "cpu"
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_user")
Why create a new language vs using javascript or lua?
e.g. your flux example above could have been just:
let cpu = data.range({start: "-5m"}).filter(func(r){return r.measurement == "cpu" && r.field == "usage_user"})
Is there any specific feature of flux that requires a new language? I read your blog post here:
There isn't a specific feature of Flux that requires a new language, although part of the language is a query planner and optimizer that is tied to certain functions in it. Technically any Turing complete language is equivalent, but in practice that is seldom true. Performance is one vector, but so is expressiveness, aesthetics, ease of use, etc.
We chose not to use Lua because it's not a widely known language. Instead, we chose to create a language that looks and feels much like Javascript, which I think is the most widely used language today. However, we wanted to limit the scope of it and to add new syntax over time to make frequent tasks easier to express.
Any time you create a new API or library, you create new surface area for a developer to know. A language is no different than this. Some languages are also much easier to learn than others. For example, I found Go a language that I could pick up the basics in very quickly, while Rust is something that I'm still learning over months of effort. I like both languages, but their learning curves are very different.
I think Flux is quite easy to pick up for many programmers, although I'll have to see questions and have countless interactions with people trying to learn it to prove that out and make adaptations over time.
Finally, one goal with including the UI in InfluxDB 2.0 is that as we improve it, most users won't have to learn Flux at all. They'll be able to accomplish what hey want by just clicking around the interface. We're not there yet, but it's what we aspire to. And we want to have control over the language to build it in conjunction with UI tooling to automatically manipulate it.
>We chose not to use Lua because it's not a widely known language. Instead, we chose to create a language that looks and feels much like Javascript
This makes sense, in this case why not take strict subset of Javascript?
> Finally, one goal with including the UI in InfluxDB 2.0 is that as we improve it, most users won't have to learn Flux at all. They'll be able to accomplish what hey want by just clicking around the interface.
This will definitely will be helpful, and auto generation makes sense, however in our use-cases we treat Kapacitor alerts as a code that is reviewed.
Flux is almost a subset of Javascript. There are only two exceptions. Our pipe forward syntax, which I've seen proposed for Javascript, but who knows if that's going to happen. And the other is our named parameters with optional defaults. You can get close to this in JS by passing an object literal combined with destructuring.
However, there are other things we'll probably be adding to the language over time. Also, if you're only allowing a subset of JS, is it actually JS anymore? I'd imagine that some JS programmers could get frustrated when things they think should work (because they're valid JS) don't because they're not in the subset.
But say we went with that. Then we'd need to adopt an existing JS engine (most likely written in C++) and then modify it based on our needs. And then integrate that into our broader Go codebase. All of that seems a bit less than ideal.
Ultimately, I really do feel that if you're someone that knows Javascript, you can learn the elements of the Flux language in less than an hour. The bigger learning curve is the library of functions and the API, which would exist regardless of what language we chose.
I think the proof will come over time based on what kinds of things we enable in the language and the platform. In the near term I expect a large number of totally reasonable people to question the choice of a new language. After all, that's probably the rational response. But as we improve it, add to it, refine it, and improve the developer and user experience, I expect to win more converts. Developers pick up new tools because they enable them to get their jobs done faster. Ease of use, speed of development, and productivity are our guiding lights.
> But say we went with that. Then we'd need to adopt an existing JS engine (most likely written in C++) and then modify it based on our needs.
This is definitely not an easy choice to make and writing a new language interpreter in Go seems a way easier approach initially.
One thing that Go helped me to understand though, is that the language does not matter so much as the tooling around it - debugging, compiling and support around the language are way harder to achieve than writing a language parser.
Thanks for sharing your thoughts on this design choice, I think this discussion is very relevant for a couple of reasons:
There are other Go projects that are taking the same approach to achieve simplicity and not picking the existing language, like OPA [1]. Others, like Helm are picking Lua [2], so there is clearly a problem the Go and infrastructure community is facing and split in the way people are approaching it.
We've faced the similar dilemma with Teleport, as we are designing our extensions system. My original plan was to use Lua, however after discussions with the team we settled on GRPC with Go [3], trading expressiveness/simplicity and freedom of lua/js languages in favor of industrial features Go runtime and GRPC provides out of the box.
For smaller extension plugins we decided not to create a new language, and ended up with interpreted subset of Go [4].
I wonder if there is a place for some subset of javascript or typescript that is fully interpreted by Go, with native extensions for debugging
to be used by the community.
They're probably looking at Flux as a sunk cost now, but I agree, Lua would have probably fit the bill.
Either way, I'm hoping this is "The Release" where things go consistent. They've pivoted several times, and it's all been good, but we do need a couple of years of consistency before we can make long term investments into Flux.
Lua was certainly an option and it's one I was even considering as an embedded scripting language back in the fall of 2014. I gave a talk in London where I asked for a show of hands about Lua vs. Javascript as the choice and the majority raised for JS.
One of the reasons we didn't go with that is because we didn't want a separate query vs. scripting experience like you have with SQL engines that embed programming languages. We wanted something that felt and looked seamless.
Awesome, great explanation, thank you. I disagree on your point about Lua, but given what you guys have built so far I have no doubt you'll be successful.
I tried out InfluxDB a while ago in my spare time and was intrigued by the feature set, but ultimately couldn't get past the abstruse query language, especially coming from the simpler and more flexible PromQL (not being able to do ad-hoc math across time series was a big deal for my use case). I'm eagerly looking forward to giving it another shot with Flux and have super-high hopes.
What does the data model for time series look like in 2.0? Mostly the same as 1.x, or has that gotten more flexible as well?
For now we take writes in the 1.x line protocol so it’s still measurement, tags, and field. However, Flux doesn’t really make that a requirement. So in the future we plan on having a way to write series in without requiring a field or even a measurement.
Once the planner gets the data to the Flux processing engine it views everything thing as a table of data with columns and records. So it’s much more flexible in how we can represent data.
I love InfluxDB, apart from the need to have separate retention policies when reducing granularity over time.
Are there any plans for a more unified method of performing continuous queries - so that we can query high granularity and older, downsampled data at the same time?
Absolutely. We won't address that in the initial release of 2.0, but there will be ways to get it done. The eventual solution will probably revolve around using the tasks system to downsample into buckets of different retention. Then using a function in Flux at query time to look at the metadata of buckets in the query and the time range and selecting the precision based on that. We should be able to show examples of how to do this in Flux later this year.
Any HA features? Sharding to look out in 2.0? Or is the general idea to set streaming relays of influxdb tsm's and treat HA as an L7 proxy routing problem (shadow metric traffic using envoy for example). How do people handle this in their production setups. Curious to know.
It would be cool if the query engine could talk to multiple shards spanning multiple machines for dealing with high cardinality series.
Right now we’re prioritizing work on the single open source server and our cloud service, which has a very different design. Flux will be able to query multiple servers and combine their results (in OSS), but that would be a building block for some HA or clustering.
So you could certainly layer in your own HA solution. We’re still working out what if any clustered for federated features will exist in open source.
In the last few months we have made quite a few improvements to data storage and indexing. Features that are available by default in 2.0, which are significant changes from releases earlier than, say, 1.6 include:
- Significant TSM encoding and decoding performance improvements.
- The TSI index will be on by default.
- Queries that use the same tag key/value filters will be answered from the index more quickly using an LRU cache.
- Field keys will now be indexed in 2.0, making filtering/grouping on field keys more efficient.
- Improvements to how series are extracted from the index, and points data from the TSM engine, which helps with memory performance for queries.
- Significant performance improvements to measurement deletion.
> And can data points be incremented instead of the current field-replacement crap when you get new points with the same tag set?
I want to use influx to store _statistics_ not _events_. Basically, my data points are tag-sets and counts.
There are several ways to achieve this; for example, you can send the events to influx and have continuous queries to gather the statistics. That doesn't work well when you have a lot of events, and where they arrive out of order and at high latency, etc.
So what you typically end up having to build is a stats thing that sits in front of influx and tracks the counts of events with particular tag sets in particular time buckets, and then keep uploading these to influx.
And there are two ways to do that:
1) you are not stateful and you keep uploading deltas and incrementing the nanoseconds to avoid data-point collision; you can then get the data out of influx with sum() on the fields and grouping by whatever the time bucket is. I tried this and influx grinds to a halt eventually.
2) you are stateful and track the totals outside influx, and keep uploading a newly-written data-point to overwrite the fields for that bucket in influx. This is much less data in influx and much easier to query, avoiding sum() etc. Its like I end up with something in front of influx doing what I want influx to do.
What would greatly simplify life is if the line format which looks like this:
and in the second case, where the line is prefixed with a + sign, influx knows to add the fields if the data-point collides with another rather than overwrite them.
This would mean that people trying to store statistics in influx could add to those statistics statelessly. A massive simplification.
I've had other problems, like I have way more than 1M series. Its painful. My influx boxes hit iowait far too often, which is weird because the boxes have more RAM than the total dataset.