could achieve something similar a bit different way - by using Linux nftables ( new iptables ) netfilter interface. I have setup like this for measuring used traffic by certain daemons running under specific user:
table inet raw {
...
counter postgre_tcp_traffic_out {
packets 0 bytes 0
}
...
chain output {
...
meta l4proto tcp skuid postgres counter name "postgre_tcp_traffic_out" notrack
...
}
}
Since nft -j outputs JSON it can easily then be ingested back into Postgres and indexed. I personally use it together with zabbix to count per second differences in values. It needs some more work because netfilter can match packets by UID/GID only for output, input then has to be matched by destination port, 5432 in case of postgres.
I have spent some time to search for a tool that can ingest realtime network traffic data to Postgres but have no luck, so I developed this extension and used it internally in our team. Thanks Rust, pgx and libpcap, the development journey is easy and enjoyable.
Would like to hear more feedbacks. Any contributions, feature requests, bug report or ideas are welcomed.
Our price model based on database usage which includes db egress, so we use it to monitor db egress traffic just for now. We are all db fans and we like put things into db, so all the following processes, like searching, tracking, analysis and etc., are a lot easier.
I've written a tool to do this not once but twice now! The first was for ingesting netflow data, the second from ingesting AWS VPC flow logs. In both cases this was storing 'flows' rather than raw packet information.
I'd be super interested in tooling to help with this. I'm running – literally right now – a Python script that is writing about 100k flows per second into Postgres (TimescaleDB). It is batch rather than realtime, and it geocodes the flows on the way in.
Yes, you need root to grant capture permission to Postgres, but Postgres doesn't need to run as root. The main purpose is to easily ingest network traffic data into Postgres for further process. Yes we can use tcpdump to do the same thing, but that needs another tool to load data into Postgres.
For another alternative: I added some monitoring, by using file_fdw. This is a standard pg extension that presents files or program stdout as tables. So I file_fdw'd some files in /proc and some system utilities.
For harder things, read the file as lines instead of fields, then create a view with some regexes to split it in fields:
CREATE FOREIGN TABLE IF NOT EXISTS proc_meminfo(
line text
) SERVER pglog OPTIONS ( filename '/proc/meminfo', header 'false',delimiter '$' );
CREATE OR REPLACE VIEW proc_meminfo_interpreted AS
WITH arr AS (SELECT regexp_split_to_array(line,':| +') a FROM proc_meminfo)
SELECT a[1] as name,a[3] as value FROM arr;
Hardest part is creating semi-legible source code in HN ;-)
Nearly all usecases for this seem to be recreating a time series database...
Postgres isn't a great TSDB - the indexes don't understand columns that will only have updates at one end, it doesn't do column compression, range queries are expensive, etc.
Perhaps it's time to just set up a time series database... Like influxDB.
Thanks for the mentions (and for using TimescaleDB).
If anyone's curious about TimescaleDB, it's packaged as an extension to Postgres, optimizing for performance, storage, and analysis of time series data. Implementing columnar compression algorithms is a big part of the secret sauce that makes TimescaleDB a popular choice with Postgres and SQL developers. You can read more about that on the Timescale blog (I'm Timescale's community manager btw). https://www.timescale.com/blog/search/?query=compression
I recommend to use VictoriaMetrics instead of using InfluxDB for many reasons. PostgreSQL is good SQL database for different purposes but as TSDB on a big installations it works not really well.