At the place where I work we use a couple of different tools for logging events:
Logstash + graylog / elasticsearch - mostly for monitoring application error logs and easy ad hoc querying and debugging.
statsd+graphite+ nagios/pagerduty - general monitoring/alerting and performance stats
zeromq (in the process of changing now to kafka) + storm and redis for real time events analytics dashboards. We are also writing it to hdfs and running batch jobs over the data for more in depth processing.
We also have a legacy sql server in which we save events / logs which is still maintained so maybe this could help you. Just FYI we analyse more than 500 million records / day and we had to do some optimisations there:
-if the database allows then partition the table by date.
-create different tables for different applications and / or different events
-1 table / day which is then at the start of the new day getting merged in a different monthly table in a separate read only database.
-create daily summary tables which are used for analytics
-if you actually need to query all the data then use union on the monthly tables or the summary tables
-I want to also say this, I know it's a given but if you have large amounts of data batch and then use bulk inserts..
I suggest you take a couple of steps back and think hard about exactly how you want to access and query the data and think what the best tool for you in the long run is.
Logstash + graylog / elasticsearch - mostly for monitoring application error logs and easy ad hoc querying and debugging.
statsd+graphite+ nagios/pagerduty - general monitoring/alerting and performance stats
zeromq (in the process of changing now to kafka) + storm and redis for real time events analytics dashboards. We are also writing it to hdfs and running batch jobs over the data for more in depth processing.
We also have a legacy sql server in which we save events / logs which is still maintained so maybe this could help you. Just FYI we analyse more than 500 million records / day and we had to do some optimisations there:
-if the database allows then partition the table by date. -create different tables for different applications and / or different events -1 table / day which is then at the start of the new day getting merged in a different monthly table in a separate read only database. -create daily summary tables which are used for analytics -if you actually need to query all the data then use union on the monthly tables or the summary tables -I want to also say this, I know it's a given but if you have large amounts of data batch and then use bulk inserts..
I suggest you take a couple of steps back and think hard about exactly how you want to access and query the data and think what the best tool for you in the long run is.