I’ve got a lot of services, some in docker, some in LXC or a VM in proxmox. Currently I’ve got no monitoring service. Recently a service went down and I didn’t notice for quite a while so now I’ve got a bunch of missing data. What monitoring tools do you all use? Looking for something that works with docker and plain Linux CTs/VMs and can notify me if a website is down, docker container crashed, VM is offline, etc.
and as a bonus feature something that I can run on two machines so if an entire machine dies, the other will notice and I’ll still receive a notification.
notification can be anything, email, sms, push, etc.
UptimeKuma is what I use; it’ll watch tcp connections, docker containers, websites… whatever. And the notifications are pretty comprehensive and probably cover anything in 2023 would want to be using.
+1 for Uptime Kuma. Dead simple to set up and configure, and it has alert support for dozens of services.
I administer a large Zabbix environment in my day job, and while it’s not complicated to get set up, it’s overkill for simple up/down service monitoring.
I use Uptime Kuma too in my VPS and I monitor it with Node Red at home (that is monitored by Uptime Kuma! 😁) So if anything goes down (monitoring tool too) I receive alerts. Both of them send me alerts with NTFY.
I use:
- Monitoring server - prometheus
- Alert manager for prometheus - alertmanager. You can write any triggers here.
- Web UI for prometheus - Grafana
- Exporters for prometheus - node-exporter, blackbox-exporter, mysql-exporter, psql-exporter etc. You can find exporter for everything you need.
- Some services native support pormetheus. Docker for example: https://docs.docker.com/config/daemon/prometheus/
If you whant cluster you can install thanos on prometheus.
I’d like to explore Prometheus (I’ve never used it). Right now I use InfluxDB to store some data (ping times, temperature, servers load, etc.) can Prometheus read those values and react if something is off or should I store everything twice?
prometheus use own time series database. you can connect influxdb to grafana and send alarms from grafana, but alertmanager better i think. node-explorer can collect all this data (sensors, VM/PC load etc.)
I’ve never used alerting in Grafana, how do they work? Is it possible to get alert if a ping is higher than xx for a period of time? What are alertmanager and node-explorer? Plugins or standalone tools? Sorry for all the questions! 😁 And thanks for the info!
Grafana sends an email screenshot of the graph when an event is triggered on the graph. You can see alerts part on any graph for understand.
You know that I’ve never knew about that? I’ve just set it up! Thanks!!!
If you strip down monitoring, all you need is a notification if something goes down
I use monocker it monitors status changes on containers and sends a notification when one happens
Thats all
For a handful of servers, try zabbix. Every distribution has a packaged zabbix agent. It has everything: web ui, a way to Auto discover things with a bit of setup, nice graphs, alerting, LDAP User Management if you need it, a way to define per person/group alerting/notification schedules. And the community is big enough that many common services (fail2ban/postfix/MySQL/etc.) have premade custom monitoring scripts. Adding your own metrics is also very easy.
I have Conky on my desktop and do a curl to a known page on my server to monitor if a web service is up every 60 seconds. If it’s down, I swap to a blinking animated gif as an icon and play an alert sound.
Checkmk maybe?
CheckMK is too complicated for my monkey brain. After a few days of going through docs, I can’t even get a log file monitoring going.
I use LibreNMS and Healthchecks.io. I also use Grafana to display all the important data in a dashboard on a portrait mounted monitor on my desk.
If all you want is uptime monitoring, Uptime Kuma.
a nagios user here, no pretty charts. Just is it down
Monit works for me. Good basic monitoring solution that can also restart a service/interface.
I also use LibreNMS to do alerting for a variety of conditions (syslog events, sensor conditions, outages and services via nagios). But this is more work to get set up.
- How do you observe your server functions? - Lemmy.world
- https://github.com/awesome-foss/awesome-sysadmin#monitoring
I use netdata (agent only, not the cloud/SaaS stuff) for metrics/HTTP checks/alerting, and rsyslog+graylog (or just lnav on small setups) for log analysis. Plus a bunch of other scanners (debsecan, lynis, debsums…)