-
Fix memory leak in chan::tick cf. BurntSushi/chan#11 and branch mem_leak_chan_tick.
This is known behavior of chan::tick. The memory leak led to 236 KB allocated memory during 64 hours. That doesn't hurt too much especially because we restart the process every 24 for log rotation. Skipping this for now.
-
Transform all wsrep values to metrics
-
Add metadata for all wsrep values
-
Check additional state for metrics
-
Handle tags
-
Automate packaging for Ubuntu
- Ansible Role
- Update Readme: Link to package and Ansible role
-
Support TLS and basic auth for Bosun
-
Add auth and transport encryption to Galera collector
-
Add auth and transport encryption to Mongo collector
-
lib version check -- reduce multiple versions of dependent crates
-
Clippy-fy
-
Release 0.1
-
Move to serde; cf. Galera collector
-
Redo collectors as real state machine
-
[+] Failure Modes
- Reinitialize collector if collection fails.
- Reconnect Logic for Galera Collector
- Remove collector if too many collection failures.
- Remove collector if collection thread does not respond anymore.
- Reinitialize collector if collection fails.
-
Add timestamps to log messages
-
Tests
-
Clean up
-
Make it safe
- Clippy-fy
- Fix Todos
- Eliminate unwraps
-
Rust documentation
-
Enhance deb package
- Don't overwrite changed config files
-
Move project to Rheinwerk
-
Extend bosun_emitter to send multiple data points
-
Support multiple Galera Collectors -- also change in Ansible role
-
Make threads resilient against panics (current workaround: abort on panic so that no thread dies unknowingly)
- Check for IP bound to interface -- keepalived VIP side effect
- [+] Postfix metrics
- Queue len
- Send / Recv stats
- [+] MongoDB
- [+] replication metrics -- cf. replSetGetStatus
- myState (A)
- [+] Oplog replication lag (A)
- Explain lag spikes due to idle times -- cf. Mongo documentation
- Show alert example
- Heartbeat latency = lastHeartbeatRecv - lastHeartbeat (A)
- roundtrip time = pingMs
- uptime = uptime -> Rate
- health = health only from point of view of primary (A)
- Balancer Status
- other metrics?
- [+] replication metrics -- cf. replSetGetStatus
- Internal metrics
rs-collector.*
- Version -- can also be used to check liveliness and as heartbeat
- Number of transmitted samples -- can also be used to check liveliness and as heartbeat
- RSS cf. procinfo -- can also be used to check liveliness and as heartbeat
- Docker
- Use rust-docker
- ifconfig / network inferface frame metrics
- DNS
- Serial numbers of all authoritive servers
- Ceph metrics
- MySQL performance metrics
- MongoDB performance metrics
- Tomcat management servlet metrics
- LACP / interface bond metrics