add prometheus monitoring foundation #5736

Sing-Li · 2017-01-24T04:58:00Z

Add support for prometheus monitoring ( https://prometheus.io/ )

This can become our own foundation to instrument everything. A first step in realizing #5730

Only one "sample metrics" has been added - accumulated number of messages sent
But more can be readily added

Prometheus exposes default useful nodejs metrics

There are excellent add on modules that can expose GC metrics, docker metrics , as well as OS metrics.

Used the following prometheus.yml for testing and development:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'


    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    metrics_path: /api/metrics
    static_configs:
      - targets: ['127.0.0.1:3000']

rodrigok · 2017-01-24T13:12:20Z

Should we replace all datadog metrics by prometheus?

geekgonecrazy · 2017-01-24T16:31:33Z

@rodrigok I cast my vote 100% for replacing datadog with prometheus. I think this will let us fine tune what we want / need to monitor within rocket.chat its self too.

Sing-Li · 2017-01-24T16:33:51Z

@rodrigok 100% vote also here to replace datadog 😄

Sing-Li · 2017-01-24T17:15:38Z

one gotcha to share when hooking up prometheus is that for multi-instanced install - you need to connect direct to instance [not through load balancer] to get meaningful metrics feed

graywolf336 · 2017-01-24T17:28:20Z

If we're going to add more stats this way, then let's move them all to a new /api/prometheus/ style api and not keep adding things to the default api.

geekgonecrazy · 2017-01-24T18:24:42Z

Should we go so specific as to /api/prometheus/ or can we not just use: /api/metrics/ ?

graywolf336 · 2017-01-24T18:27:13Z

How generic is the data returned?

geekgonecrazy · 2017-01-24T18:44:17Z

@graywolf336 I think your right. Very least should move to: /api/metrics/prometheus

Sing-Li · 2017-01-24T18:44:23Z

Not quite an "api" ... it is a data-feed that is accessed via http / https
As new metrics are added they will become part of the feed, so no extra APIs will be added.
Format is pretty standard monitoring data feed (counter, gauge, histogram, summary).
Default is actually /metrics for prometheus. (https://prometheus.io/)

graywolf336 · 2017-01-24T18:57:58Z

While I can understand what it is you're trying to do, it should be implemented in a manner which is extensible and easily supports additional metric systems beyond the ones that support the prometheus format. Before this makes it into production, I would like to see the changes this pull request added changed to use internal the RocketChat.callbacks which are not only defered but also make sense in an architectural standpoint to collect metrics from. Also, the endpoint which is used I would like to see use the structure @geekgonecrazy recommended above which is /api/metrics/{service}.

cc: @engelgabriel

engelgabriel · 2017-01-24T19:21:34Z

@Sing-Li what other system use the same data feed format as Prometheus?

engelgabriel · 2017-01-24T19:24:24Z

I think that the API should be a wrapper around our own stats collector, and not yet another stats collecting engine. This way we can support multiple system, and even show some basic data and graphs on the admin panel.

graywolf336 · 2017-01-24T19:25:15Z

@engelgabriel I agree 👍

Sing-Li · 2017-01-24T19:36:02Z

@graywolf336 As stated previously, as a single endpoint datafeed - you are free to locate it anywhere. It is not an API.

@engelgabriel it actually is. The client part of prometheus actually does not care how metrics are collected. It is just a data structure (a buffer) that is updated and then exported/rendered to the feed. How you collect those metrics, is absolutely flexible and not dictated in any way - so if @graywolf336 chooses to use defer or callback, he certainly can.

Prometheus solves the instrumentation and monitoring problem elegantly, and allows for minimal disturbance to existing systems for its implementation - hence its growing popularity.

engelgabriel · 2017-01-24T21:20:25Z

I'd like to have something like #3824 running internally, powered by our own #726

Sing-Li · 2017-01-24T23:08:15Z

@engelgabriel as most of the metrics enumerated in #726 are counters, and the https://moovel.github.io/teamchatviz/ site referenced in #3824 are time series graphs ... this is almost a classic use-case for a prometheus - Grafana pipeline.

But now I do understand (thx 4 the offline talk) that there are some plans already in motion to accomplish some part of this using REST APIs and other means.

So indeed it is your call on the approach - to avoid duplication of effort and resources.

graywolf336 · 2017-02-03T18:00:54Z

packages/rocketchat-lib/server/methods/sendMessage.coffee

 		RocketChat.sendMessage user, message, room
+		RocketChat.metrics.messagesSent.inc()


This line broke several things which rely on the return result of RocketChat.sendMessage.

I propose we remove the sendMessage hook here and work on a PR to make use of the statistics collection we are already creating. That way we don't adversely effect things

add prometheus monitoring foundation

76b9cdd

engelgabriel temporarily deployed to rocket-chat-pr-5736 January 24, 2017 04:58 Inactive

Merge branch 'develop' into prometheus-monitoring

52d7536

engelgabriel temporarily deployed to rocket-chat-pr-5736 January 24, 2017 05:05 Inactive

fix eslint complaints

1c05d4f

engelgabriel temporarily deployed to rocket-chat-pr-5736 January 24, 2017 05:34 Inactive

engelgabriel requested a review from rodrigok January 24, 2017 12:43

engelgabriel assigned rodrigok Jan 24, 2017

engelgabriel added this to the 0.51.0 milestone Jan 24, 2017

Update package.json

ac41376

rodrigok deployed to rocket-chat-pr-5736 January 24, 2017 13:03 View deployment

rodrigok approved these changes Jan 24, 2017

View reviewed changes

rodrigok added the Ready to Merge label Jan 24, 2017

engelgabriel merged commit 540f673 into develop Jan 24, 2017

engelgabriel deleted the prometheus-monitoring branch January 24, 2017 15:07

graywolf336 reviewed Feb 3, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add prometheus monitoring foundation #5736

add prometheus monitoring foundation #5736

Sing-Li commented Jan 24, 2017 •

edited

Loading

rodrigok commented Jan 24, 2017

geekgonecrazy commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

graywolf336 commented Jan 24, 2017

geekgonecrazy commented Jan 24, 2017

graywolf336 commented Jan 24, 2017

geekgonecrazy commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

graywolf336 commented Jan 24, 2017

engelgabriel commented Jan 24, 2017

engelgabriel commented Jan 24, 2017

graywolf336 commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

engelgabriel commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

graywolf336 Feb 3, 2017

geekgonecrazy Feb 3, 2017

		RocketChat.sendMessage user, message, room
		RocketChat.metrics.messagesSent.inc()

add prometheus monitoring foundation #5736

add prometheus monitoring foundation #5736

Conversation

Sing-Li commented Jan 24, 2017 • edited Loading

rodrigok commented Jan 24, 2017

geekgonecrazy commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

graywolf336 commented Jan 24, 2017

geekgonecrazy commented Jan 24, 2017

graywolf336 commented Jan 24, 2017

geekgonecrazy commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

graywolf336 commented Jan 24, 2017

engelgabriel commented Jan 24, 2017

engelgabriel commented Jan 24, 2017

graywolf336 commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

engelgabriel commented Jan 24, 2017

Sing-Li commented Jan 24, 2017

graywolf336 Feb 3, 2017

Choose a reason for hiding this comment

geekgonecrazy Feb 3, 2017

Choose a reason for hiding this comment

Sing-Li commented Jan 24, 2017 •

edited

Loading