Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add prometheus monitoring foundation #5736

Merged
merged 4 commits into from
Jan 24, 2017
Merged

Conversation

Sing-Li
Copy link
Member

@Sing-Li Sing-Li commented Jan 24, 2017

Add support for prometheus monitoring ( https://prometheus.io/ )

This can become our own foundation to instrument everything. A first step in realizing #5730

Only one "sample metrics" has been added - accumulated number of messages sent
But more can be readily added

Prometheus exposes default useful nodejs metrics

There are excellent add on modules that can expose GC metrics, docker metrics , as well as OS metrics.

Used the following prometheus.yml for testing and development:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'


    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    metrics_path: /api/metrics
    static_configs:
      - targets: ['127.0.0.1:3000']

@engelgabriel engelgabriel temporarily deployed to rocket-chat-pr-5736 January 24, 2017 04:58 Inactive
@engelgabriel engelgabriel temporarily deployed to rocket-chat-pr-5736 January 24, 2017 05:05 Inactive
@engelgabriel engelgabriel temporarily deployed to rocket-chat-pr-5736 January 24, 2017 05:34 Inactive
@engelgabriel engelgabriel requested a review from rodrigok January 24, 2017 12:43
@engelgabriel engelgabriel added this to the 0.51.0 milestone Jan 24, 2017
@rodrigok
Copy link
Member

Should we replace all datadog metrics by prometheus?

@engelgabriel engelgabriel merged commit 540f673 into develop Jan 24, 2017
@engelgabriel engelgabriel deleted the prometheus-monitoring branch January 24, 2017 15:07
@geekgonecrazy
Copy link
Contributor

@rodrigok I cast my vote 100% for replacing datadog with prometheus. I think this will let us fine tune what we want / need to monitor within rocket.chat its self too.

@Sing-Li
Copy link
Member Author

Sing-Li commented Jan 24, 2017

@rodrigok 100% vote also here to replace datadog 😄

@Sing-Li
Copy link
Member Author

Sing-Li commented Jan 24, 2017

one gotcha to share when hooking up prometheus is that for multi-instanced install - you need to connect direct to instance [not through load balancer] to get meaningful metrics feed

@graywolf336
Copy link
Contributor

If we're going to add more stats this way, then let's move them all to a new /api/prometheus/ style api and not keep adding things to the default api.

@geekgonecrazy
Copy link
Contributor

Should we go so specific as to /api/prometheus/ or can we not just use: /api/metrics/ ?

@graywolf336
Copy link
Contributor

How generic is the data returned?

@geekgonecrazy
Copy link
Contributor

@graywolf336 I think your right. Very least should move to: /api/metrics/prometheus

@Sing-Li
Copy link
Member Author

Sing-Li commented Jan 24, 2017

Not quite an "api" ... it is a data-feed that is accessed via http / https
As new metrics are added they will become part of the feed, so no extra APIs will be added.
Format is pretty standard monitoring data feed (counter, gauge, histogram, summary).
Default is actually /metrics for prometheus. (https://prometheus.io/)

@graywolf336
Copy link
Contributor

While I can understand what it is you're trying to do, it should be implemented in a manner which is extensible and easily supports additional metric systems beyond the ones that support the prometheus format. Before this makes it into production, I would like to see the changes this pull request added changed to use internal the RocketChat.callbacks which are not only defered but also make sense in an architectural standpoint to collect metrics from. Also, the endpoint which is used I would like to see use the structure @geekgonecrazy recommended above which is /api/metrics/{service}.

cc: @engelgabriel

@engelgabriel
Copy link
Member

@Sing-Li what other system use the same data feed format as Prometheus?

@engelgabriel
Copy link
Member

I think that the API should be a wrapper around our own stats collector, and not yet another stats collecting engine. This way we can support multiple system, and even show some basic data and graphs on the admin panel.

@graywolf336
Copy link
Contributor

@engelgabriel I agree 👍

@Sing-Li
Copy link
Member Author

Sing-Li commented Jan 24, 2017

@graywolf336 As stated previously, as a single endpoint datafeed - you are free to locate it anywhere. It is not an API.

@engelgabriel it actually is. The client part of prometheus actually does not care how metrics are collected. It is just a data structure (a buffer) that is updated and then exported/rendered to the feed. How you collect those metrics, is absolutely flexible and not dictated in any way - so if @graywolf336 chooses to use defer or callback, he certainly can.

Prometheus solves the instrumentation and monitoring problem elegantly, and allows for minimal disturbance to existing systems for its implementation - hence its growing popularity.

@engelgabriel
Copy link
Member

I'd like to have something like #3824 running internally, powered by our own #726

@Sing-Li
Copy link
Member Author

Sing-Li commented Jan 24, 2017

@engelgabriel as most of the metrics enumerated in #726 are counters, and the https://moovel.github.io/teamchatviz/ site referenced in #3824 are time series graphs ... this is almost a classic use-case for a prometheus - Grafana pipeline.

But now I do understand (thx 4 the offline talk) that there are some plans already in motion to accomplish some part of this using REST APIs and other means.

So indeed it is your call on the approach - to avoid duplication of effort and resources.

RocketChat.sendMessage user, message, room
RocketChat.metrics.messagesSent.inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line broke several things which rely on the return result of RocketChat.sendMessage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we remove the sendMessage hook here and work on a PR to make use of the statistics collection we are already creating. That way we don't adversely effect things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants