Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat]create kube-state-metrics metricset, including all objects #14306

Conversation

odacremolbap
Copy link
Contributor

@odacremolbap odacremolbap commented Oct 29, 2019

Create kube-state-metrics metricset for the kubernetes module.

Inplements #12938
Fixes #13986
Fixes #11624
Related to #7058

THIS IS WIP

Pre-release review

This PR is modifying the prometheus fetcher, adding a pre-processing filter, and using more than 20 kubernetes objects.

To make it easier to review there will be at least 3 milestones, one of which is already here

Milestones

Milestone 1

Milestone 1 is in review status

  • Prometheus ExtendedInfoMetric:

    Based on InfoMetric type at the prometheus package, it accepts a Configuration object that allows an option to include any label at the metric that hasn't been mapped explicitly.
    Underlying commonMetric object has been modified, it holds a configuration item and MetricProcessingOptions (an array of functions to apply to already retrieved elements) is included inside this configuration object. Existing client code do not need to be updated. As an alternative, creating a function signature that included a configuration and metricProcessingOption as part of the signature did not seemed right, since the later sounds like part of the former.

  • pre-process filtering:

    The proposal implemented here for this filtering is to use either a whitelist or blacklist of items. Docs included at the PR shows how it would work.
    The main caveat would be that we are making up names for each group that can be filtered (in fact it is inferred from the kube_<name>_ from kube-state-metrics, and I would say that is intuitive). The alternative would be to use some pattern that matches the source metric name, so instead of filtering for cronjob the user would write kube_cronjob_*. To my eyes the alternative does not look good

  • First 2 implementation of groups certificatessigningrequest and configmaps have been added to get the vibe of how the code would look like. This is also fully working if you want to test

Milestone 2

Milestone 2 is not started yet

  • Include groups that can be enriched (pods, deployments, ...)
  • Add the enricher

Milestone 3

Milestone 3 is not started yet

  • Add all other object groups

@odacremolbap odacremolbap self-assigned this Oct 29, 2019
@odacremolbap odacremolbap added in progress Pull request is currently in progress. Metricbeat Metricbeat containers Related to containers use case labels Oct 29, 2019
@exekias exekias requested a review from a team October 30, 2019 10:00
}

// ExtendedInfoMetric obtains info labels from the given metric and puts them
// into events matching all the key labels present in the metric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the difference between this and InfoMetric is that we use it to catch all labels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, re-reading the comment that is not clear.
Will improve it, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think most existing uses of InfoMetric could use this extended version out of the box, so a choice was evolving current InfoMetric instead of creating the new one.

That would need a lot of testing, not in the mood for that, but I don't see a reason why we should left some labels coming from prometheus not pushed. That might be something to check for each existing use of InfoMetrics if this extended type gets in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought, we should consider moving all existing metricsets to the new behavior. This could perhaps be a parameter for InfoMetric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the signature for InfoMetric ... could be done, but I would rather attempt to merge both InfoMetric and ExtendedInfoMetric in a different PR.

Current InfoMetric parameter option func is intended for post-processing, and does not plays well with the current intended behaviour, I don't think this is doable without changing current signature.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, as long as we keep the plan for unifying them to avoid confusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// if label for this metric is not found at the label mappings but
// it is configured to store any labels found, make it so
// TODO dedot
labels.Put(labelsLocation+"."+k, v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prometheus labels don't contain dots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's interesting, I will need to test but prometheus is copying labels + values from kubernetes, which might contain dots.

I'm not sure if it will code dots into something else. I'll keep the TODO until I test

- module: kubernetes
enabled: true
filter:
blacklist:
Copy link
Contributor

@exekias exekias Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the reasoning to do blacklist instead of using the drop_event processor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pros

blacklist happens before processing, it avoids including items in the mapping object, should save memory + CPU + time.
In fact, I was wondering how to introduce this same feature for all prometheus metrics, but don't want to abstract it until that is proved to be worth it for this metricset.

for me it feels better to write configuration to receive everything but configmaps and damonsets like:

metricset:
  kube-state-metrics:
    i-want-all-but:
      - configmaps
      - daemonsets

than

metricset:
  kube-state-metrics:

processors:
  once-processed-remove:
    - events-that-match
       event.somekey equals kube_configuration_*
    - events-that-match
       event.somekey equals kube_daemonset_*

Cons

not using drop_event is not beats standard, this might be a one off unless we find this valuable for other metrics. That said, you can still use drop_event even if blacklisting makes it in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how often people will want to disable these. What would be the expected behavior for resources that are not present in k8s? This will probably report none or only a few documents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on doubts when introducing features no one asked for

I think there is a total of 28 groups, among them replicationcontrollers, signingcertificaterequests, storageclasses ... and the default would be enabling them all. But as soon as a user notices all that information that is not useful for their use cases, or that might bring redundant information (sometimes pv/pvc), and that keeps on using storage, it looks like a candidate to be filtered.

What would be the expected behavior for resources that are not present in k8s?

not present at prometheus --> no data is processed nor sent
present and boring (storageclasses, whatever the user considers boring) --> constant data with no value
The use case for this, it would be the same as the drop_event processor, but other than saving storage at ES, it saves resources + time, and groups using familiar names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm feeling like dropping blacklisting and whitelisting filtering. Those are a lot welcomed here, but don't want to go forward until discussing out of band with the team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, we can always add them in a followup PR

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! left a few questions

@odacremolbap
Copy link
Contributor Author

Re-use of labels names at different metrics

reason is a label coming from different families but from the same prometheus web resource.
Current implementation for prometheus fetcher does not allow to track a label family, hence reason can be mapped only once, no matter which family uses it.

That hasn't been an issue so far since each label was unique per family, or if repeated, it had the same meaning and was informed at the same location at the output event.

At kube-state-metrics we find these 2 families informed:

  • container status waiting:
kube_pod_container_status_waiting_reason{namespace="kube-system",pod="etcd-minikube",container="etcd",reason="ContainerCreating"} 0
...
  • container status terminated
kube_pod_container_status_terminated_reason{namespace="kube-system",pod="kube-addon-manager-minikube",container="kube-addon-manager",reason="OOMKilled"} 0
...

Retrieving a resource that contains both families using a single fetch could be done using these labels as keys

  • namespace
  • pod
  • container
  • reason

Which would group families that match the values for those labels. The problem is that while namespace, pod and container can be placed at:

  • kubernetes.statemetrics.pod.name
  • kubernetes.statemetrics.namespace
  • kubernetes.statemetrics.pod.container

Label reason is not that generic, this does not sound that good:

  • kubernetes.statemetrics.pod.container.reason

Ideally this would be placed at a different location at each event

  • kubernetes.statemetrics.pod.status.terminated.reason
  • kubernetes.statemetrics.pod.status.waiting.reason

Solution proposal

In order to do this probably the most straightforward solution is to add an optional field to the label mapping (creating a new label type) that adds context about the metric it is being retrieved from. This way we would be able to register label mappings for elements with the same name to different event locations:
(implementation not like this, but similar)

		map[string]prometheus.LabelMap{
			"reason":     prometheus.ExtendedKeyLabel(
                    []Configuration{
                        {MetricContext: "kube_pod_container_status_last_terminated_reason", MapTo:"terminated.reason"},
                        {MetricContext: "kube_pod_container_status_last_waiting_reason", MapTo:"waiting.reason"},
                        ),

Alternatives

Dismiss the single metricset for kube-state-metrics and use the previous approach of one metricset per metric group

@exekias waiting for your decission

@exekias
Copy link
Contributor

exekias commented Nov 18, 2019

Which would group families that match the values for those labels. The problem is that while namespace, pod and container can be placed at:

  • kubernetes.statemetrics.pod.name
  • kubernetes.statemetrics.namespace
  • kubernetes.statemetrics.pod.container

Shouldn't this be kuberentes.pod.name, kubernetes.namespace and kubernes.pod.container?

Label reason is not that generic, this does not sound that good:

  • kubernetes.statemetrics.pod.container.reason

Ideally this would be placed at a different location at each event

  • kubernetes.statemetrics.pod.status.terminated.reason
  • kubernetes.statemetrics.pod.status.waiting.reason

Solution proposal

In order to do this probably the most straightforward solution is to add an optional field to the label mapping (creating a new label type) that adds context about the metric it is being retrieved from. This way we would be able to register label mappings for elements with the same name to different event locations:
(implementation not like this, but similar)

		map[string]prometheus.LabelMap{
			"reason":     prometheus.ExtendedKeyLabel(
                    []Configuration{
                        {MetricContext: "kube_pod_container_status_last_terminated_reason", MapTo:"terminated.reason"},
                        {MetricContext: "kube_pod_container_status_last_waiting_reason", MapTo:"waiting.reason"},
                        ),

I'm wondering if this should be more something like this. If I understand things correctly there can be only one status+reason pairs at the same time?

{
   "kubernetes.container.status": "terminated"
   "kuberentes.container.reason": "waiting"
}

@odacremolbap
Copy link
Contributor Author

@exekias continuing on the last conversation

To be able to extract context from labels belonging to metrics, the proposal would be declaring the labels inside of the metric map. Much closer to mapping the exact structure of what we find at the prometheus resource.

The two examples below are metric families that define 2 labels with the same name, one of them label2_for_metric has a common meaning, won't overlap, and can be put at some global location. This is the same behaviour we have so far with current label mappings.
On the other hand label1_for_metric is present at multiple metrics, but the its meaning and desired location is bound to each metric informing it:

MetricMap{
    "metric_from_prometheus_resource" : prometheus.ExtendedMetric(
        prometheus.Configuration{
            // location of retrieved metric at output event
            EventLocation: "event.location.metric.value"

            // how labels should be mapped
            Labels: map[string]prometheus.LabelMap{
                "label1_for_metric": prometheus.KeyLabel("event.location.metric.label1")
                "label2_for_metric": prometheus.KeyLabel("event.location.label2")
            }

            // what to do with non mapped labels
            StoreNonMappedLabels: true,
            NonMappedLabelsPlacement: "labels",
    ),

    "another_metric_from_prometheus_resource" : prometheus.ExtendedMetric(
        prometheus.Configuration{
            EventLocation: "event.location.another.metric.value"
            Labels: map[string]prometheus.LabelMap{
                "label1_for_metric": prometheus.KeyLabel("event.location.another.metric.label1")
                "label2_for_metric": prometheus.KeyLabel("event.location.label2")
            }
    )    
}

This proposal is compatible with previous metrics mapping specification, ans uses their objects. An ExtendedMetric object is added that allows defining labels inside the metric scope, as they are informed at prometheus.

A real world example can be found at https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md, look for kube_pod_stats_phase and kube_pod_stats_ready, where the same labels are informed but the condition label must be informed independently for both metrics within the same pod event.

Looking forward for your feedback.

@exekias
Copy link
Contributor

exekias commented Nov 19, 2019

This sounds reasonable, I wonder if mapping may get too complex in some cases. Right now for most metricsets It's really straight forward. ie:

Metrics: map[string]p.MetricMap{
"kube_pod_info": p.InfoMetric(),
"kube_pod_container_info": p.InfoMetric(),
"kube_pod_container_resource_limits_cpu_cores": p.Metric("cpu.limit.cores"),
"kube_pod_container_resource_requests_cpu_cores": p.Metric("cpu.request.cores"),
"kube_pod_container_resource_limits_memory_bytes": p.Metric("memory.limit.bytes"),
"kube_pod_container_resource_requests_memory_bytes": p.Metric("memory.request.bytes"),
"kube_pod_container_status_ready": p.BooleanMetric("status.ready"),
"kube_pod_container_status_restarts": p.Metric("status.restarts"),
"kube_pod_container_status_restarts_total": p.Metric("status.restarts"),
"kube_pod_container_status_running": p.KeywordMetric("status.phase", "running"),
"kube_pod_container_status_terminated": p.KeywordMetric("status.phase", "terminated"),
"kube_pod_container_status_waiting": p.KeywordMetric("status.phase", "waiting"),
"kube_pod_container_status_terminated_reason": p.LabelMetric("status.reason", "reason"),
"kube_pod_container_status_waiting_reason": p.LabelMetric("status.reason", "reason"),
},
Labels: map[string]p.LabelMap{
"pod": p.KeyLabel(mb.ModuleDataKey + ".pod.name"),
"container": p.KeyLabel("name"),
"namespace": p.KeyLabel(mb.ModuleDataKey + ".namespace"),
"node": p.Label(mb.ModuleDataKey + ".node.name"),
"container_id": p.Label("id"),
"image": p.Label("image"),

Would this change mean that we need to define labels for every single p.Metric there? It could be useful to explore how this one would end up looking

@odacremolbap
Copy link
Contributor Author

odacremolbap commented Nov 19, 2019

It would move labels inside metrics, so some caveats:

  1. yes, it would not be one liners, no more one line for metrics but a structure. That's uglier than what we have, but that's the price to pay to accurately represent the metrics/label relation at prometheus. I haven't found a better way to put it together. If we move forward, I'll take care that the final mapping blocks do not look ugly.

  2. keylabels are based on the prometheus label name. This would no longer be the case for keylabels identified as the proposal suggest. We need to take into account the mapped path for the label and use that string as part of the key to group events. Given the example above, label label1_for_metric exists at both metrics, but needs to be considered as a different entity at each one, if it were a keylabel, the event path location should be used.

  3. Although it would not be needed to map all labels (previous label definitions would continue to exists) for maintainability I would keep all labels mapped. Sometimes it is hard to track which label is being used by which metric family, keeping labels mapped inside metrics would map exactly what we see at prometheus resources. This also means some more memory, but I would say < 5K

As done before with the filters, I can move on and work on a temptative implementation and see how this would look.

Alternative

A very ugly one, when finding these reuse of family scoped labels, decide to use only one of them and discard the less important metric.

@exekias
Copy link
Contributor

exekias commented Nov 19, 2019

Sounds like having a definition like that per metric family (or arbitrary metric name prefix) could be a good compromise? That way you can have a definition like the one I linked for each family. WDYT?

@odacremolbap
Copy link
Contributor Author

not sure if I follow
At the example #14306 (comment) all families share the same label mapping.

There are ways to make the proposal look like that, defining labels somewhere else and then referencing them from the metric definition, making it one liner.

I'd suggest discussing out of band if that suits you fine

@odacremolbap
Copy link
Contributor Author

discarding the proposal for now.
LabelMetric should fit the case I mentioned above.

Will bring the proposal back if needed later

@odacremolbap
Copy link
Contributor Author

hitting another scenario

kube_pod_init_container_info and kube_pod_container_info metrics, both inform the same labels:

  • container
  • pod
  • namespace
  • image
  • image_id
  • container_id

I'm considering 3 first as primary keys (keyLabels as of the prometheus helper at metricbeat).
They won't mashup because containers are unique, but we should mark init containers as init_container: true at the event.

I don't think there is currently a way of doing this, is there?

If not, I guess the non disruptive solution would be to add a param at the ExtendedInfoMetric created at this same PR to add extra fields along with the family labels.

@exekias agree?

@odacremolbap
Copy link
Contributor Author

@exekias please check 895dd2c

that would allow us to add fields per family when needed.

@odacremolbap
Copy link
Contributor Author

another scenario:

kube_pod_container_resource_requests{namespace="kube-system",pod="coredns-5644d7b6d9-2gbs2",container="coredns",node="minikube",resource="cpu",unit="core"} 0.1
kube_pod_container_resource_requests{namespace="kube-system",pod="coredns-5644d7b6d9-2gbs2",container="coredns",node="minikube",resource="memory",unit="byte"} 7.340032e+07

this family is used to show memory and cpu depending on the resource label. Again, some extensions can be added to current prometheus helper, but this feels like adding too many exceptions to the current design.

@exekias
Copy link
Contributor

exekias commented Nov 21, 2019

I'm considering 3 first as primary keys (keyLabels as of the prometheus helper at metricbeat).
They won't mashup because containers are unique, but we should mark init containers as init_container: true at the event.

I don't think there is currently a way of doing this, is there?

Where do you get the info about a container being init container?

@odacremolbap
Copy link
Contributor Author

I don't think there is currently a way of doing this, is there?

Where do you get the info about a container being init container?

that's upon the metric:

kube_pod_container_info
kube_pod_init_container_info

see #14306 (comment)

@odacremolbap
Copy link
Contributor Author

halting work on this. Read here #14682 (comment)

We can probably follow the same path we planned, but gaining some extra experience by creating a number of separate metricsets for each kube-state-metrics group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containers Related to containers use case in progress Pull request is currently in progress. Metricbeat Metricbeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add compatibility notes about kube-state-metrics in Kubernetes module Improve Kubernetes module testing code
2 participants