App list UI gets really slow when there is an RC with many failing pods #443

bryk · 2016-02-25T13:51:39Z

Try deploying app with 10000 replicas.

This is because we check for errors for every pod. We could stop doing this after first error is found.

cc @floreks

floreks · 2016-02-25T13:53:22Z

What if there will be 2 pods with different errors?

bryk · 2016-02-25T13:56:15Z

Dont we already display "and possibly others"? Which means we trim the list to one?

floreks · 2016-02-25T13:58:32Z

We actually do not. Only actual error messages are displayed. Adding and possibly others suffix does not fit every error msg. We would end with:

pod (frontend-p5mjc) failed to fit in any node fit failure on node (127.0.0.1): Node didn't have enough resource: CPU, requested: 1000, used: 7620, capacity: 8000 and possibly others or

Failed to pull image "longname-longname-longname-longname-longname-longname": Error: image library/longname-longname-longname-longname-longname-longname not found and possibly others

bryk · 2016-02-25T14:02:32Z

Can we get events in one call instead of N calls? This should work according to the API: http://kubernetes.io/third_party/swagger-ui/#!/api%2Fv1/listNamespacedEvent

Am I correct?

floreks · 2016-02-25T14:05:17Z

There is TODO in code for this I think. getEvents call should be changed to get events in 1 call.

Edit: Ye, there is :)

// TODO(floreks): refactor this to make single API call instead of N api calls

bryk · 2016-02-25T14:12:43Z

All right.

floreks · 2016-02-25T15:08:44Z

I've already working solution for that. Just need to add some tests. UI speed with many failed pods increased significantly.

cheld · 2016-02-26T08:46:18Z

So the steps are:

Go to deploy page
Enter app data
enter invalid image
enter 10000 as pod count

But if I enter a valid image, then the cluster would have much bigger problems, right?

In general the pod count should be limited to much smaller numbers, I think

maciaszczykm · 2016-02-26T08:57:06Z

@cheld We cannot limit pod count if core doesn't do that. IMO we can display confirmation dialog, i.e. Are you sure? It can slow down your cluster (just a template) instead.

floreks · 2016-02-26T08:57:26Z

The only thing that we can do is display a warning asking if user is sure that he wants to create App with so many pods. There may be some users with big clusters that can handle even 10000 pods. For 1000 nodes cluster this is 10 pods per node.

cheld · 2016-02-26T09:40:42Z

We know that kubernetes supports currently 30 pods per node. This could be starting point.

We could make pop-up (similar to delete) or add a red message to the learn more text on the right side

floreks · 2016-03-01T08:36:44Z

@bryk helped mi with the investigation of this bug and we've established that finding the right solution for this issue may take some time and there are more urgent bugs that need fixing now.

My current solution was based on creating pod selector from the list of pods and it turns out that it was wrong as pod selector is based on AND condition not OR.

Using the EventInterface Search method is possible candidate to use in the fix.
https://github.com/kubernetes/kubernetes/blob/master/pkg/client/unversioned/events.go#L42

If there will be time before friday then I'll take a look at this, otherwise this should be postponed after 1.0.

If anyone wants to take a look at that then here is my current solution:
https://github.com/floreks/dashboard/tree/events-one-call

floreks · 2016-03-15T09:27:10Z

Looking on the implementation of the EventInterface Search function https://github.com/kubernetes/kubernetes/blob/master/pkg/client/unversioned/events.go#L153
I'd say that we can not use that. It can get only events related to single object. It may be a little faster but doesn't resolve issue with making single API call to get events related to list of pods.

floreks · 2016-03-15T10:34:59Z

I'm going through the k8s API and some issues, and it appears that there is no direct way of getting events based on list of objects. Closest solution I've found is based on these issues:

kubernetes/kubernetes#3295
kubernetes/kubernetes#5548
kubernetes/kubernetes#4817

This would require grouping list of pods by matching label selector and in worst case scenario still we would have N api calls but normally we are retrieving pod events from single replication controller so there would be only single API call based on pod label selector. It looks like it's still under development so we have to wait I think.

@bryk what do you think?

bryk · 2016-03-18T13:18:49Z

So you say that the only way to really fix this is to do this in apiserver?

How about finally start using goroutines and parallelize all API calls that we do? This would speed up things on our side a lot, I expect. It does not solve the real problem of N API calls, but just speeds up the UI. Would you mind taking a look on this? It should be a nice piece of engineering work.

floreks · 2016-03-18T13:23:56Z

Sounds interesting. I'll put in on my list. I'm sure I'll have some questions before starting the implementation but first I have to investigate this more.

floreks · 2016-03-22T13:59:39Z

For now I'll prepare only concurrent API calls for events. What I've in mind is to use goroutine with channel for simple function that calls API and push results to channel then only get them with select, merge events into 1 list and return.

This should be a good place to start: https://github.com/kubernetes/dashboard/blob/master/src/app/backend/events.go#L160

There is also possibility to process concurrently API calls to our backend with some worker and dispatcher based on goroutines. This would only work for pure POST request and for now there are not many of them, but we can keep that in mind because it may be needed at some point.

bryk · 2016-03-22T14:21:54Z

Sounds perfect.

floreks · 2016-03-31T12:23:47Z

Fixed

bryk added this to the Post v1.0 milestone Mar 1, 2016

maciaszczykm added enhancement labels Mar 8, 2016

maciaszczykm assigned floreks Mar 8, 2016

maciaszczykm removed the area/usability label Mar 9, 2016

floreks mentioned this issue Mar 29, 2016

Make single API call to get events and filter them on backend #587

Merged

floreks closed this as completed Mar 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

App list UI gets really slow when there is an RC with many failing pods #443

App list UI gets really slow when there is an RC with many failing pods #443

bryk commented Feb 25, 2016

floreks commented Feb 25, 2016

bryk commented Feb 25, 2016

floreks commented Feb 25, 2016

bryk commented Feb 25, 2016

floreks commented Feb 25, 2016

bryk commented Feb 25, 2016

floreks commented Feb 25, 2016

cheld commented Feb 26, 2016

maciaszczykm commented Feb 26, 2016

floreks commented Feb 26, 2016

cheld commented Feb 26, 2016

floreks commented Mar 1, 2016

floreks commented Mar 15, 2016

floreks commented Mar 15, 2016

bryk commented Mar 18, 2016

floreks commented Mar 18, 2016

floreks commented Mar 22, 2016

bryk commented Mar 22, 2016

floreks commented Mar 31, 2016

App list UI gets really slow when there is an RC with many failing pods #443

App list UI gets really slow when there is an RC with many failing pods #443

Comments

bryk commented Feb 25, 2016

floreks commented Feb 25, 2016

bryk commented Feb 25, 2016

floreks commented Feb 25, 2016

bryk commented Feb 25, 2016

floreks commented Feb 25, 2016

bryk commented Feb 25, 2016

floreks commented Feb 25, 2016

cheld commented Feb 26, 2016

maciaszczykm commented Feb 26, 2016

floreks commented Feb 26, 2016

cheld commented Feb 26, 2016

floreks commented Mar 1, 2016

floreks commented Mar 15, 2016

floreks commented Mar 15, 2016

bryk commented Mar 18, 2016

floreks commented Mar 18, 2016

floreks commented Mar 22, 2016

bryk commented Mar 22, 2016

floreks commented Mar 31, 2016