Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App list UI gets really slow when there is an RC with many failing pods #443

Closed
bryk opened this issue Feb 25, 2016 · 19 comments
Closed

App list UI gets really slow when there is an RC with many failing pods #443

bryk opened this issue Feb 25, 2016 · 19 comments
Assignees

Comments

@bryk
Copy link
Contributor

bryk commented Feb 25, 2016

Try deploying app with 10000 replicas.

This is because we check for errors for every pod. We could stop doing this after first error is found.

cc @floreks

@floreks
Copy link
Member

floreks commented Feb 25, 2016

What if there will be 2 pods with different errors?

@bryk
Copy link
Contributor Author

bryk commented Feb 25, 2016

Dont we already display "and possibly others"? Which means we trim the list to one?

@floreks
Copy link
Member

floreks commented Feb 25, 2016

We actually do not. Only actual error messages are displayed. Adding and possibly others suffix does not fit every error msg. We would end with:

pod (frontend-p5mjc) failed to fit in any node fit failure on node (127.0.0.1): Node didn't have enough resource: CPU, requested: 1000, used: 7620, capacity: 8000 and possibly others or

Failed to pull image "longname-longname-longname-longname-longname-longname": Error: image library/longname-longname-longname-longname-longname-longname not found and possibly others

@bryk
Copy link
Contributor Author

bryk commented Feb 25, 2016

Can we get events in one call instead of N calls? This should work according to the API: http://kubernetes.io/third_party/swagger-ui/#!/api%2Fv1/listNamespacedEvent

Am I correct?

@floreks
Copy link
Member

floreks commented Feb 25, 2016

There is TODO in code for this I think. getEvents call should be changed to get events in 1 call.

Edit: Ye, there is :)

// TODO(floreks): refactor this to make single API call instead of N api calls

@bryk
Copy link
Contributor Author

bryk commented Feb 25, 2016

All right.

@floreks
Copy link
Member

floreks commented Feb 25, 2016

I've already working solution for that. Just need to add some tests. UI speed with many failed pods increased significantly.

@cheld
Copy link
Contributor

cheld commented Feb 26, 2016

So the steps are:

  1. Go to deploy page
  2. Enter app data
  3. enter invalid image
  4. enter 10000 as pod count

But if I enter a valid image, then the cluster would have much bigger problems, right?

In general the pod count should be limited to much smaller numbers, I think

@maciaszczykm
Copy link
Member

@cheld We cannot limit pod count if core doesn't do that. IMO we can display confirmation dialog, i.e. Are you sure? It can slow down your cluster (just a template) instead.

@floreks
Copy link
Member

floreks commented Feb 26, 2016

The only thing that we can do is display a warning asking if user is sure that he wants to create App with so many pods. There may be some users with big clusters that can handle even 10000 pods. For 1000 nodes cluster this is 10 pods per node.

@cheld
Copy link
Contributor

cheld commented Feb 26, 2016

We know that kubernetes supports currently 30 pods per node. This could be starting point.

We could make pop-up (similar to delete) or add a red message to the learn more text on the right side

@floreks
Copy link
Member

floreks commented Mar 1, 2016

@bryk helped mi with the investigation of this bug and we've established that finding the right solution for this issue may take some time and there are more urgent bugs that need fixing now.

My current solution was based on creating pod selector from the list of pods and it turns out that it was wrong as pod selector is based on AND condition not OR.

Using the EventInterface Search method is possible candidate to use in the fix.
https://github.com/kubernetes/kubernetes/blob/master/pkg/client/unversioned/events.go#L42

If there will be time before friday then I'll take a look at this, otherwise this should be postponed after 1.0.

If anyone wants to take a look at that then here is my current solution:
https://github.com/floreks/dashboard/tree/events-one-call

@bryk bryk added this to the Post v1.0 milestone Mar 1, 2016
@floreks
Copy link
Member

floreks commented Mar 15, 2016

Looking on the implementation of the EventInterface Search function https://github.com/kubernetes/kubernetes/blob/master/pkg/client/unversioned/events.go#L153
I'd say that we can not use that. It can get only events related to single object. It may be a little faster but doesn't resolve issue with making single API call to get events related to list of pods.

@floreks
Copy link
Member

floreks commented Mar 15, 2016

I'm going through the k8s API and some issues, and it appears that there is no direct way of getting events based on list of objects. Closest solution I've found is based on these issues:

kubernetes/kubernetes#3295
kubernetes/kubernetes#5548
kubernetes/kubernetes#4817

This would require grouping list of pods by matching label selector and in worst case scenario still we would have N api calls but normally we are retrieving pod events from single replication controller so there would be only single API call based on pod label selector. It looks like it's still under development so we have to wait I think.

@bryk what do you think?

@bryk
Copy link
Contributor Author

bryk commented Mar 18, 2016

So you say that the only way to really fix this is to do this in apiserver?

How about finally start using goroutines and parallelize all API calls that we do? This would speed up things on our side a lot, I expect. It does not solve the real problem of N API calls, but just speeds up the UI. Would you mind taking a look on this? It should be a nice piece of engineering work.

@floreks
Copy link
Member

floreks commented Mar 18, 2016

Sounds interesting. I'll put in on my list. I'm sure I'll have some questions before starting the implementation but first I have to investigate this more.

@floreks
Copy link
Member

floreks commented Mar 22, 2016

For now I'll prepare only concurrent API calls for events. What I've in mind is to use goroutine with channel for simple function that calls API and push results to channel then only get them with select, merge events into 1 list and return.

This should be a good place to start: https://github.com/kubernetes/dashboard/blob/master/src/app/backend/events.go#L160

There is also possibility to process concurrently API calls to our backend with some worker and dispatcher based on goroutines. This would only work for pure POST request and for now there are not many of them, but we can keep that in mind because it may be needed at some point.

@bryk
Copy link
Contributor Author

bryk commented Mar 22, 2016

Sounds perfect.

@floreks
Copy link
Member

floreks commented Mar 31, 2016

Fixed

@floreks floreks closed this as completed Mar 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants