-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
App list UI gets really slow when there is an RC with many failing pods #443
Comments
What if there will be 2 pods with different errors? |
Dont we already display "and possibly others"? Which means we trim the list to one? |
We actually do not. Only actual error messages are displayed. Adding
|
Can we get events in one call instead of N calls? This should work according to the API: http://kubernetes.io/third_party/swagger-ui/#!/api%2Fv1/listNamespacedEvent Am I correct? |
There is TODO in code for this I think. Edit: Ye, there is :)
|
All right. |
I've already working solution for that. Just need to add some tests. UI speed with many failed pods increased significantly. |
So the steps are:
But if I enter a valid image, then the cluster would have much bigger problems, right? In general the pod count should be limited to much smaller numbers, I think |
@cheld We cannot limit pod count if core doesn't do that. IMO we can display confirmation dialog, i.e. |
The only thing that we can do is display a warning asking if user is sure that he wants to create App with so many pods. There may be some users with big clusters that can handle even 10000 pods. For 1000 nodes cluster this is 10 pods per node. |
We know that kubernetes supports currently 30 pods per node. This could be starting point. We could make pop-up (similar to delete) or add a red message to the learn more text on the right side |
@bryk helped mi with the investigation of this bug and we've established that finding the right solution for this issue may take some time and there are more urgent bugs that need fixing now. My current solution was based on creating pod selector from the list of pods and it turns out that it was wrong as pod selector is based on Using the If there will be time before friday then I'll take a look at this, otherwise this should be postponed after 1.0. If anyone wants to take a look at that then here is my current solution: |
Looking on the implementation of the |
I'm going through the k8s API and some issues, and it appears that there is no direct way of getting events based on list of objects. Closest solution I've found is based on these issues: kubernetes/kubernetes#3295 This would require grouping list of pods by matching label selector and in worst case scenario still we would have N api calls but normally we are retrieving pod events from single replication controller so there would be only single API call based on pod label selector. It looks like it's still under development so we have to wait I think. @bryk what do you think? |
So you say that the only way to really fix this is to do this in apiserver? How about finally start using goroutines and parallelize all API calls that we do? This would speed up things on our side a lot, I expect. It does not solve the real problem of N API calls, but just speeds up the UI. Would you mind taking a look on this? It should be a nice piece of engineering work. |
Sounds interesting. I'll put in on my list. I'm sure I'll have some questions before starting the implementation but first I have to investigate this more. |
For now I'll prepare only concurrent API calls for events. What I've in mind is to use goroutine with channel for simple function that calls API and push results to channel then only get them with select, merge events into 1 list and return. This should be a good place to start: https://github.com/kubernetes/dashboard/blob/master/src/app/backend/events.go#L160 There is also possibility to process concurrently API calls to our backend with some worker and dispatcher based on goroutines. This would only work for pure POST request and for now there are not many of them, but we can keep that in mind because it may be needed at some point. |
Sounds perfect. |
Fixed |
Try deploying app with 10000 replicas.
This is because we check for errors for every pod. We could stop doing this after first error is found.
cc @floreks
The text was updated successfully, but these errors were encountered: