-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reforking steep #1492
base: master
Are you sure you want to change the base?
Reforking steep #1492
Conversation
@@ -7,7 +7,7 @@ PATH | |||
csv (>= 3.0.9) | |||
fileutils (>= 1.1.0) | |||
json (>= 2.1.0) | |||
language_server-protocol (>= 3.15, < 4.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pocke ! Great work!! 👏 I'm wondering if it's necessary to fork type check workers from the main process at the first time, not forking from the primary worker from the first time. It seems like there is (almost) no benefit to do so because the memory shared cannot be increased. So, wondering if changing the architecture to fork from the primary process from the first time makes the code simpler. |
@@ -124,6 +124,18 @@ module Steep | |||
|
|||
def self.response: (String id, result) -> untyped | |||
end | |||
|
|||
module Refork |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to explain what the request is doing, who sends the request, ...?
This PR adds an experimental feature, reforking Steep.
Background
This feature focuses on reducing memory usage.
Steep consumes too much memory because it launches as many processes as there are CPUs.
A single process uses a lot of memory. For example, in a middle-size Rails application, one Steep process consumes ~1.5 GB of memory.
About reforking
The Refork feature solves this problem. I borrowed this idea from Puma. https://github.com/puma/puma/blob/master/docs/fork_worker.md
It works like the following:
steep langserver
startsIn the first step, these workers share limited memories. The master process and worker processes share class objects, and so on. But they do not share objects for type checking.
In the third step, after reforking, these workers share more memories. Because the new workers forked from a worker. They share objects for type-checking also.
Implementation on Steep
I implemented this feature as
--refork
option ofsteep langserver
command.The reforking architecture is suitable for long-lived processes. So it's disabled on other commands such as
steep check
.This option is disabled by default and marked as experimental.
I'd like to enable it by default in the future, but currently we need more investigation for the memory reduction.
Benchmarking
I benchmarked with this script. https://gist.github.com/pocke/ebce89b7341525b5c013cf3a6fa47721
It runs type checking, and execute
free -h
command to check memory usage.I executed it on a docker container. In this test, the reforking feature reduce 1.9GB memory.
Implementation Details
How to share IO
Steep shares IO between the master and workers to read and write LSP messages. For ordinary workers, the IO is created by
IO.pipe
and shared withfork
.But we can't share the IO with
fork
for the reforked workers because the workers are not direct children of the master.To share the IO, I use a UNIX socket created by
socketpair(2)
.UNIX socket allows you to send and receive IO. The master send IO for communication between the master and a grandchild worker to the primary worker via the socket, then the primary worker passes the IO with
fork
to the grandchild worker.Alternative approach
I considered the following approach also.
~/.steep/pipes/#{master-pid}-#{worker_index}
to communicate between the master and the grandchild.socketpair(2)
is more secure.How to monitor the grandchild process
I needed to change how to monitor the worker processes. Because
waitpid(2)
allows monitoring only for the direct child processes, but Steep forks workers from the primary worker. So the master process needs to monitor grandchild processes.I changed the primary worker to monitor the reforked workers.
The primary worker traps
SIGCHLD
signal. This signal is triggered by any child process exits.The signal handler checks all exited processes by
waitpid(2)
withWNOHANG
. If the exited process is one of the workers, the primary worker process dies.https://github.com/soutaro/steep/pull/1492/files#diff-0de0797cd551532b3e8ae27d86e3d2a768f05dba817d6ef009f401d1c30626cbR69-R75
After refork, the master process only monitors the primary worker process. When a grandchild worker process unexpectedly exits, the master recognizes it as the primary worker's exit.