Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Every time ensure clause runs even when program terminated with exit #1935

Closed
makenowjust opened this issue Dec 8, 2015 · 15 comments
Closed

Comments

@makenowjust
Copy link
Contributor

In Python and Ruby, see following programs:

import sys

try:
  sys.exit()
finally:
  print("Exit!")
begin
  exit
ensure
  puts "Exit!"
end

while program terminated with exit function/method, both programs display "Exit!". It means we expect ensure clause (also called finally clause) runs every time.
On top of that, it's interested, both Python and Ruby implement exit by throwing SystemExit exception. SystemExit is one of special exception, which is not caught by catch clause usually. So if program catches SystemExit exception, program is not terminated. To terminate program immediately, Ruby has exit! method, and Python has os._exit too. These method/functions are dangerous.

I suggested spec that is such a following:

  1. Add SystemExit class
  2. exit method raises a SystemExit, but catch clause does not catch this object normally.
  3. In main function, it catches SystemExit object and decides a return code.

and discussion points are:

  1. What type hierarchy does SystemExit class have? Should does SystemExit inherit Exception?
  2. Should do we add Ruby's exit! like method to Crystal? How is difference between exit! and LibC.exit? Or, is exit! alias to LibC.exit?
  3. Should does catch clause have the ability to catch SystemExit? But we finally catch SystemExit in main function.
  4. I think Fiber has same problem to run ensure clause. If program was terminated, should we run ensure clause by sending SystemExit to all fibers?
  5. At all, should does ensure clause run every time? For example in Java, it cannot runs finally clause when program terminated by System.exit. (I believe not, but it exists for fair discussion.)

(This issue is connected with #1921)

References:

Python's sys.exit: https://docs.python.org/3/library/sys.html#sys.exit
Ruby's exit: http://ruby-doc.org/core-2.1.1/Kernel.html#method-i-exit
Java's System.exit: https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#exit-int-

@asterite
Copy link
Member

asterite commented Dec 8, 2015

You are right. But what's wrong with at_exit? I would really like to keep things simple. Having rescue catch some exceptions adds one more rule to the language, and having exit throw an exception sounds like unnecessary overhead.

@technorama
Copy link
Contributor

But what's wrong with at_exit?

First there is a consistency issue. When writing a program I expect every ensure block to run. They may include database shutdown, transaction commits or aborts, graceful or fast network disconnections, logging or other code that must run before exiting.

Consider an application running multiple http services + opening multiple key value databases. Any service failure should stop all other services. The databases should be gracefully closed last.

In the following example of any of the services call exit() the databases will not be closed with the current ensure behavior. Calling LibC.exit() may leave the databases in an inconsistent state forcing recovery on the next startup or corrupting the databases. Existing transactions or data not yet saved to disk may be lost.

open_dbs do
  spawn { http_service; stop }
  spawn { https_service; stop }
  spawn { debugging_http_service; stop }
  @stop_channel.receive
  # stop all other services 
end

def stop
  @stop_channel.send nil
end

def open_dbs
  LevelDb.open(...) do |user_db|
    LevelDb.open(...) do |other_db|
      yield
    end
  end
end

Should every key value database register an at_exit handler? How do you remove them when the database is closed? What about other databases (SQL)? Other network services? GUI shutdown handlers? Other? The normal way to handle this in other languages (including ruby) is to use ensure/finally.

@technorama
Copy link
Contributor

having exit throw an exception sounds like unnecessary overhead.

Most programs don't call exit. The few that do call them once. Overhead is negligible.

@makenowjust
Copy link
Contributor Author

Thanks @technorama. Your description is detailed and I think so too.

@asterite
Copy link
Member

I really can't imagine a long-running application that in between its lines calls exit. For example, doing an exit on an HTTP request, or maybe on a database failure. Exceptions communicate failures, exit has very few use cases.

Java works that way too, and I never heard anyone complaining about it. Go also works that way: "The program terminates immediately; deferred functions are not run.". And people are writing wonderful, robust services in Go.

I really don't think we should complicate the language and the exception hierarchy because of exit. The solution is simple: don't use exit in long-running applications. I personally only use exit for small scripts, mostly when you get options from ARGV and something's missing.

@technorama
Copy link
Contributor

exit(status_code) can be used to indicate status to the parent process. Raising an exception always returns 1. Using exit to indicate status doesn't necessarily mean cleanup routines (database or other) shouldn't run.

When you say exception hierarchy you mean a base exception class and a SystemExit/ExitException/Exception::Exit class? Two additional classes aren't really that complicated and both ruby and python have analogues.

An alternative I guess is providing a way to indicate exit status when raising an exception. raise Exception.new("foo", exit_status: 2). Wrapped/masked exceptions should probably preserve the exit status if those features are provided in the future.

@intentionaccident
Copy link

Have any decisions been made on this in some other discussion?

I have also been wondering about interrupts. In ruby an interrupt is processed as an exception and you can optionally rescue it or ensure cleanup code execution. With crystal your only recourse it to trap the interrupt explicitly and figure out what needs to be cleaned up inside the trap.

This behaviour also lets ruby's at_exit callbacks fire if a program was interrupted without having to trap anything.

@straight-shoota
Copy link
Member

straight-shoota commented Apr 3, 2018

@technorama If you need to exit with a specific status code and ensure to run some tasks, you can use an application-specific termination method which calls cleanup handlers and before calling exit.
This won't work with ensure blocks but should fit for most use cases such as cleanly terminating open connections etc.
You could even implement a custom SystemExit exception and use it in the described way. Without special language support you just need to make sure to never rescue from this exception class.

But these are two ways to accomplish the desired behaviour with minimal effort. I don't think it makes sense to add additional overhead to the language for a feature with pretty rare use case.

@technorama
Copy link
Contributor

@straight-shoota What overhead could be relevant for an abnormal code path? Where are the benchmarks to prove running ensure and exception handlers add any significant amount of runtime overhead especially considering the normal program path has the exact same overhead?

Code overhead is 2 empty classes, a rescue and a small change to exit().

What overhead are you speaking of?

The current behavior violates the principle of least surprise. The behavior of ruby/python is what most programmers both want and expect especially when designing libraries.

Some examples where this falls apart (using partial/simplified code):

Signal.trap 'TERM' { exit 0 }

class Server
  def loop
    NewRelic.measure do
      TempFile.open("foo.tmp") do |tempfile|
        InMemoryDb.open do |dbfile|
          loop { handle_requests }
        end
    end
  ensure
    log "Server exiting"
  end
end

In the example the following surprises happen:

  • Tempfiles are left around possibly permanently if the programmer does not take steps to clean them up.
  • The programmer may be unaware that temp files are used of a library uses tempfiles so they never know there is anything to clean up leaving junk for sysadmins or other systems to clean up.
  • The database is left in an unclean state possibly losing data or requiring additional recovery methods on the next start.
  • NewRelic or other performance measurement services don't receive the last measurements before the service is stopped.
  • There is no log message that the server exited cleanly leaving the sysadmins wondering wtf happened.
  • Normal usage patterns from most other languages from c/ruby/python can't be used.
  • There is no standard way of doing it including for libraries. Every program must handle exit using their own custom way AND be aware of of every library using Tempfiles, logging, external services, in memory databases and/or possibly other things including having perfect knowledge of any libraries using libraries and how exit() effects them.
  • Lingering bugs exist in many programs using exit() that are hard to track down.

By the time these programs make it to the ops team they have no idea about how to handle the subtle failures/inconsistencies. They just tend to hate the program for failing in unexpected ways and advocate it's disuse.

Ask me sometime about a program that used to take over 16 hours to rebuild it's indices if they were out of sync from the main db file.

Libraries:

class MmappedFileDb or BufferedInMemoryDb or BerkeleyDb
  def self.open
     db = self.new
  ensure
    db.close
  end

  def initialize
    open file...
  end

  def close
    flush writes...
    close temporary files...
    delete temporary files...
    update wal...
  end

end

You could remove exit() and only provide an exit!() with warnings to never call it unless you know what you doing, have written or read every library used in the program yourself and understand all the implications of skipping ensure/rescue blocks within the particular program.

Not providing a standardized set of exceptions and reliable ensure clauses is setting the stage for subtle bugs that every programmer will handle differently. The following should demonstrate why an exception hierarchy is necessary for writing robust code (ruby/python figured this out ages ago).

# could be a local file based DB and not a remote SQL db that will fail gracefully when the connection closes.
class DB 
    # this must treat every abnormal code path as an incomplete transaction
    def transaction
      ...
    rescue AllExceptions => ex
      rollback
      raise ex
    end
end

class Server
  # log failures and keep running.  most abnormal paths are safe to log and ignore.
  def loop
    loop do
    rescue StandardException
      log "exception occured"
    end
  # should always run regardless of exit method.
  # may be cleaning up temp, pid files or other.
  ensure
    cleanup
  end
end

There's still the issue of attempting to exit with a non 0 or 1 value and providing a safe way to do so where all library/program authors expectations are met.

raise SystemExit(exit_code: 255) could be used as an alternative but this has other problems without a standardized exception hierarchy.

Code overhead is 1-2 empty classes, a rescue and a small change to exit(). Really, what's the big deal?

@straight-shoota
Copy link
Member

Well, another use case I stumbled upon is in specs where you need to do some cleanup like deleting a tempfile and you want it to clean up even if the spec itself is aborted (Strg+C). But I don't think the proposed solution could actually make this work as SystemException can't just be raise in the main fiber from the trap handler. You'd need a dedicated cleanup handler that can be called from an at_exit hook or Spec.abort!.

@RX14
Copy link
Member

RX14 commented Apr 6, 2018

Being able to "catch" an exit massivle violates the priciple of least surprise to me. People coming from ruby and python will expect an exit exception, people coming from any other language (java, C#, C++, C, Go, Rust, I could go on) this is absolutely not what they expect. Because having an "exit exception" is asking for the impossible. It's not reasonable to expect to be able to clean up in every situation. Raise can fail, you can segfault, you can get kill -9'd. The only robust way to write programs is to expect the power plug to be yanked out of the computer at any moment.

@RX14
Copy link
Member

RX14 commented Apr 6, 2018

I strongly vote close.

@intentionaccident
Copy link

intentionaccident commented Apr 7, 2018

There's always going to be cases where the program terminates and cannot clean up, that doesn't mean we shouldn't create mechanisms for clean up. Additionally those Java and C# programmers might not expect an exit exception, but they won't see one either, since the ExitException or SignalException would have to be explicitly rescued by type or with rescue Exception, which is expected to be bad practice in every language.

As for the threading problem with signals, it is possible to raise from the real signal handler, not the spawned one listening on the loopback pipe. This behaviour can be used to send a signal interrupt exception through to the currently executing Fiber, and with a bit more work you can send that same exception to every currently executing Fiber when they are resumed.

This actually allows a single fibered app or long running fibers to be properly interrupted by an incoming signal instead of having to explicitly yield, in the case that you have registered a signal handler.

The idea would be to create default signal handlers for signals that have default exit behaviour (SIGQUIT, SIGINT). This default handler would send the signal into the loopback pipe and then raise a SignalException for that signal. The currently executing fiber then gets to respond to that signal and if the exception is not rescued the fiber terminates. Eventually the libevent loop fiber is reached and reads the signal on the loopback pipe. It looks at the currently running fibers and schedules an interrupt for each of them. When that fiber is resumed it looks at it's interrupt queue and raises any interrupt from there. This way all fibers get to respond to the interrupt.

In order to not affect current developers relying on trap functionality, trap would have an extra parameter 'raises' that would determine if the real handler raised an exception or not and by default it would not. This would mean that users who wrote code with no traps would get similar behaviour (their program would exit due to exception instead of kernel termination), while users who used traps would get the same behaviour (their program would ignore the interrupt and the kernel would run their handler instead) unless they changed it to also raise by their choice.

There are issues with receiving multiple signals, but it's still better than hard exiting. You can write your program to be robust on start up and shutdown.

@RX14
Copy link
Member

RX14 commented Apr 7, 2018

@strigonLeader kill -9 doesn't give you a handleable signal, it just instantly kills your program without any warning. Implying that kill -9 isn't common or won't happen to your app and that you can actually "clean up" just isn't a good idea. You should write your program to clean up it's mess when it starts back up.

Exit exceptions are just incredibly leaky abstractions which quite rightly haven't been picked up by any other language.

@ysbaddaden
Copy link
Contributor

Exit means that the program must stop, otherwise you'd raise an exception, so ensure blocks are called, and the exception be handled (even if just to exit cleanly).

We can run cleanup code using at_exit blocks. This is a best effort thought, since SIGKILL and some other signals can't be handled: a program must be prepared to leak some temp files or UNIX sockets.

We could tell the scheduler to raise an exception when a fiber tries to stop/resume, but what if there are a hundred thousand fibers? Shall we raise a hundred thousand exceptions every time the program exits?

I don't think the improvement is worth the complexity. I'm not even sure that would be an improvement.

I'm also in favour of closing. I don't think we'll implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants