-
-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Second Derivatives for User Defined Functions #1198
Comments
I'd be curious if you could point to one or a couple examples in ML/Statistics where there's a significant benefit from using methods that require second-order derivatives over first-order methods that JuMP already supports with user-defined functions. This would help justifying the implementation effort. (But either way, I don't expect to spend time on this in the next few months given everything else going on with JuMP development.) |
Thanks for your prompt response, and thanks for creating this amazing package. Virtually all forms of machine learning and statistics involve choosing an optimal parameter theta such that an in-sample or out-of-sample loss function L(y,f(x,theta)) is minimized, where x is data being used to model or predict y. The predictive/modeling function f(x,theta) can be very complex to articulate because it is often designed either to be very flexible or to reflect the rules for a true data generating process. One example is neural-nets, which use a f(x,theta) that is the iterated composition of many linear and non-linear functions as a flexible predictive function. In biostatistics, the data generating process f(x,theta) often requires computing the evolution of a complex system. In economics, f(x,theta) can involve computing equilibrium strategies for the firms or individuals that generated the data (this may even require solving for a fixed point of a multi-function or nested optimization problems). I'm also happy to provide specific references to examples in some of these fields. |
As for why one needs second-order methods: in some cases, L(y,f(x,theta)) may have low dimension (in theta) but extremely long time to compute (potentially on the order of minutes, hours, or even longer). In such cases, second order methods might help reduce the number of function evaluations required. Additionally, it is not uncommon for L(y,f(x,theta)) to have substantial cross-partials in theta, meaning that exact second order methods will do much better at finding the true argmin than Hessian approximations. |
Just want to second @UserQuestions here - there are a number of cases in Economics (at the least) that require two-step optimization where the inner ("first" step) requires solving contraction mappings, etc. and may not be feasibly described in JuMP syntax - but the optimization of which faces serious improvements from being able to use the second derivative.. |
more than two years after, but I want to third @UserQuestions . It would be a game changer if JuMP could do optimization with Hessian, and especially if the Hessian was a sparse array. CasADi, a second order optimizer with autodiff based on Ipopt is the only thing that keeps me in Python.... |
Here's an example from Discourse where Ipopt failed to converge without the second-order information on the problem as formulated by the user: https://discourse.julialang.org/t/nonlinear-objective-function-splatting/51251 (However, it could be reformulated and solved with first-order information only.) |
For all of you curious about this issue, one can pass the function, gradient and hessian directly to Ipopt.jl, without using JUMP and the MathOptInterface. The documentation on how to do it is somewhat hidden but can be found In my tests, one could use AD tools in the definition of the functions (such as ForwardDiff.jl or Zygote.jl) or use the ModelingToolkit.jl to compile the gradient and hessian. I think also that using ComponentArrays.jl could be use, which would make the definition of the functions easier, but I have not tested it. I personally fail to understand why developers continue pushing MathOptInterface for nonlinear problems. It is true that it does a great job for convex problems, but for nonlinear problems it falls so much behind of what one needs, not being able to use vectors and not being able to pass the hessian, that I would classify it as experimental at this point. |
We're aware of the current NLP limitations of MOI: jump-dev/MathOptInterface.jl#846 We encourage people to use JuMP because many users already know the JuMP syntax, but they won't know about the specifics of computing gradients and hessians. If you have specific needs that JuMP isn't meeting, it probably isn't the right tool for the job, and you should consider other options: |
I think there's plenty of agreement on what the areas for improvement are for JuMP and MOI with regards to nonlinear optimization.
I'm not sure what you mean by "pushing". We all want the state of the art to improve and are more than happy to point people to the best tools for the job. For example, I've been advocating for a CasADi interface in Julia since 2014: casadi/casadi#1105. |
It would be extremely helpful for JuMP to support second derivatives for user-defined functions. Ideally this could be done as efficiently as ReverseDiffSparse, but even just calling ForwardDiff.Hessian! would be a helpful option. There are a broad class of problems that require optimizing functions that do not easily translate into the typical JuMP syntax (especially in ML/Statistics), so having JuMP able to handle such cases would be a huge benefit to people working on those problems and will greatly expand the number of potential JuMP users.
The text was updated successfully, but these errors were encountered: