Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe wrong pre_hooks order when compiling katt #2682

Closed
tothlac opened this issue Feb 14, 2022 · 7 comments
Closed

Maybe wrong pre_hooks order when compiling katt #2682

tothlac opened this issue Feb 14, 2022 · 7 comments

Comments

@tothlac
Copy link
Contributor

tothlac commented Feb 14, 2022

I've created this repository: https://github.com/tothlac/try_katt
Using rebar3 3.18.0 I got this error:

rebar3 eunit
===> Performing EUnit tests...
F
Failures:

  1) try_katt_test:katt_test/0: module 'try_katt_test'
     Failure/Error: {error,undef,
                        [{katt_bluprint,module_info,[],[]},
                         {try_katt_test,katt_test,0,
                             [{file,
                                  "/Users/tothlac/otp/try_katt/test/try_katt_test.erl"},
                              {line,6}]},
                         {try_katt_test,katt_test,0,[]}]}
     Output:

Finished in 0.084 seconds
1 tests, 1 failures
===> Error running tests

The reason is that priv/compile-parser can't be called from pre_hooks anymore (in an older version of katt it was working with pre_hooks, and probably because of the same problem they had to change it in this commit: for-GET/katt@2bc3255)

For debugging I've modified katt's code a little bit so now priv/compie-parser is called twice (from pre and post hooks), and I'm printing out when the script is called, like this:

main(_) ->
    io:format("Compile katt_blueprint~n", []),
    DepsDir = os:getenv("REBAR_DEPS_DIR"),
    Neotoma =  DepsDir ++ "/neotoma/ebin",
    code:add_pathz(filename:absname(Neotoma)),
    try
        neotoma:file("priv/katt_blueprint.peg", [{output, "src/"}])
    catch _:Reason ->
        io:format("Could not compile katt_bluepring.peg. Reason: ~p~n", [Reason])
    end.

I see the following:

Compile katt_blueprint
Could not compile katt_bluepring.peg. Reason: undef
===> Analyzing applications...
===> Compiling mimerl
===> Compiling tdiff
===> Compiling neotoma
===> Compiling jsx
===> Compiling unicode_util_compat
===> Compiling ssl_verify_fun
===> Compiling parse_trans
===> Compiling metrics
===> Compiling idna
===> Compiling certifi
===> Compiling hackney
===> Compiling katt
Compile katt_blueprint

As you can see the pre_compile hook of katt is not compiled right before compiling katt, but it happens before compiling all dependencies. Because of this when the script is called by pre_hooks neotoma is not compiled, that's why we have:

Compile katt_blueprint
Could not compile katt_bluepring.peg. Reason: under

When the script is called from the post_hooks katt_blueprint.erl is successfully generated, but as compile had been already called katt_bluprint.beam won't be generated from the generated katt_blueprint.erl.

tothlac:try_katt find . -name katt_blueprint.erl
./_build/test/lib/katt/src/katt_blueprint.erl
tothlac:try_katt find . -name katt_blueprint.beam
tothlac:try_katt

Do you have any ideas how this issue could be fixed?

I've also created a katt ticket for this : for-GET/katt#84

@tothlac
Copy link
Contributor Author

tothlac commented Feb 14, 2022

If you try it with an older version of rebar3 the compile order is a lot different:

===> Compiling mimerl
===> Compiling tdiff
===> Compiling neotoma
===> Compiling jsx
===> Compiling unicode_util_compat
===> Compiling ssl_verify_fun
===> Compiling parse_trans
===> Compiling metrics
===> Compiling idna
===> Compiling certifi
===> Compiling hackney
===> Compiling katt
Compile katt_blueprint
Compile katt_blueprint
===> Compiling try_katt

As you can see with this version all dependencies of katt are compiled first, and only after that pre_hooks is called, then it compiles katt, and finally calls post_hooks.

With v3.18.0 it first calls pre_hooks, then compiles the dependencies, compiles katt, calls post_hooks. I'm not sure if this is the right order or the one implemented in 3.13.3.

@ferd
Copy link
Collaborator

ferd commented Feb 14, 2022

Yeah, this is because we massively reworked how the compiler makes things work. I think this might be related to the same sort of refactors that had happened in #2563 (comment) for some other code base you have.

I'm guessing that the undefined module here is neotoma, because all the hooks run before all the modules get compiled under the new model, so you can't get in the older mechanism's mode where each dep had its full beginning-to-end run completed as one.

There is likely no good fix for this, but you may want to consider using something like a neotoma plugin (https://github.com/tsloughter/rebar3_neotoma_plugin) which would ensure neotoma is built and available as part of the compilation cycle. There's a possibility this is also broken, but that would surprise me because we'd have broken tons of plugins by doing that if it were so.

@tothlac
Copy link
Contributor Author

tothlac commented Feb 15, 2022

It looks like it can be a workaround. Now neotoma is fetched twice though. One instance is in the plugins directory, the other one is a dependency. It would be also good to support optional parameters, because katt_blueprint.peg is stored under the priv folder in katt:

{rebar3_neotoma_plugin, [{src_dir, "priv"},
                                              {target_dir, "src"}]}.

, but anyway thanks, it looks like it will work, I'm just not entirely sure if the current compile order is good in rebar3. Now the pre_hooks of myapp is called before the dependencies of myapp are compiled, which is probably not right.

@ferd
Copy link
Collaborator

ferd commented Feb 15, 2022

It is right, we purposefully made that change. This was necessary to make builds correctly tracking all dependencies in a DAG and then breaking down the phases as in:

compile(State, Providers, Apps, Tag) ->
?DEBUG("Compile (~p)", [if Tag =:= undefined -> untagged; true -> Tag end]),
Apps1 = [prepare_compile(State, Providers, App) || App <- Apps],
Apps2 = [prepare_compilers(State, Providers, App) || App <- Apps1],
Apps3 = run_compilers(State, Providers, Apps2, Tag),
Apps4 = [finalize_compilers(State, Providers, App) || App <- Apps3],
Apps5 = [prepare_app_file(State, Providers, App) || App <- Apps4],
Apps6 = compile_app_files(State, Providers, Apps5),
Apps7 = [finalize_app_file(State, Providers, App) || App <- Apps6],
[finalize_compile(State, Providers, App) || App <- Apps7].

The thing though is that to order compilation phases properly, we must analyze the code. But to analyze the code, the source must be in place, which may be done by the pre-hooks. Therefore, to analyze the code, the pre-hooks must have all been run. So all of the pre-hooks now run across all apps, then we do an analysis phase on the source files, then run all the .yrl compilers, then all the .erl compilers, then do all the .app file building, etc. and end up running final post-hooks.

You can compare this with the older version which handled everything on a per-app basis: https://github.com/erlang/rebar3/blob/v3.13/src/rebar_prv_compile.erl#L158-L188

So we took that bit of undefined behaviour ("what people would do with transient build artifacts we know nothing about") and used it to do better graph analysis, which in large repos yielded speedups from 5x to 20x in build performance (see #2282), opened the door to parallelizing build runs further (which hadn't yielded significant enough improvements at the time to keep in but would be nice to revisit), and fixed all sorts of correctness bugs where app build order would conflict.

The plans for this were laid out back in January 2020 (#2200 (comment)). You can see the specific concern in this comment: #2200 (comment)

Essentially all the pre-hooks (aside from app_compile) are open before any other project app compilation takes place. I dubbed this the "prepare" part of compiling, where things are put in place for the actual compilation (including all hooks run) in what should be the same order as today (reverse alphabetical due to the topsort):

  1. pre-compile hook for app2
  2. pre-erlc_compile hook for app1

As long as the pre-hooks of a given project app don't depend on the post-hooks of another app, we should be fine, and that shouldn't happen much given paths we expose.

It appears your projects fell literally into that case I expected few people to rely on, and it took a couple of years for it to show up. The hook separation phase was literally step 1 of the whole refactor to allow per-app analysis and tracking (again, we can't track what modules exist in the code unless we've run the pre-hooks). We essentially had to choose whether to keep the pre-hooks working before any analysis in order to maintain functionality for projects where the hooks create source files (which made builds faster and more correct), or to choose to keep the hooks which relied on the ordering of apps having run their other hooks to work fine, which we did not know any app was relying on.

Unfortunately, your specific use cases relies on both at once (the hook lays out source files while relying on the run of other apps) and there is literally no way to support that without breaking something else or having changed nothing of the old (buggy) compiler behaviour.

@tothlac
Copy link
Contributor Author

tothlac commented Feb 15, 2022

It's ok, I can understand the reason why the compilation order has changed here, and fortunately, the rebar3_neotoma_plugin has completely solved the issue. Maybe would not it be a good thing to add something like {pre_compile_hooks, .....} option to rebar3 which is executed right before compilation after the dependencies has been built, like in older rebar versions? Maybe someone would like to use it like that (I think katt is used by a lot of people), or they should solve similar issues with plugins?

@ferd
Copy link
Collaborator

ferd commented Feb 15, 2022

These exist currently, but only in umbrella projects and when defined from the top-level: https://www.rebar3.org/docs/configuration/configuration/#hookable-points-in-providers

image

@tothlac
Copy link
Contributor Author

tothlac commented Feb 16, 2022

Ok, as I see using the rebar3_neotoma_plugin will fix the problem. We've even tried it on our internal repositories and fortunately it works. We can close the ticket now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants