Performance of the rebis-dev branch: Are we missing something obvious? #1242

triska · 2022-01-20T19:54:45Z

triska
Jan 20, 2022

Dear all,

let's consider a very simple DCG, describing the relation between a list and its reversal:

:- use_module(library(dcgs)).

rev([]) --> [].
rev([R|Rs]) --> rev(Rs), [R].

Sample use:

?- phrase(rev("abcd"), Ls).
   Ls = "dcba".

Now, using the newly available rebis-dev branch, let's perform this goal 100 000 times:

?- use_module(library(time)).
   true.
?- use_module(library(between)).
   true.
?- time((between(1,100_000,_),phrase(rev("abcd"),_),false)).
   % CPU time: 11.045s

The time of the additional between/3 goal is small in comparison, as can be seen from running it in isolation:

?- time((between(1,100_000,_),false)).
   % CPU time: 0.116s

So, rev//1 takes most of the time.

For comparison, let's consider an equivalent query in SWI-Prolog:

?- time((between(1,100_000,_),phrase(rev([a,b,c,d]),_),false)).
% 700,002 inferences, 0.081 CPU in 0.086 seconds (94% CPU, 8645623 Lips)
false.

This means that Scryer Prolog currently incurs a more than 135 fold overhead for this query, even with the rebis-dev branch.

I find this huge gap surprising, and unlikely to be resolvable by comparative micro-optimizations such as better WAM code. We can see the WAM instructions generated for rev//1:

?- use_module(library(diag)).
   true.
?- use_module(library(format)).
   true.
?- use_module(library(lists)).
   true.
?- wam_instructions(rev/3, Is),
   maplist(portray_clause, Is).
switch_on_term(1,external(1),external(2),external(8),fail).
try_me_else(6).
get_constant(level(shallow),[],x(1)).
put_value(x(2),1).
get_variable(x(1),2).
put_value(x(3),2).
execute(=,2).
trust_me(0).
allocate(3).
get_list(level(shallow),x(1)).
unify_variable(y(2)).
unify_variable(x(1)).
get_variable(y(1),3).
put_variable(y(3),3).
call(rev,3).
put_unsafe_value(3,1).
put_list(level(shallow),x(2)).
set_value(y(2)).
set_local_value(y(1)).
deallocate.
execute(=,2).

So, that's 21 instructions, and they look reasonable to me. For comparison, here are the WAM instructions generated by GNU Prolog, one of the fastest Prolog systems especially for such small benchmarks, and they look roughly the same:

    switch_on_term(1,2,fail,4,fail),
label(1),
    try_me_else(3),
label(2),
    get_nil(0),
    get_value(x(2),1),
    proceed,
label(3),
    trust_me_else_fail,
label(4),
    allocate(3),
    get_variable(y(1),2),
    get_list(0),
    unify_variable(y(0)),
    unify_variable(x(0)),
    put_variable(y(2),2),
    call(rev/3),
    put_unsafe_value(y(2),0),
    get_list(0),
    unify_value(y(0)),
    unify_local_value(y(1)),
    deallocate,
    proceed]).

A 100-fold improvement in runtime seems unlikely to stem from any particular optimization that affects any of these instructions in isolation, nor from replacing or avoiding one or even a few of these instructions. A 2- or 3-fold speedup, maybe, but not a 100-fold one.

I would greatly appreciate if someone who is interested in this topic could have a look at this specific sample program and its execution with the rebis-dev branch, and try to find out what is going on here. Is there anything that can be done to speed up the execution by such a huge margin, I mean aside from the planned improvements that are already mentioned in the README?

Thank you and all the best!
Markus

Answered by mthom

Jan 21, 2022

To answer the original question, the old phrase_/3 interpreter was slow because it invoked expand_goal eagerly and often. (,)/2 and family had the same problem until recently. I realized a long time ago that it could be replaced by dcg_body but I punted the task until now.

View full answer

triska · 2022-01-20T20:25:22Z

triska
Jan 20, 2022
Author

Addendum: The query is much faster in Scryer Prolog if we use the compiled DCG translation directly (instead of phrase/2), so the huge difference in this case may be connected to goal expansion:

?- time((between(1,100_000,_),rev("abcd",Ls0,[]),false)).
   % CPU time: 0.657s

Still, that's a roughly 8-fold overhead, and I think well worth looking into if someone is interested in helping to improve the performance of Scryer Prolog!

0 replies

josd · 2022-01-20T21:06:52Z

josd
Jan 20, 2022

Well, using swipl without phrase/2 is also running faster.
Running the same goal 1 million times gives

$ scryer-prolog
?- [user].
:- use_module(library(dcgs)).

rev([]) --> [].
rev([R|Rs]) --> rev(Rs), [R].

?- use_module(library(time)).
   true.
?- use_module(library(between)).
   true.
?- time((between(1,1_000_000,_),rev("abcd",Ls0,[]),false)).
   % CPU time: 1.479s
false.

versus

$ swipl -q
?- [user].
|: rev([]) --> [].
|: rev([R|Rs]) --> rev(Rs), [R].
|: ^Dtrue.

?- time((between(1,1_000_000,_),rev("abcd",Ls0,[]),false)).
% 2,000,003 inferences, 0.045 CPU in 0.045 seconds (100% CPU, 44379091 Lips)
false.

which is about a factor 33 faster than Scryer.

1 reply

josd Jan 20, 2022

The swipl vm code is

?- vm_list(rev).
========================================================================
rev/3
========================================================================
       0 s_virgin
       1 i_exit
----------------------------------------
clause 1 (<clause>(0x558514157960)):
----------------------------------------
       0 h_nil
       1 i_enter
       2 b_unify_vv(1,2)
       5 i_exit
----------------------------------------
clause 2 (<clause>(0x55851437b440)):
----------------------------------------
       0 h_list_ff(3,4)
       3 i_enter
       4 b_var(4)
       6 b_var1
       7 b_firstvar(5)
       9 i_call(user:rev/3)
      11 b_unify_var(5)
      13 h_list
      14 h_var(3)
      16 h_var(2)
      18 h_pop
      19 b_unify_exit
      20 i_exit
true.

josd · 2022-01-20T22:47:48Z

josd
Jan 20, 2022

The number of events measured with valgrind is 17291342699 versus 745806340 which is also a factor 23:

$ cat test.pl
:- use_module(library(dcgs)).
:- use_module(library(between)).

rev([]) --> [].
rev([R|Rs]) --> rev(Rs), [R].

run :-
    between(1,1_000_000,_),
    rev("abcd",_,[]),
    false.
run.

$ valgrind --tool=callgrind scryer-prolog -g run,halt test.pl
==32586== Callgrind, a call-graph generating cache profiler
==32586== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==32586== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==32586== Command: scryer-prolog -g run,halt test.pl
==32586==
==32586== For interactive control, run 'callgrind_control -h'.
==32586==
==32586== Events    : Ir
==32586== Collected : 17291342699
==32586==
==32586== I   refs:      17,291,342,699

$ callgrind_annotate --show-percs=yes --auto=no callgrind.out.32586
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.32586' (creator: callgrind-3.15.0)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
LL cache:
Timerange: Basic block 0 - 3394805606
Trigger: Program termination
Profiled target:  scryer-prolog -g run,halt test.pl (PID 32586, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:
User annotated:
Auto-annotation:  off

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
17,291,342,699 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                      file:function
--------------------------------------------------------------------------------
2,290,602,250 (13.25%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/dispatch.rs:scryer_prolog::machine::Machine::run_module_predicate
1,338,025,971 ( 7.74%)  /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/slice/index.rs:scryer_prolog::machine::Machine::run_module_predicate
  728,377,277 ( 4.21%)  /home/jdroo/github.com/mthom/scryer-prolog/src/types.rs:scryer_prolog::machine::Machine::run_module_predicate
  612,738,277 ( 3.54%)  /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/iter/range.rs:scryer_prolog::machine::Machine::run_module_predicate
  524,310,613 ( 3.03%)  /build/glibc-eX1tMB/glibc-2.31/malloc/malloc.c:_int_free [/usr/lib/x86_64-linux-gnu/libc-2.31.so]
  491,418,445 ( 2.84%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/machine_state.rs:scryer_prolog::machine::Machine::run_module_predicate
  483,652,943 ( 2.80%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/machine_state_impl.rs:scryer_prolog::machine::machine_state_impl::<impl scryer_prolog::machine::machine_state::MachineState>::unify [/home/jdroo/github.com/mthom/scryer-prolog/target/release/scryer-prolog]
  431,200,506 ( 2.49%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/machine_state_impl.rs:scryer_prolog::machine::machine_state_impl::<impl scryer_prolog::machine::machine_state::MachineState>::bind [/home/jdroo/github.com/mthom/scryer-prolog/target/release/scryer-prolog]
  382,156,270 ( 2.21%)  /home/jdroo/github.com/mthom/scryer-prolog/src/types.rs:scryer_prolog::machine::machine_state_impl::<impl scryer_prolog::machine::machine_state::MachineState>::unify
  370,717,617 ( 2.14%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/machine_state_impl.rs:scryer_prolog::machine::Machine::run_module_predicate
  355,767,198 ( 2.06%)  /build/glibc-eX1tMB/glibc-2.31/malloc/malloc.c:malloc [/usr/lib/x86_64-linux-gnu/libc-2.31.so]
  338,224,560 ( 1.96%)  /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/hash/sip.rs:<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write
  333,029,144 ( 1.93%)  /home/jdroo/github.com/mthom/scryer-prolog/src/types.rs:scryer_prolog::machine::machine_state_impl::<impl scryer_prolog::machine::machine_state::MachineState>::bind
  308,291,548 ( 1.78%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/machine_state_impl.rs:_ZN13scryer_prolog7machine18machine_state_impl69_$LT$impl$u20$scryer_prolog..machine..machine_state..MachineState$GT$5store17h14e364b13ccb4ff1E.llvm.12768906043438295677 [/home/jdroo/github.com/mthom/scryer-prolog/target/release/scryer-prolog]
  307,885,649 ( 1.78%)  /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/alloc/src/vec/mod.rs:scryer_prolog::machine::Machine::run_module_predicate
  304,656,048 ( 1.76%)  /home/jdroo/.cargo/registry/src/garden.eu.org-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:hashbrown::raw::inner::RawTable<T,A>::reserve_rehash [/home/jdroo/github.com/mthom/scryer-prolog/target/release/scryer-prolog]
  272,532,967 ( 1.58%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/machine_state_impl.rs:scryer_prolog::machine::machine_state_impl::<impl scryer_prolog::machine::machine_state::MachineState>::trail [/home/jdroo/github.com/mthom/scryer-prolog/target/release/scryer-prolog]
  267,173,532 ( 1.55%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/machine_state_impl.rs:scryer_prolog::machine::machine_state_impl::<impl scryer_prolog::machine::machine_state::MachineState>::read_s [/home/jdroo/github.com/mthom/scryer-prolog/target/release/scryer-prolog]
  212,993,743 ( 1.23%)  /home/jdroo/.cargo/registry/src/garden.eu.org-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:hashbrown::raw::inner::RawTable<T,A>::insert [/home/jdroo/github.com/mthom/scryer-prolog/target/release/scryer-prolog]
  191,282,202 ( 1.11%)  /home/jdroo/github.com/mthom/scryer-prolog/src/machine/mod.rs:scryer_prolog::machine::Machine::run_module_predicate [/home/jdroo/github.com/mthom/scryer-prolog/target/release/scryer-prolog]
  177,539,817 ( 1.03%)  /build/glibc-eX1tMB/glibc-2.31/malloc/malloc.c:free [/usr/lib/x86_64-linux-gnu/libc-2.31.so]

versus

$ valgrind --tool=callgrind swipl -g run,halt test.pl
==32590== Callgrind, a call-graph generating cache profiler
==32590== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==32590== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==32590== Command: swipl -g run,halt test.pl
==32590==
==32590== For interactive control, run 'callgrind_control -h'.
ERROR: /tmp/test.pl:1:
ERROR:    source_sink `library(dcgs)' does not exist
Warning: /tmp/test.pl:1:
Warning:    Goal (directive) failed: user:use_module(library(dcgs))
ERROR: /tmp/test.pl:2:
ERROR:    source_sink `library(between)' does not exist
Warning: /tmp/test.pl:2:
Warning:    Goal (directive) failed: user:use_module(library(between))
==32590==
==32590== Events    : Ir
==32590== Collected : 745806340
==32590==
==32590== I   refs:      745,806,340

$ callgrind_annotate --show-percs=yes --auto=no callgrind.out.32590
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.32590' (creator: callgrind-3.15.0)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
LL cache:
Timerange: Basic block 0 - 121034866
Trigger: Program termination
Profiled target:  swipl -g run,halt test.pl (PID 32590, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:
User annotated:
Auto-annotation:  off

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
745,806,340 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                    file:function
--------------------------------------------------------------------------------
330,044,799 (44.25%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-vmi.c:PL_next_solution___LD
 51,000,061 ( 6.84%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-arith.c:pl_between3_va [/usr/local/lib/swipl/lib/x86_64-linux/libswipl.so.8.5.5]
 45,228,955 ( 6.06%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-wam.c:PL_next_solution___LD [/usr/local/lib/swipl/lib/x86_64-linux/libswipl.so.8.5.5]
 43,006,312 ( 5.77%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-gmp.c:put_number___LD [/usr/local/lib/swipl/lib/x86_64-linux/libswipl.so.8.5.5]
 41,001,230 ( 5.50%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-alloc-inline.h:VM_globalIndirectFromCode___LD [/usr/local/lib/swipl/lib/x86_64-linux/libswipl.so.8.5.5]
 39,000,621 ( 5.23%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-gmp.c:PL_unify_number___LD [/usr/local/lib/swipl/lib/x86_64-linux/libswipl.so.8.5.5]
 27,205,657 ( 3.65%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-inline.h:PL_next_solution___LD
 26,999,973 ( 3.62%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-arith.c:ar_add_ui [/usr/local/lib/swipl/lib/x86_64-linux/libswipl.so.8.5.5]
 24,005,252 ( 3.22%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-gmp.c:cmpNumbers [/usr/local/lib/swipl/lib/x86_64-linux/libswipl.so.8.5.5]
 18,002,088 ( 2.41%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-alloc.c:allocGlobal___LD [/usr/local/lib/swipl/lib/x86_64-linux/libswipl.so.8.5.5]
 11,673,251 ( 1.57%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-vmi.c:PL_next_solution___LD'2
 10,000,074 ( 1.34%)  /home/jdroo/github.com/SWI-Prolog/swipl-devel/build/../src/pl-attvar.h:PL_unify_number___LD

1 reply

triska Jan 20, 2022
Author

Please make sure to compare the systems in the right way: To make SWI-Prolog treat "abcd" as a list of characters as in Scryer Prolog, you must set the Prolog flag double_quotes to chars also in SWI-Prolog. This is why I used [a,b,c,d] in the sample query I posted with SWI-Prolog.

josd · 2022-01-21T00:13:30Z

josd
Jan 21, 2022

You are right and I completely missed that :-(
With

$ cat test_swi.pl
:- set_prolog_flag(double_quotes,chars).

rev([]) --> [].
rev([R|Rs]) --> rev(Rs), [R].

run :-
    between(1,1_000_000,_),
    rev("abcd",_,[]),
    false.
run.

we now get

$ swipl -q test_swi.pl
?- time((between(1,1_000_000,_),rev("abcd",Ls0,[]),false)).
% 7,000,003 inferences, 0.202 CPU in 0.202 seconds (100% CPU, 34708525 Lips)
false.

which is indeed about your factor 8.

0 replies

mthom · 2022-01-21T03:21:24Z

mthom
Jan 21, 2022
Maintainer

I don't think this comparison is totally fair to Scryer.

SWI's implementation of between/3 is written in hand-rolled C while Scryer's is written in Prolog.

If we benchmark our implementation of between/3 on SWI (that is, Scryer's but with the initial error checking removed):

:- use_module(library(dcgs)).

:- set_prolog_flag(double_quotes, chars).

rev([]) --> [].
rev([R|Rs]) --> rev(Rs), [R].

p_between(Lower, Upper, X) :-
    (   nonvar(X) ->
        Lower =< X,
        X =< Upper
    ;   Lower =< Upper,
        between_(Lower, Upper, X)
    ).

between_(Lower, Upper, Lower1) :-
   Lower < Upper,
   !,
   (  Lower1 = Lower
   ;  Lower0 is Lower + 1,
      between_(Lower0, Upper, Lower1)
   ).
between_(Lower, Lower, Lower).

on SWI we get:

2 ?- time((p_between(1,1_000_000,_),rev("abcd",Ls0,[]),false)).
% 9,000,004 inferences, 0.351 CPU in 0.352 seconds (100% CPU, 25611329 Lips)
false.

and on Scryer:

?- time((between(1,1_000_000,_),rev("abcd",Ls0,[]),false)).
   % CPU time: 1.749s

Scryer is 5x slower on my machine. I think that's a reasonable range for now. There's many opportunities for improving processing of strings in various WAM instructions. Several improvements are already being planned.

0 replies

mthom · 2022-01-21T15:33:09Z

mthom
Jan 21, 2022
Maintainer

To answer the original question, the old phrase_/3 interpreter was slow because it invoked expand_goal eagerly and often. (,)/2 and family had the same problem until recently. I realized a long time ago that it could be replaced by dcg_body but I punted the task until now.

0 replies

triska · 2022-01-21T16:47:30Z

triska
Jan 21, 2022
Author

Thank you a lot, this change has led to a very impressive speedup of the original query that caused me to start this discussion in the first place:

?- time((between(1,100_000,_),phrase(rev("abcd"),_),false)).
   % CPU time: 0.656s

That's a more than 16-fold speedup, and hence the speed difference between the two systems is now well within an order of magnitude on all (equivalent) queries we have discussed here. That's indeed a great achievement, especially for this early stage of development that Scryer Prolog is now in!

4 replies

UWN Jan 21, 2022

What is the overhead of call(rev("abcd"),_,[]) vs rev("abcd",_,[])? One option is indeed to use the same approach as YAP did. That is, implementing phrase(NT__0, Xs0,Xs) :- call(NT__0, Xs0,Xs). with the obvious auxiliary predicates like []/2, !/2, (',')/4 &c.

mthom Jan 21, 2022
Maintainer

call/1 calls expand_goal/3 on its argument but the plain version does not.

UWN Jan 21, 2022

What is expand_goal/3 good for in this context? The last time I compared systems overheads were quite dramatic (because of similar issues). Maybe this would be a good moment to reconsider this.

mthom Jan 21, 2022
Maintainer

Dynamic dispatch is one (probably unlikely and infrequent) application. If a module's goal_expansion is declared dynamic, for instance, and expand_goal/3 is meant to use clauses asserted to goal_expansion between calls in the same query.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of the rebis-dev branch: Are we missing something obvious? #1242

{{title}}

Replies: 7 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Performance of the rebis-dev branch: Are we missing something obvious? #1242

triska Jan 20, 2022

Replies: 7 comments · 6 replies

triska Jan 20, 2022 Author

josd Jan 20, 2022

josd Jan 20, 2022

josd Jan 20, 2022

triska Jan 20, 2022 Author

josd Jan 21, 2022

mthom Jan 21, 2022 Maintainer

mthom Jan 21, 2022 Maintainer

triska Jan 21, 2022 Author

UWN Jan 21, 2022

mthom Jan 21, 2022 Maintainer

UWN Jan 21, 2022

mthom Jan 21, 2022 Maintainer

triska
Jan 20, 2022

Replies: 7 comments 6 replies

triska
Jan 20, 2022
Author

josd
Jan 20, 2022

josd
Jan 20, 2022

triska Jan 20, 2022
Author

josd
Jan 21, 2022

mthom
Jan 21, 2022
Maintainer

mthom
Jan 21, 2022
Maintainer

triska
Jan 21, 2022
Author

mthom Jan 21, 2022
Maintainer

mthom Jan 21, 2022
Maintainer