-
Dear all, let's consider a very simple DCG, describing the relation between a list and its reversal: :- use_module(library(dcgs)). rev([]) --> []. rev([R|Rs]) --> rev(Rs), [R]. Sample use: ?- phrase(rev("abcd"), Ls). Ls = "dcba". Now, using the newly available ?- use_module(library(time)). true. ?- use_module(library(between)). true. ?- time((between(1,100_000,_),phrase(rev("abcd"),_),false)). % CPU time: 11.045s The time of the additional ?- time((between(1,100_000,_),false)). % CPU time: 0.116s So, For comparison, let's consider an equivalent query in SWI-Prolog: ?- time((between(1,100_000,_),phrase(rev([a,b,c,d]),_),false)). % 700,002 inferences, 0.081 CPU in 0.086 seconds (94% CPU, 8645623 Lips) false. This means that Scryer Prolog currently incurs a more than 135 fold overhead for this query, even with the I find this huge gap surprising, and unlikely to be resolvable by comparative micro-optimizations such as better WAM code. We can see the WAM instructions generated for ?- use_module(library(diag)). true. ?- use_module(library(format)). true. ?- use_module(library(lists)). true. ?- wam_instructions(rev/3, Is), maplist(portray_clause, Is). switch_on_term(1,external(1),external(2),external(8),fail). try_me_else(6). get_constant(level(shallow),[],x(1)). put_value(x(2),1). get_variable(x(1),2). put_value(x(3),2). execute(=,2). trust_me(0). allocate(3). get_list(level(shallow),x(1)). unify_variable(y(2)). unify_variable(x(1)). get_variable(y(1),3). put_variable(y(3),3). call(rev,3). put_unsafe_value(3,1). put_list(level(shallow),x(2)). set_value(y(2)). set_local_value(y(1)). deallocate. execute(=,2). So, that's 21 instructions, and they look reasonable to me. For comparison, here are the WAM instructions generated by GNU Prolog, one of the fastest Prolog systems especially for such small benchmarks, and they look roughly the same: switch_on_term(1,2,fail,4,fail), label(1), try_me_else(3), label(2), get_nil(0), get_value(x(2),1), proceed, label(3), trust_me_else_fail, label(4), allocate(3), get_variable(y(1),2), get_list(0), unify_variable(y(0)), unify_variable(x(0)), put_variable(y(2),2), call(rev/3), put_unsafe_value(y(2),0), get_list(0), unify_value(y(0)), unify_local_value(y(1)), deallocate, proceed]). A 100-fold improvement in runtime seems unlikely to stem from any particular optimization that affects any of these instructions in isolation, nor from replacing or avoiding one or even a few of these instructions. A 2- or 3-fold speedup, maybe, but not a 100-fold one. I would greatly appreciate if someone who is interested in this topic could have a look at this specific sample program and its execution with the Thank you and all the best! |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 6 replies
-
Addendum: The query is much faster in Scryer Prolog if we use the compiled DCG translation directly (instead of ?- time((between(1,100_000,_),rev("abcd",Ls0,[]),false)). % CPU time: 0.657s Still, that's a roughly 8-fold overhead, and I think well worth looking into if someone is interested in helping to improve the performance of Scryer Prolog! |
Beta Was this translation helpful? Give feedback.
-
Well, using swipl without
versus
which is about a factor 33 faster than Scryer. |
Beta Was this translation helpful? Give feedback.
-
The number of events measured with valgrind is 17291342699 versus 745806340 which is also a factor 23:
versus
|
Beta Was this translation helpful? Give feedback.
-
You are right and I completely missed that :-(
we now get
which is indeed about your factor 8. |
Beta Was this translation helpful? Give feedback.
-
I don't think this comparison is totally fair to Scryer. SWI's implementation of If we benchmark our implementation of between/3 on SWI (that is, Scryer's but with the initial error checking removed):
on SWI we get:
and on Scryer:
Scryer is 5x slower on my machine. I think that's a reasonable range for now. There's many opportunities for improving processing of strings in various WAM instructions. Several improvements are already being planned. |
Beta Was this translation helpful? Give feedback.
-
To answer the original question, the old |
Beta Was this translation helpful? Give feedback.
-
Thank you a lot, this change has led to a very impressive speedup of the original query that caused me to start this discussion in the first place: ?- time((between(1,100_000,_),phrase(rev("abcd"),_),false)). % CPU time: 0.656s That's a more than 16-fold speedup, and hence the speed difference between the two systems is now well within an order of magnitude on all (equivalent) queries we have discussed here. That's indeed a great achievement, especially for this early stage of development that Scryer Prolog is now in! |
Beta Was this translation helpful? Give feedback.
To answer the original question, the old
phrase_/3
interpreter was slow because it invokedexpand_goal
eagerly and often.(,)/2
and family had the same problem until recently. I realized a long time ago that it could be replaced bydcg_body
but I punted the task until now.