Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buggy planning in 2093 #1020

Closed
jcoupey opened this issue Oct 24, 2023 · 7 comments
Closed

Buggy planning in 2093 #1020

jcoupey opened this issue Oct 24, 2023 · 7 comments

Comments

@jcoupey
Copy link
Collaborator

jcoupey commented Oct 24, 2023

I've noticed a buggy behavior of plan mode when using high values of timestamps. This was showing up in a more complex setup but can be reproduced with the following randomly generated minimal (non-)working example.

  • test_2023.json has a single job with no TW constraint and a single vehicle with a 2-hours long TW starting today at 9:00 GMT.
  • test_2093.json is exactly the same instance but shifted in time 70 years from now (only the vehicle TW differ).
$ diff test_2023.json test_2093.json
14,15c14,15
<         1698138000,
<         1698145200
---
>         3907213200,
>         3907220400

Now using plan mode on those yields different results:

$ vroom -c -i test_2023.json | jq .summary.violations
[]
$ vroom -c -i test_2093.json | jq .summary.violations
[
  {
    "duration": 1845631911,
    "cause": "delay"
  }
]

The delay showing up is on the job step which is clearly wrong. The fact that some delay shows up on a non time constrained task, plus the fact that it happens when significantly increasing the TW values seems to hint to some kind of overflow somewhere.

@kkarbowiak
Copy link
Contributor

I had a look and was able to find the problem. It happens during scaling of end of job time window to UserDuration in

const auto user_tw_end =
utils::scale_to_user_duration(job.tws[tw_rank].end);

The job time window is default, that is start is 0 and end is std::numeric_limits<Duration>::max(). This is expected as evidenced by the assertion in
assert(job.tws[tw_rank].end % DURATION_FACTOR == 0 ||
job.tws[tw_rank].is_default());

but then the scaling function cannot fit the scaled value in the Duration type.
As a quick check I changed the scaling function to check for this max value

inline UserDuration scale_to_user_duration(Duration d) {
  if (d == std::numeric_limits<Duration>::max()) {
    return std::numeric_limits<UserDuration>::max();
  }

  return static_cast<UserDuration>(d / DURATION_FACTOR);
}

and it worked fine, but this isn't really a proper solution.

Maybe it would be better to redefine the default time window to start = 0, end = utils::scale_from_user_duration(std::numeric_limits<UserDuration>::max()). This should allow scaling back and forth without overflows.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Nov 17, 2023

Thanks for investigating. I get it that when end is std::numeric_limits<Duration>::max(), then end / DURATION_FACTOR as computed by scale_to_user_duration is too high for a UserDuration.

But now I first thought the overflow was related to the higher values in the vehicle time window. But it looks like the overflow is happening for both instances (and any time we have a default TW), it's just that in the second instance the user_service_start value happens to be high enough to trigger the following if clause:

const auto user_tw_end =
utils::scale_to_user_duration(job.tws[tw_rank].end);
if (user_tw_end < user_service_start) {
current.violations.types.insert(VIOLATION::DELAY);
v_types.insert(VIOLATION::DELAY);
current.violations.delay = user_service_start - user_tw_end;
user_delay += current.violations.delay;
}

@kkarbowiak do you share this analysis? In which case this is worse than I first thought.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Nov 17, 2023

Well, adding the following assert to scale_to_user_duration does break in both cases so I have my answer.

diff --git a/src/structures/typedefs.h b/src/structures/typedefs.h
index 79ef1ec1..49f9607d 100644
--- a/src/structures/typedefs.h
+++ b/src/structures/typedefs.h
@@ -195,6 +195,8 @@ inline Duration scale_from_user_duration(UserDuration d) {
 }
 
 inline UserDuration scale_to_user_duration(Duration d) {
+  assert(d <= DURATION_FACTOR * static_cast<Duration>(
+                                  std::numeric_limits<UserDuration>::max()));
   return static_cast<UserDuration>(d / DURATION_FACTOR);
 }

We should probably add that prior to any fix to fail early in any other similar situation, same for scale_to_user_cost.

Your suggestion on updating the default TimeWindow::end value makes total sense, it should probably have been done anyway upon separating the Duration and UserDuration types. Would you care to submit a PR for this? That would be a good timing to include the fix in the next release as I'm still polishing things and have to tag yet another release candidate anyway because of #1038.

@kkarbowiak
Copy link
Contributor

@kkarbowiak do you share this analysis? In which case this is worse than I first thought.

Yes, the overflow happens in any instance with a default TW. It looks like the bug will affect all calculations with dates starting at May, 1st, 2035.

@kkarbowiak
Copy link
Contributor

Would you care to submit a PR for this?

Yup. Will have a PR ready today.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Nov 17, 2023

the bug will affect all calculations with dates starting at May, 1st, 2035.

I'd still like to get that fixed in 2023. ;-)

@jcoupey jcoupey added this to the v1.14.0 milestone Nov 17, 2023
@jcoupey
Copy link
Collaborator Author

jcoupey commented Nov 17, 2023

Fixed in #1040

@jcoupey jcoupey closed this as completed Nov 17, 2023
jcoupey added a commit that referenced this issue Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants