Replies: 5 comments 7 replies
-
I have also written a much faster micros() function for ESP32 to speed things up. Integrates without any change to AceRoutine via ClockInterface. |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for posting, you got a lot of good ideas. Here are my initial thoughts: General Comments You are using an impressive number of coroutines, 20. The most that I have used is about 8. In general, I prefer to avoid using the c-preprocessor macros to activate/deactivate features in my libraries. Partly because changing the macro often forces the entire application and dependent libraries to be recompiled to avoid subtle compile-time and link-time errors. And partly because the Arduino IDE stubbornly refuses to support the ACE_ROUTINE_ENABLE_NAME AceRoutine automatically added the name string to each coroutine prior to v1.3, but I took that feature out in a flurry of refactoring to save 800-1000 bytes of flash memory on 8-bit processors. In retrospect, I should have made coroutine names an optional feature, instead of taking it out completely. It would cost only 2 bytes of static RAM on 8-bit processors if not used. I have been thinking about adding this feature back in. ACE_ROUTINE_USE_32BIT_TIMERS As far as I can tell, the primary reason that you need 32-bit counters is because you need the dynamic range of a 32-bit integer for your histograms. My expectation was that if someone needed to call In your case, you need to have the entire 32-bit dynamic range to update your histogram. Is that a fair assessment? [Having gone through your code while writing the rest of this comment, I get the impression that you don't really need these to be Following my general preference about avoiding compile-time macros to control behavior, I would prefer to avoid using the Option 1: Use Option 2: Change the name of the delay macros to be My initial thought is that Option 2 would be too disruptive for most people, to save a few narrow edge cases. So I'm leaning towards Option 1. ACE_ROUTINE_BENCHMARK This is a cool feature, nicely done. I understand the logarithmic binning and I understand the numbers in the JSON output as a frequency distribution. But I don't think I understand the y-axis of your graph. Did you normalize the frequency distribution so that value for the 1 microsecond bin is always 1? But that doesn't make sense because some of your coroutines have a count of 0 at the 1 microsecond bin, which means -infinity on a log graph. So with this feature, I would again like to avoid using the So roughly, something like this: class CoroutineProfiler {
public:
void updateElapsedMicros(uint32_t micros);
};
class Coroutine {
public:
...
void setProfiler(CoroutineProfiler* profiler) { mProfiler = profiler; }
CoroutineProfiler* getProfiler() const { return mProfiler; }
private:
...
CoroutineProfiler* mProfiler = nullptr;
};
CoroutineSchedule::runCoroutine() {
...
if ((*current)->mProfiler) {
uint32_t startMicros = T_COROUTINE::coroutineMicros();
(*current)->runCoroutine();
uint32_t elapsedMicros = _COROUTINE::coroutineMicros() - startMicros;
(*current)->mProfiler->updateElapsedMicros(elapsedMicros);
} else {
(*current)->runCoroutine();
}
...
} There's a bunch of design decisions to be made for
But I think you can see where I'm going with this. PS Instead of a zip file, maybe you can send me a link to a GitHub repo with the forked code next time? It's easier to see a diff and play with the code using a git repo. |
Beta Was this translation helpful? Give feedback.
-
Greetings! I've taken your advice and made a fork in https://github.com/peufeu2/AceRoutine-1 All the latest code is in there, I'd like to know what you think about it.
I agree with that! It was a quick'n'dirty test. I've reimplemented it with templates, that allow configuring
Name is not required for profiling. If it's not there, it will use a hex pointer address, which is... not that helpful if you have more than one.
I put the 32 bit code in Coroutine32bit.h, I added it to the main #include but if it isn't used, then it isn't necessary to #include it.
Since all coroutines in the application must derive from the same base class, either they all support profiling, or not. I am very fond of your low-overhead approach, so I wanted to not use a single byte more than the original code if the extra features are disabled. I think that worked, at least on ESP32. So I didn't use virtual functions for a few things that would have been convenient, but would have used extra resources. I did use them in the profiler, because that's gonna use lots of RAM to store the histograms anyway. This has a few drawbacks, for example if the Name feature is enabled, it is enabled for every coroutine. Oh, well. Configuration is done in the user code like that:
That puts the pieces together and enables the features that should be pretty obvious given the class names.
Yes, it's pretty much the reason. Another one was to use COROUTINE_DELAY( variable ) and not have to bother with if( it is longer than 32768 ms) and stuff like that.
I've changed pretty much everything from my previous code and now it uses mDelayStart. In fact it uses it twice: When profiling coroutine wait times, when the coroutine is about to run, "micros()-mDelayStart" is how much it waited. The profiler doesn't record that, rather it records how much more it waited relative to the requested mDelayDuration, because that's the interesting thing to record, and there is no point in filling the leftmost part of the histogram with zeros. So, if a coroutine uses several delays with different values, it will work, and the zero X axis of the histogram means "DELAY() delivered exactly what was asked." When profiling coroutine run times, mDelayStart was unused, so I recycled it to store the time when the coroutine enters the run state. So when it goes back to yield/delay, the profiler uses that to know how long it ran. I've added another clock in ClockInterface so runtime can be profiled using CPU cycles instead of micros. Since a profiler is an object that records time intervals, you can make two instances, call them WaitProfiler and RunProfiler, and have stats about both the run-time and the wait-times.
I went for option 3, make the user explicitly configure it for 32 bit. But, do users read what they copypaste, that remains to be seen...
Done ;) I changed it so it receives messages from the Coroutine, not the Scheduler, because only the coroutine knows if its delay is encoded in millis, micros or seconds. Right now I haven't implemented the profiler that is compatible with the 16-bit counters, but I would not want to prevent that from happening.
I've added a few histogram classes, and since ESP32 has a FPU I put one with a non-power-of-2 log histogram... this one should really be disabled on small cpus though. I've added an example in examples/Profiler, with 2 commented plots to answer your questions about the Y axis. The python script that generates them is trash, but it's usable. An interesting thing is, these Profiler objects make histograms of integer values. So they can also profile functions, it just needs a bit of boilerplate code to store the cpu cycle counter at the beginning of the function and call the profiler at the end. And I've also used them to make a histogram of logic level transition times on my onewire bus to debug it, see here pstolarz/OneWireNg#38 (comment) I've tried to add enough comments in the code. Have a nice day ;) |
Beta Was this translation helpful? Give feedback.
-
Ok, I merged everything into the
Moving the profile instrumentation into the
I think we can say that adding Profiling adds almost no CPU overhead if it is not used. I'm going to let this simmer for a couple of days before pushing a new release. I often find a bunch of typos and other little fixes before a release. |
Beta Was this translation helpful? Give feedback.
-
I just pushed out v1.5.0 with this feature. I made a few tweaks from my last update. 1)) I added a If the original 2)) It looks like I did not run my benchmarking properly when figuring out how much extra runtime overhead was consumed by profiling. When I updated
In other words, 3)) I hope everything is explained sufficiently clearly in USER_GUIDE.md#CoroutineProfiling. |
Beta Was this translation helpful? Give feedback.
-
Greetings!
I really like this library, one of the best for arduino!
I'm using it with ESP32 for my home heating controller, it has 20 coroutines at the moment.
I wanted the input filtering and debouncing coroutine to execute every 10-20ms. So I had to hunt slow stuff in the other coroutines to prevent them from executing for longer than that.
In order to do that, I've added a few features, so here are the files, in case you find them useful.
There are 3 #defines to enable the features:
#define ACE_ROUTINE_ENABLE_NAME
This juts adds a "const char*name" to the coroutine base class. I initialize it in derived classes.
#define ACE_ROUTINE_USE_32BIT_TIMERS
This one replaces the delay timers with a single 32 bit timer using microseconds. It makes no sense to save a few bytes of RAM/Flash on ESP32. This gives longer possible delays using one single function (COROUTINE_DELAY_MICROS).
#define ACE_ROUTINE_BENCHMARK 20
This one requires the above two #defines to be active. It adds an array mRuntimeStats[] to each coroutine. Every time the scheduler executes this coroutine, it measures its execution time, and increments mRuntimeStats[ log2( time in µs ) ]. So this array contains a histogram of execution times, in a form that doesn't use too much RAM.
I've added functions to encode that as JSON, here's a sample output:
That gives a useful plot of the probability distribution of runCoroutine() time:
Thanks to this I've managed to find the coroutines that were taking too long (it was mostly the I2C LCD) and snipe some YIELDs into them, which keeps the maximum runCoroutine() runtime of all of them below 16ms now. So everything runs fine.
Have a nice day!
ace_routine.zip
Beta Was this translation helpful? Give feedback.
All reactions