Precision issue of CreateTimerQueueTimer()

Discussion:

Precision issue of CreateTimerQueueTimer()

(too old to reply)

Rainny

2006-11-13 02:53:13 UTC

I use timer queue API to create an event object which is fired at
regular interval say 10 milliseconds. The precision seems not very
satisfying. I wonder if it's natural for a non-realtime OS, and is
there
anything I can do with it? Thanks.

Here's the code:

#define _WIN32_WINNT 0x0500
#include <windows.h>
#include <iostream>

static volatile DWORD last;

VOID CALLBACK timer_proc(PVOID, BOOLEAN)
{
DWORD cur = ::GetTickCount();
std::cout << "timer_proc() called, delta = " << cur - last <<
std::endl;
last = cur;
}

int main(int argc, char *argv[])
{
::timeBeginPeriod(1);
last = ::GetTickCount();

// Create timer queue
HANDLE handle;
::CreateTimerQueueTimer(&handle, NULL, timer_proc, NULL, 10, 10, 0);

::Sleep(110);

// Change due time & period
std::cout << "Changing timer, delta = " << ::GetTickCount() - last <<
std::endl;
last = ::GetTickCount();
::ChangeTimerQueueTimer(NULL, handle, 130, 20);

::Sleep(310);

// Terminate timer
std::cout << "Deleting timer, delta = " << ::GetTickCount() - last <<
std::endl;
last = ::GetTickCount();
::DeleteTimerQueueTimer(NULL, handle, INVALID_HANDLE_VALUE);
std::cout << "Timer deleted, delta = " << ::GetTickCount() - last <<
std::endl;

::timeEndPeriod(1);

return 0;
}

Compile with 'cl timer_queue_test.cpp /EHsc winmm.lib' under Windows XP
Pro sp2, Visual C++ 2005, AMD A64 2800 OC 3200.

The output is like this:

X:\Test>timer_queue_test.exe
timer_proc() called, delta = 16
timer_proc() called, delta = 15
timer_proc() called, delta = 16
timer_proc() called, delta = 0
timer_proc() called, delta = 16
timer_proc() called, delta = 15
timer_proc() called, delta = 0
timer_proc() called, delta = 16
timer_proc() called, delta = 15
timer_proc() called, delta = 0
Changing timer, delta = 16
timer_proc() called, delta = 125
timer_proc() called, delta = 16
timer_proc() called, delta = 31
timer_proc() called, delta = 16
timer_proc() called, delta = 15
timer_proc() called, delta = 16
timer_proc() called, delta = 31
timer_proc() called, delta = 16
timer_proc() called, delta = 15
Deleting timer, delta = 16
Timer deleted, delta = 0

anton bassov

2006-11-13 04:13:42 UTC

Hi mate

Post by Rainny
I use timer queue API to create an event object which is fired at
regular interval say 10 milliseconds.

You should clearly realize that the maximal *THEORETICALLY* possible
precision is that of a clock tick - the system has no notion of time in
between ticks. Therefore, if you have specified an interval of, say,
5ms on the system with the clock tick of, say, 15ms, the maximal
THEORETICALLY* possible precision is 15ms. This is theory.

Post by Rainny
The precision seems not very satisfying. I wonder if it's natural for a non-realtime OS,

In practice, don't forget that there are many threads in the system,
some of them are of priority above that of your thread. Your callback
will be called in context of your thread. Therefore, you have no option
but to wait until your thread is scheduled to run the next time -even
if your timer has already expired. This is how general-purpose OS works
- if you really need a high precision, you may consider RTOS.....

Post by Rainny
is there anything I can do with it?

You may consider increasing base priority of your process and thread.
However, even this step does not offer an ultimate solution. Unless
currently running thread yields the CPU voluntarily, it is going to run
until its quantum expires (on Windows Server systems the default
quantum is 12 clock ticks , although on workstations it is only 2 clock
ticks ).
Therefore, even if you set your thread's priority to the highest
possible level of 31, some delays are still possible (I make an
assumption that there are no other threads of the same priority in the
system - if multiple threads try to be "very special" and increase
their priority to the maximal possible level, the actual meaning of
real-time priority gets somehow diluted, for understandable reasons)

Anton Bassov

Post by Rainny
I use timer queue API to create an event object which is fired at
regular interval say 10 milliseconds. The precision seems not very
satisfying. I wonder if it's natural for a non-realtime OS, and is
there
anything I can do with it? Thanks.
#define _WIN32_WINNT 0x0500
#include <windows.h>
#include <iostream>
static volatile DWORD last;
VOID CALLBACK timer_proc(PVOID, BOOLEAN)
{
DWORD cur = ::GetTickCount();
std::cout << "timer_proc() called, delta = " << cur - last <<
std::endl;
last = cur;
}
int main(int argc, char *argv[])
{
::timeBeginPeriod(1);
last = ::GetTickCount();
// Create timer queue
HANDLE handle;
::CreateTimerQueueTimer(&handle, NULL, timer_proc, NULL, 10, 10, 0);
::Sleep(110);
// Change due time & period
std::cout << "Changing timer, delta = " << ::GetTickCount() - last <<
std::endl;
last = ::GetTickCount();
::ChangeTimerQueueTimer(NULL, handle, 130, 20);
::Sleep(310);
// Terminate timer
std::cout << "Deleting timer, delta = " << ::GetTickCount() - last <<
std::endl;
last = ::GetTickCount();
::DeleteTimerQueueTimer(NULL, handle, INVALID_HANDLE_VALUE);
std::cout << "Timer deleted, delta = " << ::GetTickCount() - last <<
std::endl;
::timeEndPeriod(1);
return 0;
}
Compile with 'cl timer_queue_test.cpp /EHsc winmm.lib' under Windows XP
Pro sp2, Visual C++ 2005, AMD A64 2800 OC 3200.
X:\Test>timer_queue_test.exe
timer_proc() called, delta = 16
timer_proc() called, delta = 15
timer_proc() called, delta = 16
timer_proc() called, delta = 0
timer_proc() called, delta = 16
timer_proc() called, delta = 15
timer_proc() called, delta = 0
timer_proc() called, delta = 16
timer_proc() called, delta = 15
timer_proc() called, delta = 0
Changing timer, delta = 16
timer_proc() called, delta = 125
timer_proc() called, delta = 16
timer_proc() called, delta = 31
timer_proc() called, delta = 16
timer_proc() called, delta = 15
timer_proc() called, delta = 16
timer_proc() called, delta = 31
timer_proc() called, delta = 16
timer_proc() called, delta = 15
Deleting timer, delta = 16
Timer deleted, delta = 0

Alexander Grigoriev

2006-11-13 05:37:45 UTC

A higher priority thread gets CPU immediately as soon as it's ready to run,
without waiting for a lower priority thread to use its time slice.

Post by anton bassov
You may consider increasing base priority of your process and thread.
However, even this step does not offer an ultimate solution. Unless
currently running thread yields the CPU voluntarily, it is going to run
until its quantum expires (on Windows Server systems the default
quantum is 12 clock ticks , although on workstations it is only 2 clock
ticks ).

anton bassov

2006-11-13 05:53:01 UTC

Alexander,

Post by Alexander Grigoriev
A higher priority thread gets CPU immediately as soon as it's ready to run,
without waiting for a lower priority thread to use its time slice.

This is true....

Thank you for correction

Anton Bassov

Post by Alexander Grigoriev
A higher priority thread gets CPU immediately as soon as it's ready to run,
without waiting for a lower priority thread to use its time slice.

Post by anton bassov
You may consider increasing base priority of your process and thread.
However, even this step does not offer an ultimate solution. Unless
currently running thread yields the CPU voluntarily, it is going to run
until its quantum expires (on Windows Server systems the default
quantum is 12 clock ticks , although on workstations it is only 2 clock
ticks ).

William DePalo [MVP VC++]

2006-11-13 05:36:55 UTC

Post by Rainny
I use timer queue API to create an event object which is fired at
regular interval say 10 milliseconds. The precision seems not very
satisfying. I wonder if it's natural for a non-realtime OS, and is
there anything I can do with it? Thanks.

Before the place in your application where you first need the timer, place a
call to

timeBeginPeriod(1);

and when you are done using the timer add a

timeEndPeriod(1);

There will still be no guarantees, but chances are you'll see better
results. And on a busy system, you'll likely need to elevate the priority of
your thread / application.

Regards,
Will

Rainny

2006-11-13 05:58:04 UTC

I've already included these two calls in my code. It doesn't help,
unfortunately.

"William DePalo [MVP VC++] wrote:
"

Post by William DePalo [MVP VC++]

Post by Rainny
I use timer queue API to create an event object which is fired at
regular interval say 10 milliseconds. The precision seems not very
satisfying. I wonder if it's natural for a non-realtime OS, and is
there anything I can do with it? Thanks.

Before the place in your application where you first need the timer, place a
call to
timeBeginPeriod(1);
and when you are done using the timer add a
timeEndPeriod(1);
There will still be no guarantees, but chances are you'll see better
results. And on a busy system, you'll likely need to elevate the priority of
your thread / application.
Regards,
Will

William DePalo [MVP VC++]

2006-11-13 06:54:23 UTC

Post by Rainny
I've already included these two calls in my code. It doesn't help,
unfortunately.

Oh sorry. I didn't scrutinize your code but when I looked that the 125 ms
delay, I just assumed that you could do better by fiddling with the timer.

Looking at it now, though, I wonder if your attempt to display the elapsed
time is skewing the results (console I/O is slow). You might want to store
the tick for the period that you time in an array and then display the
results when you are done.

Regards,
Will

Rainny

2006-11-13 07:42:08 UTC

Will:

The following code and output shows that 10ms is the precision limit of
timer queue.

#define _WIN32_WINNT 0x0500
#include <windows.h>
#include <iostream>
#include <vector>

volatile DWORD last;
std::vector<DWORD> delta;

VOID CALLBACK timer_proc(PVOID, BOOLEAN)
{
DWORD cur = ::GetTickCount();
delta.push_back(cur - last);
last = cur;
}

int main(int argc, char *argv[])
{
::timeBeginPeriod(1);

delta.reserve(10000);
last = ::GetTickCount();

// Create timer queue
HANDLE handle;
::CreateTimerQueueTimer(&handle, NULL, timer_proc, NULL, 10, 10, 0);

::Sleep(315);

::DeleteTimerQueueTimer(NULL, handle, INVALID_HANDLE_VALUE);

for (std::vector<DWORD>::iterator it = delta.begin(); it !=
delta.end();
++ it) {
std::cout << "delta = " << *it << std::endl;
}

::timeEndPeriod(1);

return 0;
}

X:\Test>timer_queue_test.exe
delta = 15
delta = 16
delta = 0
delta = 16
delta = 15
delta = 0
delta = 16
delta = 15
delta = 0
delta = 16
delta = 16
delta = 0
delta = 15
delta = 16
delta = 0
delta = 16
delta = 15
delta = 16
delta = 0
delta = 15
delta = 16
delta = 0
delta = 16
delta = 15
delta = 0
delta = 16
delta = 16
delta = 0
delta = 15

Post by William DePalo [MVP VC++]

Post by Rainny
I've already included these two calls in my code. It doesn't help,
unfortunately.

Oh sorry. I didn't scrutinize your code but when I looked that the 125 ms
delay, I just assumed that you could do better by fiddling with the timer.
Looking at it now, though, I wonder if your attempt to display the elapsed
time is skewing the results (console I/O is slow). You might want to store
the tick for the period that you time in an array and then display the
results when you are done.
Regards,
Will

Tom Widmer [VC++ MVP]

2006-11-13 10:39:30 UTC

Post by Rainny
I use timer queue API to create an event object which is fired at
regular interval say 10 milliseconds. The precision seems not very
satisfying. I wonder if it's natural for a non-realtime OS, and is
there
anything I can do with it? Thanks.
#define _WIN32_WINNT 0x0500
#include <windows.h>
#include <iostream>
static volatile DWORD last;
VOID CALLBACK timer_proc(PVOID, BOOLEAN)
{
DWORD cur = ::GetTickCount();

GetTickCount() has terrible resolution. Use timeGetTime() or
QueryPerformanceCounter instead.

With that change, the accuracy looks fine (I get 10 or 11ms for each
delta except the first 2 - increasing the time until the first wakeup
fixes this).

Tom

Rainny

2006-11-13 10:59:22 UTC

Yes, timeGetTime() works.
Thank you all!

"Tom Widmer [VC++ MVP] wrote:"

Post by Tom Widmer [VC++ MVP]

Post by Rainny
I use timer queue API to create an event object which is fired at
regular interval say 10 milliseconds. The precision seems not very
satisfying. I wonder if it's natural for a non-realtime OS, and is
there
anything I can do with it? Thanks.
#define _WIN32_WINNT 0x0500
#include <windows.h>
#include <iostream>
static volatile DWORD last;
VOID CALLBACK timer_proc(PVOID, BOOLEAN)
{
DWORD cur = ::GetTickCount();

GetTickCount() has terrible resolution. Use timeGetTime() or
QueryPerformanceCounter instead.
With that change, the accuracy looks fine (I get 10 or 11ms for each
delta except the first 2 - increasing the time until the first wakeup
fixes this).
Tom

anton bassov

2006-11-13 15:42:27 UTC

Tom,

Post by Tom Widmer [VC++ MVP]
GetTickCount() has terrible resolution. Use timeGetTime() or
QueryPerformanceCounter instead.

QueryPerformanceCounter() is quite dodgy function. It has quite
different meaning on UP and MP HALs (respectively, system timer and
RDTSC instruction), and, hence, different resolutions, so that it has
to be taken into account when converting measured period to ms. In
fact, it is not supposed to be used for measuring elapsed time, in the
first place (only for timestamps)

Furthermore, it disables interrupts on some platforms, so that calling
it too frequently may have undesireable consequences for the system....

Anton Bassov

Post by Tom Widmer [VC++ MVP]

Post by Rainny
I use timer queue API to create an event object which is fired at
regular interval say 10 milliseconds. The precision seems not very
satisfying. I wonder if it's natural for a non-realtime OS, and is
there
anything I can do with it? Thanks.
#define _WIN32_WINNT 0x0500
#include <windows.h>
#include <iostream>
static volatile DWORD last;
VOID CALLBACK timer_proc(PVOID, BOOLEAN)
{
DWORD cur = ::GetTickCount();

GetTickCount() has terrible resolution. Use timeGetTime() or
QueryPerformanceCounter instead.
With that change, the accuracy looks fine (I get 10 or 11ms for each
delta except the first 2 - increasing the time until the first wakeup
fixes this).
Tom

Tom Widmer [VC++ MVP]

2006-11-13 16:25:36 UTC

Post by anton bassov
Tom,

Post by Tom Widmer [VC++ MVP]
GetTickCount() has terrible resolution. Use timeGetTime() or
QueryPerformanceCounter instead.

QueryPerformanceCounter() is quite dodgy function. It has quite
different meaning on UP and MP HALs (respectively, system timer and
RDTSC instruction), and, hence, different resolutions, so that it has
to be taken into account when converting measured period to ms. In
fact, it is not supposed to be used for measuring elapsed time, in the
first place (only for timestamps)

Is this Microsoft's advice, or yours? If MS's, do you have a citation?
Finally, what do recommend as a general purpose solution for
sub-millisecond accurate time measurements?

Post by anton bassov
Furthermore, it disables interrupts on some platforms, so that calling
it too frequently may have undesireable consequences for the system....

MS recommend not calling it more than necessary:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directx9_c/Game_Timing_and_Multicore_Processors.asp

Tom

anton bassov

2006-11-13 17:04:29 UTC

Tom,

Post by Tom Widmer [VC++ MVP]
Is this Microsoft's advice, or yours? If MS's, do you have a citation?

Calling QueryPerformanceCounter() eventually results in
KeQueryPerformanceCounter() call in the kernel mode.
This is what KeQueryPerformanceCounter() documentation in WDK says on
the subject

//////////////////
KeQueryPerformanceCounter is intended for time-stamping packets or for
computing performance and capacity measurements. It is not intended for
measuring elapsed time, for computing stalls or waits, or for
iterations.

Use this routine as infrequently as possible. Depending on the
platform, KeQueryPerformanceCounter can disable system-wide interrupts
for a minimal interval. Consequently, calling this routine frequently,
as in an iteration, defeats its purpose of returning very fine-grained,
running time-stamp information. Calling this routine too frequently can
degrade I/O performance for the calling driver and for the system as a
whole.
////////////

Post by Tom Widmer [VC++ MVP]
Finally, what do recommend as a general purpose solution for
sub-millisecond accurate time measurements?

Why would you need it, in the first place???? However, if you are just
desperate, you can try RDTSC instruction.
Just keep in mind that:

1. Every CPU on MP system has its own counter. Therefore, if you intend
to use this instruction, you have to make sure the target thread is
allowed to run only on one CPU

2. What happens if interrupt or context switch occurs while your code
executes????? How will it affect the accuracy of your results???? In
practical terms, once you have no control over the above mentioned
things in the user mode whatsoever, doing it in the user mode just
defeats the purpose

Anton Bassov

Post by Tom Widmer [VC++ MVP]

Post by anton bassov
Tom,

Post by Tom Widmer [VC++ MVP]
GetTickCount() has terrible resolution. Use timeGetTime() or
QueryPerformanceCounter instead.

QueryPerformanceCounter() is quite dodgy function. It has quite
different meaning on UP and MP HALs (respectively, system timer and
RDTSC instruction), and, hence, different resolutions, so that it has
to be taken into account when converting measured period to ms. In
fact, it is not supposed to be used for measuring elapsed time, in the
first place (only for timestamps)

Is this Microsoft's advice, or yours? If MS's, do you have a citation?
Finally, what do recommend as a general purpose solution for
sub-millisecond accurate time measurements?

Post by anton bassov
Furthermore, it disables interrupts on some platforms, so that calling
it too frequently may have undesireable consequences for the system....

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directx9_c/Game_Timing_and_Multicore_Processors.asp
Tom

Tim Roberts

2006-11-15 09:00:51 UTC

Post by anton bassov

Post by Tom Widmer [VC++ MVP]
Finally, what do recommend as a general purpose solution for
sub-millisecond accurate time measurements?

Why would you need it, in the first place????

Are you saying there is no need for submillisecond time measurements? You
must live in a different driver universe from me.

Post by anton bassov
However, if you are just desperate, you can try RDTSC instruction.

Remember, however, that with a multiprocessor HAL,
KeQueryPerformanceCounter does nothing other than read RDTSC, so all of the
RDTSC cautions apply to KQPC, and hence QPC.

Post by anton bassov
1. Every CPU on MP system has its own counter. Therefore, if you intend
to use this instruction, you have to make sure the target thread is
allowed to run only on one CPU

It annoys me that this is now a problem. The earlier versions of NT
synchronized the cycle counters of a multiCPU system, so that they were a
few dozen cycles apart -- a negligible difference. Today, when I test,
it's not uncommon to find the two processors tens of millions of cycles
apart. The delta is a constant, so it should be trivial to compensate for
this.

--
Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.

anton bassov

2006-11-15 16:42:17 UTC

Post by Tim Roberts
Are you saying there is no need for submillisecond time measurements? You
must live in a different driver universe from me.

Actually, the only thing I am saying is that, once user-mode code
cannot protect itself against interrupts and context switches, there is
no guarantee that its measuring results are
perfectly accurate, which just defeats the purpose of obtaining
high-precision measurement from the user mode. I think this is the main
reason why kernel32.dll does not provide any function(apart from QPC,
of course) that measures time on sub-millisecond scale.

Post by Tim Roberts
Remember, however, that with a multiprocessor HAL,
KeQueryPerformanceCounter does nothing other than read RDTSC, so all of the
RDTSC cautions apply to KQPC, and hence QPC.

This is the only reason why I mentioned RDTSC, in the first place - if
Tom agrees that, due to caution 2, it just does not make sense to
measure time with RDTSC in the user mode, he will,apparently, accept my
argument against using QueryPerformanceCounter() for this purpose from
the user mode as well

Anton Bassov

Post by Tim Roberts

Post by anton bassov

Post by Tom Widmer [VC++ MVP]
Finally, what do recommend as a general purpose solution for
sub-millisecond accurate time measurements?

Why would you need it, in the first place????

Are you saying there is no need for submillisecond time measurements? You
must live in a different driver universe from me.

Post by anton bassov
However, if you are just desperate, you can try RDTSC instruction.

Remember, however, that with a multiprocessor HAL,
KeQueryPerformanceCounter does nothing other than read RDTSC, so all of the
RDTSC cautions apply to KQPC, and hence QPC.

Post by anton bassov
1. Every CPU on MP system has its own counter. Therefore, if you intend
to use this instruction, you have to make sure the target thread is
allowed to run only on one CPU

It annoys me that this is now a problem. The earlier versions of NT
synchronized the cycle counters of a multiCPU system, so that they were a
few dozen cycles apart -- a negligible difference. Today, when I test,
it's not uncommon to find the two processors tens of millions of cycles
apart. The delta is a constant, so it should be trivial to compensate for
this.
--
Providenza & Boekelheide, Inc.

Ben Voigt

2006-11-24 16:25:47 UTC

Post by Tim Roberts

Post by anton bassov

Post by Tom Widmer [VC++ MVP]
Finally, what do recommend as a general purpose solution for
sub-millisecond accurate time measurements?

Why would you need it, in the first place????

Are you saying there is no need for submillisecond time measurements? You
must live in a different driver universe from me.

Post by anton bassov
However, if you are just desperate, you can try RDTSC instruction.

Remember, however, that with a multiprocessor HAL,
KeQueryPerformanceCounter does nothing other than read RDTSC, so all of the
RDTSC cautions apply to KQPC, and hence QPC.

Post by anton bassov
1. Every CPU on MP system has its own counter. Therefore, if you intend
to use this instruction, you have to make sure the target thread is
allowed to run only on one CPU

It annoys me that this is now a problem. The earlier versions of NT
synchronized the cycle counters of a multiCPU system, so that they were a
few dozen cycles apart -- a negligible difference. Today, when I test,
it's not uncommon to find the two processors tens of millions of cycles
apart. The delta is a constant, so it should be trivial to compensate for
this.

The delta is a constant, until you run SpeedStep, Cool'n'Quiet, PowerNow, or
any of the algorithms that vary clock rate and may not do so for all
processors/cores simultaneously.

Post by Tim Roberts
--
Providenza & Boekelheide, Inc.

15 Replies
1029 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Rainny 2006-11-13 02:53:13 UTC

anton bassov 2006-11-13 04:13:42 UTC

Alexander Grigoriev 2006-11-13 05:37:45 UTC

anton bassov 2006-11-13 05:53:01 UTC

William DePalo [MVP VC++] 2006-11-13 05:36:55 UTC

Rainny 2006-11-13 05:58:04 UTC

William DePalo [MVP VC++] 2006-11-13 06:54:23 UTC

Rainny 2006-11-13 07:42:08 UTC

Tom Widmer [VC++ MVP] 2006-11-13 10:39:30 UTC

Rainny 2006-11-13 10:59:22 UTC

anton bassov 2006-11-13 15:42:27 UTC

Tom Widmer [VC++ MVP] 2006-11-13 16:25:36 UTC

anton bassov 2006-11-13 17:04:29 UTC

Tim Roberts 2006-11-15 09:00:51 UTC

anton bassov 2006-11-15 16:42:17 UTC

Ben Voigt 2006-11-24 16:25:47 UTC

about - legalese

Loading...