Debugging
This is a small summary for beginners on
how to debug a program
(experts in turn should read it and then send me corrections/additions :-)
Since this is a complex issue I just try to explain the first steps
of locating errors which correspond to a "fatal" signal,
like
SIGSEGV: In simple words this relates to a problem that is
so crucial that the operating system notices about it and forces the
program to quit. More subtle problems are not caught this way ...
Even non-programmers should be able to handle this, however
one should be roughly familiar with the operating system being used.
Tools
Ideally you should have the
source code for
the application and all involved libraries. However I assume that standard
libraries don't easily crash, but give some "reasonable"
feedback so I concentrate on bugs within the non-standard parts.
An essential prerequisite is a
debugger, a tool which
allows you to run an application step by step and examine the internal data.
On many systems the GNU debugger (gdb) is present,
however it may be inferior to commercial ones (like dbx, ladebug, etc.).
Two basic kinds of debugging applications are available:
-
Source code checker. The most famous example being lint.
On the "freeware side of life" check out
splint (former "lclint")!
-
Others, to work with the existing executable, might be runtime or even
post-mortem analysis. Examples for such are
malloc debugging libraries and
profiling tools.
I try to outline the procedure to be followed and specify some details
within the brackets [] on a fictitious example written in C which assumes
gcc (GNU C compiler) and gdb are present to create and debug
an executable "foo.exe". Note that the short versions of the gdb command s
are used here; these are less descriptive than the full command strings,
but I'm too lazy to write them here ...
-
Prepare a set of debuggable binaries
(i.e. libraries and executables).
This involves specifying a flag to the compiler (and
linker) [-g] upon compiling and linking. Otherwise
the compiler/linker may not store valuable information which shows the
corresponding source to object code and vice versa.
gcc -o foo.exe -O0 -g foo.c
-
You should disable optimization when compiling
[
-O0 is used for many compilers]!
Note that this may have apparently subtle effects:
-
Run the debugger on the executable
[
gdb foo.exe]
-
Optionally set a breakpoint on
exit()
[b exit]
The debugger will stop execution when the given location
(might relate to a line in the source code, a function, etc.)
is being reached which has been assigned to a breakpoint.
Then the command prompt is presented again.
Check below for more information about
exit().
Stopping there is not always necessary, but required when looking for
X protocol errors.
-
Actually
run the program
[
r].
Take care here if debugging
X protocol errors.
In case you need to specify commandline arguments
you can do this before running the code either by an explicit call
[set args bar]
or "implicitly"
[r bar].
-
Now try to reproduce the bug in the
most simple fashion, i.e. minimize the number of "actions" like
keystrokes and mouseclicks, use most simple input data file, etc.
Remember exactly (even better write down) what you do here!
-
The debugger commandline
will tell you technically what
has happened (e.g. a segmentation fault,
SIGSEGV,
has been caught). Now the important thing is to locate it in detail.
Produce a stack trace
[where].
This will show the location within the program where the
crash happened.
Attach a copy of this output to your bug report!
-
Given the crash has happened in a function for which you have
the source code (I assume so) you can now
list the source code, i.e.
the "bogus" line
[
l].
The next step would be examining further details (stack itself,
arguments to current function, invalid data, etc.).
Advanced Issues
Though I can not explain further procedures in this recipe-like
style, I want to give some "well-known" hints & ideas and
address some standard tasks to be done.
An
X protocol error happens when some improper request
was sent to the X server. It can happen when programming directly
the
Xlib, or if the used toolkit issues such a broken
request. So if you're not actually programming low-level
or working on an X11-based GUI library this error won't be
your fault.
This section is also useful if you're a newby to debugging X11 programs
in general, it's not only limited to X protocol errors only.
Within the debugging procedure outlined
above you should check on how to set a breakpoint on
exit().
Check whether the application uses the X Intrinsics (Xt)
library (e.g. run ldd foo.exe
outside gdb within a shell). If it's listed there run the application with the
-sync option [r -sync].
If not (pure Xlib) set the global variable _Xdebug
from within a debugger or even within the source code
near the begin of main().
Note that this may also change the "location" and/or "appearance"
of the bug or even cause it to disappear!
Alternatively one may trigger on the interfaces which the X11 libraries call
if there's a problem which they can detect:
You may try setting breakpoint to the Xlib calls
_XDefaultError(), _XError(), _XIOError()
and Intrinsic lib calls
XtError(), XtWarning().
If things go wrong upon system/libc calls from within that
libraries, those interfaces won't be called so you rely on
your libc only.
If your breakpoint on exit()
doesn't help when an X11 application crashes, check out
the alternatives.
An important issue while debugging X applications is that
one can very easily lock up the displayso that mouse and keyboard may
no longer accept input. To avoid these issues there are some workarounds:
-
Run the debugger on a console (or PM window/fullscreen session if on OS/2).
-
Run the application in
Xvfb (X Virtual
Framebuffer), Xnest or on another
(local) display (e.g. ":1"). While the first one
is a good idea only for code which may be examined non-interactively
(e.g. check for memory-handling, profiling, etc.) the latter two
alternatives are not only substitutes but almost better than
running on the current X server. Especially Xnest is
helpful if you need to take care of data like Atoms being stored
in the X server.
Unfortunately both tools are not always available.
Further details on debugging X11 apps can be found at these
valuable writings:
Everyone should know those tricks ...
-
Use a
malloc debugging library like
dmalloc, dbmalloc
or efence.
Probably the majority of all bugs is due to improper memory handling.
Also they might help to detect memory leaks.
-
Discover more warning flags of your compiler!
Beyond some basic problems a verbose compiler might tell you about
more subtle problems.
-
Ensure the compiler is in an ANSI conforming mode and not just
K&R or something else.
-
Try compiling/building on a different operating system/architecture
with a different compiler.
-
If not disabled (see "man ulimit") programs often produce
a core dump when they crash. The resulting data file is an image of the memory
when the app was stopped.
You can perform a post-mortem crash analysis with this file.
Run the debugger with the executable and the core dump
as arguments [gdb foo.exe core] and proceed
as explained
above.
-
exit()
is a function which is being called usually upon program termination.
Then a lot of internal clean up is done.
Setting a breakpoint there might also help when you're looking
for memory leaks. When debugging C++ apps crashing upon
exit() look for destructors being called there.
-
Sometimes an application crashes but you can't easily
detect where and your breakpoint on
exit()
doesn't help. Then you should check other "legal" procedures within
libc a program may call on exit, including
abort() and _exit().
-
main() has a similar
meaning as exit():
a lot of things will be performed which the simple-minded
programmer might not be aware of. Actually as opposed to
exit()
things will happen before the program's main()
is being called. So your program might crash due to all kinds of improper
initialization (e.g. variable assignments) and all C++ constructors being
called.
Compile Time Problems
You may even have problems to get some code compiled.
In this section I will again list a couple of ideas that have
helped me at least once to get things resolved.
Some of them may depend on the compiler being used however.
-
Macros obscure the effective code.
Try
gcc -E to get the output as it's being passed from the
preprocessor (cpp) to the compiler.
-
The former command does not tell you which macros are actually being set.
Try
gcc -E -dD and man cc for that.
-
Compilers may choke on source code which is not in the native text
(DOS vs. un*x) format or includes some special characters. While
the first are broken legacy compilers the latter is quite
reasonable behavior ...
-
Often macros shield declarations and definitions in the
system headers. If you're able to locate a declaration/definition
of a symbol which misses in your compilation in the system headers
try to check with the man page of your compiler.
(obviously the direct approach is reading the headers, but
the used macros may be deeply nested ...)
Nowadays
portability is very important.
Writing clean code saves you a lot of time and also increases the chance
to get it built easily on 64-bit machines as well.
So in case you run across problems within an application
it might be that the problem is not a source code which is totally
broken, but it has just been written & tested in a
single environment. From this point of view
portability is a crucial thing to reduce
the amount of useless debugging procedures.
In the following I briefly mention some famous portability
issues.
Language Level
-
Never use pre-ANSI interfaces, e.g. those included within
the headers
memory.h, <strings.h> or
<varargs.h>!
-
Don't use compiler or preprocessor extensions!
Famous examples are the GNU extensions __FUNCTION__ and
typeof().
-
Check whether compiler switches cause a different behavior
of the executable. An example are the compilers on machines based on
the alpha processor (AXP) which require
-ieee or
-mieee to get applications to work which rely on IEEE conformance
and proper handling of numerical exceptions.
"Bitness"
-
The famous byte order issue:
most widely used are
LITTLE_ENDIAN ("1234", on i386)
and BIG_ENDIAN ("4321").
-
Do not assume
char to be signed or unsigned!
-
Do not assume
sizeof(int) equals sizeof(long)!
-
Do not assume
sizeof(void *) equals sizeof(int)!
i.e. don't assign pointer values to int
-
Do not assume that the result type of the
sizeof
is int!
Its type (an unsigned integer type) is size_t which is defined in
the <stddef.h> header.
-
Always use full prototypes and make your function
calls to fulfill the specified signature with explicit
type casting.
-
The former rule is important when using varargs interfaces
(variable number of arguments), e.g.
va_start() from
<stdarg.h> or from other libraries like the
XtVa*() interfaces from the X Intrinsics library. Usually they
need a NULL pointer to indicate the end of their argument list.
Implicit conversion from 0 won't work, so you have to use
NULL (though even a strange defined NULL may
cause trouble).
-
Even in the beginning of the 200x decade using C9x-features is not a good idea:
at least the full set of features is rarely implemented and in even
more rare occasions you will actually find such a compiler on your target
machine ...
Bug Reporting
If you're going to write a proper bug report you need to
consider of couple of things:
First tell exactly which
version of the code you are using,
give the exact version of that distribution or the CVS checkout date.
You should specify all libraries which are involved. Run
ldd foo
to see all shared libraries linked to the executable.
(Note that
ldd is not a standard tool, it may have different names
on other platforms or even do not exist on your installation!)
If the error happens while compiling/linking always give the full command line,
perhaps even the complete output of some "make" command.
In addition you need to fully
specify your system.
"
uname -a" should give you the details about your
operating system as well as the basic hardware (CPU architecture).
Happy debugging!