Debugging

This is a small summary for beginners on how to debug a program (experts in turn should read it and then send me corrections/additions :-)
Since this is a complex issue I just try to explain the first steps of locating errors which correspond to a "fatal" signal, like SIGSEGV: In simple words this relates to a problem that is so crucial that the operating system notices about it and forces the program to quit. More subtle problems are not caught this way ...
Even non-programmers should be able to handle this, however one should be roughly familiar with the operating system being used.

Tools

Ideally you should have the source code for the application and all involved libraries. However I assume that standard libraries don't easily crash, but give some "reasonable" feedback so I concentrate on bugs within the non-standard parts. An essential prerequisite is a debugger, a tool which allows you to run an application step by step and examine the internal data. On many systems the GNU debugger (gdb) is present, however it may be inferior to commercial ones (like dbx, ladebug, etc.).
Two basic kinds of debugging applications are available:

Procedure

I try to outline the procedure to be followed and specify some details within the brackets [] on a fictitious example written in C which assumes gcc (GNU C compiler) and gdb are present to create and debug an executable "foo.exe". Note that the short versions of the gdb command s are used here; these are less descriptive than the full command strings, but I'm too lazy to write them here ...
  1. Prepare a set of debuggable binaries (i.e. libraries and executables).
    This involves specifying a flag to the compiler (and linker) [-g] upon compiling and linking. Otherwise the compiler/linker may not store valuable information which shows the corresponding source to object code and vice versa.
    gcc -o foo.exe -O0 -g foo.c
    
  2. You should disable optimization when compiling [-O0 is used for many compilers]!
    Note that this may have apparently subtle effects:
    • Bugs may come and go due to internal flaws of the compiler when changing the optimization level.
    • Bugs may also be spurious due to common optimization procedures:
      e.g. if your code contains a statement which would trigger an error on runtime it might just disappear when turning on optimization (e.g. by removal of "unused" code). The sample below is a candidate for that phenomena:
      void foo(void) {
      int dummy; /* The var "dummy" is only referenced once in the code,
                    namely the assignment below */
      
      /* try to calculate square root of (-1): */
      dummy = sqrt(-1); /* should trigger an error due to illegal argument */
      }
      
    • gcc has a shortcoming in its design: completely disabling optimization (-O0) limits its capability to detect potential problems in the source, so it won't issue warnings accordingly. Read the gcc manual!
  3. Run the debugger on the executable [gdb foo.exe]
  4. Optionally set a breakpoint on exit() [b exit]
    The debugger will stop execution when the given location (might relate to a line in the source code, a function, etc.) is being reached which has been assigned to a breakpoint. Then the command prompt is presented again.
    Check below for more information about exit(). Stopping there is not always necessary, but required when looking for X protocol errors.
  5. Actually run the program [r].
    Take care here if debugging X protocol errors.
    In case you need to specify commandline arguments you can do this before running the code either by an explicit call [set args bar] or "implicitly" [r bar].
  6. Now try to reproduce the bug in the most simple fashion, i.e. minimize the number of "actions" like keystrokes and mouseclicks, use most simple input data file, etc.
    Remember exactly (even better write down) what you do here!
  7. The debugger commandline will tell you technically what has happened (e.g. a segmentation fault, SIGSEGV, has been caught). Now the important thing is to locate it in detail.
    Produce a stack trace [where]. This will show the location within the program where the crash happened.
    Attach a copy of this output to your bug report!
  8. Given the crash has happened in a function for which you have the source code (I assume so) you can now list the source code, i.e. the "bogus" line [l].

The next step would be examining further details (stack itself, arguments to current function, invalid data, etc.).

Advanced Issues

Though I can not explain further procedures in this recipe-like style, I want to give some "well-known" hints & ideas and address some standard tasks to be done.

Resolving X Errors

An X protocol error happens when some improper request was sent to the X server. It can happen when programming directly the Xlib, or if the used toolkit issues such a broken request. So if you're not actually programming low-level or working on an X11-based GUI library this error won't be your fault. This section is also useful if you're a newby to debugging X11 programs in general, it's not only limited to X protocol errors only.

Within the debugging procedure outlined above you should check on how to set a breakpoint on exit(). Check whether the application uses the X Intrinsics (Xt) library (e.g. run ldd foo.exe outside gdb within a shell). If it's listed there run the application with the -sync option [r -sync]. If not (pure Xlib) set the global variable _Xdebug from within a debugger or even within the source code near the begin of main().
Note that this may also change the "location" and/or "appearance" of the bug or even cause it to disappear!

Alternatively one may trigger on the interfaces which the X11 libraries call if there's a problem which they can detect: You may try setting breakpoint to the Xlib calls _XDefaultError(), _XError(), _XIOError() and Intrinsic lib calls XtError(), XtWarning().
If things go wrong upon system/libc calls from within that libraries, those interfaces won't be called so you rely on your libc only. If your breakpoint on exit() doesn't help when an X11 application crashes, check out the alternatives.

An important issue while debugging X applications is that one can very easily lock up the displayso that mouse and keyboard may no longer accept input. To avoid these issues there are some workarounds:

Further details on debugging X11 apps can be found at these valuable writings:

Well-known Tricks

Everyone should know those tricks ...

Compile Time Problems

You may even have problems to get some code compiled. In this section I will again list a couple of ideas that have helped me at least once to get things resolved. Some of them may depend on the compiler being used however.

Portability Issues

Nowadays portability is very important. Writing clean code saves you a lot of time and also increases the chance to get it built easily on 64-bit machines as well.
So in case you run across problems within an application it might be that the problem is not a source code which is totally broken, but it has just been written & tested in a single environment. From this point of view portability is a crucial thing to reduce the amount of useless debugging procedures.

In the following I briefly mention some famous portability issues.

Language Level

"Bitness"

Bug Reporting

If you're going to write a proper bug report you need to consider of couple of things:
First tell exactly which version of the code you are using, give the exact version of that distribution or the CVS checkout date. You should specify all libraries which are involved. Run ldd foo to see all shared libraries linked to the executable. (Note that ldd is not a standard tool, it may have different names on other platforms or even do not exist on your installation!)
If the error happens while compiling/linking always give the full command line, perhaps even the complete output of some "make" command.
In addition you need to fully specify your system. "uname -a" should give you the details about your operating system as well as the basic hardware (CPU architecture).

Happy debugging!


Last modified on 20020102