Discussion:
print_backtrace not printing full stack
Mike Mason
2006-08-10 19:35:10 UTC
Permalink
Is there some trick to getting print_backtrace to work? The following
script:

probe syscall.open.return {
if (isinstr(retstr, "ENOENT")) {
printf("%s: %s (%s) = ", execname(), name, argstr)
printf("%s\n", retstr)
print_backtrace();
}
}

yields this output:

ls: open () = -2 (ENOENT)
trace for 7416 (ls)
0xffffffff8011c7c5 : kretprobe_trampoline+0x1/0x4 []

I get one line from the stack and that's it. I assume I should see the
full stack.

Mike
Frank Ch. Eigler
2006-08-10 20:45:22 UTC
Permalink
Post by Mike Mason
Is there some trick to getting print_backtrace to work? The following
probe syscall.open.return { [...]
print_backtrace();
I get one line from the stack and that's it. I assume I should see the
full stack.
If by "full stack" you mean to include the user-space program that
performed open(2), then unfortunately we can't do that yet. We also
know that kretprobes did or does have problems with corrupting
backtraces for even plain nested kernel-side probes. To some extent,
one might work around this by storing the backtrace at the point of
syscall entry thusly:

global bt
probe syscall.open { bt[tid()] = backtrace() }
probe syscall.open.return { if (...) { print_stack (bt[tid()]) }
/* delete bt[tid()] */ }

- FChE
Mike Mason
2006-08-10 21:35:27 UTC
Permalink
Hmm... I didn't think about the user portion of the stack being missing. Perhaps open() isn't a good example. I tried probing the entry to open()and printing the backtrace. Saw the same results. Then I tried a few other probe points, including schedule() in sched.c. No matter where I probe, I only get one line of the stack.

Mike
Post by Frank Ch. Eigler
Post by Mike Mason
Is there some trick to getting print_backtrace to work? The following
probe syscall.open.return { [...]
print_backtrace();
I get one line from the stack and that's it. I assume I should see the
full stack.
If by "full stack" you mean to include the user-space program that
performed open(2), then unfortunately we can't do that yet. We also
know that kretprobes did or does have problems with corrupting
backtraces for even plain nested kernel-side probes. To some extent,
one might work around this by storing the backtrace at the point of
global bt
probe syscall.open { bt[tid()] = backtrace() }
probe syscall.open.return { if (...) { print_stack (bt[tid()]) }
/* delete bt[tid()] */ }
- FChE
Jim Keniston
2006-08-11 21:43:57 UTC
Permalink
Post by Mike Mason
Hmm... I didn't think about the user portion of the stack being missing.
Perhaps open() isn't a good example. I tried probing the entry to open()
and printing the backtrace. Saw the same results.
You're probing sys_open(), which is dispatched directly from the system
call table, at least on some architectures. On i386, I'd expect to see
an entry for sys_open and one for the code that called sys_open
(sysenter_past_esp in entry.S), but that's all. sysenter_past_esp is a
label in what is essentially the entry point into the kernel for system
calls.

Which architecture are you testing?
Post by Mike Mason
Then I tried a few
other probe points, including schedule() in sched.c.
No matter where I
probe, I only get one line of the stack.
That sounds like a bug. Is that one line in the probed function, in its
caller, or what? (I'm assuming you don't see the kretprobe_trampoline
line if it's not a kretprobe.)

BTW, on powerpc, at least, print_backtrace() reports the bulk of the
stack trace as one long line, apparently no matter how many functions
are reported. Are you seeing just one function reported, or multiple
functions on one line?
Post by Mike Mason
Mike
Jim
Mike Mason
2006-08-14 21:16:35 UTC
Permalink
Post by Jim Keniston
Post by Mike Mason
Hmm... I didn't think about the user portion of the stack being missing.
Perhaps open() isn't a good example. I tried probing the entry to open()
and printing the backtrace. Saw the same results.
You're probing sys_open(), which is dispatched directly from the system
call table, at least on some architectures. On i386, I'd expect to see
an entry for sys_open and one for the code that called sys_open
(sysenter_past_esp in entry.S), but that's all. sysenter_past_esp is a
label in what is essentially the entry point into the kernel for system
calls.
Which architecture are you testing?
x86_64. I just tried the same script on i386 and print_backtrace() works as
expected. The problem appears to be x86-64 specific. I'll file a bug.
Post by Jim Keniston
Post by Mike Mason
Then I tried a few
other probe points, including schedule() in sched.c.
No matter where I
probe, I only get one line of the stack.
That sounds like a bug. Is that one line in the probed function, in its
caller, or what? (I'm assuming you don't see the kretprobe_trampoline
line if it's not a kretprobe.)
It depends. If I probe sys_open(), I see sys_open() only. If I probe
schedule() in sched.c, I see a line for __sched_text_start and that's all.
You're right, I only see kretprobe_trampoline for return probes.
Post by Jim Keniston
BTW, on powerpc, at least, print_backtrace() reports the bulk of the
stack trace as one long line, apparently no matter how many functions
are reported. Are you seeing just one function reported, or multiple
functions on one line?
On powerpc, I'm seeing as you describe. All the functions appear to be
printed, but most are on one long line.

Mike
Post by Jim Keniston
Post by Mike Mason
Mike
Jim
Jim Keniston
2006-08-10 22:23:00 UTC
Permalink
Post by Frank Ch. Eigler
We also
know that kretprobes did or does have problems with corrupting
backtraces for even plain nested kernel-side probes.
Still does. You'll see "kretprobe_trampoline" in the stack trace
wherever kretprobes has hijacked the return address of a kretprobed
function. (The stack trace is not otherwise corrupted, to my
knowledge.)

It is theoretically possible, though a bit messy, for kprobes to dig out
the real return address corresponding to the Nth appearance of
kretprobe_trampoline in your stack trace. Since SystemTap has its own
implementation of backtrace(), it could be made to do that. But then
we'd run up against the Catch-22 of
a) There's no in-kernel use of this kretprobes feature ("and by the way,
SystemTap is $#%&!").
b) Don't $#%&! up the in-kernel stack-trace implementations with this
$#%&!. Fix kretprobes!

Jim
Mike Mason
2006-08-10 22:31:50 UTC
Permalink
Post by Jim Keniston
Post by Frank Ch. Eigler
We also
know that kretprobes did or does have problems with corrupting
backtraces for even plain nested kernel-side probes.
Still does. You'll see "kretprobe_trampoline" in the stack trace
wherever kretprobes has hijacked the return address of a kretprobed
function. (The stack trace is not otherwise corrupted, to my
knowledge.)
I can live with kretprobe_trampoline in the stack trace of a return probe.
The problem is that's all I'm seeing :-)

Mike
Roland McGrath
2006-08-10 22:42:50 UTC
Permalink
The new backtrace support enabled by CONFIG_UNWIND uses dwarf unwinding for
proper backtraces (and ia64 has its own similar reality). So perhaps all
that's needed is to define some DWARF CFI for the kretprobe trampoline code
that will recover the real frame.
Loading...