使用crash工具分析linux 的crash dump文件的常用命令

December 09, 2010 | 6 Minute Read

Crash 工具主页
http://people.redhat.com/anderson/

(1) “ crash + dump 文件 + 调试版本内核“命令,启动crash
例如:
crash vmcore vmlinux-2.6.18-92.EL.fulldebug

注意这个内核应该和你dump的机器的内核相对应,这里还可以指定 system.map的符号文件等。


(2) bt 命令查看调用堆栈,估计的最有效的一个命令的了。

bt + pid 列出相应的进程堆栈

bt -f 会列出所有堆栈里面数据  

比如

crash> bt -f 1854
PID: 1854   TASK: ffff810834d770c0  CPU: 3   COMMAND: "oracle"
 #0 [ffff81011d6bbbb0] crash_kexec at ffffffff800aa977
    ffff81011d6bbbb8: ffff81085d100000 ffff81085d0f0000 
    ffff81011d6bbbc8: 0000000000000004 0000000000000004 
    ffff81011d6bbbd8: 0000000000000040 ffffffff881922dd 
    ffff81011d6bbbe8: ffff81085d838000 0000000000000009 
    ffff81011d6bbbf8: ffff81085e6a11c0 0000000000000000 
    ffff81011d6bbc08: 0000000000000000 0000000000000000 
    ffff81011d6bbc18: 00000000000003f9 0000000000000000 
    ffff81011d6bbc28: 0000000000000000 ffffffff80198c24 
    ffff81011d6bbc38: ffffffff800aa956 0000000000000010 
    ffff81011d6bbc48: 0000000000000046 ffff81011d6bbbb8 
    ffff81011d6bbc58: 0000000000000018 ffffffff80080605 
    ffff81011d6bbc68: ffffffff881922dd ffffffff8008f44e 
 #1 [ffff81011d6bbc70] panic at ffffffff8008f44e
    ffff81011d6bbc78: 0000003000000020 ffff81011d6bbd68 
    ffff81011d6bbc88: ffff81011d6bbc98 ffff81011e82eac0 
    ffff81011d6bbc98: 0000000000000007 ffff81085d1b3350 
    ffff81011d6bbca8: 00000000000040c8 0000000000004400 
    ffff81011d6bbcb8: ffff8104ab8e1e80 0000000000000004 
    ffff81011d6bbcc8: 0000000000000000 ffff81011e82eac0 
    ffff81011d6bbcd8: 0000000000000002 0000000000000000 
    ffff81011d6bbce8: 0000000000000004 ffffffff8818e317 
    ffff81011d6bbcf8: ffff8100351e9380 ffff810000000002 
    ffff81011d6bbd08: ffff81085d1b3350 0000022000000046 
    ffff81011d6bbd18: ffff81011e87ae60 0000000000004400 
    ffff81011d6bbd28: ffff81011dc6f790 ffff81011dc6f790 
    ffff81011d6bbd38: 0000000000000000 ffff81011dc6f790 
    ffff81011d6bbd48: ffff81011dc6f790 ffff8100351e9380 
    ffff81011d6bbd58: 0000000000000040 ffffffff8818e734 
 #2 [ffff81011d6bbd60] myfunction1 at ffffffff8818e734    ///ffff81011d6bbd60 这里是call指令压入的返回地址
    ffff81011d6bbd68: ffff81085d1b3350 ffff81085d322d00 
    ffff81011d6bbd78: ffff81085d1b3350 00000000000020c8 
    ffff81011d6bbd88: ffff81085d099400 ffffffff88186c4e   //ffff81011d6bbd88 是函数栈的起始位置
 #3 [ffff81011d6bbd90] myfunction2 at ffffffff88186c4e    // 其他栈可以分析,可以结合函数汇编,看看push指令和rsp等寄存器的修改等。
    ffff81011d6bbd98: 0000000000000000 0000000000000000 
    ffff81011d6bbda8: ffff81085d0f0000 ffff81085d099400 
    ffff81011d6bbdb8: 0000000000000000 ffffffff88187845 
 #4 [ffff81011d6bbdc0] myfunction3 ffffffff88187845


x86_64函数参数,大多都通过寄存器传递,寄存器的使用顺序都是固定的,可以自己写个简单函数测试一下,我以前一篇文章也有提到。局部变量一般通过 
mov 0x20(%rsp), rax  这样的居于rsp寄存器栈顶偏移来定位。




(3) mod 命令用来加载调试符号,有时一些结构或者函数的符号信息不在调试版本内核里面,需要用gcc的 -g选项编译自己的模块,然后用mod命令加载里面的调试信息。这样 sym 和whatis 命令就能正确解释我们自己模块里面自定义的结构等信息。



crash> mod
     MODULE       NAME                   SIZE  OBJECT FILE
ffffffff88009e80  ehci_hcd              65741  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88019280  ohci_hcd              54493  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88028e00  uhci_hcd              57433  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88041c00  jbd                   93873  (not loaded)  [CONFIG_KALLSYMS]
ffffffff8806bb00  ext3                 167249  (not loaded)  [CONFIG_KALLSYMS]
ffffffff880a8c00  scsi_mod             188665  (not loaded)  [CONFIG_KALLSYMS]
ffffffff880b8980  sd_mod                56257  (not loaded)  [CONFIG_KALLSYMS]
ffffffff881bab80  pcspkr                36289  (not loaded)  [CONFIG_KALLSYMS]
ffffffff881ca780  edac_mc               60193  (not loaded)  [CONFIG_KALLSYMS]
ffffffff881dd200  shpchp                70765  (not loaded)  [CONFIG_KALLSYMS]
ffffffff881eac00  serio_raw             40517  (not loaded)  [CONFIG_KALLSYMS]
ffffffff881f6280  i5000_edac            42177  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88208f00  sg                    69993  (not loaded)  [CONFIG_KALLSYMS]
ffffffff8821b600  cdrom                 68713  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88229400  sr_mod                50789  (not loaded)  [CONFIG_KALLSYMS]
ffffffff8823cb00  parport               73165  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88249080  lp                    47121  (not loaded)  [CONFIG_KALLSYMS]
ffffffff8825a000  parport_pc            62313  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88265500  ac                    38729  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88270a80  acpi_memhotplug       40133  (not loaded)  [CONFIG_KALLSYMS]
ffffffff8827e480  asus_acpi             50917  (not loaded)  [CONFIG_KALLSYMS]
ffffffff8828a900  battery               43849  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88295c00  button                40545  (not loaded)  [CONFIG_KALLSYMS]
ffffffff882a4900  i2c_core              56129  (not loaded)  [CONFIG_KALLSYMS]
ffffffff882af480  i2c_ec                38593  (not loaded)  [CONFIG_KALLSYMS]
ffffffff882ba980  backlight             39873  (not loaded)  [CONFIG_KALLSYMS]
ffffffff882c8080  sbs                   49921  (not loaded)  [CONFIG_KALLSYMS]
ffffffff882d6d80  video                 53197  (not loaded)  [CONFIG_KALLSYMS]
ffffffff882efa80  dm_mod                99737  (not loaded)  [CONFIG_KALLSYMS]
ffffffff882fec80  dm_multipath          52945  (not loaded)  [CONFIG_KALLSYMS]
ffffffff8830ea80  dm_mirror             60617  (not loaded)  [CONFIG_KALLSYMS]
ffffffff8831dd80  autofs4               57289  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88329280  crypto_api            42177  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88335900  xfrm_nalgo            43845  (not loaded)  [CONFIG_KALLSYMS]
ffffffff88340000  hangcheck_timer       37465  (not loaded)  [CONFIG_KALLSYMS]
crash> 
crash> 
crash> mod -d test_module
crash> mod -s test_module  test_module.ko
     MODULE       NAME                   SIZE  OBJECT FILE
ffffffff88198180  test_module         165128  test_module.ko 




(4) struct 命令把某个地址当成一个结构解析出来,很好用的一个命令,当然前提是你的结构的symbol符号信息已经用mod命令加载或者已经存在。
还有你要知道结构的地址 ^_^ 这个可以通过panic的报告还有分析函数堆栈得到。类似的命令还有p等

crash> struct skb_shared_info  ffff8104ab8e1e80
struct skb_shared_info {
  dataref = {
    counter = 1
  }, 
  nr_frags = 4, 
  gso_size = 0, 
  gso_segs = 0, 
  gso_type = 0, 
  ip6_frag_id = 0, 
  frag_list = 0x0, 
  frags = {
      page = 0xffff810103eb5a00, 
      page_offset = 0, 
      size = 4096
    }, {
      page = 0xffff810103eb5a38, 
      page_offset = 0, 
      size = 616
    }, {
      page = 0xffff81011b9daca8, 
      page_offset = 0, 
      size = 4096
    }, 


http://people.redhat.com/anderson/crash_whitepaper/help_pages/mod.html
的说明
NAME
  mod - module information and loading of symbols and debugging data

SYNOPSIS
  mod [ -s module [objfile] | -d module | -S [directory] | -D | -r | -o ] 

DESCRIPTION
  With no arguments, this command displays basic information of the currently
  installed modules, consisting of the module address, name, size, the
  object file name (if known), and whether the module was compiled with
  CONFIG_KALLSYMS.
 
  The arguments are concerned with with the loading or deleting of symbolic
  and debugging data from a module's object file.  A modules's object file
  always contains symbolic data (symbol names and addresses), but contains
  debugging data only if the module was compiled with the -g CFLAG.  In
  addition, the module may have compiled with CONFIG_KALLSYMS, which means
  that the module's symbolic data will have been loaded into the kernel's
  address space when it was installed.  If the module was not compiled with
  CONFIG_KALLSYMS, then only the module's exported symbols will be loaded
  into the kernel's address space.  Therefore, for the purpose of this
  command, it should noted that a kernel module may have been compiled in
  one of following manners:

  1. If the module was built without CONFIG_KALLSYMS and without the -g CFLAG,
     then the loading of the module's additional non-exported symbols can
     be accomplished with this command.
  2. If the module was built with CONFIG_KALLSYMS, but without the -g CFLAG,
     then there is no benefit in loading the symbols from the module object
     file, because all of the module's symbols will have been loaded into the
     kernel's address space when it was installed.
  3. If the module was built with CONFIG_KALLSYMS and with the the -g CFLAG,
     then the loading of the module's debugging data can be accomplished
     with this command.
  4. If the module was built without CONFIG_KALLSYMS but with the -g CFLAG,
     then the loading of the both module's symbolic and debugging data can
     be accomplished with this command.
 
  -s module [objfile]  Loads symbolic and debugging data from the object file
                       for the module specified.  If no objfile argument is
                       appended, a search will be made for an object file
                       consisting of the module name with a .o or .ko suffix,
                       starting at the /lib/modules/<release> directory on
                       the host system.  If an objfile argument is appended,
                       then that file will be used.
            -d module  Deletes the symbolic and debugging data of the module
                       specified.
       -S [directory]  Load symbolic and debugging data from the object file
                       for all loaded modules.  For each module, a search
                       will be made for an object file consisting of the
                       module name with a .o or.ko suffix, starting at the
                       /lib/modules/<release> directory of the host system.
                       If a directory argument is appended, then the search
                       will be restricted to that directory.
                   -D  Deletes the symbolic and debugging data of all modules.
                   -r  Reinitialize module data. All currently-loaded symbolic
                       and debugging data will be deleted, and the installed
                       module list will be updated (live system only).
                   -o  Load module symbols with old mechanism.
 
  After symbolic and debugging data have been loaded, backtraces and text
  disassembly will be displayed appropriately.  Depending upon the processor
  architecture, data may also printed symbolically with the "p" command;
  at a minimum, the "rd" command may be used with module data symbols.
 
  If crash can recognize that the set of modules has changed while running a
  session on a live kernel, the module data will be reinitialized the next
  time this command is run; the -r option forces the reinitialization.

====================================================




(5) 其他命令

dmesg  查看内核崩溃时的  log 
rd  读相应的内存等。
可以用help命令查看帮助
dis -l 可以反汇编出错的地址,并且列出源代码的行数


crash32> help dis

NAME
  dis - disassemble

SYNOPSIS
  dis [-r][-l][-u][-b [num]] [address | symbol | (expression)] [count]

DESCRIPTION
  This command disassembles source code instructions starting (or ending) at
  a text address that may be expressed by value, symbol or expression:

            -r  (reverse) displays all instructions from the start of the 
                routine up to and including the designated address.
            -l  displays source code line number data in addition to the 
                disassembly output.
            -u  address is a user virtual address in the current context;
                otherwise the address is assumed to be a kernel virtual address.
                If this option is used, then -r and -l are ignored.
      -b [num]  modify the pre-calculated number of encoded bytes to skip after
                a kernel BUG ("ud2a") instruction; with no argument, displays
                the current number of bytes being skipped. (x86 and x86_64 only)
       address  starting hexadecimal text address.
        symbol  symbol of starting text address.  On ppc64, the symbol

crash32> dis -l c012c033
/BUILD/linux-2.4.9-e.71/polled_io.c: 177
0xc012c033 <polled_io_poll+83>: test   %eax,0xc(%ebx)