那项技艺允许在Windows上运转Linux程序,那项技术允许在Windows上运营Linux程序

微软在2018年公布了Bash On Windows, 那项技术允许在Windows上运行Linux程序,
作者信任已经有多如牛毛稿子解释过Bash On Windows的原理,
而明日的那篇小说将会讲课怎样协调完毕二个简易的原生Linux程序运维器,
这几个运营器在用户层达成, 原理和Bash On
Windows不尽一致,相比较接近Linux上的Wine.

微软在上年揭破了Bash On Windows, 那项技术允许在Windows上运营Linux程序,
我深信已经有诸多稿子解释过Bash On Windows的原理,
而后天的那篇小说将会讲课如何协调达成三个大约的原生Linux程序运维器,
那个运转器在用户层落成, 原理和Bash On
Windows不完全平等,比较接近Linux上的Wine.

示范程序完整的代码在github上, 地址是
https://github.com/303248153/HelloElfLoader

示范程序完整的代码在github上, 地址是
https://github.com/303248153/HelloElfLoader

开班通晓ELF格式

率先让我们先精通怎么样是原生Linux程序,
以下表达摘自维基百科

In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4),[2] and later in the Tool Interface Standard,[1] it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

By design, ELF is flexible, extensible, and cross-platform, not bound to any given central processing unit (CPU) or instruction set architecture. This has allowed it to be adopted by many different operating systems on many different hardware platforms.

Linux的可执行文件格式拔取了ELF格式,
而Windows采用了PE格式,
也等于大家平时使用的exe文件的格式.

ELF格式的协会如下

图片 1

几乎上可以分成那么些有个别

  • ELF头,在文书的最初始,储存了花色和版本等消息
  • 次第头, 供程序运行时解释器(interpreter)使用
  • 节头, 供程序编译时链接器(linker)使用, 运转时不须求读节头
  • 节内容, 不一致的节效用都不雷同
    • .text 代码节,保存了第三的程序代码
    • .rodata 保存了只读的多寡,例如字符串(const char*)
    • .data 保存了可读写的数目,例如全局变量
    • 还有此外种种各个的节

让我们来其实看一下Linux可执行程序的楷模
以下的编译环境是Ubuntu 16.04 x64 + gcc 5.4.0,
编译环境不雷同大概会得出差距的结果

首先创造hello.c,写入以下的代码

#include <stdio.h>

int max(int x, int y) {
    return x > y ? x : y;
}

int main() {
    printf("max is %d\n", max(123, 321));
    printf("test many arguments %d %d %d %s %s %s %s %s %s\n", 1, 2, 3, "a", "b", "c", "d", "e", "f");
    return 100;
}

然后采取gcc编译那份代码

gcc hello.c

编译完结后您可以看看hello.c一旁多了二个a.out,
那就是linux的可执行文件了, 今后可以在linux上运转它

./a.out

您可以看看以下输出

max is 321
test many arguments 1 2 3 a b c d e f

咱俩来看看a.out包含了何等,解析ELF文件可以使用readelf命令

readelf -a ./a.out

可以观望输出了以下的消息

ELF 头:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  类别:                              ELF64
  数据:                              2 补码,小端序 (little endian)
  版本:                              1 (current)
  OS/ABI:                            UNIX - System V
  ABI 版本:                          0
  类型:                              EXEC (可执行文件)
  系统架构:                          Advanced Micro Devices X86-64
  版本:                              0x1
  入口点地址:               0x400430
  程序头起点:          64 (bytes into file)
  Start of section headers:          6648 (bytes into file)
  标志:             0x0
  本头的大小:       64 (字节)
  程序头大小:       56 (字节)
  Number of program headers:         9
  节头大小:         64 (字节)
  节头数量:         31
  字符串表索引节头: 28

节头:
  [号] 名称              类型             地址              偏移量
       大小              全体大小          旗标   链接   信息   对齐
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000400238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000400254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000400274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000400298  00000298
       000000000000001c  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           00000000004002b8  000002b8
       0000000000000060  0000000000000018   A       6     1     8
  [ 6] .dynstr           STRTAB           0000000000400318  00000318
       000000000000003f  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           0000000000400358  00000358
       0000000000000008  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          0000000000400360  00000360
       0000000000000020  0000000000000000   A       6     1     8
  [ 9] .rela.dyn         RELA             0000000000400380  00000380
       0000000000000018  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             0000000000400398  00000398
       0000000000000030  0000000000000018  AI       5    24     8
  [11] .init             PROGBITS         00000000004003c8  000003c8
       000000000000001a  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         00000000004003f0  000003f0
       0000000000000030  0000000000000010  AX       0     0     16
  [13] .plt.got          PROGBITS         0000000000400420  00000420
       0000000000000008  0000000000000000  AX       0     0     8
  [14] .text             PROGBITS         0000000000400430  00000430
       00000000000001f2  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         0000000000400624  00000624
       0000000000000009  0000000000000000  AX       0     0     4
  [16] .rodata           PROGBITS         0000000000400630  00000630
       0000000000000050  0000000000000000   A       0     0     8
  [17] .eh_frame_hdr     PROGBITS         0000000000400680  00000680
       000000000000003c  0000000000000000   A       0     0     4
  [18] .eh_frame         PROGBITS         00000000004006c0  000006c0
       0000000000000114  0000000000000000   A       0     0     8
  [19] .init_array       INIT_ARRAY       0000000000600e10  00000e10
       0000000000000008  0000000000000000  WA       0     0     8
  [20] .fini_array       FINI_ARRAY       0000000000600e18  00000e18
       0000000000000008  0000000000000000  WA       0     0     8
  [21] .jcr              PROGBITS         0000000000600e20  00000e20
       0000000000000008  0000000000000000  WA       0     0     8
  [22] .dynamic          DYNAMIC          0000000000600e28  00000e28
       00000000000001d0  0000000000000010  WA       6     0     8
  [23] .got              PROGBITS         0000000000600ff8  00000ff8
       0000000000000008  0000000000000008  WA       0     0     8
  [24] .got.plt          PROGBITS         0000000000601000  00001000
       0000000000000028  0000000000000008  WA       0     0     8
  [25] .data             PROGBITS         0000000000601028  00001028
       0000000000000010  0000000000000000  WA       0     0     8
  [26] .bss              NOBITS           0000000000601038  00001038
       0000000000000008  0000000000000000  WA       0     0     1
  [27] .comment          PROGBITS         0000000000000000  00001038
       0000000000000034  0000000000000001  MS       0     0     1
  [28] .shstrtab         STRTAB           0000000000000000  000018ea
       000000000000010c  0000000000000000           0     0     1
  [29] .symtab           SYMTAB           0000000000000000  00001070
       0000000000000660  0000000000000018          30    47     8
  [30] .strtab           STRTAB           0000000000000000  000016d0
       000000000000021a  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

程序头:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000007d4 0x00000000000007d4  R E    200000
  LOAD           0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x0000000000000228 0x0000000000000230  RW     200000
  DYNAMIC        0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                 0x00000000000001d0 0x00000000000001d0  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x0000000000000680 0x0000000000400680 0x0000000000400680
                 0x000000000000003c 0x000000000000003c  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x00000000000001f0 0x00000000000001f0  R      1

 Section to Segment mapping:
  段节...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame 
   03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.ABI-tag .note.gnu.build-id 
   06     .eh_frame_hdr 
   07     
   08     .init_array .fini_array .jcr .dynamic .got 

Dynamic section at offset 0xe28 contains 24 entries:
  标记        类型                         名称/值
 0x0000000000000001 (NEEDED)             共享库:[libc.so.6]
 0x000000000000000c (INIT)               0x4003c8
 0x000000000000000d (FINI)               0x400624
 0x0000000000000019 (INIT_ARRAY)         0x600e10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x600e18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400298
 0x0000000000000005 (STRTAB)             0x400318
 0x0000000000000006 (SYMTAB)             0x4002b8
 0x000000000000000a (STRSZ)              63 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x601000
 0x0000000000000002 (PLTRELSZ)           48 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400398
 0x0000000000000007 (RELA)               0x400380
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x400360
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400358
 0x0000000000000000 (NULL)               0x0

重定位节 '.rela.dyn' 位于偏移量 0x380 含有 1 个条目:
  偏移量          信息           类型           符号值        符号名称 + 加数
000000600ff8  000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

重定位节 '.rela.plt' 位于偏移量 0x398 含有 2 个条目:
  偏移量          信息           类型           符号值        符号名称 + 加数
000000601018  000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0
000000601020  000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)
     3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__

Symbol table '.symtab' contains 68 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000400238     0 SECTION LOCAL  DEFAULT    1 
     2: 0000000000400254     0 SECTION LOCAL  DEFAULT    2 
     3: 0000000000400274     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000400298     0 SECTION LOCAL  DEFAULT    4 
     5: 00000000004002b8     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000400318     0 SECTION LOCAL  DEFAULT    6 
     7: 0000000000400358     0 SECTION LOCAL  DEFAULT    7 
     8: 0000000000400360     0 SECTION LOCAL  DEFAULT    8 
     9: 0000000000400380     0 SECTION LOCAL  DEFAULT    9 
    10: 0000000000400398     0 SECTION LOCAL  DEFAULT   10 
    11: 00000000004003c8     0 SECTION LOCAL  DEFAULT   11 
    12: 00000000004003f0     0 SECTION LOCAL  DEFAULT   12 
    13: 0000000000400420     0 SECTION LOCAL  DEFAULT   13 
    14: 0000000000400430     0 SECTION LOCAL  DEFAULT   14 
    15: 0000000000400624     0 SECTION LOCAL  DEFAULT   15 
    16: 0000000000400630     0 SECTION LOCAL  DEFAULT   16 
    17: 0000000000400680     0 SECTION LOCAL  DEFAULT   17 
    18: 00000000004006c0     0 SECTION LOCAL  DEFAULT   18 
    19: 0000000000600e10     0 SECTION LOCAL  DEFAULT   19 
    20: 0000000000600e18     0 SECTION LOCAL  DEFAULT   20 
    21: 0000000000600e20     0 SECTION LOCAL  DEFAULT   21 
    22: 0000000000600e28     0 SECTION LOCAL  DEFAULT   22 
    23: 0000000000600ff8     0 SECTION LOCAL  DEFAULT   23 
    24: 0000000000601000     0 SECTION LOCAL  DEFAULT   24 
    25: 0000000000601028     0 SECTION LOCAL  DEFAULT   25 
    26: 0000000000601038     0 SECTION LOCAL  DEFAULT   26 
    27: 0000000000000000     0 SECTION LOCAL  DEFAULT   27 
    28: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    29: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   21 __JCR_LIST__
    30: 0000000000400460     0 FUNC    LOCAL  DEFAULT   14 deregister_tm_clones
    31: 00000000004004a0     0 FUNC    LOCAL  DEFAULT   14 register_tm_clones
    32: 00000000004004e0     0 FUNC    LOCAL  DEFAULT   14 __do_global_dtors_aux
    33: 0000000000601038     1 OBJECT  LOCAL  DEFAULT   26 completed.7585
    34: 0000000000600e18     0 OBJECT  LOCAL  DEFAULT   20 __do_global_dtors_aux_fin
    35: 0000000000400500     0 FUNC    LOCAL  DEFAULT   14 frame_dummy
    36: 0000000000600e10     0 OBJECT  LOCAL  DEFAULT   19 __frame_dummy_init_array_
    37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
    38: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    39: 00000000004007d0     0 OBJECT  LOCAL  DEFAULT   18 __FRAME_END__
    40: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   21 __JCR_END__
    41: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
    42: 0000000000600e18     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_end
    43: 0000000000600e28     0 OBJECT  LOCAL  DEFAULT   22 _DYNAMIC
    44: 0000000000600e10     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_start
    45: 0000000000400680     0 NOTYPE  LOCAL  DEFAULT   17 __GNU_EH_FRAME_HDR
    46: 0000000000601000     0 OBJECT  LOCAL  DEFAULT   24 _GLOBAL_OFFSET_TABLE_
    47: 0000000000400620     2 FUNC    GLOBAL DEFAULT   14 __libc_csu_fini
    48: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
    49: 0000000000601028     0 NOTYPE  WEAK   DEFAULT   25 data_start
    50: 0000000000601038     0 NOTYPE  GLOBAL DEFAULT   25 _edata
    51: 0000000000400624     0 FUNC    GLOBAL DEFAULT   15 _fini
    52: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.2.5
    53: 0000000000400526    22 FUNC    GLOBAL DEFAULT   14 max
    54: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
    55: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT   25 __data_start
    56: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    57: 0000000000601030     0 OBJECT  GLOBAL HIDDEN    25 __dso_handle
    58: 0000000000400630     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used
    59: 00000000004005b0   101 FUNC    GLOBAL DEFAULT   14 __libc_csu_init
    60: 0000000000601040     0 NOTYPE  GLOBAL DEFAULT   26 _end
    61: 0000000000400430    42 FUNC    GLOBAL DEFAULT   14 _start
    62: 0000000000601038     0 NOTYPE  GLOBAL DEFAULT   26 __bss_start
    63: 000000000040053c   109 FUNC    GLOBAL DEFAULT   14 main
    64: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
    65: 0000000000601038     0 OBJECT  GLOBAL HIDDEN    25 __TMC_END__
    66: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
    67: 00000000004003c8     0 FUNC    GLOBAL DEFAULT   11 _init

Version symbols section '.gnu.version' contains 4 entries:
 地址: 0000000000400358  Offset: 0x000358  Link: 5 (.dynsym)
  000:   0 (*本地*)       2 (GLIBC_2.2.5)   2 (GLIBC_2.2.5)   0 (*本地*)    

Version needs section '.gnu.version_r' contains 1 entries:
 地址:0x0000000000400360  Offset: 0x000360  Link: 6 (.dynstr)
  000000: 版本: 1  文件:libc.so.6  计数:1
  0x0010:名称:GLIBC_2.2.5  标志:无  版本:2

Displaying notes found at file offset 0x00000254 with length 0x00000020:
  Owner                 Data size   Description
  GNU                  0x00000010   NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 2.6.32

Displaying notes found at file offset 0x00000274 with length 0x00000024:
  Owner                 Data size   Description
  GNU                  0x00000014   NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: debd3d7912be860a432b5c685a6cff7fd9418528

从地点的新闻中大家可以清楚那个文件的项目是ELF64,
约等于6几个人的可执行程序, 并且有柒个程序头和3二个节头,
各样节的功效大家能够在网上找到资料, 那篇小说中只涉嫌到以下的节

  • .init 程序开端化的代码
  • .rela.dyn 必要重平素的变量列表
  • .rela.plt 必要重一直的函数列表
  • .plt 调用动态链接函数的代码
  • .text 保存了相当首要的程序代码
  • .init 保存了先后的早先化代码, 用于伊始化全局变量等
  • .fini 保存了程序的平息代码, 用于析构全局变量等
  • .rodata 保存了只读的数额,例如字符串(const char*)
  • .data 保存了可读写的数据,例如全局变量
  • .dynsym 动态链接的符号表
  • .dynstr 动态链接的号子名称字符串
  • .dynamic 动态链接所急需的消息,供程序运维时利用(不须要拜访节头)

始发驾驭ELF格式

首先让大家先通晓什么是原生Linux程序,
以下表明摘自维基百科

In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4),[2] and later in the Tool Interface Standard,[1] it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

By design, ELF is flexible, extensible, and cross-platform, not bound to any given central processing unit (CPU) or instruction set architecture. This has allowed it to be adopted by many different operating systems on many different hardware platforms.

Linux的可执行文件格式选拔了ELF格式,
而Windows采用了PE格式,
也等于我们平日采用的exe文件的格式.

ELF格式的构造如下

图片 2

大概上得以分成那几个某个

  • ELF头,在文件的最起首,储存了种类和本子等新闻
  • 程序头, 供程序运维时解释器(interpreter)使用
  • 节头, 供程序编译时链接器(linker)使用, 运营时不须求读节头
  • 节内容, 差其余节效能都不雷同
    • .text 代码节,保存了第壹的程序代码
    • .rodata 保存了只读的数据,例如字符串(const char*)
    • .data 保存了可读写的数码,例如全局变量
    • 还有其它种种各类的节

让大家来实在看一下Linux可执行程序的金科玉律
以下的编译环境是Ubuntu 16.04 x64 + gcc 5.4.0,
编译环境不相同或然会得出不一样的结果

率先创设hello.c,写入以下的代码

#include <stdio.h>

int max(int x, int y) {
    return x > y ? x : y;
}

int main() {
    printf("max is %d\n", max(123, 321));
    printf("test many arguments %d %d %d %s %s %s %s %s %s\n", 1, 2, 3, "a", "b", "c", "d", "e", "f");
    return 100;
}

接下来拔取gcc编译那份代码

gcc hello.c

编译完结后您能够见见hello.c一旁多了三个a.out,
那就是linux的可执行文件了, 以后能够在linux上运维它

./a.out

你可以见见以下输出

max is 321
test many arguments 1 2 3 a b c d e f

大家来探望a.out富含了什么,解析ELF文件可以使用readelf命令

readelf -a ./a.out

可以见见输出了以下的音信

ELF 头:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  类别:                              ELF64
  数据:                              2 补码,小端序 (little endian)
  版本:                              1 (current)
  OS/ABI:                            UNIX - System V
  ABI 版本:                          0
  类型:                              EXEC (可执行文件)
  系统架构:                          Advanced Micro Devices X86-64
  版本:                              0x1
  入口点地址:               0x400430
  程序头起点:          64 (bytes into file)
  Start of section headers:          6648 (bytes into file)
  标志:             0x0
  本头的大小:       64 (字节)
  程序头大小:       56 (字节)
  Number of program headers:         9
  节头大小:         64 (字节)
  节头数量:         31
  字符串表索引节头: 28

节头:
  [号] 名称              类型             地址              偏移量
       大小              全体大小          旗标   链接   信息   对齐
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000400238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000400254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000400274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000400298  00000298
       000000000000001c  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           00000000004002b8  000002b8
       0000000000000060  0000000000000018   A       6     1     8
  [ 6] .dynstr           STRTAB           0000000000400318  00000318
       000000000000003f  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           0000000000400358  00000358
       0000000000000008  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          0000000000400360  00000360
       0000000000000020  0000000000000000   A       6     1     8
  [ 9] .rela.dyn         RELA             0000000000400380  00000380
       0000000000000018  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             0000000000400398  00000398
       0000000000000030  0000000000000018  AI       5    24     8
  [11] .init             PROGBITS         00000000004003c8  000003c8
       000000000000001a  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         00000000004003f0  000003f0
       0000000000000030  0000000000000010  AX       0     0     16
  [13] .plt.got          PROGBITS         0000000000400420  00000420
       0000000000000008  0000000000000000  AX       0     0     8
  [14] .text             PROGBITS         0000000000400430  00000430
       00000000000001f2  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         0000000000400624  00000624
       0000000000000009  0000000000000000  AX       0     0     4
  [16] .rodata           PROGBITS         0000000000400630  00000630
       0000000000000050  0000000000000000   A       0     0     8
  [17] .eh_frame_hdr     PROGBITS         0000000000400680  00000680
       000000000000003c  0000000000000000   A       0     0     4
  [18] .eh_frame         PROGBITS         00000000004006c0  000006c0
       0000000000000114  0000000000000000   A       0     0     8
  [19] .init_array       INIT_ARRAY       0000000000600e10  00000e10
       0000000000000008  0000000000000000  WA       0     0     8
  [20] .fini_array       FINI_ARRAY       0000000000600e18  00000e18
       0000000000000008  0000000000000000  WA       0     0     8
  [21] .jcr              PROGBITS         0000000000600e20  00000e20
       0000000000000008  0000000000000000  WA       0     0     8
  [22] .dynamic          DYNAMIC          0000000000600e28  00000e28
       00000000000001d0  0000000000000010  WA       6     0     8
  [23] .got              PROGBITS         0000000000600ff8  00000ff8
       0000000000000008  0000000000000008  WA       0     0     8
  [24] .got.plt          PROGBITS         0000000000601000  00001000
       0000000000000028  0000000000000008  WA       0     0     8
  [25] .data             PROGBITS         0000000000601028  00001028
       0000000000000010  0000000000000000  WA       0     0     8
  [26] .bss              NOBITS           0000000000601038  00001038
       0000000000000008  0000000000000000  WA       0     0     1
  [27] .comment          PROGBITS         0000000000000000  00001038
       0000000000000034  0000000000000001  MS       0     0     1
  [28] .shstrtab         STRTAB           0000000000000000  000018ea
       000000000000010c  0000000000000000           0     0     1
  [29] .symtab           SYMTAB           0000000000000000  00001070
       0000000000000660  0000000000000018          30    47     8
  [30] .strtab           STRTAB           0000000000000000  000016d0
       000000000000021a  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

程序头:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000007d4 0x00000000000007d4  R E    200000
  LOAD           0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x0000000000000228 0x0000000000000230  RW     200000
  DYNAMIC        0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                 0x00000000000001d0 0x00000000000001d0  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x0000000000000680 0x0000000000400680 0x0000000000400680
                 0x000000000000003c 0x000000000000003c  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x00000000000001f0 0x00000000000001f0  R      1

 Section to Segment mapping:
  段节...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame 
   03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.ABI-tag .note.gnu.build-id 
   06     .eh_frame_hdr 
   07     
   08     .init_array .fini_array .jcr .dynamic .got 

Dynamic section at offset 0xe28 contains 24 entries:
  标记        类型                         名称/值
 0x0000000000000001 (NEEDED)             共享库:[libc.so.6]
 0x000000000000000c (INIT)               0x4003c8
 0x000000000000000d (FINI)               0x400624
 0x0000000000000019 (INIT_ARRAY)         0x600e10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x600e18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400298
 0x0000000000000005 (STRTAB)             0x400318
 0x0000000000000006 (SYMTAB)             0x4002b8
 0x000000000000000a (STRSZ)              63 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x601000
 0x0000000000000002 (PLTRELSZ)           48 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400398
 0x0000000000000007 (RELA)               0x400380
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x400360
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400358
 0x0000000000000000 (NULL)               0x0

重定位节 '.rela.dyn' 位于偏移量 0x380 含有 1 个条目:
  偏移量          信息           类型           符号值        符号名称 + 加数
000000600ff8  000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

重定位节 '.rela.plt' 位于偏移量 0x398 含有 2 个条目:
  偏移量          信息           类型           符号值        符号名称 + 加数
000000601018  000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0
000000601020  000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)
     3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__

Symbol table '.symtab' contains 68 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000400238     0 SECTION LOCAL  DEFAULT    1 
     2: 0000000000400254     0 SECTION LOCAL  DEFAULT    2 
     3: 0000000000400274     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000400298     0 SECTION LOCAL  DEFAULT    4 
     5: 00000000004002b8     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000400318     0 SECTION LOCAL  DEFAULT    6 
     7: 0000000000400358     0 SECTION LOCAL  DEFAULT    7 
     8: 0000000000400360     0 SECTION LOCAL  DEFAULT    8 
     9: 0000000000400380     0 SECTION LOCAL  DEFAULT    9 
    10: 0000000000400398     0 SECTION LOCAL  DEFAULT   10 
    11: 00000000004003c8     0 SECTION LOCAL  DEFAULT   11 
    12: 00000000004003f0     0 SECTION LOCAL  DEFAULT   12 
    13: 0000000000400420     0 SECTION LOCAL  DEFAULT   13 
    14: 0000000000400430     0 SECTION LOCAL  DEFAULT   14 
    15: 0000000000400624     0 SECTION LOCAL  DEFAULT   15 
    16: 0000000000400630     0 SECTION LOCAL  DEFAULT   16 
    17: 0000000000400680     0 SECTION LOCAL  DEFAULT   17 
    18: 00000000004006c0     0 SECTION LOCAL  DEFAULT   18 
    19: 0000000000600e10     0 SECTION LOCAL  DEFAULT   19 
    20: 0000000000600e18     0 SECTION LOCAL  DEFAULT   20 
    21: 0000000000600e20     0 SECTION LOCAL  DEFAULT   21 
    22: 0000000000600e28     0 SECTION LOCAL  DEFAULT   22 
    23: 0000000000600ff8     0 SECTION LOCAL  DEFAULT   23 
    24: 0000000000601000     0 SECTION LOCAL  DEFAULT   24 
    25: 0000000000601028     0 SECTION LOCAL  DEFAULT   25 
    26: 0000000000601038     0 SECTION LOCAL  DEFAULT   26 
    27: 0000000000000000     0 SECTION LOCAL  DEFAULT   27 
    28: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    29: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   21 __JCR_LIST__
    30: 0000000000400460     0 FUNC    LOCAL  DEFAULT   14 deregister_tm_clones
    31: 00000000004004a0     0 FUNC    LOCAL  DEFAULT   14 register_tm_clones
    32: 00000000004004e0     0 FUNC    LOCAL  DEFAULT   14 __do_global_dtors_aux
    33: 0000000000601038     1 OBJECT  LOCAL  DEFAULT   26 completed.7585
    34: 0000000000600e18     0 OBJECT  LOCAL  DEFAULT   20 __do_global_dtors_aux_fin
    35: 0000000000400500     0 FUNC    LOCAL  DEFAULT   14 frame_dummy
    36: 0000000000600e10     0 OBJECT  LOCAL  DEFAULT   19 __frame_dummy_init_array_
    37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
    38: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    39: 00000000004007d0     0 OBJECT  LOCAL  DEFAULT   18 __FRAME_END__
    40: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   21 __JCR_END__
    41: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
    42: 0000000000600e18     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_end
    43: 0000000000600e28     0 OBJECT  LOCAL  DEFAULT   22 _DYNAMIC
    44: 0000000000600e10     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_start
    45: 0000000000400680     0 NOTYPE  LOCAL  DEFAULT   17 __GNU_EH_FRAME_HDR
    46: 0000000000601000     0 OBJECT  LOCAL  DEFAULT   24 _GLOBAL_OFFSET_TABLE_
    47: 0000000000400620     2 FUNC    GLOBAL DEFAULT   14 __libc_csu_fini
    48: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
    49: 0000000000601028     0 NOTYPE  WEAK   DEFAULT   25 data_start
    50: 0000000000601038     0 NOTYPE  GLOBAL DEFAULT   25 _edata
    51: 0000000000400624     0 FUNC    GLOBAL DEFAULT   15 _fini
    52: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.2.5
    53: 0000000000400526    22 FUNC    GLOBAL DEFAULT   14 max
    54: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
    55: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT   25 __data_start
    56: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    57: 0000000000601030     0 OBJECT  GLOBAL HIDDEN    25 __dso_handle
    58: 0000000000400630     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used
    59: 00000000004005b0   101 FUNC    GLOBAL DEFAULT   14 __libc_csu_init
    60: 0000000000601040     0 NOTYPE  GLOBAL DEFAULT   26 _end
    61: 0000000000400430    42 FUNC    GLOBAL DEFAULT   14 _start
    62: 0000000000601038     0 NOTYPE  GLOBAL DEFAULT   26 __bss_start
    63: 000000000040053c   109 FUNC    GLOBAL DEFAULT   14 main
    64: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
    65: 0000000000601038     0 OBJECT  GLOBAL HIDDEN    25 __TMC_END__
    66: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
    67: 00000000004003c8     0 FUNC    GLOBAL DEFAULT   11 _init

Version symbols section '.gnu.version' contains 4 entries:
 地址: 0000000000400358  Offset: 0x000358  Link: 5 (.dynsym)
  000:   0 (*本地*)       2 (GLIBC_2.2.5)   2 (GLIBC_2.2.5)   0 (*本地*)    

Version needs section '.gnu.version_r' contains 1 entries:
 地址:0x0000000000400360  Offset: 0x000360  Link: 6 (.dynstr)
  000000: 版本: 1  文件:libc.so.6  计数:1
  0x0010:名称:GLIBC_2.2.5  标志:无  版本:2

Displaying notes found at file offset 0x00000254 with length 0x00000020:
  Owner                 Data size   Description
  GNU                  0x00000010   NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 2.6.32

Displaying notes found at file offset 0x00000274 with length 0x00000024:
  Owner                 Data size   Description
  GNU                  0x00000014   NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: debd3d7912be860a432b5c685a6cff7fd9418528

从上边的新闻中大家得以领略这么些文件的档次是ELF64,
约等于六十三个人的可执行程序, 并且有几个程序头和三贰十个节头,
各样节的成效我们可以在网上找到资料, 那篇文章中只涉及到以下的节

  • .init 程序早先化的代码
  • .rela.dyn 须要重一向的变量列表
  • .rela.plt 需要重平素的函数列表
  • .plt 调用动态链接函数的代码
  • .text 保存了根本的程序代码
  • .init 保存了先后的初步化代码, 用于开始化全局变量等
  • .fini 保存了程序的平息代码, 用于析构全局变量等
  • .rodata 保存了只读的数目,例如字符串(const char*)
  • .data 保存了可读写的多少,例如全局变量
  • .dynsym 动态链接的符号表
  • .dynstr 动态链接的符号名称字符串
  • .dynamic 动态链接所必要的音讯,供程序运维时使用(不必要拜访节头)

什么是动态链接

地点的主次中调用了printf函数, 可是以此函数的落实并不在./a.out中,
那么printf函数在哪个地方, 又是怎么被调用的?

printf函数的实今后glibc库中,
也就是/lib/x86_64-linux-gnu/libc.so.6中,
在执行./a.out的时候会在glibc库中找到那些函数并举办调用,
大家来看望那段代码

举行以下命令反编译./a.out

objdump -c -S ./a.out

咱们得以看到以下的代码

00000000004003f0 <printf@plt-0x10>:
  4003f0:   ff 35 12 0c 20 00       pushq  0x200c12(%rip)        # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
  4003f6:   ff 25 14 0c 20 00       jmpq   *0x200c14(%rip)        # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
  4003fc:   0f 1f 40 00             nopl   0x0(%rax)

0000000000400400 <printf@plt>:
  400400:   ff 25 12 0c 20 00       jmpq   *0x200c12(%rip)        # 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
  400406:   68 00 00 00 00          pushq  $0x0
  40040b:   e9 e0 ff ff ff          jmpq   4003f0 <_init+0x28>

000000000040053c <main>:
  40053c:   55                      push   %rbp
  40053d:   48 89 e5                mov    %rsp,%rbp
  400540:   be 41 01 00 00          mov    $0x141,%esi
  400545:   bf 7b 00 00 00          mov    $0x7b,%edi
  40054a:   e8 d7 ff ff ff          callq  400526 <max>
  40054f:   89 c6                   mov    %eax,%esi
  400551:   bf 38 06 40 00          mov    $0x400638,%edi
  400556:   b8 00 00 00 00          mov    $0x0,%eax
  40055b:   e8 a0 fe ff ff          callq  400400 <printf@plt>

在这一段代码中,我们可以见见调用printf会首先调用0x400400printf@plt
printf@plt会负责在运营时找到实际的printf函数并跳转到该函数
在那边其实的printf函数会保存在0x400406 + 0x200c12 = 0x601018

内需专注的是0x601018一伊始并不会指向实际的printf函数,而是会指向0x400406,
为何会这样?
因为Linux的可执行程序为了考虑品质,不会在一上马就化解全数动态连接的函数,而是采纳了推迟消除.
在下面第一回jmpq *0x200c12(%rip)会跳转到下一条指令0x400406,
又会继续跳转到0x4003f0, 再跳转到0x601010本着的地址,
0x601010针对的地址就是延迟化解的落到实处, 第⑥次推迟消除成功后,
0x601018就会针对实际的printf,
未来调用就会直接跳转到实际的printf上.

怎样是动态链接

上边的程序中调用了printf函数, 但是以此函数的落实并不在./a.out中,
那么printf函数在哪儿, 又是怎么被调用的?

printf函数的实今后glibc库中,
也就是/lib/x86_64-linux-gnu/libc.so.6中,
在执行./a.out的时候会在glibc库中找到那么些函数并进行调用,
大家来看望那段代码

推行以下命令反编译./a.out

objdump -c -S ./a.out

我们可以看来以下的代码

00000000004003f0 <printf@plt-0x10>:
  4003f0:   ff 35 12 0c 20 00       pushq  0x200c12(%rip)        # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
  4003f6:   ff 25 14 0c 20 00       jmpq   *0x200c14(%rip)        # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
  4003fc:   0f 1f 40 00             nopl   0x0(%rax)

0000000000400400 <printf@plt>:
  400400:   ff 25 12 0c 20 00       jmpq   *0x200c12(%rip)        # 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
  400406:   68 00 00 00 00          pushq  $0x0
  40040b:   e9 e0 ff ff ff          jmpq   4003f0 <_init+0x28>

000000000040053c <main>:
  40053c:   55                      push   %rbp
  40053d:   48 89 e5                mov    %rsp,%rbp
  400540:   be 41 01 00 00          mov    $0x141,%esi
  400545:   bf 7b 00 00 00          mov    $0x7b,%edi
  40054a:   e8 d7 ff ff ff          callq  400526 <max>
  40054f:   89 c6                   mov    %eax,%esi
  400551:   bf 38 06 40 00          mov    $0x400638,%edi
  400556:   b8 00 00 00 00          mov    $0x0,%eax
  40055b:   e8 a0 fe ff ff          callq  400400 <printf@plt>

在这一段代码中,我们得以观察调用printf会率先调用0x400400printf@plt
printf@plt会承担在运转时找到实际的printf函数并跳转到该函数
在此处实在的printf函数会保存在0x400406 + 0x200c12 = 0x601018

急需小心的是0x601018一开首并不会针对实际的printf函数,而是会指向0x400406,
为啥会那样?
因为Linux的可执行程序为了考虑品质,不会在一开端就缓解全部动态连接的函数,而是精选了延期解决.
在上面第三回jmpq *0x200c12(%rip)会跳转到下一条指令0x400406,
又会延续跳转到0x4003f0, 再跳转到0x601010本着的地点,
0x601010本着的地点就是延迟消除的完成, 第6次推迟化解成功后,
0x601018就会指向实际的printf,
未来调用就会平昔跳转到实际的printf上.

程序入口点

Linux程序运转首先会从_start函数伊始,
下边readelf中的入口点地址0x400430就是_start函数的地址,

0000000000400430 <_start>:
  400430:   31 ed                   xor    %ebp,%ebp
  400432:   49 89 d1                mov    %rdx,%r9
  400435:   5e                      pop    %rsi
  400436:   48 89 e2                mov    %rsp,%rdx
  400439:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  40043d:   50                      push   %rax
  40043e:   54                      push   %rsp
  40043f:   49 c7 c0 20 06 40 00    mov    $0x400620,%r8
  400446:   48 c7 c1 b0 05 40 00    mov    $0x4005b0,%rcx
  40044d:   48 c7 c7 3c 05 40 00    mov    $0x40053c,%rdi
  400454:   e8 b7 ff ff ff          callq  400410 <__libc_start_main@plt>
  400459:   f4                      hlt    
  40045a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

接下来_start函数会调用__libc_start_main函数,
__libc_start_main是libc库中定义的开始化函数,
负责早先化全局变量和调用main函数等工作.

__libc_start_main函数还承担安装再次来到值和退出进程,
可以看来地点调用__libc_start_main后的一声令下是hlt,
那些命令永远不会被执行.

先后入口点

Linux程序运营首先会从_start函数初阶,
下面readelf中的入口点地址0x400430就是_start函数的地点,

0000000000400430 <_start>:
  400430:   31 ed                   xor    %ebp,%ebp
  400432:   49 89 d1                mov    %rdx,%r9
  400435:   5e                      pop    %rsi
  400436:   48 89 e2                mov    %rsp,%rdx
  400439:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  40043d:   50                      push   %rax
  40043e:   54                      push   %rsp
  40043f:   49 c7 c0 20 06 40 00    mov    $0x400620,%r8
  400446:   48 c7 c1 b0 05 40 00    mov    $0x4005b0,%rcx
  40044d:   48 c7 c7 3c 05 40 00    mov    $0x40053c,%rdi
  400454:   e8 b7 ff ff ff          callq  400410 <__libc_start_main@plt>
  400459:   f4                      hlt    
  40045a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

接下来_start函数会调用__libc_start_main函数,
__libc_start_main是libc库中定义的伊始化函数,
负责开始化全局变量和调用main函数等工作.

__libc_start_main函数还肩负安装再次来到值和退出进程,
可以看到地方调用__libc_start_main后的吩咐是hlt,
那么些命令永远不会被执行.

已毕Linux程序运维器

在享有上述的学问后大家得以先构想以下的运营器要求做什么.

因为x64的Windows和Linux程序选拔的cpu指令集都以同等的,大家可以一贯实施汇编而不要求一个下令模拟器,
并且这一次作者打算在用户层达成, 所以不可以像Bash On Windows一样模拟syscall,
那个运维器会像下图一律模拟libc库的函数

图片 3

如此那般运维器须求做的事体有:

  • 解析ELF文件
  • 加载程序代码到钦赐的内存地址
  • 加载数据到钦定的内存地址
  • 提供动态链接的函数达成
  • 实施加载的程序代码

这几个工作会在偏下的言传身教程序中种种完结, 完整的源代码可以看文章顶部的链接

率先大家须要把ELF文件格式对应的代码从binutils中复制过来,
它包涵了ELF头, 程序头和有关的数据结构,
里面用unsigned char[]是为了防止万一alignment,
那样结构体可以一贯从文件内容中改换过来

ELFDefine.h:

#pragma once

namespace HelloElfLoader {
    // 以下内容复制自
    // https://github.com/aeste/binutils/blob/develop/elfcpp/elfcpp.h
    // https://github.com/aeste/binutils/blob/develop/include/elf/external.h

    // e_ident中各项的偏移值
    const int EI_MAG0 = 0;
    const int EI_MAG1 = 1;
    const int EI_MAG2 = 2;
    const int EI_MAG3 = 3;
    const int EI_CLASS = 4;
    const int EI_DATA = 5;
    const int EI_VERSION = 6;
    const int EI_OSABI = 7;
    const int EI_ABIVERSION = 8;
    const int EI_PAD = 9;
    const int EI_NIDENT = 16;

    // ELF文件类型
    enum {
        ELFCLASSNONE = 0,
        ELFCLASS32 = 1,
        ELFCLASS64 = 2
    };

    // ByteOrder
    enum {
        ELFDATANONE = 0,
        ELFDATA2LSB = 1,
        ELFDATA2MSB = 2
    };

    // 程序头类型
    enum PT
    {
        PT_NULL = 0,
        PT_LOAD = 1,
        PT_DYNAMIC = 2,
        PT_INTERP = 3,
        PT_NOTE = 4,
        PT_SHLIB = 5,
        PT_PHDR = 6,
        PT_TLS = 7,
        PT_LOOS = 0x60000000,
        PT_HIOS = 0x6fffffff,
        PT_LOPROC = 0x70000000,
        PT_HIPROC = 0x7fffffff,
        // The remaining values are not in the standard.
        // Frame unwind information.
        PT_GNU_EH_FRAME = 0x6474e550,
        PT_SUNW_EH_FRAME = 0x6474e550,
        // Stack flags.
        PT_GNU_STACK = 0x6474e551,
        // Read only after relocation.
        PT_GNU_RELRO = 0x6474e552,
        // Platform architecture compatibility information
        PT_ARM_ARCHEXT = 0x70000000,
        // Exception unwind tables
        PT_ARM_EXIDX = 0x70000001
    };

    // 动态节类型
    enum DT
    {
        DT_NULL = 0,
        DT_NEEDED = 1,
        DT_PLTRELSZ = 2,
        DT_PLTGOT = 3,
        DT_HASH = 4,
        DT_STRTAB = 5,
        DT_SYMTAB = 6,
        DT_RELA = 7,
        DT_RELASZ = 8,
        DT_RELAENT = 9,
        DT_STRSZ = 10,
        DT_SYMENT = 11,
        DT_INIT = 12,
        DT_FINI = 13,
        DT_SONAME = 14,
        DT_RPATH = 15,
        DT_SYMBOLIC = 16,
        DT_REL = 17,
        DT_RELSZ = 18,
        DT_RELENT = 19,
        DT_PLTREL = 20,
        DT_DEBUG = 21,
        DT_TEXTREL = 22,
        DT_JMPREL = 23,
        DT_BIND_NOW = 24,
        DT_INIT_ARRAY = 25,
        DT_FINI_ARRAY = 26,
        DT_INIT_ARRAYSZ = 27,
        DT_FINI_ARRAYSZ = 28,
        DT_RUNPATH = 29,
        DT_FLAGS = 30,

        // This is used to mark a range of dynamic tags.  It is not really
        // a tag value.
        DT_ENCODING = 32,

        DT_PREINIT_ARRAY = 32,
        DT_PREINIT_ARRAYSZ = 33,
        DT_LOOS = 0x6000000d,
        DT_HIOS = 0x6ffff000,
        DT_LOPROC = 0x70000000,
        DT_HIPROC = 0x7fffffff,

        // The remaining values are extensions used by GNU or Solaris.
        DT_VALRNGLO = 0x6ffffd00,
        DT_GNU_PRELINKED = 0x6ffffdf5,
        DT_GNU_CONFLICTSZ = 0x6ffffdf6,
        DT_GNU_LIBLISTSZ = 0x6ffffdf7,
        DT_CHECKSUM = 0x6ffffdf8,
        DT_PLTPADSZ = 0x6ffffdf9,
        DT_MOVEENT = 0x6ffffdfa,
        DT_MOVESZ = 0x6ffffdfb,
        DT_FEATURE = 0x6ffffdfc,
        DT_POSFLAG_1 = 0x6ffffdfd,
        DT_SYMINSZ = 0x6ffffdfe,
        DT_SYMINENT = 0x6ffffdff,
        DT_VALRNGHI = 0x6ffffdff,

        DT_ADDRRNGLO = 0x6ffffe00,
        DT_GNU_HASH = 0x6ffffef5,
        DT_TLSDESC_PLT = 0x6ffffef6,
        DT_TLSDESC_GOT = 0x6ffffef7,
        DT_GNU_CONFLICT = 0x6ffffef8,
        DT_GNU_LIBLIST = 0x6ffffef9,
        DT_CONFIG = 0x6ffffefa,
        DT_DEPAUDIT = 0x6ffffefb,
        DT_AUDIT = 0x6ffffefc,
        DT_PLTPAD = 0x6ffffefd,
        DT_MOVETAB = 0x6ffffefe,
        DT_SYMINFO = 0x6ffffeff,
        DT_ADDRRNGHI = 0x6ffffeff,

        DT_RELACOUNT = 0x6ffffff9,
        DT_RELCOUNT = 0x6ffffffa,
        DT_FLAGS_1 = 0x6ffffffb,
        DT_VERDEF = 0x6ffffffc,
        DT_VERDEFNUM = 0x6ffffffd,
        DT_VERNEED = 0x6ffffffe,
        DT_VERNEEDNUM = 0x6fffffff,

        DT_VERSYM = 0x6ffffff0,

        // Specify the value of _GLOBAL_OFFSET_TABLE_.
        DT_PPC_GOT = 0x70000000,

        // Specify the start of the .glink section.
        DT_PPC64_GLINK = 0x70000000,

        // Specify the start and size of the .opd section.
        DT_PPC64_OPD = 0x70000001,
        DT_PPC64_OPDSZ = 0x70000002,

        // The index of an STT_SPARC_REGISTER symbol within the DT_SYMTAB
        // symbol table.  One dynamic entry exists for every STT_SPARC_REGISTER
        // symbol in the symbol table.
        DT_SPARC_REGISTER = 0x70000001,

        DT_AUXILIARY = 0x7ffffffd,
        DT_USED = 0x7ffffffe,
        DT_FILTER = 0x7fffffff
    };;

    // ELF头的定义
    typedef struct {
        unsigned char   e_ident[16];        /* ELF "magic number" */
        unsigned char   e_type[2];      /* Identifies object file type */
        unsigned char   e_machine[2];       /* Specifies required architecture */
        unsigned char   e_version[4];       /* Identifies object file version */
        unsigned char   e_entry[8];     /* Entry point virtual address */
        unsigned char   e_phoff[8];     /* Program header table file offset */
        unsigned char   e_shoff[8];     /* Section header table file offset */
        unsigned char   e_flags[4];     /* Processor-specific flags */
        unsigned char   e_ehsize[2];        /* ELF header size in bytes */
        unsigned char   e_phentsize[2];     /* Program header table entry size */
        unsigned char   e_phnum[2];     /* Program header table entry count */
        unsigned char   e_shentsize[2];     /* Section header table entry size */
        unsigned char   e_shnum[2];     /* Section header table entry count */
        unsigned char   e_shstrndx[2];      /* Section header string table index */
    } Elf64_External_Ehdr;

    // 程序头的定义
    typedef struct {
        unsigned char   p_type[4];      /* Identifies program segment type */
        unsigned char   p_flags[4];     /* Segment flags */
        unsigned char   p_offset[8];        /* Segment file offset */
        unsigned char   p_vaddr[8];     /* Segment virtual address */
        unsigned char   p_paddr[8];     /* Segment physical address */
        unsigned char   p_filesz[8];        /* Segment size in file */
        unsigned char   p_memsz[8];     /* Segment size in memory */
        unsigned char   p_align[8];     /* Segment alignment, file & memory */
    } Elf64_External_Phdr;

    // DYNAMIC类型的程序头的内容定义
    typedef struct {
        unsigned char   d_tag[8];       /* entry tag value */
        union {
            unsigned char   d_val[8];
            unsigned char   d_ptr[8];
        } d_un;
    } Elf64_External_Dyn;

    // 动态链接的重定位记录,部分系统会用Elf64_External_Rel
    typedef struct {
        unsigned char r_offset[8];  /* Location at which to apply the action */
        unsigned char   r_info[8];  /* index and type of relocation */
        unsigned char   r_addend[8];    /* Constant addend used to compute value */
    } Elf64_External_Rela;

    // 动态链接的符号信息
    typedef struct {
        unsigned char   st_name[4];     /* Symbol name, index in string tbl */
        unsigned char   st_info[1];     /* Type and binding attributes */
        unsigned char   st_other[1];        /* No defined meaning, 0 */
        unsigned char   st_shndx[2];        /* Associated section index */
        unsigned char   st_value[8];        /* Value of the symbol */
        unsigned char   st_size[8];     /* Associated symbol size */
    } Elf64_External_Sym;
}

接下去我们定义七个读取和执行ELF文件的类,
那个类会在开首化时把文件加载到fileStream_, execute函数会负责实施

HelloElfLoader.h:

#pragma once
#include <string>
#include <fstream>

namespace HelloElfLoader {
    class Loader {
        std::ifstream fileStream_;

    public:
        Loader(const std::string& path);
        Loader(std::ifstream&& fileStream);
        void execute();
    };
}

构造函数如下, 也等于规范的c++打开文件的代码

HelloElfLoader.cpp:

Loader::Loader(const std::string& path) :
    Loader(std::ifstream(path, std::ios::in | std::ios::binary)) {}

Loader::Loader(std::ifstream&& fileStream) :
    fileStream_(std::move(fileStream)) {
    if (!fileStream_) {
        throw std::runtime_error("open file failed");
    }
}

接下去将贯彻地点所说的步调, 首先是解析ELF文件

void Loader::execute() {
    std::cout << "====== start loading elf ======" << std::endl;

    // 检查当前运行程序是否64位
    if (sizeof(intptr_t) != sizeof(std::int64_t)) {
        throw std::runtime_error("please use x64 compile and run this program");
    }

    // 读取ELF头
    Elf64_External_Ehdr elfHeader = {};
    fileStream_.seekg(0);
    fileStream_.read(reinterpret_cast<char*>(&elfHeader), sizeof(elfHeader));

    // 检查ELF头,只支持64位且byte order是little endian的程序
    if (std::string(reinterpret_cast<char*>(elfHeader.e_ident), 4) != "\x7f\x45\x4c\x46") {
        throw std::runtime_error("magic not match");
    }
    else if (elfHeader.e_ident[EI_CLASS] != ELFCLASS64) {
        throw std::runtime_error("only support ELF64");
    }
    else if (elfHeader.e_ident[EI_DATA] != ELFDATA2LSB) {
        throw std::runtime_error("only support little endian");
    }

    // 获取program table的信息
    std::uint32_t programTableOffset = *reinterpret_cast<std::uint32_t*>(elfHeader.e_phoff);
    std::uint16_t programTableEntrySize = *reinterpret_cast<std::uint16_t*>(elfHeader.e_phentsize);
    std::uint16_t programTableEntryNum = *reinterpret_cast<std::uint16_t*>(elfHeader.e_phnum);
    std::cout << "program table at: " << programTableOffset << ", "
        << programTableEntryNum << " x " << programTableEntrySize << std::endl;

    // 获取section table的信息
    // section table只给linker用,loader中其实不需要访问section table
    std::uint32_t sectionTableOffset = *reinterpret_cast<std::uint32_t*>(elfHeader.e_shoff);
    std::uint16_t sectionTableEntrySize = *reinterpret_cast<std::uint16_t*>(elfHeader.e_shentsize);
    std::uint16_t sectionTableEntryNum = *reinterpret_cast<std::uint16_t*>(elfHeader.e_shentsize);
    std::cout << "section table at: " << sectionTableOffset << ", "
        << sectionTableEntryNum << " x " << sectionTableEntrySize << std::endl;

ELF文件的的起来有些就是ELF头,和Elf64_External_Ehdr结构体的社团同样,
大家得以读到Elf64_External_Ehdr结构体中,
然后ELF头包蕴了程序头和节头的偏移值, 大家得以优先获取到那些参数

节头在运作时不需求接纳, 运转时索要遍历程序头

    // 准备动态链接的信息
    std::uint64_t jmpRelAddr = 0; // 重定位记录的开始地址
    std::uint64_t pltRelType = 0; // 重定位记录的类型 RELA或REL
    std::uint64_t pltRelSize = 0; // 重定位记录的总大小
    std::uint64_t symTabAddr = 0; // 动态符号表的开始地址
    std::uint64_t strTabAddr = 0; // 动态符号名称表的开始地址
    std::uint64_t strTabSize = 0; // 动态符号名称表的总大小

    // 遍历program hedaer
    std::vector<Elf64_External_Phdr> programHeaders;
    programHeaders.resize(programTableEntryNum);
    fileStream_.read(reinterpret_cast<char*>(programHeaders.data()), programTableEntryNum * programTableEntrySize);
    std::vector<std::shared_ptr<void>> loadedSegments;
    for (const auto& programHeader : programHeaders) {
        std::uint32_t type = *reinterpret_cast<const std::uint32_t*>(programHeader.p_type);
        if (type == PT_LOAD) {
            // 把文件内容(包含程序代码和数据)加载到虚拟内存,这个示例不考虑地址冲突
            std::uint64_t fileOffset = *reinterpret_cast<const std::uint64_t*>(programHeader.p_offset);
            std::uint64_t fileSize = *reinterpret_cast<const std::uint64_t*>(programHeader.p_filesz);
            std::uint64_t virtAddr = *reinterpret_cast<const std::uint64_t*>(programHeader.p_vaddr);
            std::uint64_t memSize = *reinterpret_cast<const std::uint64_t*>(programHeader.p_memsz);
            if (memSize < fileSize) {
                throw std::runtime_error("invalid memsz in program header, it shouldn't less than filesz");
            }
            // 在指定的虚拟地址分配内存
            std::cout << std::hex << "allocate address at: 0x" << virtAddr <<
                " size: 0x" << memSize << std::dec << std::endl;
            void* addr = ::VirtualAlloc((void*)virtAddr, memSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
            if (addr == nullptr) {
                throw std::runtime_error("allocate memory at specific address failed");
            }
            loadedSegments.emplace_back(addr, [](void* ptr) { ::VirtualFree(ptr, 0, MEM_RELEASE); });
            // 复制文件内容到虚拟内存
            fileStream_.seekg(fileOffset);
            if (!fileStream_.read(reinterpret_cast<char*>(addr), fileSize)) {
                throw std::runtime_error("read contents into memory from LOAD program header failed");
            }
        }
        else if (type == PT_DYNAMIC) {
            // 遍历动态节
            std::uint64_t fileOffset = *reinterpret_cast<const std::uint64_t*>(programHeader.p_offset);
            fileStream_.seekg(fileOffset);
            Elf64_External_Dyn dynSection = {};
            std::uint64_t dynSectionTag = 0;
            std::uint64_t dynSectionVal = 0;
            do {
                if (!fileStream_.read(reinterpret_cast<char*>(&dynSection), sizeof(dynSection))) {
                    throw std::runtime_error("read dynamic section failed");
                }
                dynSectionTag = *reinterpret_cast<const std::uint64_t*>(dynSection.d_tag);
                dynSectionVal = *reinterpret_cast<const std::uint64_t*>(dynSection.d_un.d_val);
                if (dynSectionTag == DT_JMPREL) {
                    jmpRelAddr = dynSectionVal;
                }
                else if (dynSectionTag == DT_PLTREL) {
                    pltRelType = dynSectionVal;
                }
                else if (dynSectionTag == DT_PLTRELSZ) {
                    pltRelSize = dynSectionVal;
                }
                else if (dynSectionTag == DT_SYMTAB) {
                    symTabAddr = dynSectionVal;
                }
                else if (dynSectionTag == DT_STRTAB) {
                    strTabAddr = dynSectionVal;
                }
                else if (dynSectionTag == DT_STRSZ) {
                    strTabSize = dynSectionVal;
                }
            } while (dynSectionTag != 0);
        }
    }

还记得我们地点运用readelf读取到的消息呢?

程序头:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000007d4 0x00000000000007d4  R E    200000
  LOAD           0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x0000000000000228 0x0000000000000230  RW     200000
  DYNAMIC        0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                 0x00000000000001d0 0x00000000000001d0  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x0000000000000680 0x0000000000400680 0x0000000000400680
                 0x000000000000003c 0x000000000000003c  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x00000000000001f0 0x00000000000001f0  R      1

那其间类型是LOAD的头代表要求加载文件的始末到内存,
Offset是文本的偏移值, VirtAddr是虚拟内存地址,
FileSiz是急需加载的文件大小, MemSiz是须要分配的内存大小,
Flags是内存的造访权限,
以此示例不考虑访问权限(统一行使PAGE_EXECUTE_READWRITE).

以此程序有三个LOAD头, 第3个包括了代码和只读数据(.data, .init,
.rodata等节的始末), 第四个饱含了可写数据(.init_array,
.fini_array等节的始末).

LOAD头对应的始末加载到内定的内存地址后大家就完结了构想中的第四个第二个步骤,
以往代码和数据都在内存中了.

接下去大家还亟需处理动态链接的函数,
处理所需的新闻可以从DYNAMIC头得到
DYNAMIC头包括的消息有

Dynamic section at offset 0xe28 contains 24 entries:
  标记        类型                         名称/值
 0x0000000000000001 (NEEDED)             共享库:[libc.so.6]
 0x000000000000000c (INIT)               0x4003c8
 0x000000000000000d (FINI)               0x400624
 0x0000000000000019 (INIT_ARRAY)         0x600e10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x600e18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400298
 0x0000000000000005 (STRTAB)             0x400318
 0x0000000000000006 (SYMTAB)             0x4002b8
 0x000000000000000a (STRSZ)              63 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x601000
 0x0000000000000002 (PLTRELSZ)           48 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400398
 0x0000000000000007 (RELA)               0x400380
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x400360
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400358
 0x0000000000000000 (NULL)               0x0

壹个个看上面代码中涉嫌到的类别

  • DT_JMPREL: 重定位记录的开首地址,
    指向.rela.plt节在内存中保留的地方
  • DT_PLTREL: 重定位记录的档次 RELA或RE, 那里是RELAL
  • DT_PLTRELSZ: 重定位记录的总大小, 那里是24 * 2 = 48

重定位节 '.rela.plt' 位于偏移量 0x398 含有 2 个条目:
  偏移量          信息           类型           符号值        符号名称 + 加数
000000601018  000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0
000000601020  000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
  • DT_SYMTAB: 动态符号表的开端地址,
    指向.dynsym节在内存中保存的地址
  • DT_STRTAB: 动态符号名称表的开首地址,
    指向.dynstr节在内存中保留的地方
  • DT_STRSZ: 动态符号名称表的总大小

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)
     3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__

在遍历完程序头现在, 我们可以了然有五个动态链接的函数要求重一贯,
它们各自是__libc_start_mainprintf,
其中__libc_start_main担当调用main函数
接下去让我们须要设置那几个函数的地址

    // 读取动态链接符号表
    std::string dynamicSymbolNames(reinterpret_cast<char*>(strTabAddr), strTabSize);
    Elf64_External_Sym* dynamicSymbols = reinterpret_cast<Elf64_External_Sym*>(symTabAddr);

    // 设置动态链接的函数地址
    std::cout << std::hex << "read dynamic entires at: 0x" << jmpRelAddr <<
        " size: 0x" << pltRelSize << std::dec << std::endl;
    if (jmpRelAddr == 0 || pltRelType != DT_RELA || pltRelSize % sizeof(Elf64_External_Rela) != 0) {
        throw std::runtime_error("invalid dynamic entry info, rel type should be rela");
    }
    std::vector<std::shared_ptr<void>> libraryFuncs;
    for (std::uint64_t offset = 0; offset < pltRelSize; offset += sizeof(Elf64_External_Rela)) {
        Elf64_External_Rela* rela = (Elf64_External_Rela*)(jmpRelAddr + offset);
        std::uint64_t relaOffset = *reinterpret_cast<const std::uint64_t*>(rela->r_offset);
        std::uint64_t relaInfo = *reinterpret_cast<const std::uint64_t*>(rela->r_info);
        std::uint64_t relaSym = relaInfo >> 32; // ELF64_R_SYM
        std::uint64_t relaType = relaInfo & 0xffffffff; // ELF64_R_TYPE
        // 获取符号
        Elf64_External_Sym* symbol = dynamicSymbols + relaSym;
        std::uint32_t symbolNameOffset = *reinterpret_cast<std::uint32_t*>(symbol->st_name);
        std::string symbolName(dynamicSymbolNames.data() + symbolNameOffset);
        std::cout << "relocate symbol: " << symbolName << std::endl;
        // 替换函数地址
        // 原本应该延迟解决,这里图简单就直接覆盖了
        void** relaPtr = reinterpret_cast<void**>(relaOffset);
        std::shared_ptr<void> func = resolveLibraryFunc(symbolName);
        if (func == nullptr) {
            throw std::runtime_error("unsupport symbol name");
        }
        libraryFuncs.emplace_back(func);
        *relaPtr = func.get();
    }

地点的代码遍历了DT_JMPREL重定位记录,
并且在加载时设置了这一个函数的地方,
事实上应当经过延迟化解完成的, 但是这里为了不难就径直替换到最后的地址了.

地点得到函数实际地址的逻辑自个儿写到了resolveLibraryFunc中,这些函数的贯彻在其余3个文书,
如下

namespace HelloElfLoader {
    namespace {
        // 原始的返回地址
        thread_local void* originalReturnAddress = nullptr;

        void* getOriginalReturnAddress() {
            return originalReturnAddress;
        }

        void setOriginalReturnAddress(void* address) {
            originalReturnAddress = address;
        }

        // 模拟libc调用main的函数,目前不支持传入argc和argv
        void __libc_start_main(int(*main)()) {
            std::cout << "call main: " << main << std::endl;
            int ret = main();
            std::cout << "result: " << ret << std::endl;
            std::exit(0);
        }

        // 模拟printf函数
        int printf(const char* fmt, ...) {
            int ret;
            va_list myargs;
            va_start(myargs, fmt);
            ret = ::vprintf(fmt, myargs);
            va_end(myargs);
            return ret;
        }

        // 把System V AMD64 ABI转换为Microsoft x64 calling convention
        // 因为vc++不支持inline asm,只能直接写hex
        // 这个函数支持任意长度的参数,但是性能会有损耗,如果参数数量已知可以编写更快的loader代码   
        const char generic_func_loader[]{
            // 让参数连续排列在栈上
            // [第一个参数] [第二个参数] [第三个参数] ...
            0x58, // pop %rax 暂存原返回地址
            0x41, 0x51, // push %r9 入栈第六个参数,之后的参数都在后续的栈上
            0x41, 0x50, // push %r8 入栈第五个参数
            0x51, // push %rcx 入栈第四个参数
            0x52, // push %rdx 入栈第三个参数
            0x56, // push %rsi 入栈第二个参数
            0x57, // push %rdi 入栈第一个参数

            // 调用setOriginalReturnAddress保存原返回地址
            0x48, 0x89, 0xc1, // mov %rax, %rcx 第一个参数是原返回地址
            0x48, 0x83, 0xec, 0x20, // sub $0x20, %rsp 预留32位的影子空间
            0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // movabs $0, %rax
            0xff, 0xd0, // callq *%rax 调用setOriginalReturnAddress
            0x48, 0x83, 0xc4, 0x20, // add %0x20, %rsp 释放影子空间

            // 转换到Microsoft x64 calling convention
            0x59, // pop %rcx 出栈第一个参数
            0x5a, // pop %rdx 出栈第二个参数
            0x41, 0x58, // pop %r8 // 出栈第三个参数
            0x41, 0x59, // pop %r9 // 出栈第四个参数

            // 调用目标函数
            0x48, 0x83, 0xec, 0x20, // sub $0x20, %esp 预留32位的影子空间
            0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // movabs 0, %rax
            0xff, 0xd0, // callq *%rax 调用模拟的函数
            0x48, 0x83, 0xc4, 0x30, // add $0x30, %rsp 释放影子空间和参数(影子空间32 + 参数8*2)
            0x50, // push %rax 保存返回值

            // 调用getOriginalReturnAddress获取原返回地址
            0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // movabs $0, %rax
            0xff, 0xd0, // callq *%rax 调用getOriginalReturnAddress
            0x48, 0x89, 0xc1, // mov %rax, %rcx 原返回地址存到rcx
            0x58, // 恢复返回值
            0x51, // 原返回地址入栈顶
            0xc3 // 返回
        };
        const int generic_func_loader_set_addr_offset = 18;
        const int generic_func_loader_target_offset = 44;
        const int generic_func_loader_get_addr_offset = 61;
    }

    // 获取动态链接函数的调用地址
    std::shared_ptr<void> resolveLibraryFunc(const std::string& name) {
        void* funcPtr = nullptr;
        if (name == "__libc_start_main") {
            funcPtr = __libc_start_main;
        }
        else if (name == "printf") {
            funcPtr = printf;
        }
        else {
            return nullptr;
        }
        void* addr = ::VirtualAlloc(nullptr,
            sizeof(generic_func_loader), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
        if (addr == nullptr) {
            throw std::runtime_error("allocate memory for _libc_start_main_loader failed");
        }
        std::shared_ptr<void> result(addr, [](void* ptr) { ::VirtualFree(ptr, 0, MEM_RELEASE); });
        std::memcpy(addr, generic_func_loader, sizeof(generic_func_loader));
        char* addr_c = reinterpret_cast<char*>(addr);
        *reinterpret_cast<void**>(addr_c + generic_func_loader_set_addr_offset) = setOriginalReturnAddress;
        *reinterpret_cast<void**>(addr_c + generic_func_loader_target_offset) = funcPtr;
        *reinterpret_cast<void**>(addr_c + generic_func_loader_get_addr_offset) = getOriginalReturnAddress;
        return result;
    }
}

了然那段代码必要先明白怎么着是x86 calling
conventions
,
在汇编中传递函数参数的方法由很各种,
cdecl是把全数参数都坐落栈中从低到高排列,
fastcall是把第3个参数放ecx, 第一个参数放edx, 其他参数放栈中.

小编们要求效法的6几人Linux程序,它传参使用了System V AMD64 ABI正式,
先把参数按RDI, RSI, RDX, RCX, R8, R9的相继设置,假诺有再多参数就位于栈中.
而六十二人的Windows传参使用了Microsoft x64 calling convention正式,
先把参数按RCX, RDX, R8, R9的顺序设置,假使有再多参数就放在栈中,
除此之外还亟需预留1个32字节的影子空间.
借使大家要求让Linux程序调用Windows程序中的函数,
必要对参数的次第进行转移, 那就是上边的汇编代码所做的事情.

改换前的栈结构如下

[原返回地址 8bytes] [第七个参数] [第八个参数] ...

转换后的栈结构如下

[返回地址 8bytes] [影子空间 32 bytes] [第五个参数] [第六个参数] [第七个参数] ...

因为急需协助不定个数的参数,
上边的代码用了二个thread local变量来保存原重返地址,
那样的拍卖会影响属性, 假若函数的参数个数已知可以换来更飞速的转换代码.

在设置好动态链接的函数地址后, 咱们落成了构想中的第④步,
接下来就足以运转主程序了

    // 获取入口点
    std::uint64_t entryPointAddress = *reinterpret_cast<const std::uint64_t*>(elfHeader.e_entry);
    void(*entryPointFunc)() = reinterpret_cast<void(*)()>(entryPointAddress);
    std::cout << "entry point: " << entryPointFunc << std::endl;
    std::cout << "====== finish loading elf ======" << std::endl;

    // 执行主程序
    // 会先调用__libc_start_main, 然后再调用main
    // 调用__libc_start_main后的指令是hlt,所以必须在__libc_start_main中退出执行
    entryPointFunc();

入口点的地方在ELF头中可以取拿到,这几个地址就是_start函数的地方,
大家把它转换来2个void()项目标函数指针再履行即可,
迄今示例程序完毕了构想中的全数效用.

执行功用如下图

图片 4

那份演示程序还有不少欠缺, 例如未匡助三10个人Linux程序,
不支持加载其余Linux动态链接库(so), 不接济命令行参数等等.
同时那份演示程序和Bash On Windows的原理有所出入,
因为在用户层是不或者模拟syscall.
本人期待它可以让你对哪些运转其余系统的可执行文件有二个早先的问询,
若是你指望更深切的垂询哪些模拟syscall,
可以搜索rdmsrwrmsr命令相关的资料.

终极附上本人在编辑那份演示程序中查阅的链接:

纠错(2017-10-28), 用户层通过vsyscall机制是足以效仿syscall的.

贯彻Linux程序运转器

在装有上述的知识后大家可以先构想以下的运营器必要做什么.

因为x64的Windows和Linux程序行使的cpu指令集都以同样的,大家可以直接实施汇编而不需求三个指令模拟器,
同时本次作者打算在用户层完结, 所以不可能像Bash On Windows一样模拟syscall,
这些运营器会像下图一律模拟libc库的函数

图片 5

那般运转器须求做的作业有:

  • 解析ELF文件
  • 加载程序代码到钦命的内存地址
  • 加载数据到内定的内存地址
  • 提供动态链接的函数完结
  • 推行加载的程序代码

这么些工作会在偏下的言传身教程序中种种完毕, 完整的源代码可以看小说顶部的链接

率先我们须求把ELF文件格式对应的代码从binutils中复制过来,
它包涵了ELF头, 程序头和有关的数据结构,
里面用unsigned char[]是为了防范alignment,
那样结构体可以平素从文件内容中改换过来

ELFDefine.h:

#pragma once

namespace HelloElfLoader {
    // 以下内容复制自
    // https://github.com/aeste/binutils/blob/develop/elfcpp/elfcpp.h
    // https://github.com/aeste/binutils/blob/develop/include/elf/external.h

    // e_ident中各项的偏移值
    const int EI_MAG0 = 0;
    const int EI_MAG1 = 1;
    const int EI_MAG2 = 2;
    const int EI_MAG3 = 3;
    const int EI_CLASS = 4;
    const int EI_DATA = 5;
    const int EI_VERSION = 6;
    const int EI_OSABI = 7;
    const int EI_ABIVERSION = 8;
    const int EI_PAD = 9;
    const int EI_NIDENT = 16;

    // ELF文件类型
    enum {
        ELFCLASSNONE = 0,
        ELFCLASS32 = 1,
        ELFCLASS64 = 2
    };

    // ByteOrder
    enum {
        ELFDATANONE = 0,
        ELFDATA2LSB = 1,
        ELFDATA2MSB = 2
    };

    // 程序头类型
    enum PT
    {
        PT_NULL = 0,
        PT_LOAD = 1,
        PT_DYNAMIC = 2,
        PT_INTERP = 3,
        PT_NOTE = 4,
        PT_SHLIB = 5,
        PT_PHDR = 6,
        PT_TLS = 7,
        PT_LOOS = 0x60000000,
        PT_HIOS = 0x6fffffff,
        PT_LOPROC = 0x70000000,
        PT_HIPROC = 0x7fffffff,
        // The remaining values are not in the standard.
        // Frame unwind information.
        PT_GNU_EH_FRAME = 0x6474e550,
        PT_SUNW_EH_FRAME = 0x6474e550,
        // Stack flags.
        PT_GNU_STACK = 0x6474e551,
        // Read only after relocation.
        PT_GNU_RELRO = 0x6474e552,
        // Platform architecture compatibility information
        PT_ARM_ARCHEXT = 0x70000000,
        // Exception unwind tables
        PT_ARM_EXIDX = 0x70000001
    };

    // 动态节类型
    enum DT
    {
        DT_NULL = 0,
        DT_NEEDED = 1,
        DT_PLTRELSZ = 2,
        DT_PLTGOT = 3,
        DT_HASH = 4,
        DT_STRTAB = 5,
        DT_SYMTAB = 6,
        DT_RELA = 7,
        DT_RELASZ = 8,
        DT_RELAENT = 9,
        DT_STRSZ = 10,
        DT_SYMENT = 11,
        DT_INIT = 12,
        DT_FINI = 13,
        DT_SONAME = 14,
        DT_RPATH = 15,
        DT_SYMBOLIC = 16,
        DT_REL = 17,
        DT_RELSZ = 18,
        DT_RELENT = 19,
        DT_PLTREL = 20,
        DT_DEBUG = 21,
        DT_TEXTREL = 22,
        DT_JMPREL = 23,
        DT_BIND_NOW = 24,
        DT_INIT_ARRAY = 25,
        DT_FINI_ARRAY = 26,
        DT_INIT_ARRAYSZ = 27,
        DT_FINI_ARRAYSZ = 28,
        DT_RUNPATH = 29,
        DT_FLAGS = 30,

        // This is used to mark a range of dynamic tags.  It is not really
        // a tag value.
        DT_ENCODING = 32,

        DT_PREINIT_ARRAY = 32,
        DT_PREINIT_ARRAYSZ = 33,
        DT_LOOS = 0x6000000d,
        DT_HIOS = 0x6ffff000,
        DT_LOPROC = 0x70000000,
        DT_HIPROC = 0x7fffffff,

        // The remaining values are extensions used by GNU or Solaris.
        DT_VALRNGLO = 0x6ffffd00,
        DT_GNU_PRELINKED = 0x6ffffdf5,
        DT_GNU_CONFLICTSZ = 0x6ffffdf6,
        DT_GNU_LIBLISTSZ = 0x6ffffdf7,
        DT_CHECKSUM = 0x6ffffdf8,
        DT_PLTPADSZ = 0x6ffffdf9,
        DT_MOVEENT = 0x6ffffdfa,
        DT_MOVESZ = 0x6ffffdfb,
        DT_FEATURE = 0x6ffffdfc,
        DT_POSFLAG_1 = 0x6ffffdfd,
        DT_SYMINSZ = 0x6ffffdfe,
        DT_SYMINENT = 0x6ffffdff,
        DT_VALRNGHI = 0x6ffffdff,

        DT_ADDRRNGLO = 0x6ffffe00,
        DT_GNU_HASH = 0x6ffffef5,
        DT_TLSDESC_PLT = 0x6ffffef6,
        DT_TLSDESC_GOT = 0x6ffffef7,
        DT_GNU_CONFLICT = 0x6ffffef8,
        DT_GNU_LIBLIST = 0x6ffffef9,
        DT_CONFIG = 0x6ffffefa,
        DT_DEPAUDIT = 0x6ffffefb,
        DT_AUDIT = 0x6ffffefc,
        DT_PLTPAD = 0x6ffffefd,
        DT_MOVETAB = 0x6ffffefe,
        DT_SYMINFO = 0x6ffffeff,
        DT_ADDRRNGHI = 0x6ffffeff,

        DT_RELACOUNT = 0x6ffffff9,
        DT_RELCOUNT = 0x6ffffffa,
        DT_FLAGS_1 = 0x6ffffffb,
        DT_VERDEF = 0x6ffffffc,
        DT_VERDEFNUM = 0x6ffffffd,
        DT_VERNEED = 0x6ffffffe,
        DT_VERNEEDNUM = 0x6fffffff,

        DT_VERSYM = 0x6ffffff0,

        // Specify the value of _GLOBAL_OFFSET_TABLE_.
        DT_PPC_GOT = 0x70000000,

        // Specify the start of the .glink section.
        DT_PPC64_GLINK = 0x70000000,

        // Specify the start and size of the .opd section.
        DT_PPC64_OPD = 0x70000001,
        DT_PPC64_OPDSZ = 0x70000002,

        // The index of an STT_SPARC_REGISTER symbol within the DT_SYMTAB
        // symbol table.  One dynamic entry exists for every STT_SPARC_REGISTER
        // symbol in the symbol table.
        DT_SPARC_REGISTER = 0x70000001,

        DT_AUXILIARY = 0x7ffffffd,
        DT_USED = 0x7ffffffe,
        DT_FILTER = 0x7fffffff
    };;

    // ELF头的定义
    typedef struct {
        unsigned char   e_ident[16];        /* ELF "magic number" */
        unsigned char   e_type[2];      /* Identifies object file type */
        unsigned char   e_machine[2];       /* Specifies required architecture */
        unsigned char   e_version[4];       /* Identifies object file version */
        unsigned char   e_entry[8];     /* Entry point virtual address */
        unsigned char   e_phoff[8];     /* Program header table file offset */
        unsigned char   e_shoff[8];     /* Section header table file offset */
        unsigned char   e_flags[4];     /* Processor-specific flags */
        unsigned char   e_ehsize[2];        /* ELF header size in bytes */
        unsigned char   e_phentsize[2];     /* Program header table entry size */
        unsigned char   e_phnum[2];     /* Program header table entry count */
        unsigned char   e_shentsize[2];     /* Section header table entry size */
        unsigned char   e_shnum[2];     /* Section header table entry count */
        unsigned char   e_shstrndx[2];      /* Section header string table index */
    } Elf64_External_Ehdr;

    // 程序头的定义
    typedef struct {
        unsigned char   p_type[4];      /* Identifies program segment type */
        unsigned char   p_flags[4];     /* Segment flags */
        unsigned char   p_offset[8];        /* Segment file offset */
        unsigned char   p_vaddr[8];     /* Segment virtual address */
        unsigned char   p_paddr[8];     /* Segment physical address */
        unsigned char   p_filesz[8];        /* Segment size in file */
        unsigned char   p_memsz[8];     /* Segment size in memory */
        unsigned char   p_align[8];     /* Segment alignment, file & memory */
    } Elf64_External_Phdr;

    // DYNAMIC类型的程序头的内容定义
    typedef struct {
        unsigned char   d_tag[8];       /* entry tag value */
        union {
            unsigned char   d_val[8];
            unsigned char   d_ptr[8];
        } d_un;
    } Elf64_External_Dyn;

    // 动态链接的重定位记录,部分系统会用Elf64_External_Rel
    typedef struct {
        unsigned char r_offset[8];  /* Location at which to apply the action */
        unsigned char   r_info[8];  /* index and type of relocation */
        unsigned char   r_addend[8];    /* Constant addend used to compute value */
    } Elf64_External_Rela;

    // 动态链接的符号信息
    typedef struct {
        unsigned char   st_name[4];     /* Symbol name, index in string tbl */
        unsigned char   st_info[1];     /* Type and binding attributes */
        unsigned char   st_other[1];        /* No defined meaning, 0 */
        unsigned char   st_shndx[2];        /* Associated section index */
        unsigned char   st_value[8];        /* Value of the symbol */
        unsigned char   st_size[8];     /* Associated symbol size */
    } Elf64_External_Sym;
}

接下去大家定义三个读取和执行ELF文件的类,
那个类会在早先化时把文件加载到fileStream_, execute函数会负责执行

HelloElfLoader.h:

#pragma once
#include <string>
#include <fstream>

namespace HelloElfLoader {
    class Loader {
        std::ifstream fileStream_;

    public:
        Loader(const std::string& path);
        Loader(std::ifstream&& fileStream);
        void execute();
    };
}

构造函数如下, 相当于正规的c++打开文件的代码

HelloElfLoader.cpp:

Loader::Loader(const std::string& path) :
    Loader(std::ifstream(path, std::ios::in | std::ios::binary)) {}

Loader::Loader(std::ifstream&& fileStream) :
    fileStream_(std::move(fileStream)) {
    if (!fileStream_) {
        throw std::runtime_error("open file failed");
    }
}

接下去将贯彻地点所说的步调, 首先是解析ELF文件

void Loader::execute() {
    std::cout << "====== start loading elf ======" << std::endl;

    // 检查当前运行程序是否64位
    if (sizeof(intptr_t) != sizeof(std::int64_t)) {
        throw std::runtime_error("please use x64 compile and run this program");
    }

    // 读取ELF头
    Elf64_External_Ehdr elfHeader = {};
    fileStream_.seekg(0);
    fileStream_.read(reinterpret_cast<char*>(&elfHeader), sizeof(elfHeader));

    // 检查ELF头,只支持64位且byte order是little endian的程序
    if (std::string(reinterpret_cast<char*>(elfHeader.e_ident), 4) != "\x7f\x45\x4c\x46") {
        throw std::runtime_error("magic not match");
    }
    else if (elfHeader.e_ident[EI_CLASS] != ELFCLASS64) {
        throw std::runtime_error("only support ELF64");
    }
    else if (elfHeader.e_ident[EI_DATA] != ELFDATA2LSB) {
        throw std::runtime_error("only support little endian");
    }

    // 获取program table的信息
    std::uint32_t programTableOffset = *reinterpret_cast<std::uint32_t*>(elfHeader.e_phoff);
    std::uint16_t programTableEntrySize = *reinterpret_cast<std::uint16_t*>(elfHeader.e_phentsize);
    std::uint16_t programTableEntryNum = *reinterpret_cast<std::uint16_t*>(elfHeader.e_phnum);
    std::cout << "program table at: " << programTableOffset << ", "
        << programTableEntryNum << " x " << programTableEntrySize << std::endl;

    // 获取section table的信息
    // section table只给linker用,loader中其实不需要访问section table
    std::uint32_t sectionTableOffset = *reinterpret_cast<std::uint32_t*>(elfHeader.e_shoff);
    std::uint16_t sectionTableEntrySize = *reinterpret_cast<std::uint16_t*>(elfHeader.e_shentsize);
    std::uint16_t sectionTableEntryNum = *reinterpret_cast<std::uint16_t*>(elfHeader.e_shentsize);
    std::cout << "section table at: " << sectionTableOffset << ", "
        << sectionTableEntryNum << " x " << sectionTableEntrySize << std::endl;

ELF文件的的发端部分就是ELF头,和Elf64_External_Ehdr结构体的组织同样,
大家得以读到Elf64_External_Ehdr结构体中,
然后ELF头包罗了程序头和节头的偏移值, 大家得以优先获取到这个参数

节头在运维时不须要动用, 运维时索要遍历程序头

    // 准备动态链接的信息
    std::uint64_t jmpRelAddr = 0; // 重定位记录的开始地址
    std::uint64_t pltRelType = 0; // 重定位记录的类型 RELA或REL
    std::uint64_t pltRelSize = 0; // 重定位记录的总大小
    std::uint64_t symTabAddr = 0; // 动态符号表的开始地址
    std::uint64_t strTabAddr = 0; // 动态符号名称表的开始地址
    std::uint64_t strTabSize = 0; // 动态符号名称表的总大小

    // 遍历program hedaer
    std::vector<Elf64_External_Phdr> programHeaders;
    programHeaders.resize(programTableEntryNum);
    fileStream_.read(reinterpret_cast<char*>(programHeaders.data()), programTableEntryNum * programTableEntrySize);
    std::vector<std::shared_ptr<void>> loadedSegments;
    for (const auto& programHeader : programHeaders) {
        std::uint32_t type = *reinterpret_cast<const std::uint32_t*>(programHeader.p_type);
        if (type == PT_LOAD) {
            // 把文件内容(包含程序代码和数据)加载到虚拟内存,这个示例不考虑地址冲突
            std::uint64_t fileOffset = *reinterpret_cast<const std::uint64_t*>(programHeader.p_offset);
            std::uint64_t fileSize = *reinterpret_cast<const std::uint64_t*>(programHeader.p_filesz);
            std::uint64_t virtAddr = *reinterpret_cast<const std::uint64_t*>(programHeader.p_vaddr);
            std::uint64_t memSize = *reinterpret_cast<const std::uint64_t*>(programHeader.p_memsz);
            if (memSize < fileSize) {
                throw std::runtime_error("invalid memsz in program header, it shouldn't less than filesz");
            }
            // 在指定的虚拟地址分配内存
            std::cout << std::hex << "allocate address at: 0x" << virtAddr <<
                " size: 0x" << memSize << std::dec << std::endl;
            void* addr = ::VirtualAlloc((void*)virtAddr, memSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
            if (addr == nullptr) {
                throw std::runtime_error("allocate memory at specific address failed");
            }
            loadedSegments.emplace_back(addr, [](void* ptr) { ::VirtualFree(ptr, 0, MEM_RELEASE); });
            // 复制文件内容到虚拟内存
            fileStream_.seekg(fileOffset);
            if (!fileStream_.read(reinterpret_cast<char*>(addr), fileSize)) {
                throw std::runtime_error("read contents into memory from LOAD program header failed");
            }
        }
        else if (type == PT_DYNAMIC) {
            // 遍历动态节
            std::uint64_t fileOffset = *reinterpret_cast<const std::uint64_t*>(programHeader.p_offset);
            fileStream_.seekg(fileOffset);
            Elf64_External_Dyn dynSection = {};
            std::uint64_t dynSectionTag = 0;
            std::uint64_t dynSectionVal = 0;
            do {
                if (!fileStream_.read(reinterpret_cast<char*>(&dynSection), sizeof(dynSection))) {
                    throw std::runtime_error("read dynamic section failed");
                }
                dynSectionTag = *reinterpret_cast<const std::uint64_t*>(dynSection.d_tag);
                dynSectionVal = *reinterpret_cast<const std::uint64_t*>(dynSection.d_un.d_val);
                if (dynSectionTag == DT_JMPREL) {
                    jmpRelAddr = dynSectionVal;
                }
                else if (dynSectionTag == DT_PLTREL) {
                    pltRelType = dynSectionVal;
                }
                else if (dynSectionTag == DT_PLTRELSZ) {
                    pltRelSize = dynSectionVal;
                }
                else if (dynSectionTag == DT_SYMTAB) {
                    symTabAddr = dynSectionVal;
                }
                else if (dynSectionTag == DT_STRTAB) {
                    strTabAddr = dynSectionVal;
                }
                else if (dynSectionTag == DT_STRSZ) {
                    strTabSize = dynSectionVal;
                }
            } while (dynSectionTag != 0);
        }
    }

还记得大家地点运用readelf读取到的音信呢?

程序头:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000007d4 0x00000000000007d4  R E    200000
  LOAD           0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x0000000000000228 0x0000000000000230  RW     200000
  DYNAMIC        0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                 0x00000000000001d0 0x00000000000001d0  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x0000000000000680 0x0000000000400680 0x0000000000400680
                 0x000000000000003c 0x000000000000003c  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x00000000000001f0 0x00000000000001f0  R      1

这么些中类型是LOAD的头代表要求加载文件的故事情节到内存,
Offset是文本的偏移值, VirtAddr是虚拟内存地址,
FileSiz是索要加载的文件大小, MemSiz是急需分配的内存大小,
Flags是内存的访问权限,
以此示例不考虑访问权限(统一行使PAGE_EXECUTE_READWRITE).

以此程序有三个LOAD头, 第多个饱含了代码和只读数据(.data, .init,
.rodata等节的内容), 第四个带有了可写数据(.init_array,
.fini_array等节的内容).

LOAD头对应的内容加载到指定的内存地址后大家就做到了构想中的第2个第一个步骤,
将来代码和数码都在内存中了.

接下去大家还需求处理动态链接的函数,
处理所需的消息方可从DYNAMIC头得到
DYNAMIC头包含的消息有

Dynamic section at offset 0xe28 contains 24 entries:
  标记        类型                         名称/值
 0x0000000000000001 (NEEDED)             共享库:[libc.so.6]
 0x000000000000000c (INIT)               0x4003c8
 0x000000000000000d (FINI)               0x400624
 0x0000000000000019 (INIT_ARRAY)         0x600e10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x600e18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400298
 0x0000000000000005 (STRTAB)             0x400318
 0x0000000000000006 (SYMTAB)             0x4002b8
 0x000000000000000a (STRSZ)              63 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x601000
 0x0000000000000002 (PLTRELSZ)           48 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400398
 0x0000000000000007 (RELA)               0x400380
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x400360
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400358
 0x0000000000000000 (NULL)               0x0

三个个看下面代码中关系到的项目

  • DT_JMPREL: 重定位记录的开首地址,
    指向.rela.plt节在内存中保存的地址
  • DT_PLTREL: 重定位记录的类型 RELA或RE, 那里是RELAL
  • DT_PLTRELSZ: 重定位记录的总大小, 那里是24 * 2 = 48

重定位节 '.rela.plt' 位于偏移量 0x398 含有 2 个条目:
  偏移量          信息           类型           符号值        符号名称 + 加数
000000601018  000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0
000000601020  000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
  • DT_SYMTAB: 动态符号表的启幕地址,
    指向.dynsym节在内存中保留的地点
  • DT_STRTAB: 动态符号名称表的开首地址,
    指向.dynstr节在内存中保留的地方
  • DT_STRSZ: 动态符号名称表的总大小

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)
     3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__

在遍历完程序头未来, 大家能够知道有几个动态链接的函数须要重一向,
它们分别是__libc_start_mainprintf,
其中__libc_start_main承担调用main函数
接下去让大家需求安装那一个函数的地点

    // 读取动态链接符号表
    std::string dynamicSymbolNames(reinterpret_cast<char*>(strTabAddr), strTabSize);
    Elf64_External_Sym* dynamicSymbols = reinterpret_cast<Elf64_External_Sym*>(symTabAddr);

    // 设置动态链接的函数地址
    std::cout << std::hex << "read dynamic entires at: 0x" << jmpRelAddr <<
        " size: 0x" << pltRelSize << std::dec << std::endl;
    if (jmpRelAddr == 0 || pltRelType != DT_RELA || pltRelSize % sizeof(Elf64_External_Rela) != 0) {
        throw std::runtime_error("invalid dynamic entry info, rel type should be rela");
    }
    std::vector<std::shared_ptr<void>> libraryFuncs;
    for (std::uint64_t offset = 0; offset < pltRelSize; offset += sizeof(Elf64_External_Rela)) {
        Elf64_External_Rela* rela = (Elf64_External_Rela*)(jmpRelAddr + offset);
        std::uint64_t relaOffset = *reinterpret_cast<const std::uint64_t*>(rela->r_offset);
        std::uint64_t relaInfo = *reinterpret_cast<const std::uint64_t*>(rela->r_info);
        std::uint64_t relaSym = relaInfo >> 32; // ELF64_R_SYM
        std::uint64_t relaType = relaInfo & 0xffffffff; // ELF64_R_TYPE
        // 获取符号
        Elf64_External_Sym* symbol = dynamicSymbols + relaSym;
        std::uint32_t symbolNameOffset = *reinterpret_cast<std::uint32_t*>(symbol->st_name);
        std::string symbolName(dynamicSymbolNames.data() + symbolNameOffset);
        std::cout << "relocate symbol: " << symbolName << std::endl;
        // 替换函数地址
        // 原本应该延迟解决,这里图简单就直接覆盖了
        void** relaPtr = reinterpret_cast<void**>(relaOffset);
        std::shared_ptr<void> func = resolveLibraryFunc(symbolName);
        if (func == nullptr) {
            throw std::runtime_error("unsupport symbol name");
        }
        libraryFuncs.emplace_back(func);
        *relaPtr = func.get();
    }

上边的代码遍历了DT_JMPREL重定位记录,
并且在加载时设置了那么些函数的地址,
骨子里应该通过延迟消除完成的, 不过此处为了简单就一贯替换来最后的地方了.

上边得到函数实际地址的逻辑本人写到了resolveLibraryFunc中,这么些函数的贯彻在其它2个文本,
如下

namespace HelloElfLoader {
    namespace {
        // 原始的返回地址
        thread_local void* originalReturnAddress = nullptr;

        void* getOriginalReturnAddress() {
            return originalReturnAddress;
        }

        void setOriginalReturnAddress(void* address) {
            originalReturnAddress = address;
        }

        // 模拟libc调用main的函数,目前不支持传入argc和argv
        void __libc_start_main(int(*main)()) {
            std::cout << "call main: " << main << std::endl;
            int ret = main();
            std::cout << "result: " << ret << std::endl;
            std::exit(0);
        }

        // 模拟printf函数
        int printf(const char* fmt, ...) {
            int ret;
            va_list myargs;
            va_start(myargs, fmt);
            ret = ::vprintf(fmt, myargs);
            va_end(myargs);
            return ret;
        }

        // 把System V AMD64 ABI转换为Microsoft x64 calling convention
        // 因为vc++不支持inline asm,只能直接写hex
        // 这个函数支持任意长度的参数,但是性能会有损耗,如果参数数量已知可以编写更快的loader代码   
        const char generic_func_loader[]{
            // 让参数连续排列在栈上
            // [第一个参数] [第二个参数] [第三个参数] ...
            0x58, // pop %rax 暂存原返回地址
            0x41, 0x51, // push %r9 入栈第六个参数,之后的参数都在后续的栈上
            0x41, 0x50, // push %r8 入栈第五个参数
            0x51, // push %rcx 入栈第四个参数
            0x52, // push %rdx 入栈第三个参数
            0x56, // push %rsi 入栈第二个参数
            0x57, // push %rdi 入栈第一个参数

            // 调用setOriginalReturnAddress保存原返回地址
            0x48, 0x89, 0xc1, // mov %rax, %rcx 第一个参数是原返回地址
            0x48, 0x83, 0xec, 0x20, // sub $0x20, %rsp 预留32位的影子空间
            0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // movabs $0, %rax
            0xff, 0xd0, // callq *%rax 调用setOriginalReturnAddress
            0x48, 0x83, 0xc4, 0x20, // add %0x20, %rsp 释放影子空间

            // 转换到Microsoft x64 calling convention
            0x59, // pop %rcx 出栈第一个参数
            0x5a, // pop %rdx 出栈第二个参数
            0x41, 0x58, // pop %r8 // 出栈第三个参数
            0x41, 0x59, // pop %r9 // 出栈第四个参数

            // 调用目标函数
            0x48, 0x83, 0xec, 0x20, // sub $0x20, %esp 预留32位的影子空间
            0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // movabs 0, %rax
            0xff, 0xd0, // callq *%rax 调用模拟的函数
            0x48, 0x83, 0xc4, 0x30, // add $0x30, %rsp 释放影子空间和参数(影子空间32 + 参数8*2)
            0x50, // push %rax 保存返回值

            // 调用getOriginalReturnAddress获取原返回地址
            0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // movabs $0, %rax
            0xff, 0xd0, // callq *%rax 调用getOriginalReturnAddress
            0x48, 0x89, 0xc1, // mov %rax, %rcx 原返回地址存到rcx
            0x58, // 恢复返回值
            0x51, // 原返回地址入栈顶
            0xc3 // 返回
        };
        const int generic_func_loader_set_addr_offset = 18;
        const int generic_func_loader_target_offset = 44;
        const int generic_func_loader_get_addr_offset = 61;
    }

    // 获取动态链接函数的调用地址
    std::shared_ptr<void> resolveLibraryFunc(const std::string& name) {
        void* funcPtr = nullptr;
        if (name == "__libc_start_main") {
            funcPtr = __libc_start_main;
        }
        else if (name == "printf") {
            funcPtr = printf;
        }
        else {
            return nullptr;
        }
        void* addr = ::VirtualAlloc(nullptr,
            sizeof(generic_func_loader), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
        if (addr == nullptr) {
            throw std::runtime_error("allocate memory for _libc_start_main_loader failed");
        }
        std::shared_ptr<void> result(addr, [](void* ptr) { ::VirtualFree(ptr, 0, MEM_RELEASE); });
        std::memcpy(addr, generic_func_loader, sizeof(generic_func_loader));
        char* addr_c = reinterpret_cast<char*>(addr);
        *reinterpret_cast<void**>(addr_c + generic_func_loader_set_addr_offset) = setOriginalReturnAddress;
        *reinterpret_cast<void**>(addr_c + generic_func_loader_target_offset) = funcPtr;
        *reinterpret_cast<void**>(addr_c + generic_func_loader_get_addr_offset) = getOriginalReturnAddress;
        return result;
    }
}

通晓那段代码必要先精晓什么是x86 calling
conventions
,
在汇编中传递函数参数的法门由很种种,
cdecl是把富有参数都放在栈中从低到高排列,
fastcall是把第二个参数放ecx, 第二个参数放edx, 其他参数放栈中.

笔者们须求效法的6几位Linux程序,它传参使用了System V AMD64 ABI规范,
先把参数按RDI, RSI, RDX, RCX, R8, R9的顺序设置,即便有再多参数就放在栈中.
而6贰位的Windows传参使用了Microsoft x64 calling convention标准,
先把参数按RCX, RDX, R8, R9的次第设置,假设有再多参数就置身栈中,
除此之外还索要预留三个32字节的阴影空间.
即使我们必要让Linux程序调用Windows程序中的函数,
需求对参数的依次举办更换, 那就是地点的汇编代码所做的事情.

转移前的栈结构如下

[原返回地址 8bytes] [第七个参数] [第八个参数] ...

更换后的栈结构如下

[返回地址 8bytes] [影子空间 32 bytes] [第五个参数] [第六个参数] [第七个参数] ...

因为须求支持不定个数的参数,
上边的代码用了1个thread local变量来保存原重临地址,
那样的处理会潜移默化属性, 假若函数的参数个数已知可以换到更火速的转换代码.

在设置好动态链接的函数地址后, 我们成功了构想中的第六步,
接下来就可以运转主程序了

    // 获取入口点
    std::uint64_t entryPointAddress = *reinterpret_cast<const std::uint64_t*>(elfHeader.e_entry);
    void(*entryPointFunc)() = reinterpret_cast<void(*)()>(entryPointAddress);
    std::cout << "entry point: " << entryPointFunc << std::endl;
    std::cout << "====== finish loading elf ======" << std::endl;

    // 执行主程序
    // 会先调用__libc_start_main, 然后再调用main
    // 调用__libc_start_main后的指令是hlt,所以必须在__libc_start_main中退出执行
    entryPointFunc();

入口点的地址在ELF头中可以拿走到,那么些地点就是_start函数的地址,
大家把它转换来三个void()项目标函数指针再举办即可,
至此示例程序落成了构想中的全部功用.

实施职能如下图

图片 6

那份演示程序还有为数不少相差, 例如未帮衬三十四人Linux程序,
不辅助加载其余Linux动态链接库(so), 不协助命令行参数等等.
并且那份演示程序和Bash On Windows的原理有所出入,
因为在用户层是力不从心模拟syscall.
自作者梦想它可以让你对哪些运作其余系统的可执行文件有二个伊始的打听,
借使你希望更尖锐的询问什么模拟syscall,
能够查找rdmsrwrmsr指令相关的资料.

末段附上本人在编制这份演示程序中查看的链接:

纠错(2017-10-28), 用户层通过vsyscall机制是可以效仿syscall的.