【gcc编译优化系列】如何(不)回收未发生调用的函数

嵌入式物联网开发 2022-07-11 5050

描述

1 问题场景

大家都知道，我们在开发单片机类的嵌入式固件时，一般使用的FLASH存储空间都是比较有限的，小的可能几十KB，大一点的可能也就几百KB，可以说是寸金寸土的FLASH空间，可容不得我们半点垃圾代码。如果我们在写代码的过程中，随便写一些没用的代码，比如一些测试代码，最后版本释放的时候，这些测试代码又没有删掉，还是参与了编译，那么势必最后这个函数的代码实现就会保留在我们的固件包里面，这样我们的固件包的bin文件大小势必会增加，这显然不是我们想要的。另外，还有一种场景下，有些函数我们使用static修饰的局部函数，只在初始化的时候通过初始化列表的形式调用一下，比如RT-Thread的初始化实现，INIT_DEVICE_EXPORT(device_init_func)，那么我们是不希望这个函数被优化掉的，否则最后会出逻辑问题。在使用GCC作为编译器的环境下，有什么办法可以实现呢？

2 需求分析

这里的需求两点：

没有被调用的函数需要移除，不出现在最后的固件文件里面；

某些特殊的函数实现，没有被显式调用，但是需要保留它，不能被优化掉。

3 需求实现

3.1 示例代码

实现的一个示例代码如下所示，功能很简单就定义了2个没被调用的函数，一个我希望优化移除，一个我希望被优化保留。

#include 

#define CODE_SECTION(x)               __attribute__((section(x)))
#define CODE_KEEP_USED                CODE_SECTION(".text.keep.used.code")

void unused_func1(int a)
{
    printf("a: %d\n", a);
}

CODE_KEEP_USED void unused_func2(int a)
{
    printf("a: %d\n", a);
}

int main(int argc, const char *argv[])
{
    printf("Hello world !\n");
    return 0;
}

3.2 链接脚本

链接脚本是GCC在链接所有目标文件变成可执行文件的时候，需要读取的一个配置文件，该文件决定了最后的可执行文件是如何分布的。我这里使用的是Ubuntu X64平台 GCC默认的链接脚本，由此改造而来。至于如何获取GCC默认的链接脚本，请参考这里的教程。修改后的链接脚本如下：

/* Script for -z combreloc -z separate-code */
/* Copyright (C) 2014-2020 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.  */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
          "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("=/usr/local/lib/x86_64-linux-gnu"); SEARCH_DIR("=/lib/x86_64-linux-gnu"); SEARCH_DIR("=/usr/lib/x86_64-linux-gnu"); SEARCH_DIR("=/usr/lib/x86_64-linux-gnu64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); SEARCH_DIR("=/usr/x86_64-linux-gnu/lib64"); SEARCH_DIR("=/usr/x86_64-linux-gnu/lib");
SECTIONS
{
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  .interp         : { *(.interp) }
  .note.gnu.build-id  : { *(.note.gnu.build-id) }
  .hash           : { *(.hash) }
  .gnu.hash       : { *(.gnu.hash) }
  .dynsym         : { *(.dynsym) }
  .dynstr         : { *(.dynstr) }
  .gnu.version    : { *(.gnu.version) }
  .gnu.version_d  : { *(.gnu.version_d) }
  .gnu.version_r  : { *(.gnu.version_r) }
  .rela.dyn       :
    {
      *(.rela.init)
      *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*)
      *(.rela.fini)
      *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*)
      *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*)
      *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*)
      *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*)
      *(.rela.ctors)
      *(.rela.dtors)
      *(.rela.got)
      *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*)
      *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*)
      *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*)
      *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*)
      *(.rela.ifunc)
    }
  .rela.plt       :
    {
      *(.rela.plt)
      PROVIDE_HIDDEN (__rela_iplt_start = .);
      *(.rela.iplt)
      PROVIDE_HIDDEN (__rela_iplt_end = .);
    }
  . = ALIGN(CONSTANT (MAXPAGESIZE));
  .init           :
  {
    KEEP (*(SORT_NONE(.init)))
  }
  .plt            : { *(.plt) *(.iplt) }
.plt.got        : { *(.plt.got) }
.plt.sec        : { *(.plt.sec) }
  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(SORT(.text.sorted.*))
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf.em.  */
    *(.gnu.warning)
  }
  .fini           :
  {
    KEEP (*(SORT_NONE(.fini)))
  }
  PROVIDE (__etext = .);
  PROVIDE (_etext = .);
  PROVIDE (etext = .);
  . = ALIGN(CONSTANT (MAXPAGESIZE));
  /* Adjust the address for the rodata segment.  We want to adjust up to
     the same address within the page on the next page up.  */
  . = SEGMENT_START("rodata-segment", ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)));
  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
  .rodata1        : { *(.rodata1) }
  .eh_frame_hdr   : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) }
  .eh_frame       : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) }
  .gcc_except_table   : ONLY_IF_RO { *(.gcc_except_table .gcc_except_table.*) }
  .gnu_extab   : ONLY_IF_RO { *(.gnu_extab*) }
  /* These sections are generated by the Sun/Oracle C++ compiler.  */
  .exception_ranges   : ONLY_IF_RO { *(.exception_ranges*) }
  /* Adjust the address for the data segment.  We want to adjust up to
     the same address within the page on the next page up.  */
  . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
  /* Exception handling  */
  .eh_frame       : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) }
  .gnu_extab      : ONLY_IF_RW { *(.gnu_extab) }
  .gcc_except_table   : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
  .exception_ranges   : ONLY_IF_RW { *(.exception_ranges*) }
  /* Thread Local Storage sections  */
  .tdata      :
   {
     PROVIDE_HIDDEN (__tdata_start = .);
     *(.tdata .tdata.* .gnu.linkonce.td.*)
   }
  .tbss          : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) }
  .preinit_array    :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  }
  .init_array    :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
    KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors))
    PROVIDE_HIDDEN (__init_array_end = .);
  }
  .fini_array    :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*)))
    KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors))
    PROVIDE_HIDDEN (__fini_array_end = .);
  }

  /* Here use to keep some sections, which are not wanted to be removed. */
  .text_keep_used_code :
  {
    KEEP (*(.text.keep.used.code))
  }

  .ctors          :
  {
    /* gcc uses crtbegin.o to find the start of
       the constructors, so we make sure it is
       first.  Because this is a wildcard, it
       doesn't matter if the user does not
       actually link against crtbegin.o; the
       linker won't look for a file to match a
       wildcard.  The wildcard also means that it
       doesn't matter which directory crtbegin.o
       is in.  */
    KEEP (*crtbegin.o(.ctors))
    KEEP (*crtbegin?.o(.ctors))
    /* We don't want to include the .ctor section from
       the crtend.o file until after the sorted ctors.
       The .ctor section from the crtend file contains the
       end of ctors marker and it must be last */
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
    KEEP (*(SORT(.ctors.*)))
    KEEP (*(.ctors))
  }
  .dtors          :
  {
    KEEP (*crtbegin.o(.dtors))
    KEEP (*crtbegin?.o(.dtors))
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
    KEEP (*(SORT(.dtors.*)))
    KEEP (*(.dtors))
  }
  .jcr            : { KEEP (*(.jcr)) }
  .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) }
  .dynamic        : { *(.dynamic) }
  .got            : { *(.got) *(.igot) }
  . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .);
  .got.plt        : { *(.got.plt) *(.igot.plt) }
  .data           :
  {
    *(.data .data.* .gnu.linkonce.d.*)
    SORT(CONSTRUCTORS)
  }
  .data1          : { *(.data1) }
  _edata = .; PROVIDE (edata = .);
  . = .;
  __bss_start = .;
  .bss            :
  {
   *(.dynbss)
   *(.bss .bss.* .gnu.linkonce.b.*)
   *(COMMON)
   /* Align here to ensure that the .bss section occupies space up to
      _end.  Align after .bss to ensure correct alignment even if the
      .bss section disappears because there are no input sections.
      FIXME: Why do we need it? When there is no .bss section, we do not
      pad the .data section.  */
   . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  .lbss   :
  {
    *(.dynlbss)
    *(.lbss .lbss.* .gnu.linkonce.lb.*)
    *(LARGE_COMMON)
  }
  . = ALIGN(64 / 8);
  . = SEGMENT_START("ldata-segment", .);
  .lrodata   ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) :
  {
    *(.lrodata .lrodata.* .gnu.linkonce.lr.*)
  }
  .ldata   ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) :
  {
    *(.ldata .ldata.* .gnu.linkonce.l.*)
    . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  . = ALIGN(64 / 8);
  _end = .; PROVIDE (end = .);
  . = DATA_SEGMENT_END (.);
  /* Stabs debugging sections.  */
  .stab          0 : { *(.stab) }
  .stabstr       0 : { *(.stabstr) }
  .stab.excl     0 : { *(.stab.excl) }
  .stab.exclstr  0 : { *(.stab.exclstr) }
  .stab.index    0 : { *(.stab.index) }
  .stab.indexstr 0 : { *(.stab.indexstr) }
  .comment       0 : { *(.comment) }
  .gnu.build.attributes : { *(.gnu.build.attributes .gnu.build.attributes.*) }
  /* DWARF debug sections.
     Symbols in the DWARF debugging sections are relative to the beginning
     of the section so we begin them at 0.  */
  /* DWARF 1 */
  .debug          0 : { *(.debug) }
  .line           0 : { *(.line) }
  /* GNU DWARF 1 extensions */
  .debug_srcinfo  0 : { *(.debug_srcinfo) }
  .debug_sfnames  0 : { *(.debug_sfnames) }
  /* DWARF 1.1 and DWARF 2 */
  .debug_aranges  0 : { *(.debug_aranges) }
  .debug_pubnames 0 : { *(.debug_pubnames) }
  /* DWARF 2 */
  .debug_info     0 : { *(.debug_info .gnu.linkonce.wi.*) }
  .debug_abbrev   0 : { *(.debug_abbrev) }
  .debug_line     0 : { *(.debug_line .debug_line.* .debug_line_end) }
  .debug_frame    0 : { *(.debug_frame) }
  .debug_str      0 : { *(.debug_str) }
  .debug_loc      0 : { *(.debug_loc) }
  .debug_macinfo  0 : { *(.debug_macinfo) }
  /* SGI/MIPS DWARF 2 extensions */
  .debug_weaknames 0 : { *(.debug_weaknames) }
  .debug_funcnames 0 : { *(.debug_funcnames) }
  .debug_typenames 0 : { *(.debug_typenames) }
  .debug_varnames  0 : { *(.debug_varnames) }
  /* DWARF 3 */
  .debug_pubtypes 0 : { *(.debug_pubtypes) }
  .debug_ranges   0 : { *(.debug_ranges) }
  /* DWARF Extension.  */
  .debug_macro    0 : { *(.debug_macro) }
  .debug_addr     0 : { *(.debug_addr) }
  .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) }
  /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) }
}

关键地方在于.textkeepused_code段的定义，其他都是默认链接脚本就已有的内容。

3.3 编译脚本

Ubuntu x64下的编译脚本，支持编译启用回收优化和不启用回收优化的情况，参考如下：

#! /bin/bash -e

CFLAGS="-save-temps=obj -Wall"
LDFLAGS="-Wl,-Map=test.map"

CFLAGS_GC="-fdata-sections -ffunction-sections"
LDFLAGS_GC="-Wl,--gc-sections"
LDFLAGS_MAP_GC="-Wl,-Map=test_gc.map"

PRINT_GC="-Wl,--print-gc-sections"

GCC_LDS=default.lds

if [ "$1" = "clean" ]; then
    rm -rf test* *.i *.s *.o *.map
    echo "Clean build done !"
    exit 0
elif [ "$1" = "gc" ]; then
    echo "gcc compile with gc ..."
    single_c_file=`ls *.c | cut -d . -f 1`
    cmd1="gcc -o $single_c_file.o -c *.c $CFLAGS $CFLAGS_GC"
    cmd2="gcc -o test_gc $single_c_file.o $LDFLAGS_GC $LDFLAGS_MAP_GC -T $GCC_LDS $PRINT_GC"
    echo "$cmd1 && $cmd2" && $cmd1 && $cmd2
else
    echo "gcc compile without gc ... (default)"
    cmd="gcc *.c $CFLAGS $LDFLAGS -o test"
    echo $cmd && $cmd
fi

exit 0

3.4 验证测试

3.4.1 验证不启用编译回收优化的情况

使用./build.sh编译输出各种文件，使用grep-rsn unused_func验证：

**grep -rsnw unused_func1**
main.c:9:void unused_func1(int a)
**Binary file test matches**
test.i:735:void unused_func1(int a)
Binary file test.o matches
test.s:7:       .globl  unused_func1
test.s:8:       .type   unused_func1, @function
test.s:9:unused_func1:
test.s:31:      .size   unused_func1, .-unused_func1
test.map:193:                0x0000000000001169                unused_func1

**grep -rsnw unused_func2**
main.c:14:CODE_KEEP_USED void unused_func2(int a)
**Binary file test matches**
test.i:740:__attribute__((section(".text.keep.used.code"))) void unused_func2(int a)
Binary file test.o matches
test.s:33:      .globl  unused_func2
test.s:34:      .type   unused_func2, @function
test.s:35:unused_func2:
test.s:57:      .size   unused_func2, .-unused_func2
test.map:197:                0x00000000000011b7                unused_func2

我们可以发现，在最后生成的test可执行文件中，都找到了unusedfunc1和testfunc2，也就是说在不启用回收优化的情况下，跟我们之前的预期是一样的，这样增加固件包的尺寸。

3.4.2 验证启用编译回收优化的情况

使用./build.sh gc编译输出各种文件，使用grep-rsn unused_func验证：

**grep -rsnw unused_func1**
Binary file main.o matches
main.c:9:void unused_func1(int a)
test_gc.map:35: .text.unused_func1
main.i:735:void unused_func1(int a)
main.s:6:       .section        .text.unused_func1,"ax",@progbits
main.s:7:       .globl  unused_func1
main.s:8:       .type   unused_func1, @function
main.s:9:unused_func1:
main.s:31:      .size   unused_func1, .-unused_func1

**grep -rsnw unused_func2**
Binary file main.o matches
main.c:14:CODE_KEEP_USED void unused_func2(int a)
test_gc.map:215:                0x0000000000401169                unused_func2
main.i:740:__attribute__((section(".text.keep.used.code"))) void unused_func2(int a)
main.s:33:      .globl  unused_func2
main.s:34:      .type   unused_func2, @function
main.s:35:unused_func2:
main.s:57:      .size   unused_func2, .-unused_func2
**Binary file test_gc matches**

从中，我们发现最后的可执行文件testgc里面只有unusedfunc2，而unused_func1就没回收了，这个就实现了我们前面定义的需求。

4 原理分析

4.1 实现原理

这里实现的原理主要有4个部分：第1部分主要修改的是代码编写阶段，在不希望被回收优化的函数前面添加特殊的段名称，比如__attribute__((section(".text.keep.used.code"))), 第2部分主要修改的是编译阶段，通过在CFLAGS中添加-fdata-sections-ffunction-sections来实现, 第3部分主要修改的是链接阶段，通过在LDFLAGS中添加-Wl,-gc-sections来实现, 第4部分主要修改的是链接脚本，通过在段名称中，新增下面的段申明，主要是为了限制指定的段，不被回收。

.text_keep_used_code :
  {
    KEEP (*(.text.keep.used.code))
  }

从原理上说，编译时使用-fdata-sections-ffunction-sections是为了让data数据和每一个函数都生成特定的段，以函数xxx为例，那么它将会放在.text.xxx段里面，然后在链接阶段的时候使用-Wl,-gc-sections回收那些不使用的段。注意这里回收的最小单位是段，所以编译阶段那两个选项是必不可少的。

4.2 原理验证分析

4.2.1 确认编译阶段的函数所在的段

这个确认我们可以map文件和.s汇编文件就可以确认，

/* map文件中 unused_fun1的描述 */
35  .text.unused_func1
36                 0x0000000000000000       0x28 main.o

/* 文件中 unused_fun2的描述 */
213  .text.keep.used.code
214                 0x0000000000401169       0x28 main.o
215                 0x0000000000401169                unused_func2

/* 汇编文件中 unused_fun1的描述 */
  6         .section        .text.unused_func1,"ax",@progbits
  7         .globl  unused_func1
  8         .type   unused_func1, @function
  9 unused_func1:
 10 .LFB0:
 11         .cfi_startproc
 12         endbr64
 13         pushq   %rbp
 14         .cfi_def_cfa_offset 16
 15         .cfi_offset 6, -16

/* 汇编文件中 unused_fun2的描述 */
 32         .section        .text.keep.used.code,"ax",@progbits
 33         .globl  unused_func2
 34         .type   unused_func2, @function
 35 unused_func2:
 36 .LFB1:
 37         .cfi_startproc
 38         endbr64
 39         pushq   %rbp
 40         .cfi_def_cfa_offset 16
 41         .cfi_offset 6, -16
 42         movq    %rsp, %rbp
 43         .cfi_def_cfa_register 6

从上面的分析，可以知道函数的段分布是完全符合预期的。

4.2.2 确认链接阶段的函数所在的段的回收情况

为了验证这一点，我们可以在LDFLAGS里面添加这一个选项-Wl,--print-gc-sections，这样我们就可以观察到链接阶段最后移除了那些没有引用的段，从而确认unusedfunc1和unusedfunc2是否被回收。输出的关键log如下：

 ./build.sh gc
gcc compile with gc ...
gcc -o main.o -c *.c -save-temps=obj -Wall -fdata-sections -ffunction-sections && gcc -o test_gc main.o -Wl,--gc-sections -Wl,-Map=test_gc.map -T default.lds -Wl,--print-gc-sections
/usr/bin/ld: removing unused section '.rodata.cst4' in file '/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o'
/usr/bin/ld: removing unused section '.data' in file '/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o'
/usr/bin/ld: removing unused section '.text.unused_func1' in file 'main.o'

log很明显就告诉我们，unused section '.text.unused_func1'被回收移除了，而.text.keep.used.code是没有被回收的，这切好证明了我们的猜想。

5 经验总结

使用-gc-sections可以回收不使用的代码段，从而减少代码尺寸，降低固件占用FLASH的存储空间；
在特定场景下，修改链接脚本可以实现某个函数不被链接优化，达到特定的目的。

6 更多分享

本项目的所有测试代码和编译脚本，均可以在我的github仓库01workstation中找到。

欢迎关注我的github仓库01workstation，日常分享一些开发笔记和项目实战，欢迎指正问题。

同时也非常欢迎关注我的CSDN主页和专栏：

【CSDN主页：架构师李肯】

【RT-Thread主页：架构师李肯】

【C/C++语言编程专栏】

【GCC专栏】

【信息安全专栏】

【RT-Thread开发笔记】

【freeRTOS开发笔记】

有问题的话，可以跟我讨论，知无不答，谢谢大家。

　　审核编辑：汤梓红

打开APP阅读更多精彩内容