Linux驱动模块.ko内存精简优化过程

AVCh_LinuxDev 2023-09-25 1699

嵌入式技术

1383人已加入

描述

Linux 驱动模块可以独立的编译成 .ko 文件，虽然大小一般只有几 MB，但对总内存只有几十 MB 的小型 Linux 系统来说，常常也是一个非常值得优化的点。本文以一个实际例子，详细描述 .ko 内存精简优化的具体过程。

1. Strip 文件

因为 .ko 文件是一个标准的 ELF 文件，通常我们首先会想到使用 strip 命令来精简文件大小。strip .ko 有以下几种选项：

strip --strip-all test.ko // strip 掉所有的调试段，ko 文件体积减少很多，ko 不能正常 insmod

strip --strip-debug test.ko // strip 掉 debug 段，ko 文件体积减少不多，ko 可以正常 insmod

strip --strip-unneeded test.ko // strip 掉和动态重定位无关的段，ko 文件体积减少不多，ko 可以正常 insmod

.ko 文件具体的体积变化：

6978208 origin-test.ko* // no strip

1984856 strip-all-test.ko* // strip --strip-all

6884544 strip-debug-test.ko* // strip --strip-debug

6830704 strip-unneeded-test.ko* // strip --strip-unneeded

可以看到在保存 .ko 能正常使用的前提下， strip 命令对 .ko 文件并不能减少多大的体积。而且一通操作下来， .ko 文件中的关键数据 text/data/bss 段的体积没有任何变化：

$ size *.ko

text data bss dec hex filename

1697671 275791 28367 2001829 1e8ba5 origin-test.ko

1697671 275791 28367 2001829 1e8ba5 strip-all-test.ko

1697671 275791 28367 2001829 1e8ba5 strip-debug-test.ko

1697671 275791 28367 2001829 1e8ba5 strip-unneeded-test.ko

Question 1： strip 命令是否还有命令能实现更多的精简？ strip 的本质是什么，具体 strip 掉了哪些东西？

我们通过读取 ELF 文件的 section 信息来比较 strip 前后的差异：

$ readelf -S origin-test.ko

There are 48 section headers, starting at offset 0x6a6ea0:

Section Headers:

[Nr] Name Type Address Offset

Size EntSize Flags Link Info Align

[ 0] NULL 0000000000000000 00000000

0000000000000000 0000000000000000 0 0 0

[ 1] .note.gnu.build-i NOTE 0000000000000000 00000040

0000000000000024 0000000000000000 A 0 0 4

[ 2] .note.Linux NOTE 0000000000000000 00000064

0000000000000018 0000000000000000 A 0 0 4

[ 3] .text PROGBITS 0000000000000000 0000007c

00000000001393d6 0000000000000000 AX 0 0 2

[ 4] .rela.text RELA 0000000000000000 003b9b90

00000000002b7550 0000000000000018 I 45 3 8

[ 5] .text.unlikely PROGBITS 0000000000000000 00139452

0000000000000d74 0000000000000000 AX 0 0 2

[ 6] .rela.text.unlike RELA 0000000000000000 006710e0

0000000000001950 0000000000000018 I 45 5 8

[ 7] .init.text PROGBITS 0000000000000000 0013a1c6

000000000000016e 0000000000000000 AX 0 0 2

[ 8] .rela.init.text RELA 0000000000000000 00672a30

...

$ readelf -S strip-all-test.ko

There are 27 section headers, starting at offset 0x1e4298:

Section Headers:

[Nr] Name Type Address Offset

Size EntSize Flags Link Info Align

[ 0] NULL 0000000000000000 00000000

0000000000000000 0000000000000000 0 0 0

[ 1] .note.gnu.build-i NOTE 0000000000000000 00000040

0000000000000024 0000000000000000 A 0 0 4

[ 2] .note.Linux NOTE 0000000000000000 00000064

0000000000000018 0000000000000000 A 0 0 4

[ 3] .text PROGBITS 0000000000000000 0000007c

00000000001393d6 0000000000000000 AX 0 0 2

[ 4] .text.unlikely PROGBITS 0000000000000000 00139452

0000000000000d74 0000000000000000 AX 0 0 2

[ 5] .init.text PROGBITS 0000000000000000 0013a1c6

000000000000016e 0000000000000000 AX 0 0 2

...

从信息上看 strip 主要删除了 Flags 为 I 的 Sections，而 Flags 带 A 的 Sections 是不能被删除的。关于 SectionsFlags 的定义在 Readelf 命令的最后面有详细描述：

Key to Flags:

W (write), A (alloc), X (execute), M (merge), S (strings), I (info),

L (link order), O (extra OS processing required), G (group), T (TLS),

C (compressed), x (unknown), o (OS specific), E (exclude),

p (processor specific)

另外还发现，对 .ko 文件来说 .rela. 开头的 Sections 是不能被删除的， insmod 时需要这些信息。例如 .rela.text 占用了很大的体积，但是不能直接粗暴的直接 strip 掉。

Question 2：对于 .ko 文件中 Flags 为 I 的 Sections 在模块 insmod 以后是否需要占据内存？

内核代码中对 .ko 文件 insmod 动态加载时的主流程：

SYSCALL_DEFINE3(finit_module) / SYSCALL_DEFINE3(init_module)

|→ load_module()

|→ layout_and_allocate()

| |→ setup_load_info() // info->index.mod = section ".gnu.linkonce.this_module"

| |

| |→ layout_sections() // 解析 ko ELF 文件，统计需要加载到内存中的 section

| | // 累计长度到 mod->core_layout.size 和 mod->init_layout.size

| |

| |→ layout_symtab() // 解析 ko ELF 文件，统计需要加载到内存中的符号表

| | // 累计长度到 mod->core_layout.size

| |

| |→ move_module() // 根据 mod->core_layout.size 和 mod->init_layout.size 的长度

| // 使用 vmalloc 分配空间，并且拷贝对应的 section 到内存

|→ apply_relocations() // 对加载到内存的 section 做重定位处理

|→ do_init_module() // 执行驱动模块的 module_init() 函数，完成后释放 mod->init_layout.size 内存

分析具体的代码细节，发现只有带 ALLOC 属性（即 Flags 带 A）的 section 才会在模块加载时统计并拷贝进内存：

static void layout_sections(struct module *mod, struct load_info *info)

{

/* (1) 只识别带 SHF_ALLOC 的 section */

static unsigned long const masks[][2] = {

/* NOTE: all executable code must be the first section

* in this array; otherwise modify the text_size

* finder in the two loops below */

{ SHF_EXECINSTR | SHF_ALLOC, ARCH_SHF_SMALL },

{ SHF_ALLOC, SHF_WRITE | ARCH_SHF_SMALL },

{ SHF_RO_AFTER_INIT | SHF_ALLOC, ARCH_SHF_SMALL },

{ SHF_WRITE | SHF_ALLOC, ARCH_SHF_SMALL },

{ ARCH_SHF_SMALL | SHF_ALLOC, 0 }

};

unsigned int m, i;

for (i = 0; i < info->hdr->e_shnum; i++)

info->sechdrs[i].sh_entsize = ~0UL;

/* (2) 遍历 ko 文件的 section，根据上述标志来统计

把 ALLOC 类型的 section 统计进 mod->core_layout.size

pr_debug("Core section allocation order: ");

for (m = 0; m < ARRAY_SIZE(masks); ++m) {

for (i = 0; i < info->hdr->e_shnum; ++i) {

Elf_Shdr *s = &info->sechdrs[i];

const char *sname = info->secstrings + s->sh_name;

if ((s->sh_flags & masks[m][0]) != masks[m][0]

|| (s->sh_flags & masks[m][1])

|| s->sh_entsize != ~0UL

|| module_init_section(sname))

continue;

s->sh_entsize = get_offset(mod, &mod->core_layout.size, s, i);

pr_debug(" %s ", sname);

}

/* (3) 遍历 ko 文件的 section，根据上述标志来统计

把 ALLOC 类型的并且名字以 '.init' 开头的 section 统计进 mod->init_layout.size

pr_debug("Init section allocation order: ");

for (m = 0; m < ARRAY_SIZE(masks); ++m) {

for (i = 0; i < info->hdr->e_shnum; ++i) {

Elf_Shdr *s = &info->sechdrs[i];

const char *sname = info->secstrings + s->sh_name;

if ((s->sh_flags & masks[m][0]) != masks[m][0]

|| (s->sh_flags & masks[m][1])

|| s->sh_entsize != ~0UL

|| !module_init_section(sname))

continue;

s->sh_entsize = (get_offset(mod, &mod->init_layout.size, s, i)

| INIT_OFFSET_MASK);

pr_debug(" %s ", sname);

}

Flags 带 I 的 section 只会在 apply_relocations() 重定位时提供信息，这部分 section 不会在内存中常驻。

结论：strip 操作 .ko 文件只会精简掉少量 I 的 section， .ko 文件少量减小，但是对动态加载后的内存占用毫无影响。

2. 运行时内存占用

但是生活还得继续，优化还得想办法。我们仔细分析关键数据 text/data/bss 段在模块加载过程中的内存占用。

加载前：

$ size test.ko

text data bss dec hex filename

1697671 275791 28367 2001829 1e8ba5 test.ko

模块 insmod 后的内存占用，因为是通过 vmalloc() 分配的，我们可以通过 vmallocinfo 查看内存占用情况：

# cat /sys/module/test/coresize

4203425

# cat /sys/module/test/initsize

# cat /proc/vmallocinfo

// core_layout.size 占用 4.2 M 内存

0x00000000fd4ec521-0x000000007ff17966 4210688 load_module+0x1b86/0x1c8e pages=1027 vmalloc vpages

0x000000007ff17966-0x000000004e29ad2e 16384 load_module+0x1b86/0x1c8e pages=3 vmalloc

可以看到，加载前 test.ko 的 text/data/bss 段的总长为 2 M 左右，但是模块加载后总共占用了 4.2 M 内存。

Question 3：为什么模块加载后会有多出的内存占用？

我们在内核代码中加上调试信息，跟踪 mod->core_layout.size 的变化情况，终于找到了关键所在：

SYSCALL_DEFINE3(finit_module) / SYSCALL_DEFINE3(init_module)

|→ load_module()

|→ layout_and_allocate()

| |→ setup_load_info() // mod->core_layout.size = 0x0.

| |

| |→ layout_sections() // mod->core_layout.size = 0x1f8390

| |

| |→ layout_symtab() // mod->core_layout.size = 0x4023a1.

| |

| |→ move_module() // 根据 mod->core_layout.size 和 mod->init_layout.size 的长度

可以看到是在 layout_symtab() 函数中增大了多余的长度， layout_symtab() 函数在 CONFIG_KALLSYMS 使能的情况下才有效，存储的驱动模块的符号表。

一般情况下我们并不需要模块符号表，可以关闭内核的 CONFIG_KALLSYMS 选项来查看内存的占用情况：

# cat /sys/module/test/coresize

2092876

# cat /sys/module/test/initsize

# cat /proc/vmallocinfo

// core_layout.size 占用 2.0 M 内存

0x000000009e1c62e8-0x000000001024ef17 2097152 0xffffffff8006f3de pages=511 vmalloc

0x000000004070c817-0x00000000cc1b6736 28672 0xffffffff41534922 pages=6 vmalloc

多余的 2.2 M 内存被完美的精简下来。

但是这种方法也只能减少 .ko 的静态内存占用，驱动动态分配的内存只能分析代码逻辑去优化。

结论：关闭 CONFIG_KALLSYMS 选项可以精简 .ko 模块符号表的内存占用，精简收益还是不错的。

编辑：黄飞

打开APP阅读更多精彩内容