Turning a .o into a .ko: A Strange Trip Through the ELF Format

How It Started

I had a piece of user-space code that needed to run inside the kernel. The normal approach would be to recompile it through the kernel build system, but that means maintaining a Kbuild setup, and rewriting all the user-space idioms in the code is genuinely painful.

That got me thinking: could I just "convert" the already-compiled binary directly into a .ko?

Intuitively this seemed feasible. A .ko is, after all, just an ELF relocatable file (ET_REL), and an ordinary .o produced by the compiler is also ET_REL. The skeleton of the format is identical. What differs is metadata.

Once I actually started, I discovered there were far more pitfalls than I had imagined.

Background: Four ELF Types, Two Different Worlds

Let me clarify a few basic concepts first. ELF files come in four types:

Type	Description	Who handles relocation
ET_REL (.o, .ko)	Relocatable, unlinked	Linker / kernel module loader
ET_DYN (.so)	Dynamic shared library, position-independent	ld.so (user space)
ET_EXEC	Executable	Kernel ELF loader
ET_CORE	Core dump	—

The key insight: a .so is ET_DYN, a .ko is ET_REL. They are different ELF types from the start.

Why You Can't Just Convert a .so

A .so is ET_DYN (a dynamic shared library), and it differs from a .ko in fundamental ways.

1. The kernel rejects it outright

The first thing the kernel does when loading a module is check the ELF type. It must be ET_REL, otherwise the kernel returns -ENOEXEC immediately. No matter what else you do, an ET_DYN .so cannot even get past the front door. This check lives in elf_validity_check() in kernel/module.c:

// kernel/module.c: elf_validity_check()
if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0
    || info->hdr->e_type != ET_REL         // only ET_REL is accepted
    || !elf_check_arch(info->hdr)
    || info->hdr->e_shentsize != sizeof(Elf_Shdr))
    return -ENOEXEC;

2. Too much dynamic linking machinery

A .so is stuffed with dynamic linking infrastructure: .dynamic, .dynsym, .dynstr, .hash, .gnu.hash, .got, .got.plt, .plt.got, .rel.dyn, .rel.plt, .interp, and so on. These sections carry information that is meaningless to the kernel module loader and must all be removed.

3. Symbol names carry version suffixes

Symbols in a .so look like this:

puts@GLIBC_2.2.5
malloc@GLIBC_2.0

The kernel's exported symbols carry no @ suffix. If you look up a kernel symbol using a versioned name, the kernel's simplify_symbols() calls resolve_symbol_wait(), which does a strict strcmp comparison. It will never match.

4. PLT/GOT relocations are a mess

Function calls in a .so default to going through the PLT (Procedure Linkage Table), which generates a large number of PLT-related relocation entries. The kernel loader's apply_relocations() walks every SHT_RELA section and processes relocations one by one, including these PLT entries. But the kernel's relocation logic is completely different from user-space ld.so, so the results are wrong.

Conclusion: compile with gcc -c -fPIC to produce a .o (which is ET_REL), then convert directly from ET_REL to ET_REL.

The Kernel's Full Module Loading Path

Before diving into specific pitfalls, let me walk through the entire module loading path in the kernel source code (using Linux 5.10 as a reference). The root cause of every pitfall below can be traced to some step in this chain.

Step 1: ELF validity check — elf_validity_check()

// kernel/module.c
static int elf_validity_check(struct load_info *info)
{
    if (info->len < sizeof(*(info->hdr)))
        return -ENOEXEC;

    if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0
        || info->hdr->e_type != ET_REL          // (1) must be ET_REL
        || !elf_check_arch(info->hdr)            // (2) architecture must match
        || info->hdr->e_shentsize != sizeof(Elf_Shdr))
        return -ENOEXEC;

    // (3) section header table must be within file bounds
    if (info->hdr->e_shoff >= info->len
        || (info->hdr->e_shnum * sizeof(Elf_Shdr) >
            info->len - info->hdr->e_shoff))
        return -ENOEXEC;

    info->sechdrs = (void *)info->hdr + info->hdr->e_shoff;
    // ... also validates the section name string table index
}

Three hard checks: ELF magic, ET_REL type, and architecture match. Failing any of them returns -ENOEXEC immediately. This is why a .so cannot be loaded, and it also means we cannot simply patch the ELF header to change ET_DYN into ET_REL and call it a day. On ARM64, elf_check_arch() additionally validates the integrity of the section structure.

Step 2: Kernel metadata check — check_modinfo()

// kernel/module.c
static int check_modinfo(struct module *mod, struct load_info *info, int flags)
{
    const char *modmagic = get_modinfo(info, "vermagic");  // extracted from .modinfo
    int err;

    if (flags & MODULE_INIT_IGNORE_VERMAGIC)
        modmagic = NULL;

    if (!modmagic) {
        err = try_to_force_load(mod, "bad vermagic");  // missing vermagic taints the kernel
        if (err)
            return err;
    } else if (!same_magic(modmagic, vermagic, info->index.vers)) {
        pr_err("%s: version magic '%s' should be '%s'\n",
               info->name, modmagic, vermagic);         // vermagic mismatch -> reject
        return -ENOEXEC;
    }

    if (!get_modinfo(info, "intree")) {                  // check whether in-tree
        if (!test_taint(TAINT_OOT_MODULE))
            pr_warn("%s: loading out-of-tree module taints kernel.\n", mod->name);
        add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);
    }

    check_modinfo_retpoline(mod, info);                  // retpoline check
    // ... staging, livepatch checks, and so on
}

The vermagic comparison logic lives in same_magic():

// kernel/module.c
static inline int same_magic(const char *amagic, const char *bmagic,
                             bool has_crcs)
{
    if (has_crcs) {
        amagic += strcspn(amagic, " ");
        bmagic += strcspn(bmagic, " ");
    }
    return strcmp(amagic, bmagic) == 0;    // strict string comparison
}

If the kernel was built with CONFIG_MODVERSIONS, the comparison skips the prefix up to the first space (which is UTS_RELEASE) before comparing. Otherwise it is a full strcmp.

Step 3: Architecture-specific processing — module_frob_arch_sections()

Before allocating module memory, the kernel calls an architecture hook to inspect and preprocess the section structure. The ARM64 implementation (arch/arm64/kernel/module-plts.c) is particularly worth a look:

// arch/arm64/kernel/module-plts.c
int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
                              char *secstrings, struct module *mod)
{
    unsigned long core_plts = 0;
    unsigned long init_plts = 0;
    Elf_Shdr *tramp = NULL;
    int i;

    for (i = 0; i < ehdr->e_shnum; i++) {
        if (!strcmp(secstrings + sechdrs[i].sh_name, ".plt"))
            mod->arch.core.plt_shndx = i;
        else if (!strcmp(secstrings + sechdrs[i].sh_name, ".init.plt"))
            mod->arch.init.plt_shndx = i;
        else if (!strcmp(secstrings + sechdrs[i].sh_name,
                         ".text.ftrace_trampoline"))
            tramp = sechdrs + i;
    }

    if (!mod->arch.core.plt_shndx || !mod->arch.init.plt_shndx) {
        pr_err("%s: module PLT section(s) missing\n", mod->name);
        return -ENOEXEC;         // .plt and .init.plt are both required
    }
    // ... PLT entry pre-allocation logic
}

The ARM64 module linker script confirms this (arch/arm64/include/asm/module.lds.h):

// arch/arm64/include/asm/module.lds.h
SECTIONS {
    .plt 0 : { BYTE(0) }
    .init.plt 0 : { BYTE(0) }
    .text.ftrace_trampoline 0 : { BYTE(0) }
}

A .ko built by the kernel toolchain naturally carries these three sections (size can be zero, content can be a single placeholder byte). If you build a .ko yourself and these sections are missing, ARM64's module_frob_arch_sections returns -ENOEXEC directly. x86_64 has no such hard requirement.

Step 4: Symbol resolution — simplify_symbols()

// kernel/module.c
static int simplify_symbols(struct module *mod, const struct load_info *info)
{
    Elf_Shdr *symsec = &info->sechdrs[info->index.sym];
    Elf_Sym *sym = (void *)symsec->sh_addr;
    const struct kernel_symbol *ksym;

    for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
        const char *name = info->strtab + sym[i].st_name;

        switch (sym[i].st_shndx) {

        case SHN_UNDEF:                              // key case: undefined symbol
            ksym = resolve_symbol_wait(mod, info, name);
            if (ksym && !IS_ERR(ksym)) {
                sym[i].st_value = kernel_symbol_value(ksym);
                break;                                // found in the kernel symbol table
            }
            if (!ksym && ELF_ST_BIND(sym[i].st_info) == STB_WEAK)
                break;                                // weak symbol, may be missing
            ret = PTR_ERR(ksym) ?: -ENOENT;
            pr_warn("%s: Unknown symbol %s (err %d)\n",
                    mod->name, name, ret);
            break;                                    // not found -> failure

        case SHN_ABS:
            break;                                    // absolute symbol, no relocation needed

        default:
            secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
            sym[i].st_value += secbase;               // section-local: add section base
            break;
        }
    }
    return ret;
}

The core logic:

shndx == SHN_UNDEF (value 0) → search the kernel symbol table; on success, write the kernel address into st_value.
shndx == SHN_ABS → already absolute, untouched.
Otherwise → symbol defined within a section; st_value is added to the section's load base address.

So the shndx of any UND symbol must be 0. If you rewrite it to SHN_ABS (0xFFF1) during conversion, the resolve_symbol_wait branch is never taken, the kernel never consults the symbol table, and all external references end up dangling.

Step 5: Relocation processing — apply_relocations()

// kernel/module.c
static int apply_relocations(struct module *mod, const struct load_info *info)
{
    int err = 0;

    for (i = 1; i < info->hdr->e_shnum; i++) {
        unsigned int infosec = info->sechdrs[i].sh_info;  // target section index

        if (infosec >= info->hdr->e_shnum)
            continue;

        if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC))
            continue;                                     // skip non-allocated sections

        if (info->sechdrs[i].sh_type == SHT_REL)
            err = apply_relocate(info->sechdrs, info->strtab,
                                 info->index.sym, i, mod);
        else if (info->sechdrs[i].sh_type == SHT_RELA)
            err = apply_relocate_add(info->sechdrs, info->strtab,
                                     info->index.sym, i, mod);  // handle RELA
        if (err < 0)
            break;
    }
    return err;
}

This loop walks every section header, finds those of type SHT_RELA (relocation sections), and calls the architecture-specific apply_relocate_add() to process each entry.

.rela.gnu.linkonce.this_module is processed right here. When the kernel iterates to this section, it writes the final addresses of init_module and cleanup_module to the corresponding offsets within .gnu.linkonce.this_module. Those offsets are exactly where the init/exit function pointers live inside struct module.

Step 6: Kicking off initialization — do_init_module()

// kernel/module.c
static noinline int do_init_module(struct module *mod)
{
    int ret = 0;
    struct mod_initfree *freeinit;

    freeinit = kmalloc(sizeof(*freeinit), GFP_KERNEL);
    freeinit->module_init = mod->init_layout.base;

    do_mod_ctors(mod);

    if (mod->init != NULL)                     // the critical check
        ret = do_one_initcall(mod->init);       // invoked via function pointer

    if (ret < 0)
        goto fail_free_freeinit;

    mod->state = MODULE_STATE_LIVE;
    // ... uevent, async sync, free init memory
}

The kernel invokes the mod->init function pointer. Its value was written by the relocation handling in Step 5. If the .rela.gnu.linkonce.this_module section is missing, or if the offsets are wrong, mod->init stays NULL, the kernel silently skips the entire init flow, and no error is reported.

The same applies on unload (the free_module() path in kernel/module.c):

// kernel/module.c: module unload path
if (mod->exit != NULL)
    mod->exit();

mod->exit is also populated via relocation. Two function pointers, two relocations, neither optional.

Step 7: CRC verification — check_version()

If the kernel is built with CONFIG_MODVERSIONS, every external symbol reference must have its CRC verified:

// kernel/module.c
static int check_version(const struct load_info *info,
                         const char *symname,
                         struct module *mod,
                         const s32 *crc)
{
    Elf_Shdr *sechdrs = info->sechdrs;
    unsigned int versindex = info->index.vers;
    struct modversion_info *versions;

    if (!crc)
        return 1;                          // kernel did not provide a CRC, allow
    if (versindex == 0)
        return try_to_force_load(mod, symname) == 0;  // module lacks __versions

    versions = (void *)sechdrs[versindex].sh_addr;
    num_versions = sechdrs[versindex].sh_size
        / sizeof(struct modversion_info);

    for (i = 0; i < num_versions; i++) {
        if (strcmp(versions[i].name, symname) != 0)
            continue;
        if (versions[i].crc == crcval)
            return 1;                      // CRC matches
        goto bad_version;                  // name matches but CRC does not -> fail
    }

    pr_warn_once("%s: no symbol version for %s\n", info->name, symname);
    return 1;                              // no entry found, warn and continue
}

The module's __versions section stores an array of struct modversion_info (64 bytes each: CRC plus symbol name). The kernel compares CRCs one by one; a mismatch causes the load to fail. The CRC of the module_layout symbol effectively represents the structural signature of the entire struct module.

That covers every kernel checkpoint between insmod and the actual execution of init. Now let me walk through the specific pitfalls encountered during conversion.

The Core Idea: a Two-Stage Pipeline

Once the kernel loading path was clear, I designed a two-stage pipeline:

Development host                      Target device
----------------                      -------------
.o file                               reference.ko (any existing module)
  |                                       |
  v                                       v
[offline conversion] -> .ko (with placeholders) -> [in-place patching] -> .ko (loadable)

Stage one (offline conversion) runs on the development host and handles the structural ELF transformation: drop sections that are not needed, keep sections that are, add the required kernel metadata sections, and re-index symbols and relocations. Anything that depends on the target kernel is filled in with a placeholder.

Stage two (in-place patching) runs on the target device. Pick an existing .ko on the device that loads cleanly, treat it as a "reference," and extract every kernel-specific parameter from it to overwrite the placeholders.

The point of this design is that the conversion tool does not need to know anything about the target kernel. vermagic, the size of struct module, field offsets, CRCs — all of it comes from the reference .ko.

A Full Catalogue of Pitfalls

Sorted roughly by how hard they were to find.

Pitfall 1: Section Allow-list and Block-list

The first step of conversion is deciding which sections to keep and which to drop.

Sections that must be dropped:

Everything related to dynamic linking (.dynamic, .dynsym, .dynstr, .hash, .gnu.hash, and roughly ten others).
GOT/PLT-related sections (.got, .got.plt, .plt.got, .plt.sec).
Original relocation sections (.rel.*, .rela.*) — section indices have changed, so the old relocation entries reference stale indices and must be discarded and rebuilt against the new section table.

Sections that must be kept:

.text, .data, .rodata, .bss (basic code and data).
.init_array, .eh_frame, and other auxiliary sections.
.comment, .note.*, and other informational sections.

Empty sections that must be created on ARM64:

The module_frob_arch_sections() source above shows that ARM64 looks up .plt and .init.plt by name and returns -ENOEXEC if they are missing. The ARM64 linker script .lds.h also explicitly defines these three empty sections. So the conversion stage must produce:

.plt — 12 bytes, SHT_NOBITS, SHF_EXECINSTR | SHF_ALLOC
.init.plt — 12 bytes, SHT_NOBITS, SHF_EXECINSTR | SHF_ALLOC
.text.ftrace_trampoline — 12 bytes, SHT_NOBITS, SHF_EXECINSTR | SHF_ALLOC

If any of these are missing, the kernel rejects the module with only "module PLT section(s) missing" — no indication of which one.

Original relocation sections must be deleted. Once sections have been removed and section indices reshuffled, the section references inside the old relocation entries are invalid. If old and new relocation sections coexist (for example two .rela.text sections), the loader processes the second batch and finds non-zero values already present at the target locations, producing "Invalid relocation target, existing value is nonzero".

Pitfall 2: ET_REL Addresses Are Section-Relative; Your ELF Library May Hand You Absolute Ones

Every address in an ET_REL file is section-relative:

A symbol's st_value is the offset of that symbol within its containing section.
A relocation's r_offset is the offset within the target section.

For example, a symbol's value should look like 0x10 ("the function entry is at offset 0x10 in .text"), not a virtual address like 0x7f0000001000.

The corresponding logic in the kernel is the default branch of simplify_symbols():

default:
    secbase = info->sechdrs[sym[i].st_shndx].sh_addr;  // load base of section
    sym[i].st_value += secbase;                          // st_value + base = absolute address

The kernel assumes st_value is a section offset and adds the section's load base to obtain the final address. If your st_value is already an absolute VA, adding the base sends it off into the void.

But certain ELF parsing library APIs return absolute virtual addresses when processing ET_REL, because internally they go through the ET_DYN/ET_EXEC code path.

The fix is to explicitly subtract the section's virtual base from every symbol value and every relocation offset. Do not rely on the assumption that "all section VAs in an ET_REL are zero."

Pitfall 3: Architecture Prefixes Hidden in Relocation Types

The ELF parsing library used here (LIEF) encodes architecture information into the high bits of relocation type values:

Relocation type	Standard value	LIEF returns
R_X86_64_64	1	0x80000001
R_X86_64_PC32	2	0x80000002
R_AARCH64_ABS64	1	0x101

If you write LIEF's encoded type back into the ELF's r_info field, the receiver decodes it per spec and gets a completely different number. Masking with 0x7FFFFFF (the low 27 bits, matching the low part of r_info per the ELF specification) strips the architecture prefix.

Pitfall 4: Do Not Touch UND Symbols

External symbols referenced by a kernel module — _printk, kmalloc, kfree — have shndx = 0 (SHN_UNDEF) in the original file.

Recall the simplify_symbols() source:

case SHN_UNDEF:
    ksym = resolve_symbol_wait(mod, info, name);  // only taken when shndx == 0
    if (ksym && !IS_ERR(ksym)) {
        sym[i].st_value = kernel_symbol_value(ksym);
        break;
    }

The kernel uses shndx == SHN_UNDEF to decide whether to search the global symbol table. SHN_UNDEF is defined as 0.

Removing and adding sections during conversion requires remapping section indices. It is very easy to write something like this:

if (orig_shndx > 0 && orig_shndx < SHN_ABS)
    map to new section index
else
    shndx = SHN_ABS  // shndx == 0 falls into this branch!

shndx = 0 gets silently rewritten to SHN_ABS (0xFFF1). The kernel sees SHN_ABS, takes the break branch, and leaves st_value untouched — which, for a UND symbol, means zero. Every external call dangles, and the kernel never even attempts a symbol lookup.

Lesson: when shndx is 0, leave it as 0. A single if (orig_shndx == 0) guard at the top is enough.

Pitfall 5: Symbols With Empty Names Are Not Necessarily Garbage

This is the most counterintuitive trap.

When writing the symbol filter, it is natural to skip symbols whose name is empty, value is zero, and size is zero. But there is a symbol type called STT_SECTION that represents "the section itself." Its name is genuinely empty, its value may be zero, but it is an essential relocation target.

When does a relocation reference an STT_SECTION symbol?

String constants in .rodata → the relocation needs the .rodata base address.
Exception handling tables in .text (eh_frame) → the relocation needs the .text base.
Anywhere a relocation needs a section base as its computation anchor.

These relocations bind to the STT_SECTION symbol via the shndx field; the STT_SECTION symbol's shndx identifies the target section; and simplify_symbols()'s default branch finally adds the section base.

If STT_SECTION symbols are pruned as "invalid entries," any relocation that referenced them no longer finds a target — it either points at symbol zero (the null symbol) or runs out of bounds.

Lesson: a symbol's fate cannot be decided by name and value alone. STT_SECTION symbols must be preserved, and a mapping from original section index to new symbol index must be built.

Pitfall 6: Symbol Version Suffixes

When extracting symbols from a .so (even if you never plan to feed a .so into the pipeline, this is worth knowing), symbol names may carry version suffixes:

puts@GLIBC_2.2.5
__cxa_atexit@GLIBC_2.2.5

This is GNU's symbol versioning mechanism. The kernel's resolve_symbol_wait() does a straightforward strcmp; a @GLIBC_2.2.5 suffix obviously will not match. When building the output symbol table, locate the @ character and truncate.

Pitfall 7: vermagic Requires an Exact Match

The source of check_modinfo() and same_magic() shows that vermagic is compared as a strict string (or compared after skipping the prefix up to the first space). vermagic is assembled by the macros in include/linux/vermagic.h:

// include/linux/vermagic.h
#define VERMAGIC_STRING                         \
    UTS_RELEASE " "                             \
    MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT \
    MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS \
    MODULE_ARCH_VERMAGIC                        \
    MODULE_RANDSTRUCT

Whether each macro expands to anything depends on the matching CONFIG option:

CONFIG_SMP → "SMP "
CONFIG_PREEMPT_BUILD → "preempt "
CONFIG_MODULE_UNLOAD → "mod_unload "
CONFIG_MODVERSIONS → "modversions "
MODULE_ARCH_VERMAGIC → "aarch64" on ARM64, empty on x86_64

A typical vermagic looks like:

6.19.11+kali-amd64 SMP preempt mod_unload

Note that there may be a trailing space after mod_unload (depending on whether the macro "mod_unload " carried one), and that space participates in the comparison.

The fix: do not guess vermagic at conversion time. Extract the complete vermagic value from the .modinfo section of a reference .ko and overwrite the target with it verbatim.

Pitfall 8: The Size and Layout of struct module Are Unpredictable

struct module is the data structure the kernel maintains in memory for every loaded module (defined in include/linux/module.h, a hundred-line behemoth). Its size and field layout depend entirely on the kernel's build configuration:

CONFIG_MODULE_UNLOAD → controls whether exit-related fields exist.
CONFIG_SYSFS → inserts sysfs attribute fields.
CONFIG_KALLSYMS → adds symbol-table-related fields.
CONFIG_TRACEPOINTS → inserts tracepoint fields.
... and dozens more.

For the same kernel version built from different defconfigs, sizeof(struct module) can differ by hundreds to thousands of bytes.

If a .ko's .gnu.linkonce.this_module section size does not match the target kernel, layout_and_allocate() will lay out module memory based on the kernel's own sizeof, and any mismatch causes section overlap or out-of-bounds access.

The fix: copy the entire .gnu.linkonce.this_module section data from a reference .ko. The reference was built with this kernel's own Kbuild, so its struct module is, by definition, correct.

The module name lives inside this structure as well. But the offset of the name field is also a function of kernel version:

Kernel version / arch	Name offset
x86_64 Linux 6.19	24
ARM64 Linux 5.10 (Android 12)	24

Do not hard-code the offset. Search the reference struct data for the reference module's own name string, locate the name field, and write the new module name in that position.

Pitfall 9: The .modinfo Field Specification

.modinfo is a string table of key=value\0 entries embedded inside an ELF section. The kernel looks up entries via get_modinfo(info, "field_name") — for example the earlier get_modinfo(info, "vermagic") call.

Fields that must be present (combining the check_modinfo() source with kernel conventions):

Field	Purpose	Consumer
`vermagic`	Kernel version plus build option signature	`check_modinfo()` does a strict comparison
`name`	Module name	`check_modinfo()` display, modprobe
`license`	License (e.g. GPL)	Gates access to GPL-only exported symbols
`intree`	Marks the module as in-tree	Checked by `check_modinfo()`; OOT modules taint the kernel
`retpoline`	Indicates Retpoline mitigation enabled	Checked by `check_modinfo_retpoline()`
`init`	Init function name	modprobe and user-space tooling
`cleanup`	Cleanup function name	modprobe and user-space tooling

Worth emphasizing: init=init_module and cleanup=cleanup_module are only conventions for tools like modprobe. The kernel itself does not parse these fields to find entry points. The kernel's only path to init/exit is via the function pointers inside struct module (see the next pitfall).

Pitfall 10: Why init_module Was Never Called — the Core Pitfall

This is the bug that took the longest to track down.

Symptom: the module loaded successfully (insmod returned 0), unloading worked, no error logs. But the code inside init simply did not execute. Changing the return value of init to -1 still let the load succeed — proof that init was never actually called.

Look again at the do_init_module() source:

// kernel/module.c
if (mod->init != NULL)
    ret = do_one_initcall(mod->init);
if (ret < 0) {
    goto fail_free_freeinit;
}

So where does the value of mod->init come from? Not from a symbol table lookup. Not from the init= field in .modinfo. It is written by apply_relocations() when it processes the .rela.gnu.linkonce.this_module section.

While processing relocations, the kernel walks every SHT_RELA section. When it encounters .rela.gnu.linkonce.this_module, it performs operations equivalent to:

r_offset 0x138: write the absolute address of symbol init_module    into .gnu.linkonce.this_module + 0x138
r_offset 0x4c0: write the absolute address of symbol cleanup_module into .gnu.linkonce.this_module + 0x4c0

Those two offsets (0x138 and 0x4c0) are exactly the offsets of the init/exit function pointers inside struct module.

Sample contents of a real .rela.gnu.linkonce.this_module (x86_64 Linux 6.19):

Offset        Type             Symbol
0x0138        R_X86_64_64      init_module
0x04c0        R_X86_64_64      cleanup_module

The same on ARM64 Android 5.10:

Offset        Type              Symbol
0x0190        R_AARCH64_ABS64   init_module
0x03c0        R_AARCH64_ABS64   cleanup_module

Offsets differ across architectures and kernel versions, but the principle is identical: the kernel installs the function pointers via relocation, not via name lookup.

So the conversion stage must produce a .rela.gnu.linkonce.this_module section with two relocations that point to init_module and cleanup_module. Initialize the offsets with known values (0x138/0x4c0 on x86_64, 0x190/0x3c0 on ARM64), then in the in-place patching stage, extract the actual offsets from the reference .ko's same-named relocation section and correct any mismatches.

If the section is missing, or its offsets are wrong, mod->init is NULL and the kernel silently skips init without reporting an error.

Pitfall 11: Determining and Re-indexing the Target Section of a Relocation

A relocation entry's r_offset marks "write the patched value at this offset within the target section." But each relocation entry must be grouped under the correct target section.

A reliable approach uses a two-level lookup:

First, use the library's "section containing this relocation" API if it is available.
If the library cannot tell you, match the r_offset (an absolute VA in that case) against the VA range of every retained section and find the one that contains it.

If neither step finds a target — for instance the relocation points at a section that has already been deleted — skip it and do not emit anything.

The r_info field of each relocation must be recomputed: the high 32 bits hold the symbol's index in the new symbol table (not the original), and the low 32 bits hold the relocation type with the architecture prefix stripped.

Pitfall 12: Relocation Section Naming and ELF Structure

The naming rule for output relocation sections is .rela plus the target section name. For target .gnu.linkonce.this_module, the relocation section is .rela.gnu.linkonce.this_module. For .text, it is .rela.text.

Each SHT_RELA section header must have:

sh_link → points to .symtab (symbol table section index)
sh_info → points to the target section (the section being relocated)
sh_type → SHT_RELA (or SHT_REL, depending on architecture)

apply_relocations() relies on sh_info to find the target section and on sh_link to find the symbol table. Get either wrong and the kernel will either fail to find the target (skipping it) or read the wrong symbol table (producing incorrect relocations).

Pitfall 13: Summary of ARM64 Specifics

ARM64 kernel modules have a hard section dependency that x86_64 does not. The ARM64 module_frob_arch_sections() source makes it explicit: both .plt and .init.plt must be present; if either is missing the kernel returns -ENOEXEC immediately.

In addition, the ARM64 module.lds.h linker script requires that these three special sections exist in the default link layout. If a module is not built through the kernel's Kbuild (as is the case here), the three sections must be synthesized by hand.

Other ARM64 differences:

The init/exit offsets inside struct module are different (Android 12 / 5.10 uses 0x190/0x3c0 instead of 0x138/0x4c0).
vermagic carries an aarch64 suffix.
LIEF's relocation type encoding adds a 0x100 prefix.
The exported name of printk in Android kernels may be _printk rather than printk.

Key Takeaways

A .o can be turned into a .ko; a .so cannot. The first line of elf_validity_check() rejects anything that is not ET_REL, and a .so is ET_DYN. Compiling with gcc -c -fPIC to produce a .o and converting it into a .ko is the cleanest path.
Do not guess any parameter of the target kernel. vermagic is assembled from over a dozen CONFIG macros; struct module's layout is shaped by dozens. Extracting from an existing .ko on the target device is far more accurate than guessing.
The only way to get the kernel to call your init is through the relocations in .rela.gnu.linkonce.this_module. Do not be misled by the init= line in .modinfo — that is for modprobe. The kernel fills mod->init in apply_relocations() and calls it in do_init_module(). The relocation is the only data path.
Symbols with empty names are not always garbage. STT_SECTION symbols have empty names but are referenced by relocations. Drop them and every section-base reference is wrong.
The shndx of a UND symbol must stay 0. The kernel dispatches on case SHN_UNDEF inside simplify_symbols(). Rewriting it to SHN_ABS skips the symbol table lookup entirely.
Every address in an ET_REL is section-relative. Absolute VAs returned by the library cannot be used directly; subtract the section base. The kernel's default branch in simplify_symbols() does st_value + secbase, which assumes a section offset, not an absolute value.
Relocation type encodings have a sharp edge. LIEF prefixes standard type values with architecture bits (x86_64 adds 0x80000000, ARM64 adds 0x100); mask with & 0x7FFFFFF before writing back.
Do not hard-code the offset of the name field inside struct module. Search the reference module's name string within the reference struct data to locate it.
ARM64 requires three additional sections. .plt, .init.plt, and .text.ftrace_trampoline. Missing any one causes module_frob_arch_sections() to fail. These are not optional.
The patching tool should aim for zero dependencies. The target device may have no libstdc++, no Python, no cmake. Pure C plus elf.h, buildable with any C compiler — only then can it truly "run on any device."