Skip to main content

Inlined vs Non-inlined Code

In various programming languages, it is possible to inline code. For example, with C++, one can use design patterns or keywords like extern, static, or inline to suggest to the compiler that code should be inlined. Non-inlined code can generate extra, unnecessary function calls, as well as calls to the global offset table (GOT) or procedure linkage table (PLT).

Inlining can reduce runtime overhead and eliminate function call indirection, though it may increase code size due to duplication, since it must be copied across all translation units where it is used.

Consider the following non-inlined code:

int g;

int foo() { 
    return g; 
    }

int bar() {
    g = 1;
    foo(); 
    return g;
}

If we compile the code like so with gcc -O3 -Wall -fPIC -m32, we can observe in the assembly that indeed, this code was not inlined. There are still explicit function calls, and extra calls to the GOT and PLT.

However, it's important to note that at higher optimization levels like -O2 or -O3, the compiler may inline small functions automatically—even without the inline keyword.

foo():
        call    __x86.get_pc_thunk.ax
        add     eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
        mov     eax, DWORD PTR g@GOT[eax]
        mov     eax, DWORD PTR [eax]
        ret
bar():
        push    esi
        push    ebx
        call    __x86.get_pc_thunk.bx
        add     ebx, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
        sub     esp, 4
        mov     esi, DWORD PTR g@GOT[ebx]
        mov     DWORD PTR [esi], 1
        call    foo()@PLT
        mov     eax, DWORD PTR [esi]
        add     esp, 4
        pop     ebx
        pop     esi
        ret
g:
        .zero   4
__x86.get_pc_thunk.ax:
        mov     eax, DWORD PTR [esp]
        ret
__x86.get_pc_thunk.bx:
        mov     ebx, DWORD PTR [esp]
        ret

Now consider the use of the inline keyword.

int g;

inline int foo() { 
    return g; 
    }

int bar() {
    g = 1;
    foo(); 
    return g;
}

In this example, the aforementioned code significantly reduces the amount of instructions generated by GCC—the function foo has essentially been merged into bar.

If this were a small program that made repeated calls to these functions, this optimization could reduce overhead and increase performance.

bar():
        call    __x86.get_pc_thunk.ax
        add     eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
        mov     eax, DWORD PTR g@GOT[eax]
        mov     DWORD PTR [eax], 1
        mov     eax, 1
        ret
g:
        .zero   4
__x86.get_pc_thunk.ax:
        mov     eax, DWORD PTR [esp]
        ret

The inline keyword is a hint to the compiler. The compiler may not always follow such suggestions—it is context dependent. Other functions may be inlined and optimized by default even without such hints.

But if you're experimenting with inline code, caveats may apply[1][2][3].

Comments

Popular posts from this blog

yt-dlp Archiving, Improved

One annoying thing about YouTube is that, by default, some videos are now served in .webm format or use VP9 encoding. However, I prefer storing media in more widely supported codecs and formats, like .mp4, which has broader support and runs on more devices than .webm files. And sometimes I prefer AVC1 MP4 encoding because it just works out of the box on OSX with QuickTime, as QuickTime doesn't natively support VP9/VPO9. AVC1-encoded MP4s are still the most portable video format. AVC1 ... is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. [ 1 ] yt-dlp , the command-line audio/video downloader for YouTube videos, is a great project. But between YouTube supporting various codecs and compatibility issues with various video players, this can make getting what you want out of yt-dlp a bit more challenging: $ yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best...