A small reimplementation of the memory allocation functions available in C
(e.g. malloc, realloc, calloc, and free).
See the commits to see how each feature was integrated with the previous ones.
- Currently uses C23, though C11 is sufficient as both versions contain
max_align_t, needed for memory alignment. Initially used C99, and C23 was chosen arbitrarily. If you'd like to stick to C99, see this post. - GDB as the main debugging tool.
- AddressSanitizer for detecting memory errors.
DISCLAIMER: I'm no expert, there may be incorrect information in these sources, but they've helped me build this allocator. Make sure to do your own research or fact-check these with an expert if you can!
These are all the references I found in the process of creating this custom
allocator so far (subject to change), organized into categories and referenced
like so: 1.2 means "1. General Tutorials: 2nd source." 3.x refers to "3.
Endianness" as a whole. These aren't ordered in terms of importance; instead in
order of when I needed them to understand how to do the next step in expanding
this allocator's features. I hope that these can help you as well!
These tutorials contain the general steps and overview in how to make your own
malloc. They won't explain everything, but they'll give you a sense of
what's important in an allocator. As for 1.1, I at first couldn't replicate
the error found on first step of
debugging, but I managed to
replicate it on my machine with a few workaround. I explain how
here.
These explain what LD_PRELOAD is as introduced in source 1.1, used to
override system implementations of the memory allocation functions.
I had a few misconceptions as to what endianness was, which were:
- Endianness affects the ordering of bits in memory. Endianness actually affects the ordering of bytes in multi-byte values.
- It affects whether memory is laid out from lower-to-higher or higher-to-lower addresses in memory. This is an unrelated architectural decision.
- Hexadecimal values are somehow "special" with respect to endianness. In reality, hexadecimal is only a representation of a number; endianness only affects how that number is stored in memory.
- That big and little endian had no practical difference. They do.
These sources clarified those misunderstandings for me, and it matters for how
I implemented guard canaries (more on those in 5.x).
- Wikipedia: Endianness
- @boubkerelmaayouf (Medium): Big Endian vs Little Endian — The Story, The Why, and Easy Explanation
- YC Hacker News: Big-endian vs. Little-endian
I didn't understand what memory alignment really meant before I read into
these. Basically, "CPUs always read memory at its word size" (4.4), and I
recommend reading 4.5 to understand what that means. 4.1 is more about
memory alignment in structs, but it helps to connect this concept to structs
because memory alignment doesn't only apply to memory allocation.
- Catb: The Lost Art of Structure Packing
- StackOverflow: Memory Alignment in C/C++
- StackOverflow: Purpose of memory alignment
- SWE StackExchange: How important is memory alignment? Does it still matter?
- Wikipedia: Word (computer architecture)
- Wikipedia: Data Structure Alignment
- Reddit: Calling new or malloc, what is the real size reserved by the operating system?
- Reddit: Is there a way to determine sizeof(max_align_t) in C99?
These resources are actually for the stack and not the heap. As far as I know, buffer canaries aren't implemented exactly as described in these sources. This is because many production memory allocation functions (for optimization and other reasons) rarely allocate exactly what you specify, often allocating more, and so writing past what you allocated is undefined behavior. I implemented guard canaries in one version of this allocator anyway out of curiosity.
Endianness (3.x) mattered here specifically because on little-endian
architectures like x86, {0xCA, 0xFE, 0xBA, 0xBE} gets read as CA FE BA BE
in memory while 0xCAFEBABE is written as BE BA FE CA. CA FE BA BE is
clearly easier to recognize and debug.
These sources might not directly help you build your own malloc, but they
might help you understand memory allocation better in general.
- makecleanandmake: How to malloc() the Right Way
- Reddit: What situations cause malloc to return NULL?
- pvs-studio: Four reasons to check what the malloc function returned
- @Code_Analysis (Medium): Why it is bad idea to check result of malloc call with assert
- StackOverflow: Is it required to free a pointer variable before using realloc?
As all the sources I found are indeed free online, and thus I'd like to keep the code here free as well for anyone else to learn from. See LICENSE for details.