Linux Kernel Memory Management
One seemingly innocent question is, "what is the active memory statistic in top?". That is, what does this mean:
Mem: 508476k av, 456424k used, 52052k free, 0k shrd, 86780k buff
173524k active, 131488k inactive
Swap: 2048276k av, 0k used, 2048276k free 218144k cached
|
It is fine to understand why all of the memory is used up. A modern OS isn't really disk or memory, but rather, all memory is a combination of this: a soup of pages that live on disk or memory. The O/S's virtual memory manager takes care of how these pages of memory get swapped out. The system from above, after a reboot, and pretty much the same kind of tasks running, has 12 megs or so of active memory (here is the 2.4.31 config file). Why is this? Well, active just means these are pages of memory that were recently touched. Note that this is different than committed memory. Just because an application allocates memory doesn't mean that the program touched pages. At startup, although the processes are there and claim memory, the system hasn't been up long enough for them to have touched the memory. Over time, these processes, and other scheduled tasks, will touch more of the pages and make them "active". Here are some resources to read about the surrounding issues:
Memory Management
What is the page-LRU
System Performance Tuning
Rik van Riel:
Page replacement in Linux 2.4 memory management
Towards an O(1) VM
/proc/meminfo Explained
Understanding Linux Memory Management
Some of the VM settings for the kernel can be changed using sysctl. Here is a listing of the default settings for a system on 2.4.31:
[root@www root]# sysctl -a | grep vm.
vm.block_dump = 0
vm.laptop_mode = 0
vm.max_map_count = 65536
vm.max-readahead = 31
vm.min-readahead = 3
vm.page-cluster = 3
vm.pagetable_cache = 25 50
vm.kswapd = 512 32 8
vm.overcommit_memory = 0
vm.bdflush = 30 500 0 0 500 3000 60 20 0
vm.vm_passes = 60
vm.vm_lru_balance_ratio = 2
vm.vm_anon_lru = 0
vm.vm_mapped_ratio = 100
vm.vm_cache_scan_ratio = 6
vm.vm_vfs_scan_ratio = 6
vm.vm_gfp_debug = 0
[root@www root]#
|
This is also in /proc:
[root@www root]# ls /proc/sys/vm
bdflush max-readahead vm_anon_lru vm_passes
block_dump min-readahead vm_cache_scan_ratio vm_vfs_scan_ratio
kswapd overcommit_memory vm_gfp_debug
laptop_mode page-cluster vm_lru_balance_ratio
max_map_count pagetable_cache vm_mapped_ratio
[root@www root]#
[root@www root]# cat /proc/sys/vm/vm_vfs_scan_ratio
6
[root@www root]#
|
Another view in proc is meminfo:
[root@www vm]# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 520679424 513597440 7081984 0 86335488 227442688
Swap: 2097434624 0 2097434624
MemTotal: 508476 kB
MemFree: 6916 kB
MemShared: 0 kB
Buffers: 84312 kB
Cached: 222112 kB
SwapCached: 0 kB
Active: 157576 kB
Inactive: 148916 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 508476 kB
LowFree: 6916 kB
SwapTotal: 2048276 kB
SwapFree: 2048276 kB
[root@www vm]#
|
The default settings and an explanation is written up in the kernel source:
[root@www mm]#
[root@www mm]# head /usr/src/linux-2.4.31/mm/vmscan.c -n 86 | tail -n 58
/*
* "vm_passes" is the number of vm passes before failing the
* memory balancing. Take into account 3 passes are needed
* for a flush/wait/free cycle and that we only scan 1/vm_cache_scan_ratio
* of the inactive list at each pass.
*/
int vm_passes = 60;
/*
* "vm_cache_scan_ratio" is how much of the inactive LRU queue we will scan
* in one go. A value of 6 for vm_cache_scan_ratio implies that we'll
* scan 1/6 of the inactive lists during a normal aging round.
*/
int vm_cache_scan_ratio = 6;
/*
* "vm_mapped_ratio" controls the pageout rate, the smaller, the earlier
* we'll start to pageout.
*/
int vm_mapped_ratio = 100;
/*
* "vm_lru_balance_ratio" controls the balance between active and
* inactive cache. The bigger vm_balance is, the easier the
* active cache will grow, because we'll rotate the active list
* slowly. A value of 2 means we'll go towards a balance of
* 1/3 of the cache being inactive.
*/
int vm_lru_balance_ratio = 2;
/*
* "vm_vfs_scan_ratio" is what proportion of the VFS queues we will scan
* in one go. A value of 6 for vm_vfs_scan_ratio implies that 1/6th of
* the unused-inode, dentry and dquot caches will be freed during a normal
* aging round.
*/
int vm_vfs_scan_ratio = 6;
/*
* "vm_anon_lru" select if to immdiatly insert anon pages in the
* lru. Immediatly means as soon as they're allocated during the
* page faults.
*
* If this is set to 0, they're inserted only after the first
* swapout.
*
* Having anon pages immediatly inserted in the lru allows the
* VM to know better when it's worthwhile to start swapping
* anonymous ram, it will start to swap earlier and it should
* swap smoother and faster, but it will decrease scalability
* on the >16-ways of an order of magnitude. Big SMP/NUMA
* definitely can't take an hit on a global spinlock at
* every anon page allocation. So this is off by default.
*
* Low ram machines that swaps all the time want to turn
* this on (i.e. set to 1).
*/
int vm_anon_lru = 0;
[root@www mm]#
|
There is also some documentation in
/usr/src/linux-2.4.31/Documentation/sysctl
.
To set one of these values, you can use sysctl:
[root@www sysctl]# sysctl vm.vm_anon_lru
vm.vm_anon_lru = 0
[root@www sysctl]# sysctl -w vm.vm_anon_lru=1
vm.vm_anon_lru = 1
[root@www sysctl]# sysctl vm.vm_anon_lru
vm.vm_anon_lru = 1
[root@www sysctl]#
|
Note that in general, you shouldn't need to worry much about this until there is a problem. Our concern was about active memory. After reading above, here is our explanation of what happens with active memory. Active memory includes pages that are quite recently used. Kswapd goes through those pages occasionally and frees up the pages it can. There is a heirarchy of pages so that they get marked with varying states depending on their age and how they are released. The scanning of these pages (kswapd) is configured in vmscan.c. Now, we happen to be running 2.4.31. The 2.4 kernel has had a lot of changes occur in the VM subsystem. For a glimpse at some of the issues during the transition from 2.4.9 to 2.4.10, see this article. The 2.6 kernel handles this differently still; however, the ideas are the same.
|
|