Master class  "Linux performance analysis and tuning"
Learn to interpret the output of tools like atop and to improve your system's performance
Lectured by the atop developer!

News

July 22, 2013

Netatop version 0.3

December 28, 2012

Netatop version 0.2

November 19, 2012

Atop version 2.0.2

November 12, 2012

Netatop version 0.1.1

October 26, 2012

Atop version 2.0

October 26, 2012

Netatop version 0.1

July 23, 2012

Atop version 1.27-3

November 18, 2010

Atop version 1.26-2

October 31, 2010

Atop version 1.26

April 30, 2010

Atop version 1.25

March 12, 2010

Patches kernel version 2.6.33

The full-screen output of atop consists of a top half with system level statistics and a bottom half with process level statistics. The type of process level statistics can be modified by pressing certain keys, as shown by these screen shots. This page does not describe the meaning of every single counter. You can find a full description in the atop manual page.
Notice that for some screenshots an additional example is provided showing the dynamic extension of columns whenever the window has been widened (more than 80 positions).


Generic information — default

The generic screen gives an overview of the consumption on system and process level of the four major hardware resources, i.e. cpu, memory, disk and network. Since the kernel does not maintain per-process network statistics, network consumption on process level is only shown when you have installed the netatop kernel module.

Some details: In the line with the label PRC the counter '#exit' shows that one process has finished during the last interval. The bottom half shows which process: process 'find' with process-id 31085. Before it died, it has consumed .45 seconds cpu-time in system mode and .10 seconds cpu-time in user mode, so .55 cpu-seconds in total (2.75% of one cpu during the interval of 20 seconds). The column ST (state) shows 'E' (exit) and the column EXCODE shows the process' exit code 1 (an exit code of 0 would indicate a normal run).

Generic information — default (wider window)

When this wider screenshot is compared with the previous one, real and effective uid are shown now, and the number of threads and current cpu-number for the main thread.


Scheduling information — key 's'

This screen shows specific scheduling information about the main thread of each process, like scheduling policy, nice value, priority, realtime priority and cpu-number (current or last used) and state.
Furthermore it shows how many threads within this process are in state 'running' (busy on cpu or waiting in the runqueue), 'interruptible sleeping' or 'non-interruptible sleeping'. The total number of threads can be determined by accumulating these three values (columns TRUN, TSLPI and TSLPU).

Some details: The process 'chrome' with process-id 30549 runs with 4 threads in total; one of these threads is 'running' and three are interruptible sleeping. The running thread appears to be the main thread of the process, because the state of the main thread (column S) is 'R'.
The process 'firefox' with process-id 4680 runs with 8 threads in total from which one is 'running' (but not the main thread).


Memory consumption — key 'm'

This screen shows specific memory-related information per process like total virtual and resident size (column VSIZE and RSIZE) and the virtual and resident growth during the last interval (column VGROW and RGROW). The memory percentage (column MEM) shows the resident memory occupation by this process, because that is what matters when your system starts swapping.

Some details: In the line with the label PAG the counters 'swin' (swapins) and 'swout' (swapouts) show that this system suffers from a memory-overload. In the line with label LVM for logical volume 'vg00-lvswap' the 'read' and 'write' counters exactly match with the 'swin' and 'swout' counters. This logical volume is also most reponsible for the heavy load on the underlying disk 'sda'.
On process level a lot of negative resident growth (column RGROW) can be seen because processes loose their resident pages by swapout. It appears that process 'lekker' with process-id 31048 grows heavily due to a memory leakage; its resident size is currently 1.5 GiB (total memory 3.8 GiB).

Memory consumption — key 'm' (wider window)

When this wider screenshot is compared with the previous one, the real and effective uid are shown now.


Disk utilization — key 'd'

The lines with label LVM (logical volumes) and DSK (underlying physical disks) shows the disk-activity on system level.
On process level the disk activity is shown as the amount of data transferred by reads (column RDDSK) and writes (column WRDSK). Usually the written data is stored in the in-memory page cache before it is physically written to disk. When the data is written to the page cache but destroyed before physically written to disk, that amount is reported as cancelled (column WCANCL).

Disk utilization — key 'd' (wider window)

When this wider screenshot is compared with the previous one, columns are added for the system level statistics, like the number of KiB transferred per read and write request, the total throughput per second for reading and writing, and the average number of requests in the request-queue of the disk driver.

Some details: The line with the label DSK shows that disk 'sda' is 47% busy during the last interval, issueing 3096 read requests and 40 write requests. So most disk-utilization is caused by processes that are reading.
The process that has read most data seems to be 'bash' with process-id 21091. That process transferred 150MB (which is 97% of all accounted disk transfer). Since other processes did not transfer much data, 'bash' seems to have made the disk 47% busy during an interval of 20 seconds, which can not be true.... And it is not true! The process 'find' with process-id 31085 has issued most disk-transfers, but it has finished during the interval. In that case atop obtains its information from the process accounting record of the exited process, however unfortunately the number of disk transfer is not registered there....
Another complicating factor is, that the kernel adds the accounted disk transfer figures of a child process to its parent whenever the child process exits. Therefore, probably 'bash' did not issue any disk transfer (the 150MiB data that was accounted to 'bash' in reality has been read by 'find').


Variable information — key 'v'

This screen shows miscellaneous information about processes, like credentials (real uid and real gid), parent process-id, start date and start time, etcetera.

Variable information — key 'v' (wider window)

When this wider screenshot is compared with the previous one, all flavors of uid and gid are shown now, and the exact end data and end time is shown for processes that finished during the interval.


Command line — key 'c'

This screen shows the command line of the processes. If the window is widened, more command line arguments are shown.


Accumulated per program — key 'p'

This screen shows in the most right column which programs are active (or been active during the last interval) and in the most left column how many processes (incarnations). The columns in between show the accumulated cpu consumption, the accumulated virtual and resident memory consumption (notice that the shared parts are accounted for every process, so this is far too high), the accumulated transferred data from/to disk and (only in case the netatop module is active) the accumulated network transfers.


Accumulated per user — key 'u'

This screen shows in the most right column which users are active (or been active during the last interval) and in the most left column how many processes each user runs/ran. The columns in between show the accumulated cpu consumption, the accumulated virtual and resident memory consumption (notice that the shared parts are accounted for every process, so this is far too high), the accumulated transferred data from/to disk and (only in case the netatop module is active) the accumulated network transfers.


Network utilization — key 'n' (with netatop module)

This screen shows the network activity per process when the kernel module netatop is loaded. With this module, network transfers are accounted to the concerning process and thread. The number of receives and sends are shown for TCP and UDP, even for finished processes when the netatopd daemon is active. If the window is widened, also the average size per transfer is shown (like in this example). Notice that the utilization on interface level can be easily correlated to the bandwidth used per process: 'si' for interface p5p1 shows 28 Mbps for input from which 23 Mbps ('BANDWI') is consumed by 'ssh' and almost 5 Mbps by 'attract', while 'so' for interface p5p1 shows 5571 Kbps for output from which 588 Kbps ('BANDWO') is consumed by 'ssh' and 4983 Kbps by 'attract'.