Often
a host that is sluggish or completely unresponsive can be caused by
network issues, but below are some local troubleshooting tools you
can use to tell the difference between a loaded network and a loaded machine.
When a machine is sluggish, it is
often because you have consumed all of a particular resource on the system.
The main resources are CPU, RAM,
disk I/O, and network. Overuse of any of these resources can cause a system to
bog down to the point that often the only recourse is your last resort-a
reboot. If you can log in to the system, however, there are a number of tools
you can use to identify the cause.
System
Load
|
10:55:37 up 6 days,
18:32, 3 users,
load average: 0.30, 0.17, 0.16
|
The three numbers after the load
average, 0.30, 0.17, and 0.16, represent the 1-, 5-, and 15-minute load
averages on the machine, respectively.
If the load is CPU-bound
- us: user CPU time
- sy: system CPU time
- ni: nice CPU time
- id: CPU idle time (high is good)
- wa: I/O wait (important)
|
Tasks: 145 total, 1
running, 144
sleeping, 0
stopped, 0
zombie
Cpu(s): 1.0%us, 0.3%sy, 0.0%ni,
98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 218548k total, 155732k
used, 62816k free, 7500k buffers
Swap: 634528k total, 268480k
used, 366048k free, 63832k cached
PID USER PR
NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
20112 root 20 0
2576 1212
912 R
1.0 0.6 0:00.07
top
3091 root 20 0 67900 8108
1428 S
0.3 3.7 6:54.52
Xorg
1
root 20 0
3084 124 72 S 0.0 0.1 0:03.83 init
2
root 15
-5 0 0 0 S 0.0 0.0
0:00.01 kthreadd
|
SWAP death
|
total
used free
shared buffers cached
Mem:
218548 169584
48964
0 8792
76860
-/+ buffers/cache:
83932 134616
Swap:
634528 266012 368516
|
check mem and swap lines
- always check cached first, then swap used
Real RAM used ~= used - cached +
swap used
if out of RAM, hit M to
sort top process by RAM use
The key used figure to look at is
the buffers/cache row used value (83932).
This is how much space your
applications are currently using. For best performance, this number
should be less than your total (218548)
memory. To prevent out of memory errors, it needs to be less than the
total memory (218548) and swap space
(634528).
If you wish to quickly see how much
memory is free look at the buffers/cache row free value (134616). This is the total memory (218548) - the actual used (83932). (218548 -
83932 = 134616)
Troubleshooting
High I/O wait
|
root@mon:/var/log# iostat
Linux 2.6.28-15-generic (mon) 22/11/09
_i686_ (1 CPU)
avg-cpu: %user %nice %system
%iowait %steal %idle
4.46 0.17 3.45
0.74 0.00 91.20
Device:
tps Blk_read/s Blk_wrtn/s
Blk_read Blk_wrtn
sda
2.94
45.90 42.22
26889903 24735208
sda1
2.47
36.38 33.06
21312181 19365096
sda2
0.00
0.00
0.00 34
0
sda5
0.48
9.52 9.17
5577168 5370112
|
check for swapping first
- use iostat to get disk I/O diagnostics
- tps = transactions per second
- Blk_read/s = block read per second
- Blk_wrtn/s = block written per second
- Blk_read = total blocks read
- Blk_wrtn = total blocks written
Out
of disk space issues
|
root@mon:/boot/grub# df -h
Filesystem
Size Used Avail Use% Mounted on
/dev/sda1
55G 11G 42G 21% /
tmpfs
107M 0 107M 0% /lib/init/rw
varrun
107M 136K 107M 1% /var/run
varlock
107M 0 107M 0% /var/lock
udev
107M 144K 107M 1% /dev
tmpfs
107M 1.5M 106M 2% /dev/shm
lrm
107M 2.2M 105M 3%
/lib/modules/2.6.28-15-generic/volatile
root@mon:/var/log# du -ckx | sort -nr
91296 total
91296 .
53736 ./atsar
13644 ./ConsoleKit
11240 ./mysql
1836 ./apache2
808 ./installer
228 ./apt
156 ./clamav
56 ./cacti
32 ./cups
24 ./gdm
20 ./mrtg
12 ./fsck
8 ./dbconfig-common
4
./unattended-upgrades
4 ./sysstat
4 ./samba
4 ./news
4 ./dist-upgrade
4 ./apparmor
|
- start diagnosis with df
- identify full disk, then using du to find whats causing
it
- sudo du -ckx | sort -nr > /tmp/duck-root
to
solve
- compress logs
- clear package cache
- dreaded vim full /tmp issue
- get bigger disk
Out
of Inodes
|
root@mon:/var/log# df -ih
Filesystem
Inodes IUsed IFree IUse% Mounted on
/dev/sda1
3.5M 132K 3.4M 4% /
tmpfs
27K 3
27K 1% /lib/init/rw
varrun
27K 77
27K 1% /var/run
varlock
27K 5
27K 1% /var/lock
udev
27K 1.5K 26K 6%
/dev
tmpfs
27K 3
27K 1% /dev/shm
lrm
27K 17
27K 1% /lib/modules/2.6.28-15-generic/volatile
|
* file system is full, df disagrees
- ext3 has pre-set inode limit set at mkfs
- use df -i to check
- if you run out...delete some files
- or backup and reformat...
vmstat helps you to see, among other
things, if your server is swapping
|
root@ ( 1689
~ )
# vmstat 1 2
procs -----------memory---------- ---swap-- -----io----
-system-- -----cpu------
r b swpd
free buff cache si
so bi bo in cs us
sy id wa st
4 0 72056
131836 79648
1638552 0
0 1 120
0 0 11
3 85
2 0
3 0 72056
130736 79652
1639576 0
0 4 0 2342
3655 36
2 61
0 0
|
si (swap in)
so (swap out)
so (swap out)
- applications. The si/so numbers should be 0 (or
close to it)
- Numbers in the hundreds or thousands indicate your
server is swapping
r (runnable) b (blocked) and w
(waiting) columns help see your server load
- Waiting processes are swapped out.
- Blocked processes are typically waiting on I/O.
- The runnable column is the number of processes trying
to something. These numbers combine to form the 'load' value on your
server. Typically you want the load value to be one or less per CPU
in your server.
The bi (bytes in) and bo (bytes out)
- column show disk I/O (including swapping memory to/from
disk) on your server
The us (user), sy (system) and id
(idle)
- show the amount of CPU your server is using.
- The higher the idle value, the better.
No comments:
Post a Comment
Have your say!