Like us on Facebook!

Monday, 15 April 2013

zeroMQ with PHP and next challenge



So this time round I wanted to experiment with PHP so that I can implement the front end for my little messaging application.

Why PHP? Because it's something that I already know and zeroMQ supports it :)

On the same server I have had to install a few things to get it working.

1] phpize5 - check that you have it available (you could also use phpinfo())
# phpize5 --help
Usage: /usr/bin/phpize5 [--clean|--help|--version|-v]

2] if not then you probably don't have php5-dev 
# apt-get install php5-dev

3] clone git repo for php-zmq
# git clone https://github.com/mkoppanen/php-zmq.git
# cd php-zmq.git

4] then follow the guide to finish it off ...
# phpize
# ./configure
# make && make install
# make test

5] add extension=zmq.so to php.ini
6] lastly
# apache2ctl graceful



So below is my PHP code. There's nothing special about it.

  1. Firstly I create my context
  2. subscribe to the socket on port 5556 since this is the port that the sender server sends messages out on
  3. created a small loop of 100 and each time reading from the socket

<html>
  <body>
    <?php
       $context = new ZMQContext();
       //  Socket to talk to server
       $subscriber = new ZMQSocket($context, ZMQ::SOCKET_SUB);
       $subscriber->connect("tcp://localhost:5556");
       $subscriber->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE, "");
       $i = 0; 
           while( $i < 100 ) {
                $string = $subscriber->recv();
                echo $i."-".$string."<br>";
                $i++;
           }
     ?>
   </body>
</html>


<snip>
...
10- docho: dog:1363660548
11- docho: dog:1363660549
12- docho: dog:1363660550
13- docho: dog:1363660551
14- docho: dog:1363660552
15- docho: dog:1363660553
...
</snip>


Challenge

Right now I am creating a random loop to read messages from the queue.

What I need to do is let the web front know when there is a message to display.

I have a couple of ideas on how to do this ...

  1. memcached - submit every message in memory and let the client loop. But doesn't solve the trigger problem
  2. add the messages to database and let the client look for updates rows. Not so efficient. 
  3. HTML5 webworks? The Web Workers specification defines an API for spawning background scripts in your web application. Web Workers allow you to do things like fire up long-running scripts to handle computationally intensive tasks, but without blocking the UI or other scripts to handle user interactions. They're going to help put and end to that nasty 'unresponsive script' dialog that we've all come to love. So the worker will connect to the port and update the div that contains the message holder. Thoughts? http://www.html5rocks.com/en/tutorials/workers/basics/ 

Saturday, 13 April 2013

Is Python the right language for Sys Administrators?


Yesterday at work the big boss told me to give up on Perl and to start doing everything in Python!

So this took me a few years back when I did all my sys stuff in bash before I got exposed to Perl. I found it easy to pick up and the online community helped quite a bit! 

My natural reaction when I get asked "hey, I need you to do this and it's urgent" is to do it using Perl because >

a) I feel comfortable 
b) it does the job 
c) it's urgent. 

It has pretty much all the modules that I need to get by - web programming, database programming, OO programming, and general stuff like threading, MIME handling ...

So I have used Perl in pretty much everything that I do - log parsing, cgi, reporting, system checks and now automation.

I spoke to a few guys at work and they have said good things about Python and it's ability to use C libraries. 

So what is your experience with Python? Is it the right tool for Sys Admin? Should I stop using Perl? :)

"Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs."

Python lets you work more quickly ... how? I was told that parsing large logs takes quite a bit. They say Python is easy but Perl is a pain. It's partially true about Perl but maybe that's because they haven't read the doc very well ... this was the case for me :)

How does it integrate with your system *more* effectively?

http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/ [Speed Test]
http://strombergers.com/python/ [Python is Cool (and Perl is not), Especially for C/C++ Programmers]
http://www.revolves.net/perl-vs-python-the-final-battle/



handy line to lookup country using IP address/hostname





Here is a handy lookup tool for all admins out there ... You can use both IP and FQDN (fully qualified domain name). Use of this tool is when you have a few servers scattered all over the globe and you want to know where it is attached ...




geoiplookup(1) - Linux man page


Name

geoiplookup - look up country using IP Address or hostname

Synopsis

geoiplookup [-d directory] [-f filename] [-v] <ipaddress|hostname>

Description

geoiplookup uses the GeoIP library and database to find the Country that an IP address or hostname originates from.
For example
geoiplookup 80.60.233.195
will find the Country that 80.60.233.195 originates from, in the following format:
NL, Netherlands

Options

-fSpecify a custom path to a single GeoIP datafile.
-d
Specify a custom directory containing GeoIP datafile(s). By default geoiplookup looks in /usr/share/GeoIP
-v
Lists the date and build number for the GeoIP datafile(s).

Author

Written by T.J. Mather

Wednesday, 10 April 2013

ØMQ on Debian...


 

the ØMQ installation guide advises those who want to build on Unix-like systems to choose Ubuntu as it's regarded as the most comfortable OS for developing. 
However, since I have Debian in my VM, I decided to go for installing ØMQ in Debian and have a little headache on the way :). It was worth it, and it's simple.  Just follow the guide :

 the following are what you need to do :
  1. Make sure that libtool, autoconf, automake are installed.
  2. Check whether uuid-dev package, uuid/e2fsprogs RPM or equivalent on your system is installed.
  3. Unpack the .tar.gz source archive.
  4. Run ./configure, followed by make.
  5. To install ØMQ system-wide run sudo make install.
  6. On Linux, run sudo ldconfig after installing ØMQ.


1. Make sure that libtool, autoconf, automake are installed.

    apt-get install libtool autoconf automake
 
2. Check whether uuid-dev package, uuid/e2fsprogs RPM or equivalent on your system is    installed. Use the package manager for Debian, to check if you have them already installed in your system.

     dpkg -s uuid-dev e2fsprogs

If they're are not installed, use the following command to install them :
 
     apt-get install uuid-dev 
      apt-get install e2fsprogs

3.   Download the current stable release (tar.gz source code) as provided on the ØMQ page and unpack it.


   #tar -zxvf zeromq-3.2.2.tar.gz 

4. Run ./configure, followed by make

(go to the unpacked source archive and run ./configure)

  # cd zeromq-3.2.2/
  # ./configure  

This executable script will match the libraries on your computer with those required by zmq. Thus you will see a list of checks.
If at the end of the checks you get an error like " configure: error: Unable to find a working C++ compiler ", it's likely that trying to run 'make' will fail

  # make


and you will get this error :

make: *** No targets specified and no makefile found.  Stop.



Therefore you will need to get the C++ compiler or whatever that is missing.

    # apt-get install g++
   
 after installing g++ (c++) run again './configure' and make.

Now you are ready to write, compile and run your code ;-) 

/if you you have written your code in C, simply do : /
 
$ gcc -o myprogram myprogram.c -lzmq

$ ./myprogram


< -lzmq>  when you compile your code, you need to add -lzmq , to tell the linker to link against zeromq.


 



 



  

     

     


eeDevelopers App for BlackBerry OS and BlackBerry 10 available NOW!

App gets feeds from Facebook pages and blog for you to easily keep up-to-date and share via social networking, BBM and Email!


eeDevelopers for BlackBerry 10 and OS is now available on App World!

http://appworld.blackberry.com/webstore/content/26721876/?countrycode=GB



Tuesday, 9 April 2013

ØMQ - 1 server and 3 client message processing

So after reading the documentation carefully I was able to figure out which constants to use for my little project.

For the server I need to use ZMQ_PULL. Here the socket collects messages from clients evenly using fair-queuing.

On the client side I am using ZMQ_PUSH. Each client will push a message to the server on the same port that the server listens on.


So here it is in action …

now what's left is to tidy up the code :)


Top left is the server and the rest are clients that are sending messages with time() to differentiate individual messages.



Server Code



Client Code



Monday, 8 April 2013

SUSE - Quick And Easy Local Filesystem Troubleshooting For SUSE Linux

To identify possible Filesystem problems

1. Identify OS
2. Figure out how many active/running local disks and/or volume groups
3. Identify hardware product and Check partition
4. compare results: mount errors that are supposed to be up but are not ; mounts that are not supposed to be there
  • Check the USED% column in the output of your "df -l" command
  • Check the inodes column and ensure that those aren't all being used up either.
  • If you're running ReiserFS, use reiserfsck instead of plain fsck

Commands
  • uname -a
  • hwinfo --help
  • cat /proc/partitions
  • df -l
  • grep -v ":" /etc/fstab


To identify possible memory bottlenecks

I have gathered the following commands:
  1. Top
    1. virtual memory
  2. ps --aux | grep serviceProcess
    1. %MEM, VSZ and RSS
  3. vmstat 2
    1. “inact” and “active”
  4. ps -o vsz,rss,tsiz,dsiz,majflt,minflt,pmem,cmd PID
    1. for all the memory information
  5. cat /proc/PID/status
    1. detailed information
  6. swapon --s
    1. system swap partitions
  7. free
    1. used and free memory in terms of straight-up memory, buffers and cache
  8. cat /proc/meminfo
    1. detailed information of system memory and how its being used
  9. sar --r
    1. memory usage defined in terms of memory, buffers and cache

1. Identify OS
2. Figure out how many active/running local disks and/or volume groups
3. Identify hardware product and Check partition
4. compare results: mount errors that are supposed to be up but are not ; mounts that are not supposed to be there
  • Check the USED% column in the output of your "df -l" command
  • Check the inodes column and ensure that those aren't all being used up either.
  • If you're running ReiserFS, use reiserfsck instead of plain fsck

Commands
  • uname -a
  • hwinfo --help
  • cat /proc/partitions
  • df -l
  • grep -v ":" /etc/fstab


suggest to read something on a small but extremely useful command: “w”

“w” show the load of a system and this is the best indicator on system resource usage.

Top show high memory usage, then so what? If we configure oracle to use a lot of global memory, then it should high

Top show some process run in 100% CPU, that nothing, a simple infinite loop can drive the CPU be 100%, but the system can still function normally

But if “w” show 5, 10, 15 minutes load, that is a indication of thing not good…



1. Figure out where you are and what OS you're on:
host # uname -a
2.4.x will be fore SUSE 8.x and 2.6.x will be for SUSE 9.x.
2. Figure out how many local disks and/or volume groups you have active and running on your system:# hwinfo --help
Usage: hwinfo options
Probe for hardware.
  --short        just a short listing
  --log logfile  write info to logfile
  --debug level  set debuglevel
  --version      show libhd version
  --dump-db n    dump hardware data base, 0: external, 1: internal
  --hw_item      probe for hw_item
  hw_item is one of:
    all, bios, block, bluetooth, braille, bridge, camera, cdrom, chipcard, cpu,
    disk, dsl, dvb, floppy, framebuffer, gfxcard, hub, ide, isapnp, isdn,
    joystick, keyboard, memory, modem, monitor, mouse, netcard, network,
    partition, pci, pcmcia, pcmcia-ctrl, pppoe, printer, scanner, scsi, smp,
    sound, storage-ctrl, sys, tape, tv, usb, usb-ctrl, vbe, wlan, zip Find out what hardware product that you are dealing with# hwinfo | grep system.product
  smbios.system.product = 'IBM eServer BladeCenter HS21
8853G1G'
  system.product = 'IBM eServer BladeCenter HS21
8853G1G' check out partition# cat /proc/partitions
major minor  #blocks  name
   8     0   71288832 sda
   8     1      56196 sda1
   8     2    3100545 sda2
   8     3    4152802 sda3
   8     4          1 sda4
   8     5    3100513 sda5
   8     6   36700461 sda6
   8     7   24121566 sda7 3. Check out your local filesystems and fix anything you find that's broken:

host # df -l
host # grep -v ":" /etc/fstab

compare results:
       mount errors that are supposed to be up but are not
       mounts that are not supposed to be there
 @ - 16:36:42 UTC
( 511 /etc )
  1. df -l
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2              3051824    615032   2281768  22% /
udev                   4089928       124   4089804   1% /dev
/dev/sda1                54416     19780     31827  39% /boot
/dev/sda5              3051792    906840   1989928  32% /usr
/dev/sda6             36123168  11928580  22359568  35% /var
/dev/sda7             23742552   5743236  16793240  26% /var/log

@ - 16:37:45 UTC
( 512 /etc )
  1. grep -v ":" /etc/fstab
/dev/sda2            /                    ext3  acl,user_xattr  1 1
/dev/sda1            /boot                ext3  acl,user_xattr  1 2
/dev/sda5            /usr                 ext3  acl,user_xattr  1 2
/dev/sda6            /var                 ext3  acl,user_xattr  1 2
/dev/sda7            /var/log             ext2  acl,user_xattr  1 2
/dev/sda3            swap                 swap  defaults        0 0
proc                 /proc                proc  defaults        0 0
sysfs                /sys                 sysfs noauto  0 0
debugfs              /sys/kernel/debug    debugfs       noauto  0 0
usbfs                /proc/bus/usb        usbfs noauto  0 0
devpts               /dev/pts             devpts        mode=0620,gid=5 0 0

TIPS

       Check the USED% column in the output of your "df -l" command
       Check the inodes column and ensure that those aren't all being used up either.
        If you're running ReiserFS, use reiserfsck instead of plain fsck
 host # umount /uselessFileSystem
host # fsck -y /uselessFileSystem
....
host # mount /  

if you need to fsck the filesystem any special filesystems, like root "/", you should optimally do it when booted up off of a cdrom or, at the very least, in single user mode

POP - useful commands

POP3
Version 3 of the Post Office Protocol (POP3) is comparatively simple, and only allows the user to download emails from the server to the client. The user can log in to an account, view the contents of the mailbox, transfer and delete emails, and log out, all via server port 110. This requires few resources, and there is little to configure, which means few sources of error.
The POP3 protocol is simple enough to use directly, in an interactive session:user@linux:$ telnet mail.example.com 110
Trying 192.168.50.50...
Connected to mail.example.com.
Escape character is '^]'.
+OK Hello there.
USER tux
+OK Password required.
PASS secret
+OK logged in.
The LIST command summarizes all the messages it contains (nine in the following example) and their lengths:LIST
+OK POP3 clients that break here, they violate STD53.
1 9586
2 1125022
3 53125
4 2451
5 5931
6 4943
7 4206
8 5231
9 9481
.
The message from Courier in the +OK answer refers to POP3 clients that erroneously expect the server to return the number of messages in answer to the LIST command:
LIST
+OK 2 messages (320 octets)
1 120
2 200
.
RETR is used to retrieve a message from the server:
RETR 2
Return-Path: <p.heinlein@heinlein-support.de>
X-Original-To: p.heinlein@heinlein-support.de
Delivered-To: tux@example.com
Received: from 10.0.42.2 (unknown 10.0.42.2)
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client did not present a certificate)
by plasma.heinlein-support.de (Postfix) with ESMTP id BEA0581A4B
for <tux@example.com>; Sat, 7 Apr 2007 01:02:01 +0200 (CEST)
From: Peer Heinlein <p.heinlein@heinlein-support.de>
To: Tux <tux@example.com>
Subject: Test message 2
Date: Sat, 7 Apr 2007 01:02:01 +0200
User-Agent: KMail/1.9.5
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200704070102.01895.p.heinlein@heinlein-support.de>
X-Length: 1519
Status: R
X-Status: NC
X-UID: 0
Hello!
I am a test message.
=2D-=20
Heinlein Professional Linux Support GmbH
Linux: Academy - Support - Hosting
http://www.heinlein-support.de

Legally required information according to =A735a HGB (German Commercial
Code)
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,=20
Manager: Peer Heinlein =A0-- Seat: Berlin
 Flagging message 2 for deletion after it has been read is just as simple:
DELE 2
DELE 2
+OK Deleted.
However, it will not actually be deleted until the user logs out. This allows us to undo the setting of the deletion flag:
RSET
+OK Resurrected.
 If we do not wish to transfer an entire message to the client, we can use the TOP command to retrieve only the message headers and a specified number of lines of the mail body, given in a second argument to the command (seven in this case):
TOP 2 7
Return-Path: <p.heinlein@heinlein-support.de>
X-Original-To: p.heinlein@heinlein-support.de
Delivered-To: tux@example.com
Received: from 10.0.42.2 (unknown 10.0.42.2)
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client did not present a certificate)
by plasma.heinlein-support.de (Postfix) with ESMTP id BEA0581A4B
for <tux@example.com>; Sat, 7 Apr 2007 01:02:01 +0200 (CEST)
From: Peer Heinlein <p.heinlein@heinlein-support.de>
To: Tux <tux@example.com>
 There is also an "idle" command that enables the client to keep the connection open:
NOOP
+OK Yup.
The QUIT command is used to terminate the connection:
QUIT
+OK Bye-bye.
Connection closed by foreign host.

IMAP - Useful Commands

  • Login
    • a1 LOGIN "tux" "hidden"
  • List all available directories
    • a2 LIST "" ""*
  • List specific directories
    • a3 LIST "" "INBOX.Priv"*
  • Select a specific FOLDER
    • a4 SELECT INBOX.Test
  • View ALL
    • a5 FETCH 1:3 ALL
  • View BODY
    • a6 FETCH 2 BODY[]
  • Download individual header lines
    • a7 FETCH 2 BODY[HEADER.FIELDS Message-ID
  • Copy a message either to another FOLDER or back in the same FOLDER
    • a8 COPY 2:3 INBOX.Test
  • Searching for Email Contents
    • a11 SEARCH UNSEEN
  • Find messages marked for DELETION
    • a12 SEARCH 1:4 DELETED
  • Search message contents
    • a13 SEARCH ALL TEXT Heinlein
  • Account migration
    • imapsync --hostl oldmail.example.com --userl tux \
      --passwordl "secret" --host2 newmail.example.com --user2 t.tux \
      --password2 "secret"
  • Account migration with a password file
    • imapsync --host1 oldmail.example.com --userl tux \
      --passfile1 /root/pw1 --host2 newmail.example.com --user2 t.tux \
      --passfile2 /root/pw2

Linux - netstat

Netstat returns a variety of information on active connections:
  • current status
  • what hosts are involved
  • which programs are involved
You can also see information about the routing table and even get statistics on your network interfaces.
  • netstat -l
    • To get an overview of everything running on your system, use this basic invocation
  • netstat -l -p --tcp --udp
    • display all listening TCP and UDP sockets and program doing the listening
  • netstat -a -p --tcp --udp
    • list all active TCP/UDP connections
  • netstat -t -n | cut -c 68- | sort | uniq -c | sort -n
    • This will show you a sorted list of how many sockets are in each connection state.
  • netstat -tlpn
    • what daemons are running and accepting connections
  • netstat -ulpn
    • for TCP services
  • netstat -s
    • summary of the network stack state counters, going into way more detail than the RX/TX frames dropped counter of ifconfig.
Parameter  Description
  • --a  
    • Displays all connections and listening ports
  • --e  
    • Displays Ethernet statistics
  • --n  
    • Displays addresses and port numbers in numerical form instead of using friendly names
  • --s  
    • Displays statistics categorized by protocol
  • --p
    • protocol  Shows connections for the specified protocol, either TCP or UDP
  • --r  
    • Displays the contents of the routing table interval  Displays selected statistics, pausing interval seconds between each display; press [Ctrl]C to stop displaying statistics
Common states

  • LISTEN
    • The socket is listening for incoming connections. Those sockets are only displayed if the --a or --l switch is set.
  • ESTABLISHED
    • The socket has an established connection.
  • SYN_SENT
    • The socket is actively attempting to establish a connection.
  • SYN_RECV
    • A connection request has been received from the network.
  • TIME_WAIT
    • The socket is waiting after close to handle packets still in the network.
  • FIN_WAIT1
    • The socket is closed, and the connection is shutting down.
  • FIN_WAIT2
    • The connection is closed and the socket is waiting for a shutdown from the remote end.
  • CLOSE_WAIT
    • The remote end has shut down, and it is waiting for the socket to close.
  • CLOSED
    • The socket is not being used. 

MYSQL Replication

[1] ENABLE BINARY LOGGING AND ESTABLISH UNIQUE SERVICE ID'S
why - binary log is the basis for sending data changes from the master to its slave
[A] on the master server
            1. shutdown mysql
            2. edit my.cnf and my.ini
                        within [mysqld] tag add
                                    log-bin=mysql-bin
                                    sever-id=1
            3. start server
for durability and consistency using InnoDB
            innodb_flush_log_at_trx_commit=1
            sync_binlog=1
in the master my.cnf file
also, ensure skip-networking option is not enabled
[2] ESTABLISH A UNIQU SERVER ID ON THE SLAVE SERVER
no binary logging needed unless the slave acts as a master to another slave (complex setup)
[A] on the slave server
            1. shutdown mysql
            2. edit files
                        withing [mysqld] tagg add
                                    server-id=2
[3] CREATE USER FOR REPLICATION
why - so that the slave can connect to the master (note, user credentials will be stored in plain text in master.info)
CREATE USER 'USERNAME'@%.DOMAIN' IDENTIFIED BY 'PASSWORD';
GRANT REPLICATION SLAVE ON *.* 'USERNAME'@'%.DOMAIN';
[4] OBTAIN REPLICATION MASTER BINARY LOG COORDINATES
            1.stop processing statements on the the master
            2. obtain current binary log coordinates
            3. dump
            4. permit master to continue
            a. FLUSH TABLES WITH READ LOCK;
            b. ( in a different session )
                        SHOW MASTER STATUS;
- note, file name, position for replication coordinates.
- in our case since the master has been running without binary logging, use ('') empty string and 4

[5] CREATE A SNAPSHOT USING MYSQLDUMP
            1. in a shell > mysqldump --all-databases --lock-all-tables > dbdump.db
            2. UNLOCK TABLES ;                     // release acquired lock on the master
[6] SETUP REPLICATION WITH EXISTING DATA
            1. mysql start with --skip-slave-start
            2. in a shell > mysql < dbdump.db
            3. configure the slave with the replication coordinates from the master and setup the master configuration on the slave
            CHANGE MASTER TO
            MASTER_HOST='DOMAIN',
            MASTER_USER='USERNAME',
            MASTER_PASSWORD='PASSWORD',
            MASTER_LOG='',
            MASTER_LOG_POS=4;
            4. start mysql
- the slave should now be able to connect to the master and catch up on any updates that have occurred since the snapshot was taken
Failed:
            a. check server-id on both master and server and ensure that they are unique
            b. check logging on slave
            c. ensure that the domain is correctly set on the mast to grant replication access for the slave
            d. ensure username and password is correct
logging is configured in the master.info in the relay-log-info
if you have made any correction then you will need to
            STOP SLAVE;

            RESET SLAVE

Linux - other ways to copy files to a remote host

Useful tips –

To copy a single file over without using scp
  1. cat file | ssh root@host 'cat > file'
  2. ssh root@host 'cat > file' < file
using public/private key to run scripts on a remote hosts without the need to enter passwords
  • cat to_be_remote_executed | ssh -i private-key-file root@host | cat > result
to transfer multiple files across in one go

  • tar -cvf - . | ssh root@host 'cd whereever; tar -xvf - '

Linux - Troubleshooting local sluggish or completely unresponsive system


Often a host that is sluggish or completely unresponsive can be caused by network issues, but below are some local troubleshooting tools you can use to tell the difference between a loaded network and a loaded machine.

When a machine is sluggish, it is often because you have consumed all of a particular resource on the system.
The main resources are CPU, RAM, disk I/O, and network. Overuse of any of these resources can cause a system to bog down to the point that often the only recourse is your last resort-a reboot. If you can log in to the system, however, there are a number of tools you can use to identify the cause.
System Load
10:55:37 up 6 days, 18:32,  3 users,  load average: 0.30, 0.17, 0.16
The three numbers after the load average, 0.30, 0.17, and 0.16, represent the 1-, 5-, and 15-minute load averages on the machine, respectively.  
If the load is CPU-bound

  • us: user CPU time
  • sy: system CPU time
  • ni: nice CPU time
  • id: CPU idle time (high is good)
  • wa: I/O wait (important)
Tasks: 145 total,   1 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.3%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    218548k total,   155732k used,    62816k free,     7500k buffers
Swap:   634528k total,   268480k used,   366048k free,    63832k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20112 root      20   0  2576 1212  912 R  1.0  0.6   0:00.07 top
 3091 root      20   0 67900 8108 1428 S  0.3  3.7   6:54.52 Xorg
    1 root      20   0  3084  124   72 S  0.0  0.1   0:03.83 init
    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.01 kthreadd
  •  
SWAP death
             total       used       free     shared    buffers     cached
Mem:        218548     169584      48964          0       8792      76860
-/+ buffers/cache:      83932     134616
Swap:       634528     266012     368516
check mem and swap lines
  • always check cached first, then swap used
Real RAM used ~= used - cached + swap used
if out of RAM, hit M to sort top process by RAM use
The key used figure to look at is the buffers/cache row used value (83932). 
This is how much space your applications are currently using.  For best performance, this number should be less than your total (218548) memory.  To prevent out of memory errors, it needs to be less than the total memory (218548) and swap space (634528).
If you wish to quickly see how much memory is free look at the buffers/cache row free value (134616). This is the total memory (218548) - the actual used (83932).  (218548 - 83932 = 134616)
Troubleshooting High I/O wait  

root@mon:/var/log# iostat
Linux 2.6.28-15-generic (mon)   22/11/09        _i686_  (1 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.46    0.17    3.45    0.74    0.00   91.20
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               2.94        45.90        42.22   26889903   24735208
sda1              2.47        36.38        33.06   21312181   19365096
sda2              0.00         0.00         0.00         34          0
sda5              0.48         9.52         9.17    5577168    5370112
check for swapping first
  • use iostat to get disk I/O diagnostics
  • tps = transactions per second
    • Blk_read/s = block read per second
    • Blk_wrtn/s = block written per second
    • Blk_read = total blocks read
    • Blk_wrtn = total blocks written
Out of disk space issues

root@mon:/boot/grub# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              55G   11G   42G  21% /
tmpfs                 107M     0  107M   0% /lib/init/rw
varrun                107M  136K  107M   1% /var/run
varlock               107M     0  107M   0% /var/lock
udev                  107M  144K  107M   1% /dev
tmpfs                 107M  1.5M  106M   2% /dev/shm
lrm                   107M  2.2M  105M   3% /lib/modules/2.6.28-15-generic/volatile

root@mon:/var/log# du -ckx | sort -nr
91296   total
91296   .
53736   ./atsar
13644   ./ConsoleKit
11240   ./mysql
1836    ./apache2
808     ./installer
228     ./apt
156     ./clamav
56      ./cacti
32      ./cups
24      ./gdm
20      ./mrtg
12      ./fsck
8       ./dbconfig-common
4       ./unattended-upgrades
4       ./sysstat
4       ./samba
4       ./news
4       ./dist-upgrade
4       ./apparmor
  • start diagnosis with df
  • identify full disk, then using du to find whats causing it
  • sudo du -ckx | sort -nr > /tmp/duck-root
to solve
  • compress logs
  • clear package cache
  • dreaded vim full /tmp issue
  • get bigger disk
Out of Inodes
root@mon:/var/log# df -ih
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1               3.5M    132K    3.4M    4% /
tmpfs                    27K       3     27K    1% /lib/init/rw
varrun                   27K      77     27K    1% /var/run
varlock                  27K       5     27K    1% /var/lock
udev                     27K    1.5K     26K    6% /dev
tmpfs                    27K       3     27K    1% /dev/shm
lrm                      27K      17     27K    1% /lib/modules/2.6.28-15-generic/volatile
* file system is full, df disagrees
  • ext3 has pre-set inode limit set at mkfs
  • use df -i to check
  • if you run out...delete some files
  • or backup and reformat...
VMSTAT
vmstat helps you to see, among other things, if your server is swapping
root@ ( 1689 ~ )
# vmstat 1 2
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0  72056 131836  79648 1638552    0    0     1   120    0    0 11  3 85  2  0
 3  0  72056 130736  79652 1639576    0    0     4     0 2342 3655 36  2 61  0  0
si (swap in)
so (swap out)
  • applications.  The si/so numbers should be 0 (or close to it)
  • Numbers in the hundreds or thousands indicate your server is swapping
r (runnable) b (blocked) and w (waiting) columns help see your server load
  • Waiting processes are swapped out. 
  • Blocked processes are typically waiting on I/O. 
  • The runnable column is the number of processes trying to something.  These numbers combine to form the 'load' value on your server.  Typically you want the load value to be one or less per CPU in your server.
The bi (bytes in) and bo (bytes out)
  • column show disk I/O (including swapping memory to/from disk) on your server
The us (user), sy (system) and id (idle)
  • show the amount of CPU your server is using. 
  • The higher the idle value, the better.