Monday, 15 April 2013

zeroMQ with PHP and next challenge

So this time round I wanted to experiment with PHP so that I can implement the front end for my little messaging application.

Why PHP? Because it's something that I already know and zeroMQ supports it :)

On the same server I have had to install a few things to get it working.

1] phpize5 - check that you have it available (you could also use phpinfo())
# phpize5 --help
Usage: /usr/bin/phpize5 [--clean|--help|--version|-v]

2] if not then you probably don't have php5-dev

# apt-get install php5-dev

3] clone git repo for php-zmq
# git clone https://github.com/mkoppanen/php-zmq.git

# cd php-zmq.git

4] then follow the guide to finish it off ...

# phpize

# ./configure

# make && make install

# make test

5] add extension=zmq.so to php.ini

6] lastly

# apache2ctl graceful

So below is my PHP code. There's nothing special about it.

Firstly I create my context
subscribe to the socket on port 5556 since this is the port that the sender server sends messages out on
created a small loop of 100 and each time reading from the socket

<html>
<body>
<?php
$context = new ZMQContext();
// Socket to talk to server
$subscriber = new ZMQSocket($context, ZMQ::SOCKET_SUB);
$subscriber->connect("tcp://localhost:5556");
$subscriber->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE, "");
$i = 0;
while( $i < 100 ) {
$string = $subscriber->recv();
echo $i."-".$string."<br>";
$i++;
}
?>
</body>
</html>

<snip>
...
10- docho: dog:1363660548
11- docho: dog:1363660549
12- docho: dog:1363660550
13- docho: dog:1363660551
14- docho: dog:1363660552
15- docho: dog:1363660553
...
</snip>

Challenge

Right now I am creating a random loop to read messages from the queue.

What I need to do is let the web front know when there is a message to display.

I have a couple of ideas on how to do this ...

memcached - submit every message in memory and let the client loop. But doesn't solve the trigger problem
add the messages to database and let the client look for updates rows. Not so efficient.
HTML5 webworks? The Web Workers specification defines an API for spawning background scripts in your web application. Web Workers allow you to do things like fire up long-running scripts to handle computationally intensive tasks, but without blocking the UI or other scripts to handle user interactions. They're going to help put and end to that nasty 'unresponsive script' dialog that we've all come to love. So the worker will connect to the port and update the div that contains the message holder. Thoughts? http://www.html5rocks.com/en/tutorials/workers/basics/

Saturday, 13 April 2013

Is Python the right language for Sys Administrators?

Yesterday at work the big boss told me to give up on Perl and to start doing everything in Python!

So this took me a few years back when I did all my sys stuff in bash before I got exposed to Perl. I found it easy to pick up and the online community helped quite a bit!

My natural reaction when I get asked "hey, I need you to do this and it's urgent" is to do it using Perl because >

a) I feel comfortable
b) it does the job
c) it's urgent.

It has pretty much all the modules that I need to get by - web programming, database programming, OO programming, and general stuff like threading, MIME handling ...

So I have used Perl in pretty much everything that I do - log parsing, cgi, reporting, system checks and now automation.

I spoke to a few guys at work and they have said good things about Python and it's ability to use C libraries.

So what is your experience with Python? Is it the right tool for Sys Admin? Should I stop using Perl? :)

"Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs."

Python lets you work more quickly ... how? I was told that parsing large logs takes quite a bit. They say Python is easy but Perl is a pain. It's partially true about Perl but maybe that's because they haven't read the doc very well ... this was the case for me :)

How does it integrate with your system *more* effectively?

http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/ [Speed Test]
http://strombergers.com/python/ [Python is Cool (and Perl is not), Especially for C/C++ Programmers]
http://www.revolves.net/perl-vs-python-the-final-battle/

handy line to lookup country using IP address/hostname

Here is a handy lookup tool for all admins out there ... You can use both IP and FQDN (fully qualified domain name). Use of this tool is when you have a few servers scattered all over the globe and you want to know where it is attached ...

geoiplookup(1) - Linux man page

Name

geoiplookup - look up country using IP Address or hostname

Synopsis

geoiplookup [-d directory] [-f filename] [-v] <ipaddress|hostname>

Description

geoiplookup uses the GeoIP library and database to find the Country that an IP address or hostname originates from.

For example

geoiplookup 80.60.233.195

will find the Country that 80.60.233.195 originates from, in the following format:

NL, Netherlands

Options

-fSpecify a custom path to a single GeoIP datafile.
-d
Specify a custom directory containing GeoIP datafile(s). By default geoiplookup looks in /usr/share/GeoIP
-v
Lists the date and build number for the GeoIP datafile(s).

Author

Written by T.J. Mather

Wednesday, 10 April 2013

ØMQ on Debian...

the ØMQ installation guide advises those who want to build on Unix-like systems to choose Ubuntu as it's regarded as the most comfortable OS for developing.
However, since I have Debian in my VM, I decided to go for installing ØMQ in Debian and have a little headache on the way :). It was worth it, and it's simple. Just follow the guide :

the following are what you need to do :

Make sure that libtool, autoconf, automake are installed.
Check whether uuid-dev package, uuid/e2fsprogs RPM or equivalent on your system is installed.
Unpack the .tar.gz source archive.
Run ./configure, followed by make.
To install ØMQ system-wide run sudo make install.
On Linux, run sudo ldconfig after installing ØMQ.

1. Make sure that libtool, autoconf, automake are installed.

    apt-get install libtool autoconf automake

2. Check whether uuid-dev package, uuid/e2fsprogs RPM or equivalent on your system is    installed. Use the package manager for Debian, to check if you have them already installed in your system.

   dpkg -s uuid-dev e2fsprogs

If they're are not installed, use the following command to install them :

     apt-get install uuid-dev
apt-get install e2fsprogs

3.   Download the current stable release (tar.gz source code) as provided on the ØMQ page and unpack it.

   #tar -zxvf zeromq-3.2.2.tar.gz

4. Run ./configure, followed by make

(go to the unpacked source archive and run ./configure)

# cd zeromq-3.2.2/
# ./configure

This executable script will match the libraries on your computer with those required by zmq. Thus you will see a list of checks.
If at the end of the checks you get an error like " configure: error: Unable to find a working C++ compiler ", it's likely that trying to run 'make' will fail

# make

and you will get this error :

make: *** No targets specified and no makefile found. Stop.

Therefore you will need to get the C++ compiler or whatever that is missing.

    # apt-get install g++

after installing g++ (c++) run again './configure' and make.

Now you are ready to write, compile and run your code ;-)

/if you you have written your code in C, simply do : /

$ gcc -o myprogram myprogram.c -lzmq

$ ./myprogram

< -lzmq> when you compile your code, you need to add -lzmq , to tell the linker to link against zeromq.

eeDevelopers App for BlackBerry OS and BlackBerry 10 available NOW!

App gets feeds from Facebook pages and blog for you to easily keep up-to-date and share via social networking, BBM and Email!

eeDevelopers for BlackBerry 10 and OS is now available on App World!

http://appworld.blackberry.com/webstore/content/26721876/?countrycode=GB

Tuesday, 9 April 2013

ØMQ - 1 server and 3 client message processing

So after reading the documentation carefully I was able to figure out which constants to use for my little project.

For the server I need to use ZMQ_PULL. Here the socket collects messages from clients evenly using fair-queuing.

On the client side I am using ZMQ_PUSH. Each client will push a message to the server on the same port that the server listens on.

So here it is in action …

now what's left is to tidy up the code :)

Top left is the server and the rest are clients that are sending messages with time() to differentiate individual messages.

Server Code

Client Code

Monday, 8 April 2013

SUSE - Quick And Easy Local Filesystem Troubleshooting For SUSE Linux

To identify possible Filesystem problems

1. Identify OS
2. Figure out how many active/running local disks and/or volume groups
3. Identify hardware product and Check partition
4. compare results: mount errors that are supposed to be up but are not ; mounts that are not supposed to be there

Check the USED% column in the output of your "df -l" command
Check the inodes column and ensure that those aren't all being used up either.
If you're running ReiserFS, use reiserfsck instead of plain fsck

Commands

uname -a
hwinfo --help
cat /proc/partitions
df -l
grep -v ":" /etc/fstab

To identify possible memory bottlenecks

I have gathered the following commands:

Top
1. virtual memory
ps --aux | grep serviceProcess
1. %MEM, VSZ and RSS
vmstat 2
1. “inact” and “active”
ps -o vsz,rss,tsiz,dsiz,majflt,minflt,pmem,cmd PID
1. for all the memory information
cat /proc/PID/status
1. detailed information
swapon --s
1. system swap partitions
free
1. used and free memory in terms of straight-up memory, buffers and cache
cat /proc/meminfo
1. detailed information of system memory and how its being used
sar --r
1. memory usage defined in terms of memory, buffers and cache

1. Identify OS
2. Figure out how many active/running local disks and/or volume groups
3. Identify hardware product and Check partition
4. compare results: mount errors that are supposed to be up but are not ; mounts that are not supposed to be there

Check the USED% column in the output of your "df -l" command
Check the inodes column and ensure that those aren't all being used up either.
If you're running ReiserFS, use reiserfsck instead of plain fsck

Commands

uname -a
hwinfo --help
cat /proc/partitions
df -l
grep -v ":" /etc/fstab

suggest to read something on a small but extremely useful command: “w”

“w” show the load of a system and this is the best indicator on system resource usage.

Top show high memory usage, then so what? If we configure oracle to use a lot of global memory, then it should high

Top show some process run in 100% CPU, that nothing, a simple infinite loop can drive the CPU be 100%, but the system can still function normally

But if “w” show 5, 10, 15 minutes load, that is a indication of thing not good…

1. Figure out where you are and what OS you're on:
host # uname -a
2.4.x will be fore SUSE 8.x and 2.6.x will be for SUSE 9.x.
2. Figure out how many local disks and/or volume groups you have active and running on your system:# hwinfo --help
Usage: hwinfo options
Probe for hardware.
--short        just a short listing
--log logfile write info to logfile
--debug level set debuglevel
--version      show libhd version
--dump-db n    dump hardware data base, 0: external, 1: internal
--hw_item      probe for hw_item
hw_item is one of:
    all, bios, block, bluetooth, braille, bridge, camera, cdrom, chipcard, cpu,
    disk, dsl, dvb, floppy, framebuffer, gfxcard, hub, ide, isapnp, isdn,
    joystick, keyboard, memory, modem, monitor, mouse, netcard, network,
    partition, pci, pcmcia, pcmcia-ctrl, pppoe, printer, scanner, scsi, smp,
    sound, storage-ctrl, sys, tape, tv, usb, usb-ctrl, vbe, wlan, zip Find out what hardware product that you are dealing with# hwinfo | grep system.product
smbios.system.product = 'IBM eServer BladeCenter HS21
~~8853G1G~~'
system.product = 'IBM eServer BladeCenter HS21
~~8853G1G~~' check out partition# cat /proc/partitions
major minor #blocks name
   8     0   71288832 sda
   8     1      56196 sda1
   8     2    3100545 sda2
   8     3    4152802 sda3
   8     4          1 sda4
   8     5    3100513 sda5
   8     6   36700461 sda6
   8     7   24121566 sda7 3. Check out your local filesystems and fix anything you find that's broken:

host # df -l
host # grep -v ":" /etc/fstab

compare results:
       mount errors that are supposed to be up but are not
       mounts that are not supposed to be there
@ - 16:36:42 UTC
( 511 /etc )

df -l

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2              3051824    615032   2281768 22% /
udev                   4089928       124   4089804   1% /dev
/dev/sda1                54416     19780     31827 39% /boot
/dev/sda5              3051792    906840   1989928 32% /usr
/dev/sda6             36123168 11928580 22359568 35% /var
/dev/sda7             23742552   5743236 16793240 26% /var/log

@ - 16:37:45 UTC
( 512 /etc )

grep -v ":" /etc/fstab

/dev/sda2            /                    ext3 acl,user_xattr 1 1
/dev/sda1            /boot                ext3 acl,user_xattr 1 2
/dev/sda5            /usr                 ext3 acl,user_xattr 1 2
/dev/sda6            /var                 ext3 acl,user_xattr 1 2
/dev/sda7            /var/log             ext2 acl,user_xattr 1 2
/dev/sda3            swap                 swap defaults        0 0
proc                 /proc                proc defaults        0 0
sysfs                /sys                 sysfs noauto 0 0
debugfs              /sys/kernel/debug    debugfs       noauto 0 0
usbfs                /proc/bus/usb        usbfs noauto 0 0
devpts               /dev/pts             devpts        mode=0620,gid=5 0 0

TIPS

       Check the USED% column in the output of your "df -l" command
       Check the inodes column and ensure that those aren't all being used up either.
       If you're running ReiserFS, use reiserfsck instead of plain fsck
host # umount /uselessFileSystem
host # fsck -y /uselessFileSystem
....
host # mount /

if you need to fsck the filesystem any special filesystems, like root "/", you should optimally do it when booted up off of a cdrom or, at the very least, in single user mode

POP - useful commands

POP3
Version 3 of the Post Office Protocol (POP3) is comparatively simple, and only allows the user to download emails from the server to the client. The user can log in to an account, view the contents of the mailbox, transfer and delete emails, and log out, all via server port 110. This requires few resources, and there is little to configure, which means few sources of error.
The POP3 protocol is simple enough to use directly, in an interactive session:user@linux:$ telnet mail.example.com 110
Trying 192.168.50.50...
Connected to mail.example.com.
Escape character is '^]'.
+OK Hello there.
USER tux
+OK Password required.
PASS secret
+OK logged in.
The LIST command summarizes all the messages it contains (nine in the following example) and their lengths:LIST
+OK POP3 clients that break here, they violate STD53.
1 9586
2 1125022
3 53125
4 2451
5 5931
6 4943
7 4206
8 5231
9 9481
.
The message from Courier in the +OK answer refers to POP3 clients that erroneously expect the server to return the number of messages in answer to the LIST command:
LIST
+OK 2 messages (320 octets)
1 120
2 200
.
RETR is used to retrieve a message from the server:
RETR 2
Return-Path: <p.heinlein@heinlein-support.de>
X-Original-To: p.heinlein@heinlein-support.de
Delivered-To: tux@example.com
Received: from 10.0.42.2 (unknown 10.0.42.2)
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client did not present a certificate)
by plasma.heinlein-support.de (Postfix) with ESMTP id BEA0581A4B
for <tux@example.com>; Sat, 7 Apr 2007 01:02:01 +0200 (CEST)
From: Peer Heinlein <p.heinlein@heinlein-support.de>
To: Tux <tux@example.com>
Subject: Test message 2
Date: Sat, 7 Apr 2007 01:02:01 +0200
User-Agent: KMail/1.9.5
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200704070102.01895.p.heinlein@heinlein-support.de>
X-Length: 1519
Status: R
X-Status: NC
X-UID: 0
Hello!
I am a test message.
=2D-=20
Heinlein Professional Linux Support GmbH
Linux: Academy - Support - Hosting
http://www.heinlein-support.de

Legally required information according to =A735a HGB (German Commercial
Code)
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,=20
Manager: Peer Heinlein =A0-- Seat: Berlin
Flagging message 2 for deletion after it has been read is just as simple:
DELE 2
DELE 2
+OK Deleted.
However, it will not actually be deleted until the user logs out. This allows us to undo the setting of the deletion flag:
RSET
+OK Resurrected.
If we do not wish to transfer an entire message to the client, we can use the TOP command to retrieve only the message headers and a specified number of lines of the mail body, given in a second argument to the command (seven in this case):
TOP 2 7
Return-Path: <p.heinlein@heinlein-support.de>
X-Original-To: p.heinlein@heinlein-support.de
Delivered-To: tux@example.com
Received: from 10.0.42.2 (unknown 10.0.42.2)
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client did not present a certificate)
by plasma.heinlein-support.de (Postfix) with ESMTP id BEA0581A4B
for <tux@example.com>; Sat, 7 Apr 2007 01:02:01 +0200 (CEST)
From: Peer Heinlein <p.heinlein@heinlein-support.de>
To: Tux <tux@example.com>
There is also an "idle" command that enables the client to keep the connection open:
NOOP
+OK Yup.
The QUIT command is used to terminate the connection:
QUIT
+OK Bye-bye.
Connection closed by foreign host.

IMAP - Useful Commands

Login
- a1 LOGIN "tux" "hidden"
List all available directories
- a2 LIST "" ""*
List specific directories
- a3 LIST "" "INBOX.Priv"*
Select a specific FOLDER
- a4 SELECT INBOX.Test
View ALL
- a5 FETCH 1:3 ALL
View BODY
- a6 FETCH 2 BODY[]
Download individual header lines
- a7 FETCH 2 BODY[HEADER.FIELDS Message-ID
Copy a message either to another FOLDER or back in the same FOLDER
- a8 COPY 2:3 INBOX.Test
Searching for Email Contents
- a11 SEARCH UNSEEN
Find messages marked for DELETION
- a12 SEARCH 1:4 DELETED
Search message contents
- a13 SEARCH ALL TEXT Heinlein
Account migration
- imapsync --hostl oldmail.example.com --userl tux \
  --passwordl "secret" --host2 newmail.example.com --user2 t.tux \
  --password2 "secret"
Account migration with a password file
- imapsync --host1 oldmail.example.com --userl tux \
  --passfile1 /root/pw1 --host2 newmail.example.com --user2 t.tux \
  --passfile2 /root/pw2

Linux - netstat

Netstat returns a variety of information on active connections:

current status
what hosts are involved
which programs are involved

You can also see information about the routing table and even get statistics on your network interfaces.

netstat -l
- To get an overview of everything running on your system, use this basic invocation
netstat -l -p --tcp --udp
- display all listening TCP and UDP sockets and program doing the listening
netstat -a -p --tcp --udp
- list all active TCP/UDP connections
netstat -t -n | cut -c 68- | sort | uniq -c | sort -n
- This will show you a sorted list of how many sockets are in each connection state.
netstat -tlpn
- what daemons are running and accepting connections
netstat -ulpn
- for TCP services
netstat -s
- summary of the network stack state counters, going into way more detail than the RX/TX frames dropped counter of ifconfig.

Parameter Description

--a
- Displays all connections and listening ports
--e
- Displays Ethernet statistics
--n
- Displays addresses and port numbers in numerical form instead of using friendly names
--s
- Displays statistics categorized by protocol
--p
- protocol Shows connections for the specified protocol, either TCP or UDP
--r
- Displays the contents of the routing table interval Displays selected statistics, pausing interval seconds between each display; press [Ctrl]C to stop displaying statistics

Common states

LISTEN
- The socket is listening for incoming connections. Those sockets are only displayed if the --a or --l switch is set.
ESTABLISHED
- The socket has an established connection.
SYN_SENT
- The socket is actively attempting to establish a connection.
SYN_RECV
- A connection request has been received from the network.
TIME_WAIT
- The socket is waiting after close to handle packets still in the network.
FIN_WAIT1
- The socket is closed, and the connection is shutting down.
FIN_WAIT2
- The connection is closed and the socket is waiting for a shutdown from the remote end.
CLOSE_WAIT
- The remote end has shut down, and it is waiting for the socket to close.
CLOSED
- The socket is not being used.

MYSQL Replication

[1] ENABLE BINARY LOGGING AND ESTABLISH UNIQUE SERVICE ID'S
why - binary log is the basis for sending data changes from the master to its slave
[A] on the master server
            1. shutdown mysql
            2. edit my.cnf and my.ini
                        within [mysqld] tag add
                                    log-bin=mysql-bin
                                    sever-id=1
            3. start server
for durability and consistency using InnoDB
            innodb_flush_log_at_trx_commit=1
            sync_binlog=1
in the master my.cnf file
also, ensure skip-networking option is not enabled
[2] ESTABLISH A UNIQU SERVER ID ON THE SLAVE SERVER
no binary logging needed unless the slave acts as a master to another slave (complex setup)
[A] on the slave server
            1. shutdown mysql
            2. edit files
                        withing [mysqld] tagg add
                                    server-id=2
[3] CREATE USER FOR REPLICATION
why - so that the slave can connect to the master (note, user credentials will be stored in plain text in master.info)
CREATE USER 'USERNAME'@%.DOMAIN' IDENTIFIED BY 'PASSWORD';
GRANT REPLICATION SLAVE ON *.* 'USERNAME'@'%.DOMAIN';
[4] OBTAIN REPLICATION MASTER BINARY LOG COORDINATES
            1.stop processing statements on the the master
            2. obtain current binary log coordinates
            3. dump
            4. permit master to continue
            a. FLUSH TABLES WITH READ LOCK;
            b. ( in a different session )
                        SHOW MASTER STATUS;
- note, file name, position for replication coordinates.
- in our case since the master has been running without binary logging, use ('') empty string and 4

[5] CREATE A SNAPSHOT USING MYSQLDUMP
            1. in a shell > mysqldump --all-databases --lock-all-tables > dbdump.db
            2. UNLOCK TABLES ;                     // release acquired lock on the master
[6] SETUP REPLICATION WITH EXISTING DATA
            1. mysql start with --skip-slave-start
            2. in a shell > mysql < dbdump.db
            3. configure the slave with the replication coordinates from the master and setup the master configuration on the slave
            CHANGE MASTER TO
            MASTER_HOST='DOMAIN',
            MASTER_USER='USERNAME',
            MASTER_PASSWORD='PASSWORD',
            MASTER_LOG='',
            MASTER_LOG_POS=4;
            4. start mysql
- the slave should now be able to connect to the master and catch up on any updates that have occurred since the snapshot was taken
Failed:
            a. check server-id on both master and server and ensure that they are unique
            b. check logging on slave
            c. ensure that the domain is correctly set on the mast to grant replication access for the slave
            d. ensure username and password is correct
logging is configured in the master.info in the relay-log-info
if you have made any correction then you will need to
            STOP SLAVE;

            RESET SLAVE

Linux - other ways to copy files to a remote host

Useful tips –

To copy a single file over without using scp

cat file | ssh root@host 'cat > file'
ssh root@host 'cat > file' < file

using public/private key to run scripts on a remote hosts without the need to enter passwords

cat to_be_remote_executed | ssh -i private-key-file root@host | cat > result

to transfer multiple files across in one go

tar -cvf - . | ssh root@host 'cd whereever; tar -xvf - '

Linux - Troubleshooting local sluggish or completely unresponsive system

Often a host that is sluggish or completely unresponsive can be caused by network issues, but below are some local troubleshooting tools you can use to tell the difference between a loaded network and a loaded machine.

When a machine is sluggish, it is often because you have consumed all of a particular resource on the system.

The main resources are CPU, RAM, disk I/O, and network. Overuse of any of these resources can cause a system to bog down to the point that often the only recourse is your last resort-a reboot. If you can log in to the system, however, there are a number of tools you can use to identify the cause.

System Load

10:55:37 up 6 days, 18:32, 3 users, load average: 0.30, 0.17, 0.16

The three numbers after the load average, 0.30, 0.17, and 0.16, represent the 1-, 5-, and 15-minute load averages on the machine, respectively.

If the load is CPU-bound

us: user CPU time
sy: system CPU time
ni: nice CPU time
id: CPU idle time (high is good)
wa: I/O wait (important)

Tasks: 145 total, 1 running, 144 sleeping, 0 stopped, 0 zombie

Cpu(s): 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 218548k total, 155732k used, 62816k free, 7500k buffers

Swap: 634528k total, 268480k used, 366048k free, 63832k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

20112 root 20 0 2576 1212 912 R 1.0 0.6 0:00.07 top

3091 root 20 0 67900 8108 1428 S 0.3 3.7 6:54.52 Xorg

1 root 20 0 3084 124 72 S 0.0 0.1 0:03.83 init

2 root 15 -5 0 0 0 S 0.0 0.0 0:00.01 kthreadd

SWAP death

total used free shared buffers cached

Mem: 218548 169584 48964 0 8792 76860

-/+ buffers/cache: 83932 134616

Swap: 634528 266012 368516

check mem and swap lines

always check cached first, then swap used

Real RAM used ~= used - cached + swap used

if out of RAM, hit M to sort top process by RAM use

The key used figure to look at is the buffers/cache row used value (83932).

This is how much space your applications are currently using. For best performance, this number should be less than your total (218548) memory. To prevent out of memory errors, it needs to be less than the total memory (218548) and swap space (634528).

If you wish to quickly see how much memory is free look at the buffers/cache row free value (134616). This is the total memory (218548) - the actual used (83932). (218548 - 83932 = 134616)

Troubleshooting High I/O wait

root@mon:/var/log# iostat

Linux 2.6.28-15-generic (mon) 22/11/09 _i686_ (1 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

4.46 0.17 3.45 0.74 0.00 91.20

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn

sda 2.94 45.90 42.22 26889903 24735208

sda1 2.47 36.38 33.06 21312181 19365096

sda2 0.00 0.00 0.00 34 0

sda5 0.48 9.52 9.17 5577168 5370112

check for swapping first

use iostat to get disk I/O diagnostics
tps = transactions per second

Blk_read/s = block read per second
Blk_wrtn/s = block written per second
Blk_read = total blocks read
Blk_wrtn = total blocks written

Out of disk space issues

root@mon:/boot/grub# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 55G 11G 42G 21% /

tmpfs 107M 0 107M 0% /lib/init/rw

varrun 107M 136K 107M 1% /var/run

varlock 107M 0 107M 0% /var/lock

udev 107M 144K 107M 1% /dev

tmpfs 107M 1.5M 106M 2% /dev/shm

lrm 107M 2.2M 105M 3% /lib/modules/2.6.28-15-generic/volatile

root@mon:/var/log# du -ckx | sort -nr

91296 total

91296 .

53736 ./atsar

13644 ./ConsoleKit

11240 ./mysql

1836 ./apache2

808 ./installer

228 ./apt

156 ./clamav

56 ./cacti

32 ./cups

24 ./gdm

20 ./mrtg

12 ./fsck

8 ./dbconfig-common

4 ./unattended-upgrades

4 ./sysstat

4 ./samba

4 ./news

4 ./dist-upgrade

4 ./apparmor

start diagnosis with df
identify full disk, then using du to find whats causing it
sudo du -ckx | sort -nr > /tmp/duck-root

to solve

compress logs
clear package cache
dreaded vim full /tmp issue
get bigger disk

Out of Inodes

root@mon:/var/log# df -ih

Filesystem Inodes IUsed IFree IUse% Mounted on

/dev/sda1 3.5M 132K 3.4M 4% /

tmpfs 27K 3 27K 1% /lib/init/rw

varrun 27K 77 27K 1% /var/run

varlock 27K 5 27K 1% /var/lock

udev 27K 1.5K 26K 6% /dev

tmpfs 27K 3 27K 1% /dev/shm

lrm 27K 17 27K 1% /lib/modules/2.6.28-15-generic/volatile

* file system is full, df disagrees

ext3 has pre-set inode limit set at mkfs
use df -i to check
if you run out...delete some files
or backup and reformat...

VMSTAT

vmstat helps you to see, among other things, if your server is swapping

root@ ( 1689 ~ )

# vmstat 1 2

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------

r b swpd free buff cache si so bi bo in cs us sy id wa st

4 0 72056 131836 79648 1638552 0 0 1 120 0 0 11 3 85 2 0

3 0 72056 130736 79652 1639576 0 0 4 0 2342 3655 36 2 61 0 0

si (swap in)
so (swap out)

applications. The si/so numbers should be 0 (or close to it)
Numbers in the hundreds or thousands indicate your server is swapping

r (runnable) b (blocked) and w (waiting) columns help see your server load

Waiting processes are swapped out.
Blocked processes are typically waiting on I/O.
The runnable column is the number of processes trying to something. These numbers combine to form the 'load' value on your server. Typically you want the load value to be one or less per CPU in your server.

The bi (bytes in) and bo (bytes out)

column show disk I/O (including swapping memory to/from disk) on your server

The us (user), sy (system) and id (idle)

show the amount of CPU your server is using.
The higher the idle value, the better.

Like us on Facebook!