• PHP
  • Ruby on Rails
  • MySQL
  • Linux
    • SELINUX
    • Fedora
    • debian
  • Apache
  • nginx
  • AJAX
Albertech.net

Tag Archives: Linux

Amanda Backups: Exclude.gtar

April 30, 2009 5:39 pm / Albertech.net

The Amanda backup system is a great resource for backing up your Linux system. One of the things I noticed with the latest version is that the exclusion list has been breaking. For instance, Amanda backups are now backing up the /tmp folder, which causes it complain with the PHP session lock files. All was needed to fix was adding a leading dot in front of each folder.

Here’s is a copy of my working amanda.gtar file:

./tmp
./dev
./sys
./config
./proc
./mnt
./cdrom
./lost+found
./opt

Share this:

  • Facebook
  • Google
  • Twitter
  • Print
  • Email
Posted in: Linux / Tagged: exclude.gtar, Linux

Analyzing large web log files

February 20, 2009 12:11 am / Albertech.net

The fastest way to trim down large web log files is through UNIX/Linux shell. Large files exceeding 1 GB (millions of lines of logs) are not easily editable using a GUI interface, so the fastest way is to parse them is via command line. You can trim them down according to a time range, remove internal requests from within the company, and remove bots/crawlers data from the log files.

For instance, I have a 4GB log file with about two years worth of info (2007-2009) in there. What if I just wanted the logs from 2008? First, run the “head -10” command on the log file to see what the general format of the log is.

Preferably, it is in the YYYY-MM-DD hour:min:second format. If you are building a custom log file from scratch, make sure you have the dash delimiter between the dates to make it easier to work with after. For me, the log file that I was given was in the YYYYMMDDHHMMSS (e.g. timestamp format) Luckily, with regular expressions, you can still parse the file accordingly.

20070101000013 4ms 0ms 8ms
20070101000019 4ms 0ms 8ms

Make sure you have enough disk space before proceeding, at least enough to handle twice the size of the logfile. Run a “df” to check available space and “ls -lah” to see the size of the files in your log directory.
So, if you only want the logs from 2008 (using the example above) you can run the following command.

cat [INPUTLOGFILE] | grep '^2008' > [OUTPUTLOGFILE]

This will look through each line, run a regular expression match with 2008 at the beginning of the line. If the logfile does not start with the date, you can still run a match if the date appears anywhere in the line (if in the YYYY-DD-MM format)

cat [INPUTLOGFILE] | grep '2008-' > [OUTPUTLOGFILE]

What if you want to omit googlebots and crawlers directly from the log file? Most log analysis programs have a filter that looks in the user agent string to check if the source is from a crawler. But, not all crawlers use the user agent string. Some set the user agent string to a common browser. If you have dns lookups on, you can try filtering logs that match the domain googlebot.com . The “grep -v ” command is handy since it excludes all lines that match that particular value.

cat [INPUTLOGFILE] | grep -v 'googlebot.com' > [OUTPUTLOGFILE]

Another use is to filter out all local requests coming from your department/building/company. This would be good to see where your external users are coming from. If you have DNS turned on:

cat [INPUTLOGFILE] | grep -v 'mycompany.com' > [OUTPUTLOGFILE]

If you don’t have DNS turned on, you can filter by IP range

cat [INPUTLOGFILE] | grep -v '123.456.789.' > [OUTPUTLOGFILE]

Check the number of lines in the logfile (before and after comparison) with:
wc -l [LOGFILE]

For more information, checkout these resources:
http://www.robelle.com/smugbook/regexpr.html

What if you don’t have access to UNIX/Linux? There’s Cygwin for Windows, although I don’t recommend running it with huge log files. I’ll eventually try it at some point to see if it’ll work 😉
http://www.cygwin.com/

Share this:

  • Facebook
  • Google
  • Twitter
  • Print
  • Email
Posted in: Apache, Linux / Tagged: Linux, log files, regular expressions, unix

Text in SSH window stuck at fixed width

November 18, 2008 4:16 pm / Albertech.net

If you ssh into your server and the text is stuck at 80 characters per line, you will need to check the sshd setting in /etc/ssh/sshd_config (Debian)

Make sure the X11 setting is set to yes. This will allow you to expand your terminal window to beyond 80 characters per line.  Otherwise, it will be limited to the display settings in console.

X11Forwarding yes

Share this:

  • Facebook
  • Google
  • Twitter
  • Print
  • Email
Posted in: Linux / Tagged: Linux, ssh, tips

Categories

  • AJAX
  • Android
  • Apache
  • Canon Cameras
  • Cloud
  • CMS
  • Computer Mods
  • Conferences
  • Deals
  • debian
  • Fedora
  • Flash
  • Frameworks
  • git
  • Hardware
  • HTML
  • IDE
  • iPhone
  • iPhone App Review
  • jQuery
  • Linux
  • Mac OS X
  • MySQL
  • nginx
  • PHP
  • portfolio
  • Puppet
  • Ruby on Rails
  • Script Reviews
  • SELINUX
  • Software
  • Software Review
  • SQL Server
  • statistics
  • Tech
  • Tomcat
  • Uncategorized
  • VMWARE
  • VPS
  • Windows
  • wordpress
  • Zend Framework

Blogroll

  • DragonAl Flickr
  • Dropbox – Free 2GB Account
  • James' Blog
  • Javascript Compressor
  • PHP Builder Community
  • PHP-Princess.net
  • Rubular – Regular Expression Validator
  • The Scale-Out Blog
  • Tiny MCE

Tags

activation AJAX android antec Apache AWS awstats canon coda codeigniter debian enclosure external free G1 install vmware tools Internet Explorer iphone 5 jquery Linux mx-1 MySQL office 2007 OSX photoshop PHP plugin plugins portfolio redesigned website review rewrite script security SELinux ssh tinymce tutorial upgrade VMWARE vmware server wordpress wordpress mu XSS zend framework
© Copyright 2013 Albertech.net
Infinity Theme by DesignCoral / WordPress
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.