<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Albertech.net &#187; Apache</title>
	<atom:link href="http://albertech.net/category/apache/feed/" rel="self" type="application/rss+xml" />
	<link>http://albertech.net</link>
	<description>Tips, Tricks, and Reviews in Linux, Apache, MySQL, PHP</description>
	<lastBuildDate>Wed, 28 Jul 2010 16:09:13 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Debian: Installing Mailman with Exim4</title>
		<link>http://albertech.net/2009/09/debian-installing-mailman-with-exim4/</link>
		<comments>http://albertech.net/2009/09/debian-installing-mailman-with-exim4/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 21:03:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://albertech.net/?p=283</guid>
		<description><![CDATA[I recently installed Mailman with Exim4, which was a challenge considering all the manual configuration you have to do. I found a few guides on the install, but they didn't seem to be "complete" enough to get the system working. Turns out, the biggest challenge was Exim4. The configuration files are confusing, especially since there are two sets of configuration files. ]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Falbertech.net%252F2009%252F09%252Fdebian-installing-mailman-with-exim4%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Debian%3A%20Installing%20Mailman%20with%20Exim4%22%20%7D);"></div>
<p>I recently installed Mailman with Exim4, which was a challenge considering all the manual configuration you have to do. I found a few guides on the install, but they didn&#8217;t seem to be &#8220;complete&#8221; enough to get the system working. Turns out, the biggest challenge was Exim4. The configuration files are confusing, especially since there are two sets of configuration files.</p>
<p><strong>I used the following guide from</strong> <a href="http://www.debian-administration.org/article/Mailman_and_Exim4" target="_blank"><strong>http://www.debian-administration.org/article/Mailman_and_Exim4</strong></a> as a baseline. I&#8217;ve added my own notes to make the install go through smoother. This is probably the best guide I&#8217;ve found so far on this topic. Debian allows for easy installation of the software packages, however the configuration is all manual work. <img src='http://albertech.net/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<blockquote><p><strong>Installing and Configuring Mailman</strong></p>
<p>To install mailman, simply run the following command:</p>
<p><code>apt-get install mailman</code></p>
<p>During the install, you will be prompted to choose which languages you want mailman to support.</p>
<p>After the install is complete, follow the instructions given during the install and setup the Mailman-specific mailing list.</p>
<p><code>newlist mailman</code></p>
<p>There are just a few changes that must be made to the basic configuration. Open /etc/mailman/mm_cfg.py and edit the following items:</p>
<p><code># Default domain for email addresses of newly created mailing lists<br />
DEFAULT_EMAIL_HOST = 'list.example.org'</code></p>
<p># Default host for the web interface of newly created mailing lists<br />
DEFAULT_URL_HOST   = &#8217;list.example.org&#8217;</p>
<p># Uncomment this. In this setup, the alias file won&#8217;t need to be changed.<br />
MTA=None   # Misnomer, suppresses alias output on newlist</p>
<p>Restart mailman so that the configuration changes take effect:</p>
<p><code>/etc/init.d/mailman restart</code></p>
<p>Now would be a good time to set up any other mailing lists you will need using the same &#8220;newlist&#8221; command. If your list will be using anything other than the DEFAULT_URL_HOST we set up earlier as its web interface hostname, make sure to pass that to newlist with the -u flag.</p></blockquote>
<p><strong>Exim Configuration</strong></p>
<p>Previously, you had to update the /etc/alias for each list you add on Mailman. This is no longer necessary with Exim. I strongly suggest using the split config since its much easier to locate the right section in the configuration file to modify.  By default, its setup as a single file so you will need to update the setting by running &#8220;<strong>dpkg-reconfigure exim4-config</strong>&#8220;. On the next to last screen, set the configuration on multiple files vs. a single file.</p>
<blockquote><p><strong>Create the files listed below.</strong></p>
<p>/etc/exim4/conf.d/main/04_mailman_options:<br />
<code># Mailman macro definitions</code></p>
<p># Home dir for the Mailman installation<br />
MM_HOME=/var/lib/mailman</p>
<p># User and group for Mailman<br />
MM_UID=list<br />
MM_GID=list</p>
<p>#<br />
# Domains that your lists are in - colon separated list<br />
# you may wish to add these into local_domains as well<br />
domainlist mm_domains=list.example.org</p>
<p># The path of the Mailman mail wrapper script<br />
MM_WRAP=MM_HOME/mail/mailman<br />
#<br />
# The path of the list config file (used as a required file when<br />
# verifying list addresses)<br />
MM_LISTCHK=MM_HOME/lists/${lc::$local_part}/config.pck</p>
<p>/etc/exim4/conf.d/router/450_mailman_aliases:<br />
<code>mailman_router:<br />
  driver = accept<br />
  domains = +mm_domains<br />
  require_files = MM_LISTCHK<br />
  local_part_suffix_optional<br />
  local_part_suffix = -admin : \<br />
    -bounces   : -bounces+*  : \<br />
    -confirm   : -confirm+*  : \<br />
    -join      : -leave      : \<br />
    -owner     : -request    : \<br />
    -subscribe : -unsubscribe<br />
  transport = mailman_transport<br />
</code></p>
<p>/etc/exim4/conf.d/transport/40_mailman_pipe:<br />
<code>mailman_transport:<br />
  driver = pipe<br />
  command = MM_WRAP \<br />
    '${if def:local_part_suffix \<br />
    {${sg{$local_part_suffix}{-(\\w+)(\\+.*)?}{\$1}}} \<br />
    {post}}' \<br />
    $local_part<br />
  current_directory = MM_HOME<br />
  home_directory = MM_HOME<br />
  user = MM_UID<br />
  group = MM_GID<br />
</code></p>
<p>After you finish creating the various configuration files, run the following commands to build the updated configuration file and restart exim:<br />
<code>update-exim4.conf<br />
/etc/init.d/exim4 restart</code></p>
<p><strong>Apache Configuration</strong></p>
<p>mailman uses CGI to create a web interface for its mailing lists. We need to configure Apache in order to get this piece working. First create a file to store some new aliases for the web server.</p>
<p>/etc/apache2/conf.d/mailman:<br />
<code>Alias /pipermail /var/lib/mailman/archives/public<br />
Alias /images/mailman /usr/share/images/mailman<br />
&lt;directory /var/lib/mailman/archives/public&gt;<br />
    DirectoryIndex index.html<br />
&lt;/directory&gt;</code></p>
<p>Then create (or edit) a VirtualHost entry to allow the scripts to run.</p>
<p>/etc/apache2/sites-available/list.example.org:<br />
<code>&lt;virtualhost *:80&gt;<br />
        ServerName list.example.org<br />
        ServerAdmin webmaster@list.example.org<br />
        DocumentRoot /var/www/<br />
        &lt;directory /var/www/&gt;<br />
                Options Indexes FollowSymLinks MultiViews<br />
                AllowOverride None<br />
                Order allow,deny<br />
                allow from all<br />
                # This directive allows us to have apache2's default start page<br />
                # in /apache2-default/, but still have / go to the right place<br />
                RedirectMatch ^/$ /cgi-bin/mailman/listinfo<br />
        &lt;/directory&gt;</code><br />
If this is a new file, remember to symlink it to the sites-enabled directory.</p>
<p>        ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/<br />
        &lt;directory &#8221;/usr/lib/cgi-bin&#8221;&gt;<br />
                AllowOverride None<br />
                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch<br />
                Order allow,deny<br />
                Allow from all<br />
        &lt;/directory&gt;<br />
&lt;/virtualhost&gt;</p>
<p>Finally, restart Apache so that the changes take effect.<br />
<code>/etc/init.d/apache2 restart</code></p></blockquote>
<h2>Mailman troubleshooting:</h2>
<p><strong>If your lists are not showing up on the website</strong>, you will need to change<br />
<strong>/var/lib/mailman/Mailman/Defaults.py</strong></p>
<p>In the section &#8220;<strong>VIRTUAL_HOST_OVERVIEW</strong>&#8220;, set it to &#8220;<strong>No</strong>&#8220;.<br />
Restart Mailman <code>/etc/init.d/mailman restart</code></p>
<p><strong>Locking down the Create Lists &#8220;feature&#8221;</strong><br />
e.g. <em>If you have the proper authority, you can also create a new mailing list</em><br />
By default, Mailman leaves the Create Lists feature wide open so anyone can create new lists. For most places, this is a bad thing.  To lock it down, go to your Apache2 configuration to block access to the folder.  Go to /etc/apache2/site-enabled/000-default (if you only have 1 web host on the server) and find the section before</p>
<p><code>ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/</code></p>
<p>Add the following above the ScriptAlias line:</p>
<p><code>&lt;Location /cgi-bin/mailman/create&gt;<br />
Allow from [YOUR IP]<br />
Deny from all<br />
&lt;/Location&gt;</code></p>
<p>Restart Apache2 and the creation of lists should now be limited to your computer only.</p>
<h2>Exim4 troubleshooting:</h2>
<p><strong>ERROR: Low level smtp error: (111,&#8217;Connection refused&#8217;)</strong></p>
<p>This basically means a few things.</p>
<p>First, is your hostname a FQDN? Fully qualified domain name. Basically, it needs to be &#8220;mysite.com&#8221; as opposed to just &#8220;mysite&#8221;.  Go to /etc/hosts file. Is 127.0.0.1 setup as localhost and 127.0.1.1 setup as mydomain.com?</p>
<p>Second, your firewall could be interferring. If your iptables rules are too strict, then you won&#8217;t be able to connect to your own smtp port.  To test, run &#8220;telnet localhost 25&#8243;<br />
If you get a connection refused or it hangs, then this means a few things.</p>
<p>Third, Exim4 is not configured properly. Run &#8220;<strong>dpkg-reconfigure exim4-config</strong>&#8221;<br />
Make sure your sendmail is setup to be an &#8220;internet site&#8221; so that other computers can send email to it. Set relay domains to blank. And make sure the configration is setup using the multiple configuration files.</p>
<p><span id="main" style="VISIBILITY: visible"><span id="search" style="VISIBILITY: visible"><strong>ERROR: 550 <em>relay not permitted<br />
</em></strong>This has to do with Mailman unable to send email to the clients due to the relay setting in Exim not being configured properly. If you get this far, Exim is able to take the e-mail posts and route it to the Mailman. In Exim, make sure localhost has permission to post messages. Run &#8220;<strong>dpkg-reconfigure exim4-config</strong>&#8220;  In the relay network option, set them to: &#8220;127.0.1.1;127.0.0.1&#8243;. This will allow your computer to send email out. By default, Mailman sends everything through localhost, so it needs to match up with Exim. To verify whether you can connect, make sure you can &#8220;telnet localhost 25&#8243;.<br />
</span></span></p>

]]></content:encoded>
			<wfw:commentRss>http://albertech.net/2009/09/debian-installing-mailman-with-exim4/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>awstats: Import old log files</title>
		<link>http://albertech.net/2009/09/awstats-import-old-log-files/</link>
		<comments>http://albertech.net/2009/09/awstats-import-old-log-files/#comments</comments>
		<pubDate>Thu, 03 Sep 2009 16:26:34 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[awstats]]></category>
		<category><![CDATA[import]]></category>
		<category><![CDATA[old logs]]></category>

		<guid isPermaLink="false">http://albertech.net/?p=256</guid>
		<description><![CDATA[One thing I've encountered with Awstats is adding in old apache log files. This is useful if you are migrating data from servers. By default, Awstats will ignore ALL past dates in the log that occur before the most recent log entry date. In order to avoid Awstats from finding the "recent log entry date", you will need move all Awstats cached files into a separate folder. ]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Falbertech.net%252F2009%252F09%252Fawstats-import-old-log-files%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22awstats%3A%20Import%20old%20log%20files%22%20%7D);"></div>
<p>One thing I&#8217;ve encountered with Awstats is adding in old apache log files. This is useful if you are migrating data from servers. By default, Awstats will ignore ALL past dates in the log that occur before the most recent log entry date. In order to avoid Awstats from finding the &#8220;recent log entry date&#8221;, you will need move all Awstats cached files into a separate folder. I&#8217;ve verified that this works, so if you have any questions feel free to comment.</p>
<p align="left"><strong>1) Locate your awstats data directory.</strong> Most commonly found in /var/lib/awstats</p>
<p align="left">2) <strong>Create a new folder inside the directory</strong>. <strong>Move all awstats* files to the new folder.</strong></p>
<p align="left">3) <strong>Run  awstats update process on all log files in chronological order</strong>. AWStats will not complain about the &#8220;too old record&#8221; because there is no history files in DirData directory that contains compiled data more recent than record.</p>
<p align="left"><strong>Edit the awstats.conf inside /etc/awstats</strong><br />
Locate the Apache logfile you need. It must be the oldest one you want to import. Awstats will chronologically add in the files, so make sure you add them in the right order. Edit the line starting with <strong>LogFile=&#8221;[log file location]</strong>&#8221;</p>
<p>Run the Awstats command (through the web browser or through shell via the perl command line)</p>
<p>4) <strong>Move the history files back.</strong> Once you process through all the logfiles, move the Awstats history files inside the /var/lib/awstats/newfolder back into /var/lib/awstats</p>

]]></content:encoded>
			<wfw:commentRss>http://albertech.net/2009/09/awstats-import-old-log-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cleaner URLs in CodeIgniter</title>
		<link>http://albertech.net/2009/08/cleaner-urls-in-codeigniter/</link>
		<comments>http://albertech.net/2009/08/cleaner-urls-in-codeigniter/#comments</comments>
		<pubDate>Tue, 11 Aug 2009 22:13:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Frameworks]]></category>
		<category><![CDATA[codeigniter]]></category>
		<category><![CDATA[rewrite]]></category>

		<guid isPermaLink="false">http://albertech.net/?p=231</guid>
		<description><![CDATA[Here's a quick tip to make cleaner looking URLs in CodeIgniter PHP framework. By default, if you have an application, the path of the application usually ends after index.php.  Using Apache RewriteEngine, you can make cleaner looking URLs. You can have something similar to mysite.com/cigniter/MyApplication instead of mysite.com/cigniter/index.php/MyApplication]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Falbertech.net%252F2009%252F08%252Fcleaner-urls-in-codeigniter%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Cleaner%20URLs%20in%20CodeIgniter%22%20%7D);"></div>
<p>Here&#8217;s a quick tip to make cleaner looking URLs in CodeIgniter PHP framework. By default, if you have an application, the path of the application usually ends after index.php.  Using Apache RewriteEngine, you can make cleaner looking URLs. You can have something similar to <strong>mysite.com/cigniter/MyApplication</strong> instead of <strong>mysite.com/cigniter/index.php/MyApplication</strong></p>
<ul>
<li><strong>Create a file named &#8220;.htaccess&#8221; inside your root CodeIgniter directory.</strong> It should look like the following:RewriteEngine on<br />
RewriteCond $1 !^(index\.php|images|css|robots\.txt)<br />
RewriteRule ^(.*)$ /[CODE IGNITER BASE]/index.php/$1 [L]where [CODE IGNITER BASE]  is the directory of your Code Igniter install. For instance, if you named your base install folder &#8220;cigniter&#8221;, the Rewrite Rule would look like:RewriteRule ^(.*)$ /cigniter/index.php/$1 [L]</li>
<li><strong>If you get an error</strong>, make sure your Apache install allows for RewriteEngine in that particular folder. This is usually located in /etc/apache2/sites-enabled/YOURSITE The folder needs to have permissions of:AllowOverride AuthConfig<br />
Options +FollowSymlinksSee: <a href="http://www.whoopis.com/howtos/apache-rewrite.html" target="_blank">http://www.whoopis.com/howtos/apache-rewrite.html</a></li>
<li>More info on CodeIgniter framework:<br />
<a href="http://" target="_blank">http://codeigniter.com/</a></li>
</ul>

]]></content:encoded>
			<wfw:commentRss>http://albertech.net/2009/08/cleaner-urls-in-codeigniter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WordPress MU: Limit access to certain blogs</title>
		<link>http://albertech.net/2009/07/wordpress-mu-limit-access-to-certain-blogs/</link>
		<comments>http://albertech.net/2009/07/wordpress-mu-limit-access-to-certain-blogs/#comments</comments>
		<pubDate>Thu, 02 Jul 2009 20:22:45 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[limit access]]></category>
		<category><![CDATA[wordpress mu]]></category>

		<guid isPermaLink="false">http://albertech.net/?p=203</guid>
		<description><![CDATA[I recently implemented a way to limit access by IP range on specific blogs on WordPress MU. As you know, WordPress MU uses Apache Rewrite engine to rewrite URLs. For instance, you have a blog on WordPress MU called "intranet". Apache Rewrite takes the "intranet" string in the URL and automatically rewrites it as a value in the PHP script. A side effect to Apache Rewrite is that "Directory" .htaccess parameters don't work. So, if you wanted only your company IPs to access an internal blog, you will need to use Apache Rewrite parameters instead. For the solution, read on...]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Falbertech.net%252F2009%252F07%252Fwordpress-mu-limit-access-to-certain-blogs%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22WordPress%20MU%3A%20Limit%20access%20to%20certain%20blogs%22%20%7D);"></div>
<p>I recently implemented a way to limit access by IP range on specific blogs on WordPress MU. As you know, WordPress MU uses Apache Rewrite engine to rewrite URLs. For instance, you have a blog on WordPress MU called &#8220;intranet&#8221;. Apache Rewrite takes the &#8220;intranet&#8221; string in the URL and automatically rewrites it as a value in the PHP script. A side effect to Apache Rewrite is that &#8220;Directory&#8221; .htaccess parameters don&#8217;t work. So, if you wanted only your company IPs to access an internal blog, you will need to use Apache Rewrite parameters instead.</p>
<p>Here&#8217;s how to limit access to an IP or subnet on a particular blog on your WordPress MU install:</p>
<p><strong>DISCLAIMER:</strong> <span style="text-decoration: underline;">Modifying .htaccess files can break your WordPress MU install. </span>ALWAYS backup your .htaccess file. Simply, copy .htaccess file and rename it to .htaccess-backup. (cp .htaccess .htaccess-backup)</p>
<p><strong>Step 1:</strong></p>
<p>Add a section <strong>after</strong> the &#8220;RewriteEngine On<br />
RewriteBase /&#8230;&#8221; section</p>
<p><code>RewriteCond %{REMOTE_ADDR} !^1\.2\.3\.4<br />
RewriteCond %{REMOTE_ADDR} !^1\.2\.3\.5<br />
RewriteCond %{REQUEST_URI} ^/BLOG1<br />
RewriteRule .* - [F]<br />
</code></p>
<p><strong>DO NOT SAVE YET. You will need edit the IP address info and blog info first:</strong></p>
<p><strong>Step 2: Replace the 1.2.3.4 number with your company IPs</strong> (its easier if you have an entire subnet or you can use internal IPs)</p>
<p><strong>For a class C,</strong> the part after {REMOTE ADDR} would be<br />
!^1\.2\3\.</p>
<p><strong>Step 3: Replace BLOG1</strong> with the blog you want to limit access to those IPs. So http://www.mysite.com/myfirstblog would be &#8220;^/blogs/</p>
<p><strong>Optional: </strong>If you have WordPressMU installed in a folder (e.g. not your root folder), you will need to append the directory in front of the blog name.</p>
<p>For instance, if you have http://mysite.com/blog (as your WordPress MU root folder) the ^/ BLOG1 would be</p>
<p>^/blog/BLOG1</p>
<p><strong>Optional: Multiple blogs with same access restrictions</strong><br />
By default, the Apache Rewrite treats every line as an AND statement. If you have multiple blogs, you will need to have an [OR] at the end of the line.</p>
<p><code>RewriteCond %{REQUEST_URI} ^/BLOG1 [OR]<br />
RewriteCond %{REQUEST_URI} ^/BLOG2</code></p>
<p><strong>Errors?</strong></p>
<ul>
<li>Make sure you have ^/ marks in front of the blog names</li>
<li>IP addresses must have a backslash before each dot. Regular expression for dot is concatenate by default, so it needs to be escaped</li>
<li>Make sure you don&#8217;t forget the !^ sign before the IP, otherwise you will be forbidden.</li>
</ul>
<p>If all else fails, if can&#8217;t fix the  error, just copy back the .htaccess-backup to the .htaccess file.</p>

]]></content:encoded>
			<wfw:commentRss>http://albertech.net/2009/07/wordpress-mu-limit-access-to-certain-blogs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sitemap generator for search engines</title>
		<link>http://albertech.net/2009/05/sitemap-generator-for-search-engines/</link>
		<comments>http://albertech.net/2009/05/sitemap-generator-for-search-engines/#comments</comments>
		<pubDate>Tue, 05 May 2009 20:55:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[sitemap generator]]></category>

		<guid isPermaLink="false">http://albertech.net/2009/05/sitemap-generator-for-search-engines/</guid>
		<description><![CDATA[<strong>Sitemaps are useful if you want search engines to look in specific directories of your website</strong>. The standard robots.txt notation only has the exclusion list; where not to look and the search frequency.

<strong>For instance, a really basic robots.txt file looks like this:
</strong><code>
User-agent: *
Crawl-delay: 3
Disallow:/cgi-bin/
</code>

For me, I set the Crawl-delay to 3 as a general rule to prevent crawlers from consuming all the web server bandwidth. Generally, Yahoo crawlers are the most aggressive on your site, Google averages about ~13 seconds per request. Anyway, a sitemap gives the crawler a better idea of where to search, rather than trying to discover ... ]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Falbertech.net%252F2009%252F05%252Fsitemap-generator-for-search-engines%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Sitemap%20generator%20for%20search%20engines%22%20%7D);"></div>
<p><strong>Sitemaps are useful if you want search engines to look in specific directories of your website</strong>. The standard robots.txt notation only has the exclusion list; where not to look and the search frequency.</p>
<p><strong>For instance, a really basic robots.txt file looks like this:<br />
</strong><code><br />
User-agent: *<br />
Crawl-delay: 3<br />
Disallow:/cgi-bin/<br />
</code></p>
<p>For me, I set the Crawl-delay to 3 as a general rule to prevent crawlers from consuming all the web server bandwidth. Generally, Yahoo crawlers are the most aggressive on your site, Google averages about ~13 seconds per request. Anyway, a sitemap gives the crawler a better idea of where to search, rather than trying to discover on its own by looking at the root file.</p>
<p><strong>Here&#8217;s a great resource for generating sitemaps:</strong><br />
<a href="http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators">http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators</a></p>
<blockquote>
<h4><em>About Google Sitemap Generator</em></h4>
<p><em>Our new open-source </em><a title="Google Sitemap Generator" href="http://code.google.com/p/googlesitemapgenerator/"><em>Google Sitemap Generator</em></a><em> finds new and modified URLs based on your webserver&#8217;s traffic, its log files, or the files found on the server. By combining these methods, Google Sitemap Generator can be very fast in finding these URLs and calculating relevant metadata, thereby making your Sitemap files as effective as possible. Once Google Sitemap Generator has collected the URLs, it can create the following Sitemap files for you:</em></p></blockquote>

]]></content:encoded>
			<wfw:commentRss>http://albertech.net/2009/05/sitemap-generator-for-search-engines/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Analyzing large web log files</title>
		<link>http://albertech.net/2009/02/parse-large-logfile/</link>
		<comments>http://albertech.net/2009/02/parse-large-logfile/#comments</comments>
		<pubDate>Fri, 20 Feb 2009 05:11:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[log files]]></category>
		<category><![CDATA[regular expressions]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://albertech.net/?p=91</guid>
		<description><![CDATA[<strong>The fastest way to trim down large web log files is through UNIX/Linux shell</strong>. Large files exceeding 1 GB (millions of lines of logs) are not easily editable using a GUI interface, so the fastest way is to parse them is via command line. You can trim them down according to a time range, remove internal requests from within the company, and remove bots/crawlers data from the log files.

For instance, I have a 4GB log file with about two years worth of info (2007-2009) in there. What if I just wanted the logs from 2008? First, run the "head -10" command ... ]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Falbertech.net%252F2009%252F02%252Fparse-large-logfile%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Analyzing%20large%20web%20log%20files%22%20%7D);"></div>
<p><strong>The fastest way to trim down large web log files is through UNIX/Linux shell</strong>. Large files exceeding 1 GB (millions of lines of logs) are not easily editable using a GUI interface, so the fastest way is to parse them is via command line. You can trim them down according to a time range, remove internal requests from within the company, and remove bots/crawlers data from the log files.</p>
<p>For instance, I have a 4GB log file with about two years worth of info (2007-2009) in there. What if I just wanted the logs from 2008? First, run the &#8220;head -10&#8243; command on the log file to see what the general format of the log is.</p>
<p>Preferably, it is in the YYYY-MM-DD hour:min:second format. If you are building a custom log file from scratch, make sure you have the dash delimiter between the dates to make it easier to work with after. For me, the log file that I was given was in the YYYYMMDDHHMMSS (e.g. timestamp format) Luckily, with regular expressions, you can still parse the file accordingly.<br />
<code><br />
20070101000013 4ms 0ms 8ms<br />
20070101000019 4ms 0ms 8ms<br />
</code><br />
Make sure you have enough disk space before proceeding, at least enough to handle twice the size of the logfile. Run a &#8220;df&#8221; to check available space and &#8220;ls -lah&#8221; to see the size of the files in your log directory.<br />
So, if you only want the logs from 2008 (using the example above) you can run the following command.</p>
<p><code>cat [INPUTLOGFILE] | grep '^2008' &gt; [OUTPUTLOGFILE]</code></p>
<p>This will look through each line, run a regular expression match with 2008 at the beginning of the line. If the logfile does not start with the date, you can still run a match if the date appears anywhere in the line (if in the YYYY-DD-MM format)</p>
<p><code>cat [INPUTLOGFILE] | grep '2008-' &gt; [OUTPUTLOGFILE]</code></p>
<p>What if you want to omit googlebots and crawlers directly from the log file? Most log analysis programs have a filter that looks in the user agent string to check if the source is from a crawler. But, not all crawlers use the user agent string. Some set the user agent string to a common browser. If you have dns lookups on, you can try filtering logs that match the domain googlebot.com . The &#8220;grep -v &#8221; command is handy since it excludes all lines that match that particular value.</p>
<p><code>cat [INPUTLOGFILE] | grep -v 'googlebot.com' &gt; [OUTPUTLOGFILE]</code></p>
<p>Another use is to filter out all local requests coming from your department/building/company. This would be good to see where your external users are coming from. If you have DNS turned on:<br />
<code><br />
cat [INPUTLOGFILE] | grep -v 'mycompany.com' &gt; [OUTPUTLOGFILE]</code></p>
<p>If you don&#8217;t have DNS turned on, you can filter by IP range<br />
<code><br />
cat [INPUTLOGFILE] | grep -v '123.456.789.' &gt; [OUTPUTLOGFILE]</code></p>
<p>Check the number of lines in the logfile (before and after comparison) with:<br />
<code>wc -l [LOGFILE]</code></p>
<p><strong>For more information, checkout these resources:</strong><br />
<a href="http://www.robelle.com/smugbook/regexpr.html">http://www.robelle.com/smugbook/regexpr.html</a></p>
<p><strong>What if you don&#8217;t have access to UNIX/Linux?</strong> There&#8217;s Cygwin for Windows, although I don&#8217;t recommend running it with huge log files. I&#8217;ll eventually try it at some point to see if it&#8217;ll work <img src='http://albertech.net/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /><br />
<a href="http://www.cygwin.com/">http://www.cygwin.com/</a></p>

]]></content:encoded>
			<wfw:commentRss>http://albertech.net/2009/02/parse-large-logfile/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>awstats &#8211; remove image files</title>
		<link>http://albertech.net/2009/02/awstats-remove-image-files/</link>
		<comments>http://albertech.net/2009/02/awstats-remove-image-files/#comments</comments>
		<pubDate>Tue, 03 Feb 2009 22:41:33 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[awstats]]></category>
		<category><![CDATA[awstats.conf]]></category>
		<category><![CDATA[remove images]]></category>

		<guid isPermaLink="false">http://albertech.net/?p=76</guid>
		<description><![CDATA[If you want to remove image files from your awstats reports, modify the SkipFiles variable in the /etc/awstats/awstats.conf (or awstats.YOURHOST.conf)  

Do a search for "SkipFiles" in the file using nano/vi/emacs and find the section that talks about "Use SkipFiles to ignore access to URLs that match one of the following entries..." The SkipFiles line should look similar to the following:

<code>SkipFiles="REGEX[.jpg$] REGEX[.gif$] REGEX[.png$]"
</code>
]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Falbertech.net%252F2009%252F02%252Fawstats-remove-image-files%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22awstats%20-%20remove%20image%20files%22%20%7D);"></div>
<p>If you want to remove image files from your awstats reports, modify the SkipFiles variable in the /etc/awstats/awstats.conf (or awstats.YOURHOST.conf)  </p>
<p>Do a search for &#8220;SkipFiles&#8221; in the file using nano/vi/emacs and find the section that talks about &#8220;Use SkipFiles to ignore access to URLs that match one of the following entries&#8230;&#8221; The SkipFiles line should look similar to the following:</p>
<p><code>SkipFiles="REGEX[.jpg$] REGEX[.gif$] REGEX[.png$]"<br />
</code></p>

]]></content:encoded>
			<wfw:commentRss>http://albertech.net/2009/02/awstats-remove-image-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Awstats setup in Debian</title>
		<link>http://albertech.net/2009/01/awstats-debiansetup/</link>
		<comments>http://albertech.net/2009/01/awstats-debiansetup/#comments</comments>
		<pubDate>Mon, 26 Jan 2009 03:05:19 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[awstats]]></category>
		<category><![CDATA[debian]]></category>
		<category><![CDATA[geoIP]]></category>

		<guid isPermaLink="false">http://albertech.net/?p=65</guid>
		<description><![CDATA[<strong>Installing awstats in Debian with GeoIP caching
</strong>

<strong>First, use apt-get to get the software:</strong>

apt-get install awstats

<strong>Configure awstats:</strong>
Instead of using the awstats configure tool (written in Perl), manually add in the awstats Apache configuration. Using the <a href="http://www.debuntu.org/2006/04/21/33-how-to-setting-up-awstats-with-apache-2-on-debianubuntu">tutorial from debuntu</a>, make a file inside your apache config folder called awstats.conf with the following lines.

<code>Alias /awstatsclasses "/usr/share/awstats/lib/"
Alias /awstats-icon/ "/usr/share/awstats/icon/"
Alias /awstatscss "/usr/share/doc/awstats/examples/css"
Options ExecCGI -MultiViews +SymLinksIfOwnerMatch</code>

<strong>Inside the Apache2.conf file (or virtual host conf file inside the sites-available folder) add this to the end of the file:</strong>

<code>Include /etc/apache2/awstats.conf</code>

<strong>Next, copy the /usr/lib/cgi-bin/awstats.pl file to the apache cgi-bin folder.</strong> If you have virtual hosts enabled, copy it to ... ]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Falbertech.net%252F2009%252F01%252Fawstats-debiansetup%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Awstats%20setup%20in%20Debian%22%20%7D);"></div>
<p><strong>Installing awstats in Debian with GeoIP caching<br />
</strong></p>
<p><strong>First, use apt-get to get the software:</strong></p>
<p>apt-get install awstats</p>
<p><strong>Configure awstats:</strong><br />
Instead of using the awstats configure tool (written in Perl), manually add in the awstats Apache configuration. Using the <a href="http://www.debuntu.org/2006/04/21/33-how-to-setting-up-awstats-with-apache-2-on-debianubuntu">tutorial from debuntu</a>, make a file inside your apache config folder called awstats.conf with the following lines.</p>
<p><code>Alias /awstatsclasses "/usr/share/awstats/lib/"<br />
Alias /awstats-icon/ "/usr/share/awstats/icon/"<br />
Alias /awstatscss "/usr/share/doc/awstats/examples/css"<br />
Options ExecCGI -MultiViews +SymLinksIfOwnerMatch</code></p>
<p><strong>Inside the Apache2.conf file (or virtual host conf file inside the sites-available folder) add this to the end of the file:</strong></p>
<p><code>Include /etc/apache2/awstats.conf</code></p>
<p><strong>Next, copy the /usr/lib/cgi-bin/awstats.pl file to the apache cgi-bin folder.</strong> If you have virtual hosts enabled, copy it to the cgi-bin located in the designated virtual hosts folder.</p>
<p><strong>Configure the awstats.conf sample file located in /etc/awstats/awstats.conf</strong><br />
1) Set the LogFile to the server&#8217;s logfile directory (usually /var/log/apache2/access.log)<br />
2) Set LogFormat to 1.<br />
3) Set DNSLookup=0 (No DNS Lookup is necessary when you use GeoIP DB, instructions are below)<br />
4) Uncomment this line (remove the # from the beginning)<br />
LoadPlugin=&#8221;geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat&#8221;</p>
<p><strong>Install GeoIP to speed up the hostname lookups. </strong>This will significanly improve the performance since DNS lookups will generally take a long time. Using <a href="http://www.ducea.com/2006/06/14/install-geoip-perl-module-on-debian/">this tutorial</a>, install the GeoIP library:</p>
<p>wget http://geolite.maxmind.com/download/geoip/api/c/GeoIP-1.4.5.tar.gz</p>
<p>Download and extract the file.  Inside the GeoIP 1.4.5 folder, run:</p>
<p><code>./configure<br />
make<br />
make check<br />
make install<br />
</code></p>
<p>Next, download the latest GeoIP perl module -and- latest GeoLite country database.<br />
<a href="http://www.maxmind.com/download/geoip/api/perl/">http://www.maxmind.com/download/geoip/api/perl/</a> I usually run wget to download the file from shell. wget http://geolite.maxmind.com/download/geoip/api/perl/Geo-IP-1.36.tar.gz</p>
<p>Extract the tar file and inside the Geo-IP folder, run:</p>
<p><code>perl Makefile.PL<br />
make<br />
make test<br />
make install</code></p>
<p><strong>GeoIP should now be installed. You can now generate the reports:</strong><br />
/usr/lib/cgi-bin/awstats.pl -config=[NAME OF YOUR INSTALL] -update<br />
For instance, if you called the install &#8220;sample&#8221;, you should have a file called /etc/awstats/awstats.sample.conf    The command for sample would be:<br />
/usr/lib/cgi-bin/awstats.pl -config=sample -update<br />
<strong><br />
To run awstats:<br />
http://[SERVER]/cgi-bin/awstats.pl?config=[NAME OF YOUR INSTALL]</strong></p>

]]></content:encoded>
			<wfw:commentRss>http://albertech.net/2009/01/awstats-debiansetup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
