New Hosting Provider

I recently switched providers for shared hosting used to host this site and a couple of others, and the improvement, at least in transfer rate, has been pretty noticeable.  I just happened to come across my site’s crawl statistics in Google Webmaster Tools today and the improvement speaks for itself:

Time spent downloading a page (in milliseconds)

Time spent downloading a page (in milliseconds)

I made the switch in the first week of April. I knew the site was running faster, but even this graph surprised me.

How to Maintain Simple, Static Pages in CakePHP

Cake’s default way of handling simple static content is to use the built-in PagesController to serve up .tpl files from /app/views/pages. This is a simple and straightforward approach and works for very small websites, but comes with some obvious drawbacks:

  • Making changes to the content of the pages requires editing template files;
  • There’s no easy way (generally) to edit these pages the way you’d edit other content on your site, using controllers with admin actions, for example;
  • There’s no way to specify a hierarchy of pages, which can be quite useful for large websites;
  • The URL structure of the pages, although it follows Cake’s URL conventions, isn’t intuitive and looks pretty clunky and unprofessional. It would be much nicer to have /about rather than /pages/about, and so forth.

Logically, it makes more sense to place static content in the database so that it can be manipulated just like any other model. Additionally, we want our URLs to be pretty.

This is surprisingly easy to accomplish, thanks in part to this article, which shows how to load a model from within our routes configuration. I use the same technique, but my code below is updated to run on CakePHP 1.2, and I’ll show you how to create a hierarchy of nested pages as well.

Database

Let’s start by creating the database schema for our model. I call mine “StaticPage”, but you can name yours whatever you’d like. I don’t use “Page” to avoid controller conflicts down the road should I ever decide to make use of Cake’s PagesController for anything.

CREATE TABLE IF NOT EXISTS `static_pages` (
  `id` INTEGER(12) NOT NULL AUTO_INCREMENT,
  `parent_id` INTEGER(12) NULL,
  `title` VARCHAR(128) NOT NULL,
  `slug` VARCHAR(128) NOT NULL,
  `content` LONGTEXT NOT NULL,
  PRIMARY_KEY(`id`)
);

Model

Next, let’s create the actual model in CakePHP. We need to define the belongsTo and hasMany relationships for the tree hierarchy to work properly:

<?php
class StaticPage extends AppModel {

  var $belongsTo = array(
    'ParentPage' => array(
      'className' => 'StaticPage',
      'foreignKey' => 'parent_id'
  ));

  var $hasMany = array(
    'ChildPage' => array(
      'className' => 'StaticPage',
      'foreignKey' => 'parent_id',
      'dependent' => false
  ));
}
?>

Controller

Moving on, we create our StaticPagesController. You’ll likely want to create your own admin CRUD actions (the entire point of this, afterall, is to be able to manage these pages dynamically!), but for simplicity I’m just going to define the one action we need to display our pages. Conveniently, we’re just going to use the “index” action:

<?php
function index( $slug = null ) {
  if (!$slug) {
    $this->Session->setFlash(__('Invalid StaticPage.', true));
    $this->redirect(array('action'=>'index'));
  }
  $staticPage = $this->StaticPage->find('first', array(
    'conditions' => array(
      'StaticPage.slug' => $slug
  )));
  $this->set(compact('staticPage'));
  $this->pageTitle = $staticPage['StaticPage']['title'];
}
?>

Notice we’re going to be using the slug as the unique identifier when looking up the page. But what happens if we have two pages with the same slug? As you’ll see, that won’t be a problem as long as they’re nested under separate parent pages.

Routing

Now for the key part: we need to set up a custom route to handle our pages. In our routes.php file, we’re going to pull a list of static page slugs from the database and use them as the regular expression to match against with our route:

<?php
// routes.php

App::import('Model', 'StaticPage');
$page = new StaticPage();
$slugs = $page->find('list', array(
  'fields' => array('StaticPage.slug'),
  'order' => 'StaticPage.slug DESC'
));

Router::connect('/:slug/',
  array('controller' => 'static_pages', 'action' => 'index'),
  array(
    'pass' => array('slug'),
    'slug' => implode($slugs, '|')
));
?>

Now that everything is set up, we can start creating some static pages in the database. The key thing to remember is that the slug should be the full path to the page. So, for example, if we create a page called “about”, the slug should simply be “about”. The page will then be accessible at yourdomain.com/about. If we want to create a subpage of that page called “projects”, the slug for that page should be “about/projects”. Some people may not like storing the full path as the slug, but I find that it has two main advantages: it prevents ambiguity among pages with the same name/slug, and it makes managing your pages easier since you can immediately know the location of any given page.

This is also the reason that we load the slugs from the database in decending order for our regular expression: the route can first try to match the full URL of a subpage before the parent page is considered for matching.

If you want to take this one step further, you could write some sort of method, getFullSlug(), for the StaticPage model that generates the full slug (rather that storing it) by recursively appending the simple slug from parent pages. The obvious downside to this is that more SQL queries will be required, something we want to avoid, especially when dealing with what should be static content.

Happy baking!

Functional Programming, Here I Come!

My copy of Real World Haskell finally arrived this week, and now that I’m on co-op and have some free time in the evenings, I’ll be diving into it this week.

I’ve already been teaching myself a bit from online tutorials and documentation, but nothing beats a good book in order to really start understanding things.  And since it’s an O’Reilly book, the thing weighs in at just under three metric tons.

Passing A Variable Number of Arguments to a Function at Run-time

At the request of an Ozzu member last night, I wrote up two new quick tutorials outlining how to write functions that can accept a variable-length argument list at run-time.  I wrote the original tutorial for C programmers and later adapted it for PHP.  I’ll likely adapt it for a few other languages in the near future.

TUTORIAL: Pass Variable Number of Arguments to a C Function
TUTORIAL: Pass Variable Num. of Arguments to a PHP Function

Feedback is always welcome either here or in the tutorial topics themselves.

A Calendar Element for CakePHP

Here’s a quick calendar element I whipped up for a CakePHP application I’m writing.  I needed to use the calendar in a number of different places, so creating a view element for it in Cake made the most sense.

<<< March 2009 >>>

Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31

Download the source (zipped .ctp file) here: CakePHP Calendar Element

The code is pretty simple.  Pass the element a year and a month when rendering it and it will render that month in a simple table.  It also accepts two additional arguments, a day link and a month link.  The day link allows you to specify what the base URL for each day should be, and similarly, the month link specifies the base URL for traversing months.

Drop the calendar.ctp into your elements directory in your Cake app (usually /app/views/elements) and render it an any view you’d like.  Below is an example of rendering the element with all of its parameters:

<?php
$this->renderElement('calendar', array(
    'year' => 2009,
    'month' => '11',
    'month_link' => '/controller/showmonth/',
    'day_link' => '/controller/showday/'
));
?>

The month link and day link provide the base URLs for linking to additional months and days, allowing you to modify the element for particular controller actions.

Feel free to use this component for anything you’d like, no strings attached.  If you make any improvements or enhancements to it, be sure to share!

Thursday Night Tech Talks

Last quarter (20081), my friend Tom and I started up a program within SSE called Thursday Night Tech Talks. The series gives undergraduate students in computer-related disciplines the chance to share with others some of the new and interesting things they’ve encountered while on co-op and in other activities outside of their department’s traditional curriculum.  Tom and I gave the first presentations, respectively, and we’ve since then experienced a lot of interest from students looking to give presentations of their own.

Now that SSE has [finally] migrated to a new server, I am now able to make available the videos and slides from each of the past talks.  If you haven’t been around to attend any of the presentations, here are some of the topics that have been discussed so far:

- Microformats: Empowering Your Markup
- Adobe Flex
- Cocoa Fundamentals
- Cross-Platform OpenGL
- BlazeDS: Integrating Flex and Java
- Open Source Collaboration with Git and GitHub
- Amazing AJAX
- Don’t Forget to ____! – A Discussion on Configuration Management
- Plan 9 OS

There are at least two more confirmed talks scheduled for this quarter.  This Thursday (9/22) I’ll be giving a presentation on CakePHP, an MVC framework that borrows many ideas from Rails.  After that, Tom will be giving a talk on XForms.

Additionally, the Tech Talk series will be moving back to a bi-weekly schedule at the end of the winter quarter.  We’ve been seeing much bigger turnouts at recent presentations, and we believe this adjustment will help with the budgeting and planning that go into each talk.

I’d like to give my sincere thanks to Northrop Grumman for their very generous donation of $500 towards the program, as well as to everyone who has given a presentation and to everyone who has attended the weekly talks.  The program has grown very rapidly and couldn’t have done so without the support of everyone who has taken such a vested interest in it.

Creating a Tree Structure within a CakePHP Model

My assumption going into this was that creating a tree-like relationship within one of my models would be fairly straightforward with Cake’s model relationships.  As it turns out, I was absolutely correct.

Let’s say we want to create a model called “Category”, and each category can belong to a parent category.  In other words, we want our application to recognize that every category may potentially belong to a higher-level category, and likewise, that every category may also have child categories beneath it.

Consider a Category model similar to the following:

CREATE TABLE `categories` (
  `id` INTEGER(12) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(64) NOT NULL,
  `parent_id` INTEGER(12) NULL,

  PRIMARY KEY(`id`)
);

The first step is to define our model (obviously):

<?php
class Category extends AppModel {
  var $name = 'Category';
}
?>

Next, let’s tell Cake that a category can belong to other categories using the belongsTo association:

< ?php

class Category extends AppModel {
  var $name = 'Category';

  var $belongsTo = array(
    'ParentCategory' => array(
      'className' => 'Category',
      'foreignKey' => 'parent_id'
  ));
}
?>

Finally, we want to tell Cake that a category can have other categories beneath it.  We’ll do this using the hasMany association:

< ?php

class Category extends AppModel {
  var $name = 'Category';

  var $belongsTo = array(
    'ParentCategory' => array(
      'className' => 'Category',
      'foreignKey' => 'parent_id'
  ));

  var $hasMany = array(
    'ChildCategory' => array(
      'className' => 'Category',
      'foreignKey' => 'parent_id'
  ));
}
?>

That’s it.  Create a controller for categories and turn on scaffolding, and you’ll see how nicely this all works out.

10 Useful PHP Functions You May Not Have Known About

10. get_browser()

This function comes in handy when collecting information about a user’s browser to determine the content capabilities for that user.  It returns an associative array containing information such as the browser type, version, whether or not it is capable of handling frames, cookies, and javascript, and the platform on which the browser is running.

This kind of information could prove to be invaluable when tracking users, logging site usage, or building a statistics engine for your site.

9. parse_url()

Ever needed to split up a URL into its individual components to extract data from it?  Look no further than parse_url().  This function does all the dirty work of determining what is what in the URL and conveniently presents it to you in an associative array.  The resulting array will contain one or more elements of the URL. Take the following URL for example:

http://www.domain.com/controller.php?var=value&another=value2#anchor

Passing this URL string to parse_url() would generate an array like the following:

Array
(
    [scheme] => http
    [host] => www.domain.com
    [path] => /controller.php
    [query] => var=value&another=value2
    [fragment] => anchor
)

Don’t worry if the URL isn’t perfectly formed — parse_url() will do it’s best to extract whatever information it possibly can from the string passed to it.

8. get_meta_tags()

This function does exactly what it says: it parses a string of HTML and extracts the meta tags from it.  The results are returned in an associative array with the name attributes as the keys and the value attributes as the values.

This could be used to extract meta information such as keywords and descriptions from pages in any sort of PHP-based web crawler.

7. array_filter()

Filtering arrays in PHP is easy thanks to array_filter().  Let’s say you wanted to filter the following array of integers for any numbers less than 5:

$array = array(1, 11, 56, 3, 3, 633, 34, 2, 18, 9);

A classic way to approach this task would be to use a loop to examine each element and selectively generate a filtered array:

foreach( $array as $num ) {
    $filtered_array = array();
    if( $num < 5 ) {
        $filtered_array[] = $num;
    }
    return $filtered_array;
}

However, there’s an easier way to accomplish this. Let’s define a function that checks a number for our filter condition (which, by the way, makes it easier to change/reuse later on):

function filter( $item ) {
    return $item < 5;
}

And now, generating the filtered integer array is a simple one-line task:

$filtered_array = array_filter($array, 'filter');

The resulting array looks like this:

Array
(
    [0] = 1
    [3] = 3
    [4] = 3
    [7] = 2
)

6. base_convert()

This function converts a string representation of a number in one format (radix) to a string representation of that same number in another format, up to and including base 36.  The most common use for this is to convert between decimal and hexadecimal, but there are many other potential uses depending on the scope of your application.

$numBase10 = 4567;

$numBase16 = base_convert($numBase10, 10, 16);
echo $numBase16;

$numBase2 = base_convert($numBase16, 16, 2);
echo $numBase2;

Results:

11d7
1000111010111

5. uniqid()

The use of globally-unique identifiers, or GUIDs, is becoming increasingly more widespread.  PHP’s on top of things and provides this function to generate them for your applications.  The two optional arguments allow you to specify a prefix for the identifier and whether or not to add more entropy to the end.

echo uniqid();
echo uniqid('prefix');
echo uniqid('', true);
echo uniqid('prefix', true);

Results:

48c88453d9702
prefix48c88453d970e
48c88453d97100.75111063
prefix48c88453d97130.65378524

4. levenshtein()

The Levenshtein distance between two strings is the minimum number of characters in one string that must be altered, added, or removed in order to end up with the second string.  For example, take the words cat and car.  The Levenshtein distance between the two words is 1, since to go from cat to car, the only change required is to replace the “t” with an “r”.

The levenshtein() PHP function finds this distance.  Pass it two strings, and it returns the minimum distance, or number of required changes to go from the first string argument to the second one.  Here’s an few examples:

$lev_dist = levenshtein('tothbrush', 'toothbrush');
echo $lev_dist;

$lev_dist = levenshtein('car', 'truck');
echo $lev_dist;

$lev_dist = levenshtein('undergraduate', 'graduate');
echo $lev_dist;

Results:

1
5
5

This can be particularly useful for catching and correcting spelling errors in user input, especially when dealing with search terms. I’d be interested to see other clever uses for this that people come up with. Note that there is a 255 character limit imposed on the strings used as arguments to the function, but several variations of this function without the imposed limit have been posted on PHP.net.

3. pack()

Pack() works similarly to sprintf(), but rather than returning an ordinary string, pack() returns a binary string. This can be incredibly useful when writing custom serialization routines or when sending raw, protocol-specific data to a host.

As an example, let’s construct a data packet consisting of two short integers, followed by a character:

$sint_1 = 42;
$sint_2 = 11;
$my_char = 'x';

$bin_str = pack('nnC', $sint_1, $sint_2, $my_char);

The resulting binary string will contain the following bytes:

0x00 0x1A 0x00 0x0B 0x78

The list of acceptable format characters includes individual characters for specifying byte order in integers as well, another reason why this can become handy when dealing with network applications.

2. soundex()

The soundex key of a string provides information about how that string is pronounced.  The function will generate a soundex key for a word, which then can be used when searching for similar sounding words.  This is especially useful when searching a database.

echo soundex('stake');
echo soundex('steak');
echo soundex('milk');
echo soundex('mill');

Results:

S320
S320
M420
M400

If you’re only dealing with words in the English language, you may prefer to use metaphone() instead, which uses a better understanding of English word pronunciation and is thus more accurate.

1.  date_sunset()

Given a specified day (and, optionally, a locale), this function will return the time at which the sun will set.

The first thing that came to mind when I heard about this function was the ability to swap stylesheets based on the time of day at which a visitor visits a website. For example, this function could be used to create a simple system that displays a daytime theme to the visitor if the sun is still up in their area, and a nighttime theme once the sun has set.

echo "The sun sets at " . date_sunset(time(), SUNFUNCS_RET_STRING);

Other potential uses for this function include disabling various website functionality based on the time of the sunset (for whatever reason) and tracking daytime versus nighttime visitors.

There are obviously plenty of other interesting PHP functions out there. If you know of any other unique, noteworthy built-in PHP functions, I’d love to hear about them.

malloc() Causes a Segfault at _malloc_unlocked

I ran into a strange bug this week in the code to the C project I’ve been working on.  Seemingly randomly, I was encountering a segfault in a call to malloc() while allocating memory for a new struct.  The problem had me completely baffled, and web searches turned up no useful information, since most people encounter this problem when dealing with multithreading, which I was not.

I fired up gdb and ran a backtrace from the fault:

#1  0×08055e4a in _malloc_unlocked () at src/file.c:263

This didn’t help much, but my guess was that I had corrupted the heap somehow somewhere in the code before the allocation. After a bit of careful code browsing, I found the culprit:

memcpy(ptr, &obj->data.data[19], obj->data.length);
ptr += obj->data.length;

A simple memory bounds error. My intention here was to copy all of the data from position 19 through the end of the data.data buffer into the buffer pointed at by ptr.  But I had left off the 19-byte adjustment from the size argument to memcpy and the subsequent pointer incrementation.  Correcting the bounds fixed the problem and the program went on its merry way:

memcpy(ptr, &obj->data.data[19], obj->data.length - 19);
ptr += obj->data.length - 19;

The problem with these kinds of errors is that they often don’t reveal themselves until later on in the execution of the program, in an area of code that has nothing to do with the actual problem.  The code above is used in a loop and executes successfully for a while, until malloc() tried to deal with an area of memory that was accidentally written over by memcpy(). At that point, bad things happen.

Be careful with memcpy().

Object-Oriented C

Although C is regarded primarily as a procedural language, it is entirely possible to write C code structured in a way similar to code written in object-oriented languages such as C++.

Now, of course, you could go all out and write truely object-oriented C, complete with inheritance, type checking, and the like. But that’s not what we’re going to be doing here.  Instead of recreating the complete functionality of object-orientation, we’re going to look at how to write pseudo-object-oriented code in C. The key is that the code itself is still procedural, but organized in a way such that it can be used in an OO fashion.  The technique itself is very simple, and when used properly it can make code management much easier.

The first thing to address is data encapsulation.  How do we define a new data type so that the rest of the program is able to use it without knowing about its internal structure?

Doing this is rather easy.  In our header file we tell the compiler that the structure will be defined elsewhere by simply declaring the struct without defining it:

struct String;
typedef struct String String;    /* typedef'd for convenience */

Then, in our source file, we define the actual structure:

struct String {
    unsigned char *str;
    unsigned int len;
};

Now, whenever we include string.h in our program, we have access to the String type, that is, we can declare String variables and pointers, but the internal data of the struct is hidden from us. Voila – encapsulation!

The next step is to distinguish the scope of our type’s methods. This is equally as simple, and we’ll start by establishing a few simple naming conventions that will allow us to simulate the scope of functions related to our data type.

For public methods, we’ll prefix the function with the name of the type and an underscore.  For example, if we wanted to create a public method for String called append(), then the corresponding function would be String_append().

For private methods, we’ll prefix the function name with only an underscore.  For example, if we wanted to add a private method to String called resize(), the corresponding function would be _resize().

These conventions help us to visually distinguish which methods should be called by other parts of the program and which ones should be limited for use by only the module containing the data type.

But let’s not rely on these conventions alone.  Where we place our function prototypes is just as important as how we name them. Since we want to make our public methods available to other parts of the program, we place their prototypes in the header file for our module.  This grants access to these functions to any file that includes our header, just like we did with the structure.

Our private methods, however, are declared in the source file as static methods.  This ensures that only other functions within the module will be able to access them.

Let’s create a data type called String to illustrate how the technique works.  We’ll start by defining our header file, mystring.h:

#ifndef STRING_H
#define STRING_H

/* declare the struct (but don't define it!) */
struct String;
typedef struct String String;

/* declare some public methods */
String* String_new( const char *init );
void String_delete( String *str );
void String_append( String *str, const char *other );

#endif

Since the code we’re writing isn’t truely object-oriented (and we’re not messing around with all sorts of function pointers), we need a way for the functions to know which object they are acting upon. For this reason, we pass a pointer to the object as the first argument of each function. In an actual object-oriented language, a method call would look like this:

obj.method(arg1, arg2, ...);

In our pseudo-object-oriented code, the method call looks like this:

method(obj, arg1, arg2, ...);

Now let’s move on to our source file, mystring.c, where we will define the struct, declare our private methods, and define both our public and private methods.

#include <stdlib.h>
/* string.h and strings.h are included for
   strlen() and strlcpy(), respectively. */
#include <string.h>
#include <strings.h>
#include "mystring.h"

/* define the struct */
struct String {
    unsigned char *str;
    unsigned int len;
};

/* declare private methods */
static void _resize( String *str, const unsigned int newSize );

/* define private methods */
void _resize( String *str, const unsigned int newSize ) {
    if( newSize != str->len ) {
        str->str = realloc(str->str, newSize);
        str->len = newSize;
    }
}

/* define public methods */
String* String_new( const char *init ) {
    String *retval = malloc(sizeof(String));
    retval->len = strlen(init);
    retval->str = malloc(sizeof(char) * retval->len);
    strlcpy(retval->str, init, retval->len);
}

void String_delete( String* str ) {
    free(str->str);
    free(str);
}

void String_append( String* str, const char *other ) {
    int i, oldLen = str->len;
    _resize(str, strlen(other));
    for( i = oldLen; i < str->len; ++i ) {
        str->str[i] = other[oldLen - i];
    }
}

That’s essentially all there is to it. Other C modules in the program will be able to declare and create String objects, but will not have access to their internal variables and will only be allowed to call the public methods declared in the header file.

On a final note, if you consider yourself a proficient C programmer, I highly recommend checking out the book I liked to at the beginning of this article.  It’s an excellent read and gives a truely insightful look into the inner workings of many of the object-oriented language constructs we’ve come to rely on.