10 Useful PHP Functions You May Not Have Known About

10. get_browser()

This function comes in handy when collecting information about a user’s browser to determine the content capabilities for that user.  It returns an associative array containing information such as the browser type, version, whether or not it is capable of handling frames, cookies, and javascript, and the platform on which the browser is running.

This kind of information could prove to be invaluable when tracking users, logging site usage, or building a statistics engine for your site.

9. parse_url()

Ever needed to split up a URL into its individual components to extract data from it?  Look no further than parse_url().  This function does all the dirty work of determining what is what in the URL and conveniently presents it to you in an associative array.  The resulting array will contain one or more elements of the URL. Take the following URL for example:

http://www.domain.com/controller.php?var=value&another=value2#anchor

Passing this URL string to parse_url() would generate an array like the following:

Array
(
    [scheme] => http
    [host] => www.domain.com
    [path] => /controller.php
    [query] => var=value&another=value2
    [fragment] => anchor
)

Don’t worry if the URL isn’t perfectly formed — parse_url() will do it’s best to extract whatever information it possibly can from the string passed to it.

8. get_meta_tags()

This function does exactly what it says: it parses a string of HTML and extracts the meta tags from it.  The results are returned in an associative array with the name attributes as the keys and the value attributes as the values.

This could be used to extract meta information such as keywords and descriptions from pages in any sort of PHP-based web crawler.

7. array_filter()

Filtering arrays in PHP is easy thanks to array_filter().  Let’s say you wanted to filter the following array of integers for any numbers less than 5:

$array = array(1, 11, 56, 3, 3, 633, 34, 2, 18, 9);

A classic way to approach this task would be to use a loop to examine each element and selectively generate a filtered array:

foreach( $array as $num ) {
    $filtered_array = array();
    if( $num < 5 ) {
        $filtered_array[] = $num;
    }
    return $filtered_array;
}

However, there’s an easier way to accomplish this. Let’s define a function that checks a number for our filter condition (which, by the way, makes it easier to change/reuse later on):

function filter( $item ) {
    return $item < 5;
}

And now, generating the filtered integer array is a simple one-line task:

$filtered_array = array_filter($array, 'filter');

The resulting array looks like this:

Array
(
    [0] = 1
    [3] = 3
    [4] = 3
    [7] = 2
)

6. base_convert()

This function converts a string representation of a number in one format (radix) to a string representation of that same number in another format, up to and including base 36.  The most common use for this is to convert between decimal and hexadecimal, but there are many other potential uses depending on the scope of your application.

$numBase10 = 4567;

$numBase16 = base_convert($numBase10, 10, 16);
echo $numBase16;

$numBase2 = base_convert($numBase16, 16, 2);
echo $numBase2;

Results:

11d7
1000111010111

5. uniqid()

The use of globally-unique identifiers, or GUIDs, is becoming increasingly more widespread.  PHP’s on top of things and provides this function to generate them for your applications.  The two optional arguments allow you to specify a prefix for the identifier and whether or not to add more entropy to the end.

echo uniqid();
echo uniqid('prefix');
echo uniqid('', true);
echo uniqid('prefix', true);

Results:

48c88453d9702
prefix48c88453d970e
48c88453d97100.75111063
prefix48c88453d97130.65378524

4. levenshtein()

The Levenshtein distance between two strings is the minimum number of characters in one string that must be altered, added, or removed in order to end up with the second string.  For example, take the words cat and car.  The Levenshtein distance between the two words is 1, since to go from cat to car, the only change required is to replace the “t” with an “r”.

The levenshtein() PHP function finds this distance.  Pass it two strings, and it returns the minimum distance, or number of required changes to go from the first string argument to the second one.  Here’s an few examples:

$lev_dist = levenshtein('tothbrush', 'toothbrush');
echo $lev_dist;

$lev_dist = levenshtein('car', 'truck');
echo $lev_dist;

$lev_dist = levenshtein('undergraduate', 'graduate');
echo $lev_dist;

Results:

1
5
5

This can be particularly useful for catching and correcting spelling errors in user input, especially when dealing with search terms. I’d be interested to see other clever uses for this that people come up with. Note that there is a 255 character limit imposed on the strings used as arguments to the function, but several variations of this function without the imposed limit have been posted on PHP.net.

3. pack()

Pack() works similarly to sprintf(), but rather than returning an ordinary string, pack() returns a binary string. This can be incredibly useful when writing custom serialization routines or when sending raw, protocol-specific data to a host.

As an example, let’s construct a data packet consisting of two short integers, followed by a character:

$sint_1 = 42;
$sint_2 = 11;
$my_char = 'x';

$bin_str = pack('nnC', $sint_1, $sint_2, $my_char);

The resulting binary string will contain the following bytes:

0x00 0x1A 0x00 0x0B 0x78

The list of acceptable format characters includes individual characters for specifying byte order in integers as well, another reason why this can become handy when dealing with network applications.

2. soundex()

The soundex key of a string provides information about how that string is pronounced.  The function will generate a soundex key for a word, which then can be used when searching for similar sounding words.  This is especially useful when searching a database.

echo soundex('stake');
echo soundex('steak');
echo soundex('milk');
echo soundex('mill');

Results:

S320
S320
M420
M400

If you’re only dealing with words in the English language, you may prefer to use metaphone() instead, which uses a better understanding of English word pronunciation and is thus more accurate.

1.  date_sunset()

Given a specified day (and, optionally, a locale), this function will return the time at which the sun will set.

The first thing that came to mind when I heard about this function was the ability to swap stylesheets based on the time of day at which a visitor visits a website. For example, this function could be used to create a simple system that displays a daytime theme to the visitor if the sun is still up in their area, and a nighttime theme once the sun has set.

echo "The sun sets at " . date_sunset(time(), SUNFUNCS_RET_STRING);

Other potential uses for this function include disabling various website functionality based on the time of the sunset (for whatever reason) and tracking daytime versus nighttime visitors.

There are obviously plenty of other interesting PHP functions out there. If you know of any other unique, noteworthy built-in PHP functions, I’d love to hear about them.

malloc() Causes a Segfault at _malloc_unlocked

I ran into a strange bug this week in the code to the C project I’ve been working on.  Seemingly randomly, I was encountering a segfault in a call to malloc() while allocating memory for a new struct.  The problem had me completely baffled, and web searches turned up no useful information, since most people encounter this problem when dealing with multithreading, which I was not.

I fired up gdb and ran a backtrace from the fault:

#1  0x08055e4a in _malloc_unlocked () at src/file.c:263

This didn’t help much, but my guess was that I had corrupted the heap somehow somewhere in the code before the allocation. After a bit of careful code browsing, I found the culprit:

memcpy(ptr, &obj->data.data[19], obj->data.length);
ptr += obj->data.length;

A simple memory bounds error. My intention here was to copy all of the data from position 19 through the end of the data.data buffer into the buffer pointed at by ptr.  But I had left off the 19-byte adjustment from the size argument to memcpy and the subsequent pointer incrementation.  Correcting the bounds fixed the problem and the program went on its merry way:

memcpy(ptr, &obj->data.data[19], obj->data.length - 19);
ptr += obj->data.length - 19;

The problem with these kinds of errors is that they often don’t reveal themselves until later on in the execution of the program, in an area of code that has nothing to do with the actual problem.  The code above is used in a loop and executes successfully for a while, until malloc() tried to deal with an area of memory that was accidentally written over by memcpy(). At that point, bad things happen.

Be careful with memcpy().

Object-Oriented C

Although C is regarded primarily as a procedural language, it is entirely possible to write C code structured in a way similar to code written in object-oriented languages such as C++.

Now, of course, you could go all out and write truely object-oriented C, complete with inheritance, type checking, and the like. But that’s not what we’re going to be doing here.  Instead of recreating the complete functionality of object-orientation, we’re going to look at how to write pseudo-object-oriented code in C. The key is that the code itself is still procedural, but organized in a way such that it can be used in an OO fashion.  The technique itself is very simple, and when used properly it can make code management much easier.

The first thing to address is data encapsulation.  How do we define a new data type so that the rest of the program is able to use it without knowing about its internal structure?

Doing this is rather easy.  In our header file we tell the compiler that the structure will be defined elsewhere by simply declaring the struct without defining it:

struct String;
typedef struct String String;    /* typedef'd for convenience */

Then, in our source file, we define the actual structure:

struct String {
    unsigned char *str;
    unsigned int len;
};

Now, whenever we include string.h in our program, we have access to the String type, that is, we can declare String variables and pointers, but the internal data of the struct is hidden from us. Voila – encapsulation!

The next step is to distinguish the scope of our type’s methods. This is equally as simple, and we’ll start by establishing a few simple naming conventions that will allow us to simulate the scope of functions related to our data type.

For public methods, we’ll prefix the function with the name of the type and an underscore.  For example, if we wanted to create a public method for String called append(), then the corresponding function would be String_append().

For private methods, we’ll prefix the function name with only an underscore.  For example, if we wanted to add a private method to String called resize(), the corresponding function would be _resize().

These conventions help us to visually distinguish which methods should be called by other parts of the program and which ones should be limited for use by only the module containing the data type.

But let’s not rely on these conventions alone.  Where we place our function prototypes is just as important as how we name them. Since we want to make our public methods available to other parts of the program, we place their prototypes in the header file for our module.  This grants access to these functions to any file that includes our header, just like we did with the structure.

Our private methods, however, are declared in the source file as static methods.  This ensures that only other functions within the module will be able to access them.

Let’s create a data type called String to illustrate how the technique works.  We’ll start by defining our header file, mystring.h:

#ifndef STRING_H
#define STRING_H

/* declare the struct (but don't define it!) */
struct String;
typedef struct String String;

/* declare some public methods */
String* String_new( const char *init );
void String_delete( String *str );
void String_append( String *str, const char *other );

#endif

Since the code we’re writing isn’t truely object-oriented (and we’re not messing around with all sorts of function pointers), we need a way for the functions to know which object they are acting upon. For this reason, we pass a pointer to the object as the first argument of each function. In an actual object-oriented language, a method call would look like this:

obj.method(arg1, arg2, ...);

In our pseudo-object-oriented code, the method call looks like this:

method(obj, arg1, arg2, ...);

Now let’s move on to our source file, mystring.c, where we will define the struct, declare our private methods, and define both our public and private methods.

#include <stdlib.h>
/* string.h and strings.h are included for
   strlen() and strlcpy(), respectively. */
#include <string.h>
#include <strings.h>
#include "mystring.h"

/* define the struct */
struct String {
    unsigned char *str;
    unsigned int len;
};

/* declare private methods */
static void _resize( String *str, const unsigned int newSize );

/* define private methods */
void _resize( String *str, const unsigned int newSize ) {
    if( newSize != str->len ) {
        str->str = realloc(str->str, newSize);
        str->len = newSize;
    }
}

/* define public methods */
String* String_new( const char *init ) {
    String *retval = malloc(sizeof(String));
    retval->len = strlen(init);
    retval->str = malloc(sizeof(char) * retval->len);
    strlcpy(retval->str, init, retval->len);
}

void String_delete( String* str ) {
    free(str->str);
    free(str);
}

void String_append( String* str, const char *other ) {
    int i, oldLen = str->len;
    _resize(str, strlen(other));
    for( i = oldLen; i < str->len; ++i ) {
        str->str[i] = other[oldLen - i];
    }
}

That’s essentially all there is to it. Other C modules in the program will be able to declare and create String objects, but will not have access to their internal variables and will only be allowed to call the public methods declared in the header file.

On a final note, if you consider yourself a proficient C programmer, I highly recommend checking out the book I liked to at the beginning of this article.  It’s an excellent read and gives a truely insightful look into the inner workings of many of the object-oriented language constructs we’ve come to rely on.