Archive for the ‘Tutorial’ Category

Picking the first language to learn for novice programmers can be a daunting task. For many, they simply have no clear where to start, and don’t know any of the merits and drawbacks of the various choices they have. Even among experienced programmers, the debate on what new programmers should start with is long and has many different sides. In this article I will give my side.

My Perspective

Before I talk about which language people should start trying to learn first, I will discuss where I am coming from when I make that choice. The first language someone learns in my opinion should be fairly easy to develop with, and be able to give the programmer results fairly quickly. The main problem I generally see with fledgling programmers just starting out is that they get very frustrated with error messages, and in general struggle to connect the concepts they are working with to an actual useful program. I keep these things in mind when making my choice.

 

My Opinion

For new programmers, I highly recommend starting with a web development related language, namely Javascript or PHP, PHP being the first recommendation. The reason I say this is simple. PHP is a very easy language to program in. As a scripting language,  development goes much faster (no having to recompile upon changing a single line of code), and the language itself is much more forgiving. A new user doesn’t have to memorize many different syntactical rules, so all they have to focus on is understanding the concepts, and getting practice with solving problems. Also, PHP (and to an extent, Javascript) comes right out of the box with many various helper functions and algorithms that in other languages you need to either import a library, or write the function yourself. This allows novice programmers to solve problems much faster and easier. However, this comes with a drawback. A programmer who gets used to PHP having a lot of functionality built in that other languages don’t have by default may make it difficult for the programmer to migrate to different languages/platforms. On the other hand, while the programmer may become too reliant on these built in functions, they do know to some extent how to solve a similar problem with another language. The new problem becomes finding out how the other language implements the functionality used in the built in PHP function, or how to write it themselves.

The second reason I suggest PHP (or Javascript), and web development in general is because it makes using the various concepts they’ve learned in practice more rewarding. It is much easier to make an interactive GUI with PHP combined with HTML/CSS than with a programming language like C++, Java, or even Python. Because of this, its much easier to make a “cool” application even with only basic skills. Making a website where you can log in and see a welcome screen is extremely easy with PHP and HTML, but would be much more difficult for a novice in, say, C++ or Java.

 

Drawbacks

While learning PHP (or web development in general) first is not a bad idea, there are some drawbacks. One thing, which I mentioned earlier in this post, was that many novice users who use PHP may become dependent on the built in functions that aren’t available in other languages without either importing a library, or writing the code yourself. This can make transitioning to a more difficult language even harder. This type of problem isn’t unique to PHP either, but common to people who start with most scripting languages, and than transition to a more difficult, stricter language. For example, in my school, the introductory programming class used Python, while the next class (Object Oriented Programming) is taught using C++. Many of the students who go from the first class to the second struggle early in the class with the unavailability of certain functionality that they got so used to using in Python.

Another issue is bad habits that are easy to form with scripting languages. Since most scripting languages have no explicit typing, the concept of a variable with a type can be difficult to understand, especially if you have been programming for a good amount of time.

In general, since scripting languages or usually much easier than stricter programming languages, novice programmers are able to progress faster and without much frustration, but they can pick up bad habits, like the expectation of the language to provide certain functionality, or the expectation of the program to handle many things in the background that you have to set up yourself in stricter languages. The example that one may come to think of first is garbage collection, which is available in most scripting languages, and even some strict programming languages (most notable being Java), but not in, say, C++. Scripting languages also hide many of the core functionality that is inherent in the language from the user. This can create programmers who know how to do something a certain way, but have no idea of the concept behind what allows them to accomplish whatever goal they have. This makes it much harder for these programmers to solve more complex problems that require knowledge of how programs and the underlying code actually works. For example, a novice may figure out how to stop a certain PHP notice from occurring by using the notice suppression operator (the @ operator), but may have no clue why that operator actually fixes (well it doesn’t actually fix the error, but rather stops it from being reported) the error.

However, these problems can be resolved by going from learning an easy scripting language to learning a more difficult, stricter programming language like C++ (in fact, I highly recommend starting with PHP/Javascript and even MySQL if you are adventurous, and then learning C++ in depth). This way, the programmer gets the benefits of learning a very strict language like C++, where you essentially have full control over your program and therefore have to know not only how to solve problems, but why certain code and concepts solve the problem. Also, the programmer won’t be nearly as frustrated since they already have a language under their belt. The learning curve of C++ is high enough, but some of that difficulty can be mitigated if you already have practice with most or all of the basic programming concepts.

 

In short, for anyone who wishes to learn how to program and make applications, I suggest starting with web design, namely PHP (but Javascript, ASP, or any other web based language would work) because development is faster and easier for a novice programmer, and seeing the results of your knowledge and making cool, useful applications is much easier, especially if you want to incorporate some sort of GUI elements.

As a side note, since not everyone is interested in web design, or has the means to purchase a server (although you can always download software like LAMP, or WAMP to turn your computer into a local web server), other scripting languages will work as well. If you are going to go this route, I suggest Python personally.

 

If you have any questions or comments, or disagree with what I say, leave a comment below! I would love to have a discussion with someone who has a differing opinion from mine.

Out of boredom I decided to create a very simple wrapper class for mysql database management. This class, as I mentioned, is extremely simple. However, for those wishing to learn more about Object Oriented programming, and how to implement a class that is useful, this is a good place to start. Anyone wishing to learn more about OOP could take this basic class, and make it into something that is much more useful.

Note: This class is untested. I haven’t actually ran the code, but I reviwed it to make sure any simple syntax/conceptual errors were removed.

If there is a syntax error or anything you see, leave a comment below so I can fix it!

Functionality

The class has very limited functionality. It basically provides normal mysql functionality (like querying, fetching rows, and fetching all the rows of a result set) and has a built in sanitizing function.

Improvements

This class is by no means complete or even useful. However, there are many improvements that someone could add to this that would not only make this class fairly useful, but is also great practice. Some obvious improvements could be as follows:

  • function to build and execute a query based on function parameters (like table, what to select, and where/order by statements(s)
  • an insert function that builds and executes a query based on parameters (table to insert, array of columns, and an array of corresponding values
  • stronger validation/sanitization

There are many many more things you could add to this class. A little competition for the readers: Submit your version of the class below with added features. The best/coolest class will get a free page with links/advertisement to their website (or a website of their choice)


<?php
class DB_manager {
 private $database;//database you wish to connect to
 private $username;//mysql username
 private $password;//myysql password
 private $server;//mysql server to connect to
 private $link;//variable to hold mysql connection link
 private $currentResult = false;

 private $query;

 //simple error reporting function.
 private function reportError($error){
 echo $error;
 }

 //constructor
 public function __construct($username, $password, $server="localhost"){
 $this->server = $server;
 $this->username = $username;
 $this->password = $password;
 }

 //connect
 public function connect($database){
 $this->link=mysql_connect($this->server,$this->user,$this->pass);
 //make sure we connected correctly
 if (!$this->link) {//open failed
 $this->reportError("Could not connect to server: <b>$this->server</b>.");
 }
 $this->database = $database
 if(!@mysql_select_db($this->database, $this->link)) {//no database
 $this->reportError("Could not open database: <b>$this->database</b>.");
 }
 }

 //run query q
 public function query($q){
 $this->query = $this->string_escape($q);
 $this->currentResult = mysql_query($q);
 if (!$this->currentResult){
 $this->reportError("There was an error with your query: " . mysql_error();
 $this->currentResult = false;
 return false;
 }
 return $this->currentResult;
 }
 //returns row from the result a query returned
 public function fetch_row($assoc = true){
 if (!$this->currentResult){
 $this->reportError("There is no valid result set to return data from.");
 return false;
 }

 if ($assoc)
 return mysql_fetch_assoc($this->currentResult);
 else
 return mysql_fetch_array($this->currentResult);

 }

 public function string_escape($s){
 return mysql_real_escape_string($s);
 }

 public function fetch_all($assoc = true){
 if (!$this->currentResult)
 $this->reportError("There is no valid result set to return data from.");
 return false;
 }

 $array = [];
 if ($assoc)
 while ($row = mysql_fetch_assoc($this->currentResult)){
 $array[] = $row;
 }
 }
 else {
 while ($row = mysql_fetch_array($this->currentResult)){
 $array[] = $row;
 }
 }

 return $array;
 }
}

//usage
$db = new DB_Manager('username', 'password', 'server (usually localhost');
$db->connect("databse_name");

$db->query("SELECT * FROM sometable");

$rows = $db->fetch_all();

foreach($rows as $r){
 print_r($r);
}

&nbsp;

&nbsp;

?>

If you have any questions, leave a comment!

This will be the first of 4 part series on getting started with Unity. In this portion, I will go over some things I learned while starting my first project, as well as some design issues I came across, as well as how to solve them. In addition I will talk about starting a game from scratch. This section won’t be too heavy on code or follow-able examples, so if you want to skip ahead to the more code heavy parts in section II, III, and IV, feel free. These posts assume you have basic knowledge of Unity’s GUI, and have to navigate it.

What I learned:

I’ve been using Unity for a few months now, and I’ve learned quite a bit about game design in general. The types of problems you come across in game programming are much different than on other domains. For example, in a web or mobile development, you spend a lot of time processing and retrieving data from various databases or calculations, and presenting it in a simple manner to the user. In game development, you have to work with things like positioning, and rotation, and figure out solutions to things like path finding, combat, and artificial intelligence.

The most interesting part of developing a game is definitely designing the individual game mechanics. When deciding on which mechanics to work on first, starting on the simplest, or most fundamental to your game is key. This is one things I learned while developing the simulated city game I am currently working on. For example, in a game like mine, where spawning certain buildings or other objects in a key part of the game, starting with a simple script to place objects (ignoring resources, and similar) is important. Once you have that basic functionality done, you can continue adding to it without worrying about whether the basic functionality of it will work. In a platformer game (like Lerpz 2), focusing on the movement of the player controlled character is also very important and should be one of the first things designed.

In addition to coding, the look and feel of your game is also important. Unlike web design, where creativity can be borrowed from free templates and similar sources, getting visual content for your game can be difficult. There are online services that offer free game assets, but getting a large group of assets that look good together can be difficult. This actually brings me to my next point.

Starting from Scratch

Starting a game from scratch can become a daunting process, and unfortunately many would be game developers get eternally stuck at this part of the process. I wrote a post a while ago with a bunch of different resources: http://blackscorner.me/2011/10/04/list-of-unity-game-assets-both-free-and-paid/. This is a great place to start for free assets of all kinds. Things like sounds and textures (I use http://www.cgtextures.com/ for textures, and http://www.freesound.org/ for sounds) are easy to get, and using these resources saves a whole lot of time. However, for models, many models, while of good quality, just don’t fit with what you need. Some are too high poly, some or too low poly, and some just don’t fit the style. Sometimes you need to make your own models

Modeling

Modeling can be a very difficult process, especially for those whose experience lies mostly in coding. Choosing what 3D modeling software depends on quite a few factors. The free and open source alternative, called Blender (Blender.org) is a very popular choice for indie game developers who don’t have the money to put towards one of the proprietary pieces of software. Other alternatives are Maya, 3Ds Max, and others. I have experience with Blender, so what is what I will talk about briefly.

There are many online tutorials for Blender (you can find many on this page: http://en.wikibooks.org/wiki/Blender_3D:_Noob_to_Pro/Tutorial_Links_List). The learning curve is somewhat complex, but once you learn how to successfully create and texture a simple model, things get easier. I am by no means an expert, or really even good at modeling, but making your own models at least as place holders can really make development faster. You can always change the models, but if you are waiting on models before you can do code related stuff, the development process really slows down. The great part is that importing 3D models from Blender (and really, most other 3D modeling software) is extremely easy.

View the second part of this series here: Link

As a tutor and coding help forum poster, I notice many instances of people using OOP, but using it incorrectly. Now, i don’t necessarily mean that they have errors in their code, or that it isn’t logically correct, but rather that the programmer in question has problems with their concepts of OOP. In this post I’ll be discussing the very basic and common problems I see, and ways to think about OOP such that these mistakes can be avoided. I’ll only be going over two of the simplest OOP concepts, encapsulation and inheritance.

Object Oriented Programming: The Why?

Many people simply use Object Oriented Programming because its the “latest and greatest” paradigm in the computer science field. The problem is that they learn the syntax and semantics of OOP, but never truely learn what OOP is. Let me start by giving a basic explanation of what OOP brings to the table as far as programming is concerned.

First, the basic concept of OOP is instead of programming procedurally, or functionally, you program in such a way that your entire program consists of discrete, separate objects that interact with each other in certain ways. This concept is fairly easy to understand, but doesn’t quite explain how to implement such a program that takes advantage of all OOP has to offer. However, lets take a simple example to further illustrate this concept. Lets say we are writing a program to model driving a car. Each car is a separate object of course, but these objects are instances of the same class (in the case, a car), but have different properties. Some are fast, some are strong, and some have great gas mileage. Each car, consequently, is composed of many other discrete objects. Like the rear view mirror, or the engine (which in turn could be composed of other objects) and the tires (and of course these each have their own attributes). Then there is the driver of the car which is another object that interacts directly with the car. The Driver tells the car to drive, and the car in turn tells the engine to turn on, and tells the wheels to spin. The Driver doesn’t ever interact with the individual components of a car, but rather the car itself. The car interacts with each individual components. In (very PHP-like) pseudo code, this interaction might look something like

$Honda_Accord =  new Car();
$Honda_Accord->addEngine(new Engine());
//add rest of components
$bob = new Driver();
$bob->Drive($Honda_Accord);

//inside the drive method,
$Honda_Accord->start_engine();
$Honda_Accord->spin_wheels();
//etc..

Encapsulation

This is probably the most widely known but often misunderstood concepts that OOP brings to the table. The basic concept is that of data hiding. Each discrete object in your program has data inside of it, that it hides from the rest of the program. If we consider the example above, each car would have certain properties (like speed, gas mileage, etc.). These properties would only be changed or acted with by the car. The rest of the program cannot (and should not) have access to any of these properties. For example, the driver tells the car to start, but the car is the one that sends the signals to the tires to move. The driver doesn’t (and can’t) interact with each individual wheel.

Encapsulation is often widely misused though. I see misuse in many forms, but the most common are as follows. The programmer may completely forgo encapsulation (For example, he makes all data members public). The programmer also may take it too far (he makes every single data member private, even those that conceptual shouldn’t be). I also see many people doing a very curious thing. They make all (or most) of the classes data members private, but create mutators and accessors for all of them, essentially making them public, but writing a lot more code to do so. For example

class A {
private $some_int;
private $some_string;

public function get_some_int(){ return $this->some_int; }

public function set_some_int($i){ $this->some_int = $i; }

public function get_some_string(){ return $this->some_string; }

public function set_some_string($s){ $this-> some_string = $s; }

}

Now, trying to explain how to correctly use encapsulation is almost impossible, since it can be applied in so many ways. However, I often try to tell my students the following, which has mixed, but predominantly positives results: When deciding what data members to make public, which to make private, and which to make accessors or mutators for, consider the following. If we consider a data member, say int x, how is it going to be used? Is it only ever going to be accessed and modified inside the class. If so make it private. If it will be accessed outside of the class, but you only want to be able to change it inside, then make it a private variable with an accessor (like GetX()). If you are going to be accessing and changing it outside of the class without any restrictions (IE a programmer using this class can set the value of x to any reasonable value) Then make it public. If you want to make it so that the data member can be set and retrieved outside the class, but setting the variable has restrictions (Like x cannot be over 10 or below 0) then make the variable private, and use an accessor and mutator.

Lets take an example. Lets say we are creating a Person class. The name of the person would obviously be private (since no one can change someone’s name but the person) but with an accessor (people can still ask for their name). The amount of cash someone has would again be private, but would have an accessor (someone can ask how much money they have) and a restricted mutator (someone can’t just set the amount of money they have, but a store keeper can ask for a certain amount, or give a certain amount back as change.) The class could look something like:

class Person {
private $name;
private $cash;

public function getName(){ return $this->name; }

public function getCash(){ return $this->cash; }

public function giveChange($c){ $this->cash += $c; }//function to return change back to user

public function pay($price) { $this->cash -= $price; }//function to pay for something

}
<h3>Inheritance</h3>
Inheritance is one of the most commonly misused concepts I see. Sometimes even more than encapsulation (since its a bit easier to understand the implications of it in a conceptual sense, so its much easier to correct past mistakes). However, inheritance is slightly more complex than encapsulation. The basic concept is rather easy to understand. Inheritance in OOP models inheritance in real life fairly closely. You have a Base or Parent object, and you have a Derived or Child object that inherits from it. The child has all the properties that a parent has, and more.

The main problem I see many students and novice programmers trip up on is having children inherit from parents for no apparent reason. For example (and this is a real example) one of my students had a lab recitation to model a car (much like the example above, but much simpler). Cars were composed of Engines, and Wheels and had some basic attributes (like model name and number). This student decided to have the engines and wheels inherit from the car because they were part of the class, and he took this to mean that they had to inherit from cars. This shows two things. One, laziness, and two, a complete misunderstanding of the concept of inheritance.

When deciding on using inheritance, take the following into consideration:

Say we have a system where we have a few different classes. Lets say that these classes all have some sort of similarities. If they do, then I would take these similarities (generally in the form of the same attributes, or the same methods) , move them into a base class, and have my other classes derive from them. Inheritance can be used to avoid having to repeat code. However, inheritance is unnecessary or irrelevant if we have two classes that are quite different from each other, even if they seem to have some similarities. For example, if we are creating a GUI system, we may have a bunch of different shapes. Each shape would have some similar methods and attributes (Like the name of the shape, and a method for drawing the shape). Each shape should derive from some Base shape class. Another example would be a system where we have musicians and a music store. Each musician and each store would have a store of instruments (for example, a musician may have a guitar and drums, and a store may have a whole selection of guitars, basses, drums, etc.) Musicians and stores would also each have names, and other things. However, it really wouldn't make sense for either musicians to inherit from stores, or stores from musicians since each has unique data members that the other doesn't have.

Taking the shapes example, the code may look something like



class Shape { //base shape class
protected $name= "abstract shape";//protected so derived class can access of course

public function draw(){ ... }

}

class Triangle extends Shape {//triangle class that inherits from shape class

private $height;
private $base;

function _construct(){ $this->name = "triangle"; } //constructor to set name of this class to triangle

}

 

If you have any questions, comments or feedback, please let me know in the comments!

public function draw() { ... draw the triangle ... }

As a frequent forum poster in a few programming help boards, I have noticed a lot of well meaning people simply don’t know how to ask a question right, and this leads to incomplete, or inadequate answers, or simply getting told off. The poster will ask follow up questions usually, in a strained struggle to get their question/homework/project completed, and usually these end up annoying those who do want to help (as they are of the same bad format as the original question) or get shuffled off to some distant tutorial on the other side of the internet.

The problem is that they simply don’t know how to ask their question correctly, or more commonly are so fed up with the problem that they can’t be bothered to do any research and formulate an intelligent question that will get an intelligent answer. For example, I often see posts like this.

Hi I have a problem. I want my code to print my birthday from the database, but it won’t. Help

Code here Code here

This may seem like a perfectly reasonable question to a novice because they simply don’t realize how vague of a question this is. The reason that the above code may not work could be one of multitudes of different things, from simple syntax errors to very complex server problems. No forum poster wants to have to prod the user for more information to fix the problem (unless its super obvious) so questions like these can go unanswered, or answered in the form of a question requesting more information.

 

So How do I ask?

Well, this depends entirely on the question. But a general formula for asking good questions, and seeming like you actually care about the Why of the answer, rather than just the answer is as follows:

  • First research the problem. For very simple problems, this can save a lot of time debugging, since someone else is bound to have run into the same problem. Generally, if I have some sort of error I will start by Googling the actual error text. Remember, Google is your friend, and should always be the first resort!
  • Try to debug yourself. Debugging is a practice learned through experience, but some helpful tips are as follows (Note these are rather specific to PHP, but the concept can be applied to anything):
  1. Output all the variables in question. Do they have values you think they have? If not trace where these values are set.
  2. If its a MySQL problem, make sure you are checking if there was a MySQL error. If there was, make sure you note it in any posts you make.
  3. Turn error reporting on. Generally, when people have simple syntax errors and don’t see them, its because error reporting is turned off.
  4. Comment out everything in your script until  you pinpoint the problem line.
  • Once you have tried debugging without success, you can make your post. Make sure you list all the steps you took to debug, and post the relevant code. DO NOT POST ALL THE CODE. You don’t know how many posts I skip over because someone just posted their whole page, HTML and all on the page. Not only does this make it seem like you simply don’t care about the other posters, but it is incredibly irritating to see a whole HTML page slathered on a forum post. Posting the relevant code makes it seem like you actually care not only about the problem, but about the posters who want to help you. It is also a good way to test yourself. Its easy to post the whole page, but its a little harder to figure out what exactly is relevant to the problem.
  • Make sure that if you are posting code, you post it in code BBCode tags if the forums supports them (and most coding forums do)

 

Why?

Why should you follow the above directions? Well, it’s not just about getting a good answer (which following the above advice will really help). Debugging and figuring out problems is half of the battle in coding. Many times, when people who post questions like my example above get their answers, its generally in the form of code you can copy paste and move on. This may make fixing things quicker and easier, but what happens when you are on your own? Being able to debug is a very important part of programming, and while asking for help is never a bad thing, when you go through the steps of trying to solve the problem yourself, you understand the problem better. After doing a little research, you may not find the answer, but you may find some tid bits of information that you didn’t know before. Above all, learning how to program efficiently is the most important part!

Also, when you put forth the effort to research your problem, it shows on the post. Its obvious when you clearly haven’t even tried to solve this problem, opting instead to just post generic SOS messages on various forums  until someone solves your problem. Its also obvious when you put forth the effort to try to solve the problem, and are generally interested in not only fixing the problem, but not repeating the mistake you made. When someone has clearly put some effort, I will put forth equal effort into not only solving the problem, but also explaining it. I usually end up posting a few paragraphs of explanation along with annotated code (that I may have even tested on my local server!) For those who simply want the answer, and don’t care about the underlying problem, I simply put a few words of explanation and  a code snippet, and I almost never test the code. Think about it. Why should I put forth effort into fixing your problem that you aren’t willing to put effort into fixing.

 

Hopefully this gives the question poster good perspective on what a help poster thinks, and the internet will see better questions!

Functions in PHP are an integral part of any PHP coder’s programming career, whether you are developing complex social applications, or are creating simple scripts for use in a basic website. However, as I have noticed in my experience as a tutor and contributor to PHP help forums, many people have some bad habits with functions. Especially new programmers, who see functions as a way to just place code they would normally copy paste, without any notion or portability or re usability. For this tutorial, you should have a basic understanding of functions, including the syntax for creating/defining one, and calling one. Before I go over best/worst practices in terms of functions, lets go over how functions work in PHP

 

Function Basics

Functions are a very basic programming concept that many people learn within a few weeks of first learning to code. In PHP, functions have the following basic structure


function function_name(argument list){
function body

return statement (if applicable)
}

//function call
function_name(passed in arguments);

Very simple right? Lets break it down

function function_name(argument list){

The first part of this line is the function keyword. This tells PHP that you are about to define a function. The function_name is the name of your function. Like a variable name, you refer to this name when calling the function. Calling a function means to invoke its code. The argument list is an optional list of variables that you give values to when you call the function. When a function needs information that is outside of its scope, then it is best to pass this information via the argument list.

function body

return statement (if applicable)

The function body is the meat of your function. It is the code that is run every time you call the function. This is probably the most important part of any function. The return statement is an optional statement that will return information back to the line that called the function. The return statement is extremely useful, and an integral part of any functions in any programming language. Some times, you may not want or need to return anything however. An example function that would return something is, say, a function that calculates the square root of a number. An example function that may not return anything would be, say, a function that outputs (IE echo or print) some HTML that goes on a web page.


function_name(passed in arguments);

This is the function call. Here is where you actually invoke the code in a function, and pass it any relevant information via the argument list. If your function returns a value, you can set a variable equal to the function call, and the value returned from the function will be assigned to the variable. Lets take a simple example shall we!


//function definition
function square($number){
$product = $number * $number;
return $product;
}

//function call
$five = 5;
$five_squared = square($five);

Here we see an example of a function with an argument list and a return statement. Our code simply accepts a number, and returns what its square is. In our case, the variable $five_squared will contain the value of 25 (which is the value that our function returned). Lets look at another example.

function showHTML(){
echo "<h2>Welcome to my Website!</h2>";
}

showHTML();

 

 

Here we have a function without a return statement or an argument list. This function simply shows some HTML. Functions of this type are used primarily for outputting some kind of content. Of course, the return statement and the argument list are not dependent on one another, so you can mix and match functions having an argument list and having a return statement to fit your needs.

 

Scope

In a normal PHP script, variables have a script wide scope, meaning that when you define a varible, any line after that variable definition (as long as its not part of a function or class) can access and use that variable. However, functions are discreet, stand-alone entities. Variables available inside the script are not available in a function, and variables that are part of a function (including the variables in an argument list) are not available outside of the function. To illustrate this, consider the following example

$somevar = "some value";
function test($var1, $var2){
echo $somevar;//this will fail. $somevar is not in the functions scope, and thus is not available
echo $var1 . $var2;//this is correct. $var1 and $var2 are part of the functions argument list, and thus are available in the function
$somevar = "Another value";
echo $somevar;//now this will work, as $somevar is now defined inside the functions scope. However, it will echo "Another value" rather than "some value"
//because the function uses the variable from its scope.

}//end function

echo $var1 . $var2;//this will fail, as $var1 and $var2 are part of the functions scope, not the rest of the script;
echo $somevar;//this will echo "some value".

test(4,5);//outputs the following
/*************
a PHP notice will be issued for trying to use $somevar when its not defined in the functions scope
45
Another value
*************/

 

Although the above seems like it would prevent you from accessing variables inside a function that are defined outside, there is actually a way around the scope problem. There is a keyword global in PHP that lets you essentially import a variable defined outside the scope of a function to a function. To illustrate


$var = "Hello World"

function incorrect(){
echo $var;
}

function correct(){
global $var;
echo $var;
}

incorrect();
correct();


In the above snippet, the first function call (the line: incorrect();) will fail because $var is not in the scope of the function incorrect(). However, in the function correct(), we use the global keyword, which puts the variable $var from the outside script into the function. The resulting output would be: “Hello World” (excluding the notice that would be issued from the first function.)

 

Global can be nifty, but for new programmers can become a crutch. Most experienced programmers will never use the global keyword (unless they are being lazy, or providing a quick fix that they will revisit later), as it kind of defeats the purpose of a function. A function should be a completely standalone piece of code. If you use the global keyword, you are making your function dependent on the environment it is called in. This results in your function only being usable in a very specific script, and forces a programmer to change or code pages that use this function in a specific way. This makes the function much less portable, and makes reusing the function very difficult. Usually adding the global keyword to give a function a variable that would normally be out of scope is done as a quick fix for a badly written function.

When creating a function, a programmer must always plan it out. You must first determine what information a function needs, and what information it will return. In almost every case where a programmer uses the global keyword, they could have easily passed the relevant variables value into the argument list. Lets take the example function correct() above, and rewrite it.


function correct($var){
echo $var;
}

As we see, this was a very simple change (and actually results in a smaller function!). This also makes it so that our function is much more portable and reusable. For example, with the old function (the one that uses global) if we wanted to call the correct() function more than once, and echo different things each time, before each call we would have to change the value of $var. This could result in very messy code, especially if $var (or whatever the global‘ed variable is named) is an important variable in the script. Tracking a bug with a variable that is changed so much just because it is used in a function can become tedious and frustrating. However, with our new version of correct, you simply have to change the value of the parameter in the call. Lets take a look at some code to illustrate what I mean:

function correct1($var){
echo $var;
}

function correct2(){
global $var;
echo $var;
}

//output 3 different things with correct1
correct1('Hello');
correct1("I'm outputting stuff");
correct1('weeeee');

//now output 3 different things with correct2
$var = "hello";
correct2();
$var = "I'm outputting stuff";
correct2();
$var = "weeeee";
correct2();

As you can see, the code using the first function looks much nicer, and is much clearer. For someone reading just the code that outputs the information, it is much clearer what is happening when using the first function.

 

To Return, or To Not Return

Another common place I see new programmers forming bad habits is the return statements. As a general rule of thumb, when creating functions you should always think about returning something. Of course sometimes a function just doesn’t need to return anything, but more often than not, I will see a function that by all accounts should return a value, but instead output the value you want instead. Lets take an example


function showHeader(){
echo "<h2>Some text thats part of the Header</h2>";
}

This function seems ok at first (and for all intents and purposes it is, you can apply the following concept to any function of this form). It simply outputs a Header that a programmer may want to show on multiple pages. However, what if we wanted to, say, change the header size from header 2 to header 3? Of course we could create a new function, but if instead of simply echoing the HTML we want to output, we return the string to output, we can manipulate the string. So for example

function showHeader1(){
echo "<h2>Some text thats part of the Header</h2>";
 }
function showHeader2(){
return "<h2>Some text thats part of the Header</h2>";
 }

showHeader1();//we can only output the hardcoded info that is in the function.
//however with out other function we can output like the first
echo showHeader2();
//but we can also manipulate the value if we want to
echo str_replace("h2", "h3", showHeader2());//change it from h2 to h3

This example may seem somewhat arbitrary and unnecessary, so lets take an example more relevant to fledgling developers.

 

Say we have a system where we output users comments. Assume we have a function called getComment($id) which returns the comment of a specific user (who has a User ID of $id). Of course you must assume we have the correct database structure. Now assume we have another function called showComment($id) which outputs the comment from the specific user. Now, when simply outputting the comment, both would preform the same. However, what if we wanted to only show the first 500 characters? Or if we wanted to convert all the HTML characters in the comment into the html entities via the htmlentities function. With the showComment function, we would have to change the function itself to achieve this (which may result in this change breaking another part of your website). However, with getComment, since it returns a string, we can choose to simply output the returned value, or further manipulate it (with substr or htmlentities with the examples given above)

 

Of course, I could talk about proper use of functions for days, but the problems I discussed above are the most common problems I see

In my previous post (which you can reach here: http://blackscorner.me/2011/09/29/strings-and-interpolation-a-love-story-part-1/) I discussed interpolation in PHP.  Apart from professing my love for PHP’s implementation of interpolation (+1 poet points to me!), I showed some examples of its uses, and compared that to the equivalent statements using concatenation. However, besides novice PHP coders, intermediate coders and above usually know about interpolation (whether or not they know it goes by that name). However, many intermediate coders (and some advanced!) don’t know the full capabilities of PHP’s interpolation implementation. The examples I showed in the previous post only show one part of the story! There are actually two different syntax’s for interpolation in PHP, simple (which was shown in the previous example) and complex (which I will discuss in this article).

Lets take a look at an example of simple syntax just to refresh our memory (coders are notoriously forgetful!):

$var = "variable";

echo "I am a $var";

A lot of coders wrongly assume that this is the only form of interpolation available in PHP, and thus conclude that if you want to use, say, an element in an array, or an attribute of an object, you have to use concatenation. When given a scenario in which you want to use an array element in a string, many would write the following:


$arr = array("element1", "element2", "element3");
echo "the value at index 1 is: " . $arr[1];

However, PHP can interpolate the element $arr[1] using simple syntax! The above echo statement is equivalent to the following:

echo "The value at index 1 is: $arr[1]";

The same goes for associative arrays! So the following is also valid:


$assoc = array("key1" => "value1", "key2" => "value2");
echo "The value of the array at the key 'key1' is: $assoc[key1]";

Notice the lack of single quotes around the associative array’s key. As I said earlier, you can also use simple syntax to interpolate an objects attributes (or properties). Obviously the property in question has to be public. The following illustrates what I mean:

class foo {
public $var;
function __construct(){
$var = "World!";
     }
}
$foo = new foo();
//the following statements are equivalent
echo "Hello " . $foo->var;
echo "Hello $foo->var";
Here are some more examples for clarification (taken from the PHP manual)
<?php
$juices = array("apple", "orange", "koolaid1" => "purple");

echo "He drank some $juices[0] juice.".PHP_EOL;
echo "He drank some $juices[1] juice.".PHP_EOL;
echo "He drank some juice made of $juice[0]s.".PHP_EOL; // Won't work
echo "He drank some $juices[koolaid1] juice.".PHP_EOL;

class people {
public $john = "John Smith";
public $jane = "Jane Smith";
public $robert = "Robert Paulsen";

public $smith = "Smith";
}

$people = new people();

echo "$people->john drank some $juices[0] juice.".PHP_EOL;
echo "$people->john then said hello to $people->jane.".PHP_EOL;
echo "$people->john's wife greeted $people->robert.".PHP_EOL;
echo "$people->robert greeted the two $people->smiths."; // Won't work
?>
Awesome right! But thats not all! Not even close. PHP also has complex syntax (which I mentioned above). Despite what you may think, its not called complex syntax because it is in itself complex, but rather because it allows you to interpolate complex expressions. To use complex notation, you must use curly brackets ({ and }) to surround your expression you want interpolated. The PHP manual describes complex notations functionality as follows:
Any scalar variable, array element or object property with a string representation can be included via this syntax. Simply write the expression the same way as it would appear outside the string, and then wrap it in { and }. Since { can not be escaped, this syntax will only be recognized when the $ immediately follows the {. Use {\$ to get a literal {$.
Lets look at an example shall we! This will show improper use of complex syntax, then 2 examples of correct use. I will use a simple variable here.
<?php
$great = 'fantastic';

// Won't work, outputs: This is { fantastic}
echo "This is { $great}";
// Works, outputs: This is fantastic
echo "This is {$great}";
echo "This is ${great}";
?>
Of course you can also use complex notation with object properties and array elements as I did above with simple syntax
<?php
echo "This square is {$square->width} centimeters broad.";
echo "This works: {$arr['key']}";</div>
?>
Notice that I used single quotes in the array element example. Complex syntax is the only place you can use single quoted array keys, as opposed to simple syntax which, as shown above, can’t have the single quotes (if you are curious why this is, leave a comment and ill try to get back to you).  Where complex syntax really shines though, is its ability to interpolate more complex expressions than simple object propertys or array elements. For example, say you are taking advantage of PHP’s variable variables (link will be below). Assume you have a function called getName() which returns the name of the variable you want to access. You can do
<?php
function getName(){ return 'var'; }
//note single quotes and thus lack of interpolation
$var = 'value of $var';

echo "This is the value of the var named by
the return value of getName(): {${getName()}}";
//output: This is the value of the var named by the return value of getName(): value of $var
//of course, if the function is a method of some class, you can also do this:
$object = new Some_Class_With_getName_Method();
echo "This is the value of the var named by the return value of \$object->getName(): {${$object->getName()}}";
?>
Unfortunately, you can’t interpolate just the return value from a function. So the following won’t work:
<?php
// Won't work, outputs: This is the return value of getName(): {getName()}
echo "This is the return value of getName(): {getName()}";
?>
Unfortunately, since PHP depends on the variable sigil ($) to proceed the opening curly bracket to use complex syntax, interpolating functions is impossible. But PHP’s ability to interpolate many things really makes it an amazing language. What is truly unfortunate is that many programmers are unaware of how far PHP’s capabilities go! Most know about simple syntax, but many are unaware of the power of complex syntax.
Here are a bunch more examples (from the PHP manual) of things that will and won’t work with complex syntax so you can get an idea of its use.
<?php
// Show all errors
error_reporting(E_ALL);

$great = 'fantastic';

// Won't work, outputs: This is { fantastic}
echo "This is { $great}";

// Works, outputs: This is fantastic
echo "This is {$great}";
echo "This is ${great}";

// Works
echo "This square is {$square->width}00 centimeters broad.";

// Works, quoted keys only work using the curly brace syntax
echo "This works: {$arr['key']}";

// Works
echo "This works: {$arr[4][3]}";

// This is wrong for the same reason as $foo[bar] is wrong  outside a string.
// In other words, it will still work, but only because PHP first looks for a
// constant named foo; an error of level E_NOTICE (undefined constant) will be
// thrown.
echo "This is wrong: {$arr[foo][3]}";

// Works. When using multi-dimensional arrays, always use braces around arrays
// when inside of strings
echo "This works: {$arr['foo'][3]}";

// Works.
echo "This works: " . $arr['foo'][3];

echo "This works too: {$obj->values[3]->name}";

echo "This is the value of the var named $name: {${$name}}";

echo "This is the value of the var named by the return value of getName(): {${getName()}}";

echo "This is the value of the var named by the return value of \$object->getName(): {${$object->getName()}}";

// Won't work, outputs: This is the return value of getName(): {getName()}
echo "This is the return value of getName(): {getName()}";
?>
<?php
class foo {
var $bar = 'I am bar.';
}

$foo = new foo();
$bar = 'bar';
$baz = array('foo', 'bar', 'baz', 'quux');
echo "{$foo->$bar}\n";
echo "{$foo->$baz[1]}\n";
?>
<?php
 class foo {
 var $bar = 'I am bar.';
 }$foo = new foo();
 $bar = 'bar';
 $baz = array('foo', 'bar', 'baz', 'quux');
 echo "{$foo->$bar}\n";
 echo "{$foo->$baz[1]}\n";
 ?>
<?php
// Show all errors.
error_reporting(E_ALL);

class beers {
const softdrink = 'rootbeer';
public static $ale = 'ipa';
}

$rootbeer = 'A & W';
$ipa = 'Alexander Keith\'s';

// This works; outputs: I'd like an A & W
echo "I'd like an {${beers::softdrink}}\n";

// This works too; outputs: I'd like an Alexander Keith's
echo "I'd like an {${beers::$ale}}\n";
?>
Well I hope you enjoyed reading about string interpolation as much as I enjoyed writing about it. PHP really is an amazing language, and touches like this are what really set it apart from other languages, and is what’s key to PHP’s wide success.
Links of Interest
Part 1:
String Parsing (Go here for a full explanation of the examples I posted, as well as comments and discussion about it):
Simple Syntax:
Variable Variables:

I am a frequent forum goer to the website http://www.phpfreaks.com/ (My username is Mikesta707), and one thing I see a lot from PHP novices is a question like

Hey, I am trying to echo a string, but it doesn’t look right. I’m using a variable and but instead of its value all I see is the php code! my code looks like

$str = "Poop";

echo 'I have to take a $str';

please halp!

Now, anyone who has uses PHP at an intermediate level or higher will tell this hypothetical novice to either use double quotes, or concatenate. The solutions would be as follows


$str = "Poop";

echo "I have to take a $str";//double quotes

echo 'I have to take a ' . $str;//concatenation

And of course, their error will be fixed, and they can go on their merry way, or post about another problem they have. In addition to the above solutions, one could also use heredoc syntax, but that is generally reserved for multi-line strings and can be a little complex for a new programmer. Many forum posters wouldn’t bother even mentioning it.

Anyways, what many posters fail to do in these cases, is explain why what they wrote failed. As a result of this, even people who know to use double quotes or concatenate will often not really know why you have to do that, beyond “uh.. just because man.”  In my experience as a forum poster for phpfreaks, I see a solution that concatenates rather than using double quotes more frequently, and this produces a very interesting effect.

Concatenation is simple enough, and easy to rationalize the use of without wondering why you were wrong. Without knowing about the use of double quotes (or simply not realizing), one may naturally assume, after having the above problem and being told to concatenate, that they can’t put variables inside strings.

This is half correct, as it is true with single quotes, but incorrect in terms of double quotes. This can lead an overall assumption that if you want to create a string that is composed partly of one or more variables, then you must always concatenate. Over the years I have worked with quite a few developers who have been looking over my shoulder while I am writing some code, and told me, “Hey, you gotta concatenate there!” or something along those lines when using double quotes. Usually this will come with the look of quiet satisfaction at having corrected (or thinking they corrected) a fellow coder’s mistake.

Unfortunately, because of this assumption many people miss out on one of my favorite features of PHP called Interpolation,  as well as having a somewhat faulty understanding of how strings work in PHP. For those who don’t know, interpolation is defined by Wikipedia as the following:

The process of evaluating an expression or string literal containing one or more variables, yielding a result in which the variables are replaced with their corresponding values in memory. It is a specialized instance of concatenation.

Now this basically means that when you put a variable inside of a string, PHP does a sort of replace, where the name of the variable (say, $str) is replaced with its value (say, “Poop”). PHP supports interpolation, but only when using double quotes or the heredoc string syntax. To take a simple example:

//The following are all equivalent
$n = "Mike";
echo "Hi i'm $n";//interpolation
echo 'Hi i\'m ' . $n;//concatenation
echo <<<EOT
Hi i'm $n
EOT;//heredoc syntax

As I said earlier, Interpolation is one of my favorite attributes of PHP. Interpolation isn’t supported in many languages (Perl, Tcl, Ruby and most unix shells are the others). PHP uses a sigil, or a symbol attached to a variable to show its a variable (and not a global constant), unlike many languages (including C, C++, C#, java, javascript, etc.) so interpolation is possible. Interpolation really makes creating strings that would be complex or look ugly in other languages easy and clean. Lets compare two strings, one in PHP and one in C++/C#/Java/ect…


//assume we have created 3 variables, $a $b and $c in PHP, and a b and c in C++

//PHP string
$str = "The first 3 letters of the alphabet are $a $b and $c";

//equivalent string in C++/C#/java/etc...
//assume we have the proper header includes
string str = "The first 3 letters of the alphabet are " + a + " " + b + " and " + c;

As you can see, the PHP version is much easier on the eyes, and much nicer to write! Of course, most languages that don’t support interpolation have a specialized form of interpolation in function form using string formatting (for example, printf() from C/Lisp).

To sum things up, I absolutely love interpolation, and think its one of the most amazing features of PHP. Its a very simple feature, but elegant at the same time, and extremely useful. Out of all of PHP’s amazing features, I know that I use this one the most. I love interpolation so much actually, that I plan on writing a part 2 to this post very soon. It will probably be a tad bit shorter, but will have some good information.

Part 2: [Click me]

Links of Interest:

String parsing (Interpolation):
http://www.php.net/manual/en/language.types.string.php#language.types.string.parsing

Single quotes:
http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.single

Double quotes:
http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.double

Heredoc:
http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc

Interpolation (Wikipedia):
http://en.wikipedia.org/wiki/Variable_(computer_science)#Interpolation

I few weeks ago I wrote a tutorial on how to validate your PHP variables. That is the precursor to this post (You can read it here: http://blackscorner.me/2011/07/28/when-and-how-to-validate-your-php-variables/) in which I explain the basics of validating your PHP variables that you get for use with MySQL (or other SQLs) queries.

In order to get the most out of this tutorial, you will need to be familiar with the following

  • The subjects discussed in the precursor post
  • basic SQL queries
  • using MySQL or other SQLs and PHP together
  • Escape characters (for example how is ‘ different from \’)

Whats The Point?

You may wonder why we need to go through all of this sanitation nonsense in the first place. Well when using SQLs, like MySQL along with user input (Like say storing some information that they gave the website via a form), your site could be vulnerable to what is called a SQL injection. SQL injections, and how they work is a pretty broad topic (one beyond the scope of this post), but they basically consist of a user injecting bad SQL into your query to make it do something you don’t want. Lets take a very basic example:

Lets say that we have a user login system, and the query we use to make sure the login details is correct is:

SELECT * FROM user_table WHERE username=’$username’ AND password=’$password’

Assume that $username and $password are defined earlier in the script. Now if we don’t validate the script, a malicious user can enter something like:

‘ OR 1=1 —

Now, that may seem like a useless string, but now lets look at what our query looks like

SELECT * FROM user_table WHERE username=” OR 1=1 — 

Now the rest after — doesn’t matter because the — operator basically tells SQL to ignore everything after it. It is kind of like commenting out a line of code in PHP. Now what will happen is that 1=1 is always true, so the query will be true, and return a row from your table. The first row specifically in our case with the query above. This would allow a user to access someones account without the proper username and password. If you are unlucky, the admin account (if there is one) may be the first account (since its usually the first account created), and now the malicious user has access to administrative powers. Bad news

So How do we Stop them?

Well, removing all vulnerabilities from your site is almost impossible (as cyber security is an ever changing field) but protecting from SQL injections is pretty straight forward. PHP has a function made just for stopping the kind of attack you see above, but before we start using it on anything and everything we have to be aware of a few things. First, the magic quotes setting for PHP. Nowadays it is usually turned off, and it is recommended that it be turned off. What this setting does is when turned off, it automatically escapes all strings. This is a dangerous action because it is sort of like a 1 size fits all glove for sanitation that has terrible effects depending on the data. You want to sanitize SQL versus HTML vs. bbcode differently. However, if it is on in your system, you can do the following.

if (get_magic_quotes_gpc()==1){
 //its on, lets undo what happened
 //do this for all variables
 $_POST['username'] = stripslashes($_POST['username']);
 $_POST['password'] = stripslashes($_POST['password']);
 //the rest of the post variables you are gonna use.

Now of course it would just be easier to turn this setting off in your php.ini. Consult the manual for details on turning this setting off. The stripslashes() function basically does is remove all the slashes that PHP adds in order to escape the data. PHP adds backslashes before single and double quotes. As you should know, escaping slashes lets you add them to strings that are delimited by them!

 

Now the second thing I need to mention is that we need to consider what type of data we are using. If we are using string data (like usernames, passwords, etc.) then we should escape the string to prevent our basic SQL injection from happening. If it is number data, we should just convert whatever they entered into a number so we don’t have to worry about escaping it. (Entering a string where a number should be in a query will break the script, and may even show an error message that gives the attacker more information about your system. So lets go over each different scenario.

Strings:

For strings, PHP provides a very useful function called mysql_real_escape_string(). It accepts a string as a parameter and and returns the string escaped in such a way to make it safe from basic SQL injections like the one above. So lets give a code example and see what would happen if the attacker tried the simple SQL injection.


//assume we have turned off magic quotes.

$username = mysql_real_escape_string($_POST['username']);
$password = mysql_real_escape_string($_POST['password']);

$query = "SELECT * FROM user_table WHERE username='$username' AND password='$password'";
//rest of login stuff below

Now lets see what our injected query would look like

SELECT * FROM user_table WHERE username=’\’ or 1=1 –‘ AND password=”

Here we see that our escape string function made the ‘ that originally broke our query harmless, and this will not let them login as you would expect. Awesome right!

Numbers

Now  if you are expecting the user to enter in a number into the query, we should whatever they enter into a number. So that even if they enter some malicious text, it will be converted to a harmless number. Now assuming we want an integer number, we can easily do something like


$area_code = intval($_POST['area_code']);

//rest of code

Now, if you need a float number, you can use the floatval function instead of intval()!

 

Now What?

Well, we have sanitized some stuff, but not everthing! What if we are getting some information that we will show on our users profile page (like a description or a bio). In fields like these, the user can enter any HTML they want, and could even enter malicious HTML/Javascript or include malicious HTML/Javascript from another site. This is known as Cross Site Scripting (XSS) and is a very real threat to many websites. Now, the most basic way to stop this is to turn all html into html entities. That way, even if they enter HTML, it becomes harmless. HTML entities are special characters that represent HTML characters that you don’t won’t to be interpreted as HTML. For example, the less than sign (<) This can be used to show a mathematical inequality, but in HTML it denotes the beginning of an HTML tag (like <a for example). When you want to show the less than sign without the browser trying to interpret it as HTML, you use HTML entities.

PHP provides a function that converts all HTML special characters into their respective entities called, you guessed it! htmlentities(). Now lets show an example


$bio = htmlentities($_POST['bio']);

Now if the user enters some HTML, it will instead be harmless html entities.

Now, if we want to remove the HTML outright, there is also a function for that. We can use the strip_tags() function which will remove all tags. We can also specify tags to exclude from removal, which would allow you to allow users to enter links, or text formatting tags, but not other tags. For example


$bio = strip_tags($_POST['bio']);//remove all html tags
//or
$Bio = strip_tags($_POST['bio'], "<p><a>");//allow links and paragraph tags

 

And thats it for basic validation! After reading this and the previous article, you should be able to add some basic security to your website! Now there is much more to the website security field than just this, but following these simple steps should allow you to protect your site against most basic attacks. Here are some links to the functions used

 

mysql_real_escape_string(): http://php.net/manual/en/function.mysql-real-escape-string.php

htmlentities(): http://php.net/manual/en/function.htmlentities.php

intval()/floatval(): http://php.net/manual/en/function.intval.php  http://www.php.net/manual/en/function.floatval.php

string_tags(): http://php.net/manual/en/function.strip-tags.php

 

 

 

Sanitizing and validating user data to make it safe for processing is a very hot topic today, and not at all an exact science. It is very important, as many websites out there are susceptible to many different types of attacks since they don’t try to sanitize or verify any of the data they get from a user.

Now, this is a huge topic, and truly beyond the scope of a single blog post, but I am going to go through the simplest cases which should help most amateur or new web developers protect their site. To make sense out of this tutorial, you should have a basic understanding of:

  • basic PHP syntax
  • how to get input into the $_GET and $_POST super globals
  • basic string manipulation
Validation is important, because it can avoid any type of input that doesn’t even make any sense. This can make sanitizing much simpler, because you have a rough idea of what to expect at that point. Since validation usually comes first in PHP scripts, I will go over that first.
Note: I will be going over the PHP side of things, and won’t really be paying attention to or talking about creating the forms on the front end. That is simple enough and you should know how to do this before reading any further. I will also only be going over validation. I will make my next post about sanitation to complete the tutorial.

We don’t need no.. Validation!

So what is validation exactly? Well consider this: Lets say we have a simple user signup form. This form excepts an email address. We can validate this particular part of the form by making sure that its an actual email address and not some garbage. If it isn’t a valid email address, we can stop the script from inserting stuff into our database, and let the user know through some error message that the email is invalid!

Now, validating an email address may seem like a no-brainer and most secure sites on the internet already do this. Now the question becomes how? Well for emails, doing simple string checks (Like checking for the position of the @ character, or making sure there are no random characters that don’t belong in an email address) can work, the best way is to use what is called regular expressions or regex. Regex itself is a very broad topic, and so I won’t really be going over it beyond a simple explanation. So what is Regex? Well at its base its a mathematical concept that tries to match strings based on a pattern. It uses patterns to do this. You can look at a tutorial for it here: http://www.regular-expressions.info/tutorial.html

So, what kind of pattern do we need for an email address? Well luckily for us and most new web developers, the pattern to match an email address is a commonly used on, and so a quick google search will find us the answer! Consider the following


$pattern = "/^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$/";//this is the pattern we will use to determine if its an email

//it may look scary, but don't worry, you don't have to change it or understand it to use it!?
//its a good idea to read through a tutorial for regex so it atleast makes a little bit of sense though

$email = $_POST['email'];//the input from our form

if (!preg_match($pattern, $email)){
echo "Your email was invalid! Please press the back button and enter a valid email address";
exit();
}

So in this code we created our pattern that would match the email address. Then we extracted the input email address from the $_POST super global array, and tried to match them. If they did match, no error would pop up, but if they didn’t then the user would be given an error message and told to re-enter the email address. This is validation folks! A very simple implementation, but validation nonetheless. The pattern may seem scary, but don’t worry about it. We won’t ever really need to change it, and I will go over a simpler regex pattern with a more comprehensive description farther down.

Ok so we know how to validate email addresses. What about other information? Well, besides email addresses, validating other data depends on what you expect, and what rules you have given your users for that specific field. So for example, on one site, a username may have to be at least 6 characters while on another it may have to be at least 4. Some except only numbers and letters, while others allow dots and such. Lets go over some of the most common ways to validate other fields to give you an idea of what to do, and then you can adapt what you have learned to your specific site.

Username/Password minimum length

This is a very common and very easy type of validation. For this, we will use the strlen function. Consider the following


$min_uname_len = 5;//usernames must be 5 characters long at least
$min_pword_len = 4;//passwords must be at least 4 characters long

$username = $_POST['username'];
$password = $_POST['password'];

if (strlen($username) < $min_uname_len){
//error!
echo "Username's must be at least 5 characters. Please go back and try again!";
exit();
}
if (strlen($password) < $min_pword_len){
//error!
echo "Passwords must be at least 4 characters long. Please go back and try again!";
exit();
}

In the code above, we simply use the strlen function to make sure that the password and username are at least the minimum required lengths in order to continue. If they aren’t, like with the email validation, we show the user an error, and stop the execution of the script.

Restricted Characters

This is another very common type of validation that consists of not letting the user use certain characters in their input, or restricting all characters but alpha numeric characters. This, like the above, can be different from site to site, but I will go over a simple case, and you can try to figure out how to adapt it to your specific site.

Here I will use regex again, but it will be a much simpler pattern than the one we used with email. I will try to explain it so that you can alter it to fit your needs. Consider the following:


$pattern = "/[a-zA-Z0-9_-]/";
$username = $_POST['username'];//we want to restrict all characters but numbers, letters, and underscores in the username

if (!preg_match($pattern, $username)){
echo "You can only use numbers, letters and underscores in your username!";
exit();
}

So what does that regex mean? Well the / characters is simply a delimiter which is required in PHP. / is the most commonly used delimiter but you can use any delimiter you want (but if you use a delimiter that may appear in your pattern, you have to escape it in the pattern. Since / doesn’t appear at all in our pattern, we don’t have to escape anything, and our regex looks cleaner!). The [ and ] characters signify that we are using a character class. What this means is that any of the characters specified in that class will match the string. If the string contains characters that aren’t in that class, than the string will not match. The inside of the character class is pretty straight forward. We use whats called a range of characters (the first one being lowercase a to lowercase z, or a-z as we have written. The second is uppercase letters, and the last is numbers). After the three ranges, we also stick an underscore at the end so we can allow underscores to be used in the username. Not too scary right!

And link the rest of the validation code blocks, we show the user an error and exit the script if the username isn’t valid. Regular expressions are an extremely valuable tool, and can be made to match many different types of strings. Here we show how to match based on what characters you will allow, but you could easily alter it to show what character we shouldn’t allow. This I will let you do some research on to try and find out (doing your own research and experimentation is important and a great learning exercise!) If you are having a hard time figuring it out, then shoot me a comment or email and I will try to help out!

The second part of this article can be read here: http://blackscorner.me/2011/08/04/when-and-how-to-sanitize-your-php-variables/

If you have any questions/comments, please don’t be afraid to share! Hope you enjoyed reading this tutorial and learned from it!