Today we take a short break from learning PHP. I would think that the bulk of the people reading these tutorials would like to learn about writing their own database driven web applications using PHP (that would be online shopping carts, message boards, and most any other use you would have for PHP scripts on the internet). There are definatly more concepts in PHP that we need to cover (looping, arrays, etc…) but I thought we could get a little background on a database server and then use a real world example to introduce looping in PHP later.
Today, we will talk ONLY about the database. Specifically, we will use the MySQL database server. This is a great open source database system that you can download from www.mysql.com. Next, things are MUCH easier if you have a nice interface to work with. By default, MySQL doesn’t come with a GUI interface. You can download their inteface, but last I checked, you needed two programs to do everything, and they were really more in the beta phase. Don’t judge the quality of the database server by the quality of these tools, they are new and I suspect written by a different programmer. In any case, SQL Yog is a great one. Get it at www.webyog.com. Download the free version.
If you already have a web server through a web hosting service, chances are that you already have MySQL available. You may need to look through the management interface that they provide to create a database or to set your passwords. Also, you may want to look at the option that allows you to specify what computers users can connect from. By default, many are set to local host. This is more secure, but it means that connections can only occur from software running on that server. If you want to use SQLYog, you will need to put in your IP, or put in the wildcard % to allow any IP to connect. Be aware that this means that you are loosening your security by doing so, so be careful. You can’t hold me responsible for messing up your own system 🙂
If you already have MySQL provided by your web hosting company, they may also have a management interface installed for you. PHPMyAdmin is a popular web based interface and will work great for our examples. In either case, you may have to log in with your username and password. The server (if needed) will be the name of the computer where the database server is located (localhost works if it is on the current computer, or you can use the IP address also). Real database servers have security built in from the start, not as an after thought like in Access or Excell. This is great for allowing multiple users or keeping hackers out.
Now for a little background. A database server is different than Excell and is also different from Access. It is BETTER! Of course, there is no need for a full featured database server for some tasks, but make no mistake, Access can’t hold a candle to a real database server like MySQL, MSSQL, or Oracle. The point of a database server isn’t to look pretty, it is super optimized to find and crunch data. Period.
At first, it will seem limited, especially for those who have worked with Excell or Access (“How do I format my page?”, “Why can’t I put an equation in this field?”, “Why can’t I create forms like in Access?”). Let me just say that a database was made to do one thing… Crunch DATA!! DATA, DATA, DATA!!!! All those other features present in other apps are nice, but they would interfere with the insane speed of the data processing in a real server. Try putting 1 million records into an Access database, then do the same thing in a MySQL server. You would be amazed at the difference in speed. You wouldn’t even dream of creating an Excell file that big.
A database server is the software that does the work. Internally it may hold several databases. Each database can be thought of as an Excel workbook (Excel file). It doesn’t actually hold data, it is just a container for tables which do hold data. By breaking things up into several databases, you can segregate the information and apply different security settings for each database. This means that I can have a database full of my companies information on the same server as your companies information and not have to worry about you getting into my stuff (try sharing an Access database and keeping your users limited to only certain data, or sharing an Excel file and only allow users to view certain sheets!). In these databases, you can put tables. A table is similar to an Excel work sheet in that you put your data there (it has columns and rows).
So how do I let my users enter data? You don’t give them direct access to the database. Instead, you use some language such as PHP to work with and manipulate the data via SQL statements. Then, in the case of PHP, you use HTML tags to beautify the information and make it appealing and easy to work with. The presentation layer (integrated in Access as forms) is a seperate function and is not included directly in the database server itself. This allows you to have a C++ application that connect to your database at the same time that your PHP web application is showing your products in real time online. You have many choices. With Access, you are stuck doing things the Access way and any one who wants to use the Access database has to have purchased the Access software and installed it on their machine to use it (you can’t open an Access MDB file without the Access software). With a database server, you can pick your language and development environment and make your front end look and act any way you want.
The primary method of communicating with a database server (There is a published standard even!) is via the SQL language. It isn’t a full featured language like a programming language, but it is decriptive enough to allow you to describe what you want the database to do. You can do just about anything on the database by using SQL statements (even manage it) but that is a bit overkill. It is much easier to use a GUI interface (PHPMyAdmin or SQLYog) for many of the menial tasks such as creating users.
The first task is to create a new database. A database server can host many databases. The security measures in place allow you to sandbox users in their own database and have full control over their own area. For those of you with a management interface on your web server, you may need to use the create new database function. Your user account itself doesn’t have permissions to creeate a database directly. Look around (not in PHPMyAdmin!). For the rest of you, there is probly an area to create a new database by entering the name. These interfaces simply take the name you specify and run the CREATE DATABASE command which would look something like this:
I generally use CAPITALS for all SQL keywords as this helps your eye distinguish between the different elements. This is not a requirement of SQL, but I recommend it.
This SQL statement can be entered into your SQL GUI (SQLYog or PHPMyAdmin). In some environments, as mentioned above, you may not have access to run this command directly (use your web server management software to create the database or contact your hosting company to get setup). If it works, you should see your database listed.
Next, we need to create a table. The Database itself (the thing you just created, not the server) is just a container that things can go in. The tables are where the actual data is stored. Think of it as an Excell work sheet. The Excell file (generally called a work book) can hold many work sheets (sheet1, sheet2, etc…). The Excell file corresponds to the Database while the individual sheets correspond to a table. Now that I have used that analogy, don’t make assumptions about the capabilities of a database table based on the cool calculations you created in Excell, that is where the similarities end.
To creat a table, the easiest way is to use the GUI interface and just punch in the fields. We will do a simple table the hard way (using SQL statements) to show you how its done. Each table is made up of columns (think of the spread sheet). A big difference between Excell and a database is that Excell has what appears to be infinite columns and rows already available. In each of these columns and rows, you can place ANY type of data (1, 5, 2.7, “howdy”, etc..). There is no constraint. In a database table, you have to define each column including its data type. When the data type is set, ONLY that type of information may be stored there (attempts to put incorrectly formatted information will either result in an error or result in that field value being thrown away!).
Ok, I know, it sounds very limiting. Why would you impose these rules? One word, Speed! By imposing these limits, the database server can take advantage of the characteristics of computing systems. The way the information is stored on the physical medium (hard disks!) is now predictable! We can now predict that each row will be exactly XXX number of bytes long and it makes it easy to figure out where row 100 is on the hard drive without having to wade through every record to find what we are looking for. Additionally, databases use indexing as another way to pre-calculate those jump points. For a small Excell file, this could be overkil, but for a multi million record database (think of Hotmails list of users!) this is absolutely necessary. Could you imagine waiting 5 minutes for hotmail to look up your user information when you were logging in?
Again, remember that the presentation layer (PHP/HTML) allows you to manipulate and beautify this information prior to presenting it to your users, so it isn’t quite as limiting as it sounds. And for those Access junkies, our presentation layer is more versatile and powerfull than yours (sorry, not an opinion, it is a fact, we can plug in the language/tools of choice).
The last thing to note before moving on. To take advantage of the databases strenghts, we need to have at least one indexed column. The first one is generally called the Primary Key. This column identifies each record uniquely. Usually just a numeric column is adequate and most database servers support an auto increment feature which means that a new record will just get the next available number (you don’t have to try and figure it out). Oracle likes sequences, which is a little bit hokey, but gives you more flexibility (they like things to be more powerfull and more difficult 🙂 ).
Ok, here it is at last, the SQL statement to create a table…
The create table statement will cause the database to make a new table with the specified columns and options. First, there is the name `mytable`. This helps you to identify which data you want (think of sheet 1, sheet 2 in Excell) and will be used in later queries.
Note that the ` symbol is used to surround our table and column names. This isn’t needed most of the time, but the ` symbol works kind of like a quote. This helps keep your column name from being mistakenly interperated as a key word. For example, if you wanted a table named bigint (look at the ID column), this could confuse the server, so you could put ` symbols around it to give the server a clue that you want that as a name not a data type.
The table we created will have 4 columns. The first column is a numeric column called ID. The data type in this case is a bigint. Big ints in MySQL are I believe 64 bits. A 32 bit integer can have a range of about -2 billion to postive 2 billion (you can look up the exact number) so a 64 bit integer can be REALLY big! If you make a 32 bit number unsigned, then you can have a range of zero to 4 billion (signed numbers use 1 bit to indicate a positive or negative, therefore the difference in ranges).
The reason for our first column is so that we can have an indexed primary key. The indexing process is what makes the searches REALLY fast (familliar with google desktop search? it finds stuff really fast on your computer because it has previously indexed it for you). Also, this column is set to Auto Increment which means we don’t have to specify the number, it will fill one in for us (the next number in line). With this we can uniquely identify each record (we could theoretically have two users named bobs, though in practice this isn’t a good idea). Really this column is a maintenance type column and doesn’t ever have to be exposed to your database users (know what your unique identifier is for your hotmail account? not including your email address ). This column can ONLY hold whole numbers (integers).
The next column (user_name) is defined as a varchar. Varchar columns can be zero to 255 characters long in MySQL. The number (varchar(200)) defines the max length of the values held in this column. This means that some one can enter a user name of 1 to 200 characters in length. If the user name is over 200 characters, it is cut off at the 200th character. 200 characters is a bit over kill for a user name, so there ends up being some wasted space in each record. The idea of a varchar versus a char field is that the record will shrink (take some of the air out and not waste the space) where a char field would allocate all 200 characters even if you only used 1. Char fields can be searched faster, but it can waste space. I generally just use varchar for most purposes. If you needed to hold a LARGE ammount of information (a whole document, something more than 255 characters) you could use a TEXT field.
The password field is similar, and we have an insane number of characters specified here also (200). You can have a whole poem as your password according to this. One thing that we programmers like to do is obfuscate the actual data when it comes to sensitive information such as passwords. We would rather that if some one did happen to look at the contents of the table, that every system on the network wouldn’t be compromised (most people use the same password in many different places). One way to mask the password is to use a hashing algorithm. It is an encryption that can’t be reversed. Also, most hashes (if not all?) create a hash of the same number of characters EVERY TIME regardless of the ammount of data you feed it. For instance, you could give the hashing algorithm a two gigabyte file and it would would come out with a 40 character hash. If you hand the hashing algorithm 1 character, it would still come out with a 40 character hash. In the case of the two gigabyte file, if you change just one bit in the file, then you would get a completely different hash. This is also a good way to ensure that the file is in tact after a download (notice any MD5 hashes listed next to a download link?). The fact that these hashes are not reversable means that the only way to crack them is to start with the correct password, hash the password, then compare the two hashes to see if they match (this is exactly how some authentication mechanisms work). When you are trying to crack a hash, you would have to make every possibl guess until the hashes match (this could take weeks, month, or even years!).
Moving on, we have a column called age. This column we made a integer. This is actually over kill since most people don’t live past 100 (int ranges from -2 billion to +2 billion). Since it is an integer, we can’t have 2.5 years, only 2 or 3 (which is fine for what we want).
Lastly, we included the NOT NULL syntax on most columns. This tells the database that each column has to be set. A NULL value indicates an empty field (meaning NOT SET). This is different from 0, or from an emptry string (those are actually values). NULL is a complete lack of value. It is a nice feature in some cases, but unless there is a reason to use it it simplifies things to not use them. By using the NOT NULL option, we specify that the column has to have a value, even if it is a default value (like 0 or an empty string).
The final option specifies which column will be the primary key. The primary key just adds an index on that column, but in this case, it is the primary index.
Most SQL statements have a very similar logical layout. We start with the command (CREATE TABLE) then follow with the name of the table (mytable) then define the columns. Notice that the columns have the ( and ) symbols wrapped around them. This gives the db a clue that we are starting and ending the list of columns in the table. The parenthesis will be used more in future SQL statements.
Now that we have a table, we can start putting data in it. When you work with SQL as a programmer, you generally use a GUI (like SQLYog or PHPMyAdmin) to do your administrative tasks (creating users, tables, databases). When you write your software though, you need to know how to manipulate the data from your program. This is where you use SQL exclusivly. The bulk of your usage is constrained to four different types of access which include getting data (SELECT), changing data (UPDATE), adding data (INSERT), and removing data (DELETE). These are the four main types of SQL queries that you need to master to write database driven applications.
This article is getting long, so I will show an example of each query here, but I will break each one down in later articles. Notice the similarities between the four statements.
Computer Magic And Software Design