I recently got the book “MongoDB and PHP” by Steve Francia which had an interesting paragraph in the first chapter that talks about how the use of stored data has changed in recent years:
With all the problems with ORMs, you may wonder why programmers use them at all. People were willing to make the compromises to adopt ORMs for one big reason; PHP applications are by and large CRUD applications. Rarely do they use all of the rich features the relational database provides, so giving them up seemed a small price to pay for the benefit of simplified access to the data. Additionally, there weren’t really any other good options. For very simple projects, one could write SQL in one’s code, but this was hard to debug and even harder to ensure that it was done securely. PHP is famous for enabling SQL injection attacks, as inexperienced developers pass variables right into the SQL without sanitization.
Which lead me to think about how my own use of data has changed over the years..htaccess, apache, mod_rewrite, php
There are quite a few circumstances where you may need to force a site, app or page to serve over a secure SSL connection. In PHP development there are two ways to approach this problem depending on your setup. If you’re using Apache to serve your site and you have the ability to use .htaccess files then you can use a rewrite rule to simplify this task server side.
If you can use the mod_rewrite function below to enforce this server side.
You may be in the position where you only need to redirect a certain page (although you could easily modify the rewrite rule above to do that) or you don’t have the ability to use .htaccess files then you will need to handle this within your actual PHP script.php, timestamp
For a recent piece of PHP I was writing I needed to find the start and end timestamps for a given week. I came up with the following piece of code to achieve getting the timestamps.php, twitter, twitter api
This PHP script will post an update to Twitter using OAuth. You can download the complete script and library here.
While working on the new cache pages for this site I came across the interesting task of parsing GPX files with PHP. SimpleXML is quite easy to use and with the children() function you can use namespace extensions – for example with GeoCaching exported GPX files. You can get an awful lot more information from a GeoCache GPX than with standard XML parsing.
The following is an example of a GPX file from the Geocaching site:
Now, as you can see there is a large amount of data contained within this file. Using PHP and SimpleXML you can extract this data quite easily, by doing something similar to this:adodb, database, php, tweeklyfm, twitter, twitter api
The best thing about building a project that gets popular is the same as the worst things about a project that grows rapidly. Tweekly.fm is currently gaining 1,500 users per month most of which decide to publish on a Sunday. This increases the load and run time of the delivery engine that pushes tweets outbound. The other side of an increased user base is that user pages get more popular. Although this is a good thing overall and shows our popularity – it also has its downsides.
The queries to show simple people and shared by counts are expensive to run. We’re indexing just short of 25,000 user records and running the queries in the original way brought the servers down to a sluggish speed. My response to this was to rewrite the entire user pages and introduce caching on a couple of levels.
The diagram below shows the original connections between our servers and our users. As the diagram shows, there was no caching present through the system. This presented problems under load and ended up with the user pages of the site being temporarily postponed.
After a little research and crunching of numbers I came down to the conclusion that caching points at both the database server and the website (abstraction layer). Implementing the ADOdb layer underneath the existing database links has proven to be a great success. Its boosted response times over 300%. The next diagram shows the new caching points added the the system.
Point A is the abstraction layer at the site level. At peak times, the queries issued here are cached for an hour. This scales back to 30 minutes at low load times. Point B is the connection between the web server and database server. Queries are cached at the database constantly via the MySQL Query Cache. Finally Point C is the connection to the Twitter API. All calls are cached for 30 minutes where possible, but due to the nature of the API and service in general I’m not currently caching too much here. What is not shown on the diagram is the connection to the last.fm API because this data is automatically cached and updated when needed.
The effects of these changes will be noticeable by all users because the front-end of the site will be a lot snappier and responses should be quicker too. Hopefully this is just one of many changes that can improve the system for everyone.