Dashboard > HOW-TOS > Scaling PHP in a High Load Environment
HOW-TOS Log In   View a printable version of the current page.
Scaling PHP in a High Load Environment
Added by Michael Morgan, last edited by Michael Morgan on May 09, 2006  (view change)
Labels: 
(None)

Outline

  • Application structure.
    • Think twice, code once.
    • Keep your options open.
    • Don't hold out for the perfect app.
  • Server configuration.
    • To SSL or not to SSL?
    • .htaccess in production.
    • MySQL optimization.
  • The landslide model of scalability.
    • Optimize your application.
    • Eliminate redundant compiling.
    • Caching.
    • Distribute your app across multiple machines.
    • Distribute your database across multiple machines.
    • Distribute your cache across multiple machines.
    • Throw more hardware at it.
  • Resources.

Application structure

Think twice, code once

It helps to start out on the right foot. It means having a good application structure that was planned from the beginning with solid requirements. Extra time spent asking, "is this solving our problem" will often lead to whiteboard discussions that can improve the overall result of your project. And yes, it seems obvious, but once you get coding it's easy to lose track of this sometimes. It happens too often.

That said, scalability is definitely optimal and it's often worth using additional technologies to achieve it. But it may not be necessary for your app, at least in the short term.

Keep your options open

Eventually your web application will be uber-cool and be serving up hundreds of thousands of people a minute. Tomorrow it just has to work. But that doesn't mean you can't design your application with extensibility in mind, or leave hooks for future preprocessing of page output, caching, etc.

Remember that you will have to deal with your code a year from now, two years from now, and you may not want to have to redesign the entire layout of your application to add some additional technology to make your application scalable.

Some things you might want to leave wide open:

  • What database interface you use – you might want to change this later, or just write your own.
  • How you wrap your PHP execution and output – you might want to cache all of it, or localize.
  • How you store configuration options – you will definitely be adding or redoing this later as you add more technology.

Don't hold out for the perfect app.

In your travels as a web developer, you will become familiar with compromise and the difference between a deliverable and a really cool idea. Don't let yourself fall into common traps:

  • Taking forever to choose anything because it's not perfect
  • Adding unnecessary abstraction when you don't really need it
  • Adding features that aren't a part of your requirements "just because it'd be cool"

If you're running with a surplus of time, and you aren't worried about a deadline or project funding, then have fun – but in the real world you'll be pushed to deliver, and you should keep a balance between what's ideal and what's practical.

A good example would be choosing a datbase interface like PEAR::DB or adodb that lets your application use different RDBMS's (MySQL, PostgreSQL, Oracle). Always ask if it's completely necessary to use these over built-in capabilities. You'll most likely never actually need to migrate your database and you'll be spending extra time over every request for abstraction you'll never use.

The bottom line is to use what you need, but give yourself enough room to modify or replace it in the future. Don't go overboard with trying to make your app the all-knowing singing and dancing crap of the world.

Server configuration

To SSL or not to SSL?

SSL should be reserved for only protected pages that pass important or sensitive information (username, password, etc.). Don't make the mistake of serving up static content or images under a cert that doesn't need to be encrypted. This will cost you in the long run as your load increases.

.htaccess in production

It's best to do your rewrites or local configurations in your vhost .conf file instead of in an .htaccess file. Apache has to parse the .htaccess file across requests, since it's local. It is better practice to throw your Apache configurations into the .conf file to avoid that overhead.

At the same time, it helps to have an htaccess.dist file so people can configure things easily in a dev sandbox if they want to develop on your application. That's why most things like Gallery 2, Wordpress, etc., have .htaccess instead of saying, "just edit your vhost conf file". It's a pain to edit Apache's config if you don't have access, etc.

MySQL optimization

MySql query caching would be a good idea if your data sets aren't super-huge. Read more about it here:
http://mysql.osuosl.org/doc/refman/4.1/en/query-cache.html

There are other things like connection limits and memory settings that can affect the upper boundaries of what your database server can do. Try to be aware of those things.

The landslide model of scalability

There are some typical steps you will take once you start experiencing scalability issues with your application. This section tries to deal with these in the order these typical bottlenecks would occur:

  • Your application
  • PHP
  • Application server
  • Database server

Optimize your application

The first obvious thing you can do is to make sure your application isn't wasting resources. This means looking for things like:

  • Redundant queries
  • Bad or pointless looping
  • Unnecessary database connections

Once you are sure the code is pretty well written, it's time to take a look at how much time your application spends on different tasks. You can help to debug this using APD or the Zend Profiler. APD is free, so I would suggest taking a look at that. There is a beta how-to on the apd site.

I would also check out this article from Linux Journal for some tips as well as the PHP documentation about APD.

Eliminate redundant compiling

The cheapest way to get more out of your app is to utilize a byte-code compilier like eAccelerator or APC.

The goal of these packages is to reduce compile time for redundant objects / files (includes!). Out-of-the-box, PHP will compile everything across every request, which doesn't scale well at all, especially for large libraries like PEAR::DB or Smarty. Using one of these options will speed up your apps considerably – which is why many PHP sites use these. It's a no-brainer.

Caching

There are a variety of caching options, and places where you can implement a cache:

  • Page output
  • Object caching
  • Database query caching

These can be accomplished many software and hardware tools. The ones we encountered (the free and popular ones) were:

  • Smarty
  • Cache_Lite
  • memcached

Of the three, the clear winner was memcached, which is the fastest and most scalable option available for caching at any level.

Typically you'll save your app a lot of cycles by caching page output – but once that starts to crumble, there are still places you can go.

Distribute your app across multiple machines

Your application layer includes Apache and PHP and can be spread out across multiple machines. Some challenges you may face will be moving away from single-server dependent functionarlity offered by PHP. The challenge I've been faced with the most has been moving from the PHP session handler to a database-driven session handler.

Distribute your database across multiple machines

You can multiply/replicate your database reads across multiple nodes in a cluster or by RRDNS, but you will always have a singular bottleneck for database writes – that can only be battled with hardware.

Distribute your cache across multiple machines

memached allows you to distribute instances of memcached across multiple machines. Doing so allows you to reduce a cache bottleneck that might occur at one particular node.

Throw more hardware at it.

From here, it's just a matter of money. The large corporations with high traffic (like livejournal, for example) can afford to support their apps with hundreds of machines at each tier. In order to do that (and do it well) you have to have a scalable application and cache. So, given the right application structure and the right amount of money, it's possible to scale anything.

For most of us, though, taking some easy steps, like using APD and memcached, can buy us enough so that we at least won't have to get another machine.

Resources

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.7 Build:#524 Jul 28, 2006) - Bug/feature request - Contact Administrators