Dashboard > HOW-TOS > Scaling PHP in a High Load Environment > Information > Page Comparison
HOW-TOS Log In   View a printable version of the current page.
Scaling PHP in a High Load Environment
compared with
Current by Michael Morgan
on May 09, 2006 17:43.

(show comment)
 
Key
These lines were removed. This word was removed.
These lines were added. This word was added.

View page history


There are 7 changes. View first change.

 h1. Outline
  
 * [Application structure|#Application structure].
 ** Think twice, code once.
 ** Keep your options open.
 ** Don't hold out for the perfect app.
 * [Server configuration|#Server configuration].
 ** To SSL or not to SSL?
 ** .htaccess in production.
 ** MySQL optimization.
 * [The landslide model of scalability|#The landslide model of scalability].
 ** Optimize your application.
 ** Eliminate redundant compiling.
 ** Caching.
 ** Distribute your app across multiple machines.
 ** Distribute your database across multiple machines.
 ** Distribute your cache across multiple machines.
 ** Throw more hardware at it.
 * [Resources|#Resources].
  
 h1. Application structure
 h2. Think twice, code once
 It helps to start out on the right foot. It means having a good application structure that was planned from the beginning with solid requirements. Extra time spent asking, "is this solving our problem" will often lead to whiteboard discussions that can improve the overall result of your project. And yes, it seems obvious, but once you get coding it's easy to lose track of this sometimes. It happens too often.
  
 That said, scalability is definitely optimal and it's often worth using additional technologies to achieve it. But it may not be necessary for your app, at least in the short term.
  
 h2. Keep your options open
 Eventually your web application will be uber-cool and be serving up hundreds of thousands of people a minute. Tomorrow it just has to work. But that doesn't mean you can't design your application with extensibility in mind, or leave hooks for future preprocessing of page output, caching, etc.
  
 Remember that you will have to deal with your code a year from now, two years from now, and you may not want to have to redesign the entire layout of your application to add some additional technology to make your application scalable.
  
 Some things you might want to leave wide open:
 * What database interface you use -- you might want to change this later, or just write your own.
 * How you wrap your PHP execution and output -- you might want to cache all of it, or localize.
 * How you store configuration options -- you will definitely be adding or redoing this later as you add more technology.
  
 h2. Don't hold out for the perfect app.
 In your travels as a web developer, you will become familiar with compromise and the difference between a deliverable and a really cool idea. Don't let yourself fall into common traps:
 * Taking forever to choose anything because it's not perfect
 * Adding unnecessary abstraction when you don't really need it
 * Adding features that aren't a part of your requirements "just because it'd be cool"
  
 If you're running with a surplus of time, and you aren't worried about a deadline or project funding, then have fun -- but in the real world you'll be pushed to deliver, and you should keep a balance between what's ideal and what's practical.
  
 A good example would be choosing a datbase interface like PEAR::DB or adodb that lets your application use different RDBMS's (MySQL, PostgreSQL, Oracle). Always ask if it's completely necessary to use these over built-in capabilities. You'll most likely never actually need to migrate your database and you'll be spending extra time over every request for abstraction you'll never use.
  
 The bottom line is to use what you need, but give yourself enough room to modify or replace it in the future. Don't go overboard with trying to make your app the all-knowing singing and dancing crap of the world.
  
 h1. Server configuration
 h2. To SSL or not to SSL?
 SSL should be reserved for only protected pages that pass important or sensitive information (username, password, etc.). Don't make the mistake of serving up static content or images under a cert that doesn't need to be encrypted. This will cost you in the long run as your load increases.
  
 h2. .htaccess in production
 It's best to do your rewrites or local configurations in your vhost .conf file instead of in an .htaccess file. Apache has to parse the .htaccess file across requests, since it's local. It is better practice to throw your Apache configurations into the .conf file to avoid that overhead.
  
 At the same time, it helps to have an htaccess.dist file so people can configure things easily in a dev sandbox if they want to develop on your application. That's why most things like Gallery 2, Wordpress, etc., have .htaccess instead of saying, "just edit your vhost conf file". It's a pain to edit Apache's config if you don't have access, etc.
  
 h2. MySQL optimization
 MySql query caching would be a good idea if your data sets aren't super-huge. Read more about it here:
 http://mysql.osuosl.org/doc/refman/4.1/en/query-cache.html
  
 There are other things like connection limits and memory settings that can affect the upper boundaries of what your database server can do. Try to be aware of those things.
  
 h1. The landslide model of scalability
 There are some typical steps you will take once you start experiencing scalability issues with your application. This section tries to deal with these in the order these typical bottlenecks would occur:
 * Your application
 * PHP
 * Application server
 * Database server
  
 h2. Optimize your application
 The first obvious thing you can do is to make sure your application isn't wasting resources. This means looking for things like:
 * Redundant queries
 * Bad or pointless looping
 * Unnecessary database connections
  
 Once you are sure the code is pretty well written, it's time to take a look at how much time your application spends on different tasks. You can help to debug this using APD or the Zend Profiler. APD is free, so I would suggest taking a look at that. There is a beta [how-to|http://apd.communityconnect.com/faq.html] on the apd site.
  
 I would also check out this [article from Linux Journal|http://www.linuxjournal.com/article/7213] for some tips as well as the [PHP documentation about APD|http://usphp.com/manual/en/ref.apd.php].
  
 h2. Eliminate redundant compiling
 The cheapest way to get more out of your app is to utilize a byte-code compilier like [eAccelerator|http://eaccelerator.net/] or [APC|http://pecl.php.net/package/APC].
  
 The goal of these packages is to reduce compile time for redundant objects / files (includes!). Out-of-the-box, PHP will compile everything across every request, which doesn't scale well at all, especially for large libraries like PEAR::DB or Smarty. Using one of these options will speed up your apps considerably -- which is why many PHP sites use these. It's a no-brainer.
  
 h2. Caching
 There are a variety of caching options, and places where you can implement a cache:
 * Page output
 * Object caching
 * Database query caching
  
 These can be accomplished many software and hardware tools. The ones we encountered (the free and popular ones) were:
 * Smarty
 * Cache_Lite
 * memcached
  
 Of the three, the clear winner was [memcached|http://danga.com/memcached/], which is the fastest and most scalable option available for caching at any level.
  
 Typically you'll save your app a lot of cycles by caching page output -- but once that starts to crumble, there are still places you can go.
  
 h2. Distribute your app across multiple machines
Your application layer includes Apache and PHP and can be spread out across multiple machines. Some challenges you may face will be moving away from single-server dependent functionarlity offered by PHP. The challenge I've been faced with the most has been moving from the PHP session handler to a database-driven session handler.
  
 h2. Distribute your database across multiple machines
You can multiply/replicate your database reads across multiple nodes in a cluster or by RRDNS, but you will always have a singular bottleneck for database writes -- that can only be battled with hardware.
  
 h2. Distribute your cache across multiple machines
memached allows you to distribute instances of memcached across multiple machines. Doing so allows you to reduce a cache bottleneck that might occur at one particular node.
  
 h2. Throw more hardware at it.
From here, it's just a matter of money. The large corporations with high traffic (like livejournal, for example) can afford to support their apps with hundreds of machines at each tier. In order to do that (and do it well) you have to have a scalable application and cache. So, given the right application structure and the right amount of money, it's possible to scale anything. :)
  
For most of us, though, taking some easy steps, like using APD and memcached, can buy us enough so that we at least won't have to get another machine. :)
  
 h1. Resources
 * [Danga site|http://danga.com/memcached/] - This talked a lot about some of the challenges a high-load app might have just in the "about" section for memcached -- it's a good read.
 * [My blog post, talking about addons.mozilla.org in particular, part 1|http://morgamic.com/2006/03/19/scalable-php-with-phpa-memcached-and-lvs-part-1/]
 * [My blog post, part 2|http://morgamic.com/2006/04/14/scalable-php-with-phpa-apc-memcached-and-lvs-part-2/]
 * [APC page|http://php.osuosl.org/manual/en/ref.apc.php]
 * [APD page|http://php.osuosl.org/manual/en/ref.apd.php]
Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.7 Build:#524 Jul 28, 2006) - Bug/feature request - Contact Administrators