More on Multi-core

Posted by Ronald M. Zownir Sat, 28 Jun 2008 23:41:00 GMT

There is a very good article at InfoQ that’s related to my post on the rise of multi-core computing and parallel programming.

Rails Deployment Options 2

Posted by Ronald M. Zownir Fri, 20 Jun 2008 02:45:00 GMT

There are quite a number of deployment options for Rails these days. You have a choice on a variety of things and the list of options is ever expanding. What follows is my short list:

Ruby Virtual Machine

Several Ruby implementations can run Rails. Others cannot. The ability to run Rails is a major achievement for alternative Ruby VMs. For detailed comparisons check out Antonio Cangiano’s blog. He does the Ruby shootout.

  • MRI
    • 1.8 standard
    • Ruby 1.8.6 is the recommended version to run Rails on
  • YARV
    • 1.9 standard
    • Significantly faster than MRI
    • Rails is not yet fully compatible with Ruby 1.9
    • Many gems are not compatible with Ruby 1.9
  • JRuby
    • Java implementation of Ruby
    • Runs Rails
  • Rubinius
    • “Ruby in Ruby”
    • Runs Rails
  • Ruby Enterprise Edition
    • From the creators of mod_rails
    • Fork of MRI
    • 33% less memory consumption on average when used with mod_rails
  • MagLev
    • Commercial
    • Pending release
    • Lots of promise in terms of performance and features, but won’t run Rails for some time

For critical production applications, there are really only two implementations you should consider. If you are using mod_rails, go with Ruby Enterprise Edition. Otherwise, the standard Ruby implementation, MRI, is the way to go. The other implementations are progressing rapidly and in time will be good to go for production.

Server Configuration

There is much activity in this area. These are the choices worth noting:

  • nginx + mongrel | thin | ebb | fuzed (yaws)
    • nginx is a powerful lightweight frontend server/reverse proxy/load balancer that can take a licking and keep on ticking
    • mongrel is the veteran backend web server for Ruby on Rails
    • thin is an evented backend server that’s faster than mongrel and supports unix socket connections
    • ebb is an evented backend server written in C that’s faster than thin and also supports unix socket connections, but it uses more memory than thin while idling
    • fuzed allows Rails to be served up by yaws, a server written in Erlang that provides an unparalleled degree of concurrency
  • Apache + Passenger (mod_rails) + Ruby Enterprise Edition
    • New and exciting deployment option for Apache
    • Easy to setup
    • Deploying an app can be as simple is uploading your app
    • mod_rails and Ruby Enterprise Edition, both developed by Phusion, together provide a 33% lower memory footprint (for Rails) on average
    • Integrated monitoring and load balancing – monitors Rails processes and starts/kills them as necessary based on demand
  • LiteSpeed
    • Commercial
    • Relatively easy to setup
    • Better performance than most other solutions
    • Despite its qualities, not amazingly popular

I only mention LiteSpeed because of its performance. Few people actually use it for serious Rails deployments. I omitted lighttpd from the list because nginx has stolen the show. Ancient solutions like fastcgi were also omitted.

I personally use nginx + thin. I have not transitioned to ebb because of higher memory consumption (at least on the low end). I included the fuzed project in my list because I find yaws and Erlang fascinating. Yaws puts Apache to shame when it comes to concurrency. I’m not sure how polished the fuzed project is, but it looks like a contender to me! It’s also good to see cooperation between Ruby and Erlang. Mongrel, thin, and ebb are all good options. It all depends on your needs and preferences.

I have not tried out mod_rails. It is being touted as a breakthrough solution because of how simple it makes deployment. Not to take anything away from it, but my impression is that it is more for deployment novices and people with shared hosting provided by operations like DreamHost. With WebFaction, you have the freedom and ability to build your own stack. I’ve made this a breeze with a shake and bake shell script. Nginx is a better frontend server than Apache in its ability to serve static pages and with regard to memory usage. What would be great is if mongrel/thin/ebb could take advantage of the memory saving features of Ruby Enterprise Edition. I’m sure that the mod_ruby solution is outstanding. I will check it out for myself and report.

Load Balancing

Load balancing allows your applications to scale horizontally.

  • Hardware
    • For very large applications
    • Most advanced
    • Expensive
  • HAProxy
    • For large applications
    • Very advanced
    • Very difficult to setup
  • nginx-upstream-fair
    • Third party module for nginx
    • Adds fair load balancing to nginx (replaces standard round-robin load balancing)
    • Very simple to setup
    • Small to large applications

I use the nginx-upstream-fair module for load balancing. Written by Grzegorz Nosek, the module works very well and is so easy to setup that there is no reason not to do so.

Monitoring

To make sure that your processes are behaving, you need a process monitor.

I use monit. I haven’t tried the god gem, but I’ve heard good things.

Multi-core Processors and Parallel Computing

Posted by Ronald M. Zownir Wed, 18 Jun 2008 16:30:00 GMT

The performance of modern computer processors can be likened to space in Manhattan. On the island of Manhattan, the fundamental problem of scaling outward is overcome by scaling upward. The opposite is true in today’s computer processors. Clock frequency is limited by physical and economic factors, such as power/cooling requirements. Computer performance continues to improve at a predictable rate, however, because an increasing number of processors are used to work in parallel. Methods for utilizing multiple processors include:

These technologies can be combined. Apple’s Mac Pro can be equipped with two quad-core processors. Sun manufactures multi-core processors with multiple hyper-threads per core. An SMP capable UltraSPARC T2 Plus ships with 8 cores and 4 hyper-threads/core. That’s virtually equivalent to 32 cores per processor. A computer cluster can be composed of just about any computer system that can be networked.

From the list above, the most recent technology to enter the market is multi-core. Multi-core technology represents a fundamental shift in processor design. Performance is driven by core quantity rather than clock frequency. Clock frequency is still important, but not as much as it used to be.

It is no coincidence that Intel dropped the venerable Pentium name. The Pentium name correlates computer performance directly with clock frequency. The switch to the Core name helps consumers unfamiliar with the concept of benchmarking to discern apples from oranges. It also serves to forge a strong association between multi-core technology and Intel.

Multi-core technology has also changed the landscape of software development. Performance is now concurrency based. It’s no longer a sure bet that software will run faster if programmers leave it up to technology turnover alone. For best performance, software must be explicitly written to take advantage of multiple cores. Otherwise, performance is limited to that of a single core. All programs can benefit from multi-core technology at the operating system level through multitasking. Different processes can be handled concurrently by different cores. This means that a multi-core computer will not get bogged down while running a CPU intensive application. For the average user, only a few cores are sufficient to experience the full extent of this benefit.

Sequentially written programs can only utilize a single core. To utilize multiple cores, these programs must be parallelized. The degree to which a program can be parallelized determines how much faster it can run on a multi-core machine and how many cores are required to approach maximum performance. Parallel programming is subject to Amdahl’s Law.

Many problems are easy to parallelize. These problems are called embarrassingly parallel. Other problems require various degrees of cleverness. Some problems are fundamentally sequential. Generally speaking, the larger a problem, the more likely it can be broken down and parallelized.

Parallel programming is inherently more complex than sequential programming. It introduces a unique set of behaviors which can result in errors that are difficult to debug. One such behavior is the race condition, where an outcome is sequence dependent. Even worse, nearly every programming language is fundamentally flawed in its support for parallel programming. Shared memory, locks, and mutexes are no good. Erlang gets it right. (I am currently learning the language and may write more extensively about it in the future.) However, Erlang may be too strange to achieve critical mass. I hope that this is not the case.

The asymmetry between hardware and software development is well recognized. Unless something profound emerges, rapid expansion in processor cores per computer (“core sprawl”, to coin a phrase) will significantly widen the gap. Automatic or assisted parallelization would be tremendous. Unfortunately, there has been little to show for many decades of work on automatic parallelization.

Many people, companies, and institutions are hard at work trying to make parallel programming easier. Some encouraging news comes from Apple. Practically lost among the iPhone 3G hoopla at WWDC 2008, the basic plans for Mac OS X 10.6 (Snow Leopard) were publicly disclosed. The new operating system is supposed to be much leaner than its predecessor and multi-core optimized. Multi-core optimization comes from a set of technologies together called Grand Central. According to Apple:

Grand Central takes full advantage by making all of Mac OS X multicore aware and optimizing it for allocating tasks across multiple cores and processors. Grand Central also makes it much easier for developers to create programs that squeeze every last drop of power from multicore systems.

The most detailed account I have found about Grand Central comes from RoughlyDrafted (found via Mac Rumors). Other interesting articles on Grand Central come from AnandTech and Mac Rumors. Apple’s parallelization solution presumably works by “handling processes like network packets”. That would make it easier to delegate work across multiple cores.

Multi-core technology represents an exciting convergence. Personal computers have become very much like supercomputers in terms of performance scaling. Parallel programming techniques for supercomputers can be applied to modern personal computers. Clustering and distributed computing in general will benefit significantly from the rise in parallel programming competency. New and exciting applications will result and web application scaling will become easier.

Ruby on Rails Stack on WebFaction 1

Posted by Ronald M. Zownir Fri, 11 Apr 2008 16:53:00 GMT

I’ve created a shell script to build a complete Ruby on Rails stack (application environment) on WebFaction. Although written with WebFaction users in mind, apart from a few minor details, the script is actually generally applicable. All you have to do is edit a few variable assignments (install path, rails app name, and service ports) at the beginning of the script and execute. In less than 20 minutes, your rails app will be up and running with nginx reverse proxying (and fair load balancing) to a pair of thin servers and with monit keeping watch.

In case you’re unfamiliar with thin, it’s the likely successor to mongrel. It uses mongrel’s excellent http parser, provides various overall enhancements, and offers a number of features mongrel lacks. I specifically chose to use thin on WebFaction because of its support for unix socket listeners. For more technical information, see the comments in the script and the accompanying README.markdown file.

What you get:

  • Ruby
  • RubyGems
  • Gems: rails, merb, mongrel, mongrel_cluster, thin, capistrano, termios, ferret, acts_as_ferret, god, sqlite3-ruby, mysql, and typo
  • Git
  • Nginx (with nginx-upstream-fair third party module)
  • Monit
  • Startup scripts and working default configuration files for nginx and monit

I will try to keep this script reasonably up to date at GitHub. Last updated July 21, 2008.

Get it from GitHub

Rails and Git

Posted by Ronald M. Zownir Mon, 17 Mar 2008 02:00:00 GMT

My notes from Scott Chacon’s screencast. You should check it out for yourself. It is definitely well worth your time.

Instantiate a git repository with a newly minted rails app.

rails railsapp && cd railsapp
git init-db
touch .gitignore

Add the following lines to .gitignore:

config/database.yml
tmp/*
log/*

Add all the files to the repository and commit all.

git add .
git status # To check the status of the working copy.
git commit -a -m "Initial commit"
git log # To see the log for the repository.

Create a remote git repository from the one just created.

cd ..
git clone --bare railsapp/.git railsapp.git
scp -r railsapp.git username@remote-machine:/home/username/git-repos
cd railsapp
git remote add gitserver username@remote-machine:/home/username/git-repos/railsapp.git

Replace gitserver with a name you want to reference the remote machine by. Make sure that the path to the git binaries is defined in ~/.bashrc and not ~/.bash_profile because remote commands load the former and not the latter. Information about the remote is added in the git config.

git push gitserver # Push the code in local repository to gitserver

On the remote machine, in railsapp.git:

export GIT_DIR=.
git log

Branching and merging in git.

git branch -a # Show all git branches (including the remote machine).
git branch # Show all local git branches.
git checkout -b experimental # Create and switch to new branch "experimental".
git checkout master # Switch back to master branch.
git checkout experimental # Switch back to experimental branch.

To merge experimental into master:

git checkout master # Switch to the master branch as the working copy.
git pull . experimental # Does a fetch and then a merge; you could just merge.
git add filenameinconflict # Fix files in conflict and then do a git add.
git commit -a # After merging do a commit.

After merging the experimental branch into master, we’re finished with it so we can delete its identifier. The branch’s change history will still be there but the branch name is gone. To do so:

git branch -d experimental
git branch # See that the branch name is deleted.
gitk --all& # Visualize the change history using a TK GUI.

Database

database.yml

development:
  adapter: sqlite3
  database: db/development.sqlite3

test:
  adapter: sqlite3
  database: db/test.sqlite3

production:
  adapter: mysql
  encoding: utf8
  host: localhost
  database: production_db_name
  username: mysql_username
  password: mysql_password

Mongrel Cluster

mongrel_cluster.yml

--- 
user: user
group: user
environment: production
address: 127.0.0.1
port: 3000
servers: 2
cwd: /home/user/webapps/railsapp/current
log_file: log/mongrel.log
pid_file: tmp/pids/mongrel.pid

Capistrano

In railsapp, execute:

capify .

Capfile

load 'deploy' if respond_to?(:namespace) # cap2 differentiator
Dir['vendor/plugins/*/recipes/*.rb'].each { |plugin| load(plugin) }
load 'config/deploy'
load 'config/mongrel' # mongrel overrides

deploy.rb

set :application, "railsapp"
set :repository,  "user@webxx.webfaction.com:/home/user/git-repos/railsapp.git"
set :domain, "webxx.webfaction.com"
set :deploy_to, "/home/user/webapps/#{application}"
set :mongrel_conf, "#{current_path}/config/mongrel_cluster.yml"
set :scm, :git
set :deploy_via, :remote_cache
ssh_options[:paranoid] = false
set :user, "user"
set :runner, "user"
set :use_sudo, false
role :app, domain
role :web, domain
role :db,  domain, :primary => true

# If the production web server doesn't have access to your git server,
# add the following two lines.
set :deploy_via, :copy # instead of :remote_cache
set :git_shallow_clone, 1 # optional, but makes things faster

# moves over server config files after deploying the code
task :update_config, :roles => [:app] do
  run "cp -Rf #{shared_path}/config/* #{release_path}/config/"
end
after 'deploy:update_code', :update_config

mongrel.rb

# mongrel-based overrides of the default tasks

namespace :deploy do
  namespace :mongrel do
    [ :stop, :start, :restart ].each do |t|
      desc "#{t.to_s.capitalize} the mongrel appserver"
      task t, :roles => :app do
        #invoke_command checks the use_sudo variable to determine how to run the mongrel_rails command
        invoke_command "mongrel_rails cluster::#{t.to_s} -C #{mongrel_conf}", :via => run_method
      end
    end
  end

  desc "Custom restart task for mongrel cluster"
  task :restart, :roles => :app, :except => { :no_release => true } do
    deploy.mongrel.restart
  end

  desc "Custom start task for mongrel cluster"
  task :start, :roles => :app do
    deploy.mongrel.start
  end

  desc "Custom stop task for mongrel cluster"
  task :stop, :roles => :app do
    deploy.mongrel.stop
  end

end

mongrel_cluster with Nonconsecutive Ports

Posted by Ronald M. Zownir Sun, 16 Mar 2008 02:30:00 GMT

Need to operate mongrel_cluster with nonconsecutive ports? No problem.

WebFaction assigns ports to its users through the control panel. By design, the panel assigns ports in such a way that users hoping to officially stake claim to a consecutive block of ports are out of luck. Ports that the panel intentionally does not assign can be put to use, but let us suppose that this practice is frowned upon. If you are interested in running mongrel_cluster, walking the line requires a little bit of effort. Out of the box, mongrel_cluster spawns mongrel_rails listeners on consecutive ports. Configuration is limited to specifying the first port and the number of instances. The situation outlined requires a more precise port configuration and that in turn requires modification the the mongrel_cluster code. Luckily, this modification comes down to a one line addition.

The file requiring modification is lib/mongrel_cluster/init.rb inside the mongrel_cluster gem directory. The easiest way to find and open this file for editing is to execute the following command:

nano `locate lib/mongrel_cluster/init.rb`

Locate the read_options method. In version 1.0.5, it should read:

def read_options
    @options = { 
        "environment" => ENV['RAILS_ENV'] || "development",
        "port" => 3000,
        "pid_file" => "tmp/pids/mongrel.pid",
        "log_file" => "log/mongrel.log",
        "servers" => 2
    }
    conf = YAML.load_file(@config_file)
    @options.merge! conf if conf

    process_pid_file @options["pid_file"]
    process_log_file @options["log_file"]

    start_port = end_port = @only
    start_port ||=  @options["port"].to_i
    end_port ||=  start_port + @options["servers"] - 1
    @ports = (start_port..end_port).to_a
end

Add the following line to the end of the method:

@ports = @options["ports"] if @options["ports"] && !@only

What this addition does is acknowledge a parameter named ‘ports’ in config/mongrel_cluster.yml. Unless a single port is specified on the command line using the --only option, ‘ports’ will be respected over the ‘port’ and ‘servers’ parameters used to specify a continuous range. The ‘ports’ parameter should be accompanied by an array of integers in YAML format. An example mongrel_cluster.yml file that defines nonconsecutive ports follows:

---
cwd: /home/user/webapps/railsapp
environment: production
user: user
group: user
address: 127.0.0.1
log_file: log/mongrel.log
pid_file: tmp/pids/mongrel.pid
ports:
- 3333
- 3335
- 3359
- 3401

The one line addition does not allow you to define discontinuous ports on the command line. You must edit mongrel_cluster.yml to do so. This is merely a matter of convenience and has no operational impact whatsoever.

SSH Public Key Authentication

Posted by Ronald M. Zownir Wed, 27 Feb 2008 04:54:00 GMT

If you use ssh a lot, you should really take the time to learn about public key authentication. It is more secure than password based authentication and with the help of ssh-agent, there is no need to enter a passphrase on each and every login. Setup is well worth the effort. I use public key authentication to ssh into my WebFaction shell account from my Macbook. The following instructions document how I set that up.

Instructions

~/.ssh Directory Creation

User specific ssh data is stored in the ~/.ssh directory. On both the client and the server execute:

mkdir ~/.ssh
chmod 700 ~/.ssh

If the directory already exists, make sure that the permissions are set to 700 (rwx------).

Key Pair Generation

Create the key pair on the client with:

ssh-keygen -q -f ~/.ssh/id_rsa -t rsa

Enter a passphrase when asked. It should be at least 16 characters long and not your account password.

Public Half Key Dissemination

Upload id_rsa.pub to the server with:

scp ~/.ssh/id_rsa.pub username@remote-machine:~/.ssh/

Replace username and remote-machine accordingly.

The public key data must be appended into the ~/.ssh/authorized_keys file on the server:

cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
rm ~/.ssh/id_rsa.pub

SSH into Remote Machine

The first time you ssh into the remote machine from the client, execute:

ssh -o PreferredAuthentications=publickey username@remote-machine

Again, replace username and remote-machine accordingly. You will be asked to enter your passphrase.

Passphrase Tedium

Entering the private key passphrase each time you ssh into the remote machine can drive you nuts. If you are using Mac OS X 10.5 (Leopard), you have the option to save the passphrase in the Apple Keychain at the passphrase prompt. This feature isn’t available in Mac OS X 10.4 (Tiger) and lower. However, SSHKeychain gives you similar functionality. If your using another Unix-like system, check out the first resource below.

SSHKeychain Primer

I have a number of iMac G3s that I still use regularly. There is no out of the box keychain integration with Mac OS X 10.4 (Tiger), so I decided I would try out SSHKeychain. Setting up SSHKeychain was a little confusing at first, so I’ll explain the basics here.

There is nothing special about installation, although an installer is involved rather than a simple drag-and-drop action. Once installed, open up SSHKeychain from the Applications directory. Open up the Preferences dialog box. You can do this three ways. You can click “SSHKeychain” at the top left of the menubar and select “Preferences…”, click the keychain icon at the top right of the menubar and select “Preferences…”, or right click/click and hold the icon in the dock and select “Preferences…”. Select the “Environment” tab and check the “Manage (and modify) global environment variables”. (That’s what I missed at first.) Select the “SSH Keys” tab and remove the default values using the minus sign button (unless those private keys actually do exist). Select the plus sign button and enter the full path of the private key you just created. For example: /Users/username/.ssh/id_rsa. Close the Preferences dialog box, and click “Agent” and select “Add all keys…”. You can find “Agent” on the menubar or the dock menu. You will be prompted for the private key passphrase and have the option to add the passphrase to the Apple keychain. I had a problem typing in the entire passphrase in the password field. I solved this by typing it in my favorite text editor and doing a copy and paste. If you have to do this, make sure to copy meaningless text afterward. You really don’t want your passphrase to be exposed on the clipboard for any significant length of time. Before you ssh into your servers using public key authentication managed by SSHKeychain, restart your computer. It should work nicely afterward. There is much more you can do with SSHKeychain, but the aforementioned should get you going along.

Disabling Standard Password Authentication

You may want to make it so that only public key authentication can be used to login to a remote machine using ssh. Check out the second resource for more information.

Other Resources