Thursday, July 10, 2008

Virtual Box 1.6.2 and Mac OS X 10.5.4: Still Fail

Just noticed Virtual Box released version 1.6.2. I tried installed Ubuntu 8.0.4 Server both 32 and 64-bit versions on my Mac OS 10.5.4 box. Depending on the Ubuntu flavor, it still either doesn't install or doesn't boot after being installed. Just saying.

update: So later in the day I decided to experiment with the desktop version of ubuntu. The 64-bit version does not work, as the VirtualBox CPU detector is confused, but, the Ubuntu 8.0.4 Desktop 32-bit version works it installs, and boots into the desktop. Only NAT networking is implemented which means "outbound" networking is fine: web browsing, apt-get updates, etc. But you cannot ssh or copy files (directly anyways) into the virtual machine (you need bridged networking for this). The bug report is here. Boo.

Varargs in Python

"Varargs" is C/C++ (?) name for "variable-length argument lists" for functions and methods, used for example by printf. In python, this: Arbitrary Function Arguments is what you are looking for (Sections 4.7.3 and 4.74).

This was written since most searches produce junk for this answer. Hopefully some web crawler will find this.

Tuesday, May 6, 2008

VirtualBox 1.6 on Mac OS 10.5.2

Sun's free virtualization productVirtualBox released version 1.6, which supposed will fix the PAE Guest issue which prevented Ubuntu from installing last time.

The good news: The download of VirtualBox is simple and the UI is a lot better than the beta version I looked at a few weeks ago.

The bad news: There is a big warning saying "if you have a 64-bit host OS, then you need to run a 64-bit guest OS" (and the same for 32-bit). Ok, so I attempted to install Ubuntu Hardy Server 64-bit (e.g. the 'amd' version) since Leopard/10.5 is a 64-bit OS. The VM complained that it needed "x86_64" but only found "i686". Sigh. Ok, let's install Hardy Server 32-bit. It installs, but does not boot, and gives a complaint about the CPU and "0.6" (i.e. a random error message).

Ok so much for that. Looks like we'll wait for version 1.7

Monday, April 28, 2008

Virtualization is faster than Native on Mac OS X 10.5?

That little article on Battle of Mac Virtualization Products caused quite a stir with Parallels and another web publisher contacting me. I'll follow up with them and report back but in the meantime check this out.

I mentioned last time that on Parallels, the linux filesystem (ext3) seemed quite a bit faster than the native filesystem. They are many ways this could happen and honestly I have neither time not expertise to learn about the finer details of HFS3 vs. ext3. Filesystems are messy.

Not that CPU benchmarks are much better, but I do have a silly python benchmark that tests raw performance. I stole the idea from http://furryland.org/~mikec/bench/. I use it mostly to see if compiler flags make any difference when compiling python from source, or what the performance of a pre-made version of python is. The code is at the end.

Apple OS X 10.5

$ ./benchmark.py 
test_hash: 2.833932
test_list: 3.647719

Ubuntu 8.0.4 Server 64-Bit in VMWare Fusion

$benchmark.py 
test_hash: 1.697652
test_list: 2.255775

Uhhhh, say what? VMWare is 33%+ faster than native? Remember this is running the same code, on the same machine. I have no idea if Ubuntu being 64-bit (vs. 32) makes any difference. This code doesn't do very much, so I suspect that the Linux memory allocators must be quite a bit better than Mac OS X.

This is of course a bit synthetic, and I can find other examples of the virtualized linux being a lot slower than native mac, however, it is interesting.

Comments most welcome.


The Lame Test

from timeit import Timer

test_hash = """
for i in xrange(100):
    x={}
    for j in xrange(1000):
        x[j]=i
        x[j]
"""

test_list = """
v=['a','b','c','d','e','f','g']
for j in xrange(1000):
    v.append(j)
    v[j]
"""

t = Timer(stmt=test_hash)
print "test_hash: %f" % t.timeit(100)

t = Timer(stmt=test_list)
print "test_list: %f" % t.timeit(10000)

Friday, April 25, 2008

Battle of Mac Virtualization Products

I just evaluated three virtualization product for the Mac: Parallels Desktop For Mac 3.0 (build 5584), VMware Fusion 1.1.2, and Virtual Box (Version 1.5.51) to run Ubuntu Linux 8.0.4 (Hardy) server. (not the desktop version). My test system is a MacBook (not pro) with 2G of RAM and 2.2 Ghz Intel Core 2 Duo on Mac OS X 10.5.2 (Leopard). This is an 64-bit CPU and OS.

Ok, now that I'm done with adding keywords for search engine, you might why?. The goal is develop software on a platform that is as close as possible to the final deployed version. But isn't Mac with it's "BSD-core" along with MacPorts or Fink or Gentoo Linux for Mac good enough? The short answer is no, and I'll write that up in another post. You might also ask why don't I just ditch Mac altogether and just Ubuntu Desktop. Here the answer is because I like the Mac hardware, the OS, and mostly because cause I don't want to. I suppose later on that well might be a good answer, but even then, I'd still use virtualization (again, another post).

Most of these product heavily advertise on Windows-Mac integration and accelerated graphics and whatnot. I'm not testing any of that. I'm testing how easy is it to install and the performance.

Parallels Desktop For Mac 3.0

I downloaded this, got my 15-day trial key and we are off to the races.

The UI is pretty good, and the installed answering a few questions wasn't bad, although it wasn't always very clear. A few options I needed were hidden and took a few times to figure out what exactly I needed to do. To be fair, it was my first time using any virtualization product.

Ok the bad news is that Parallel needs a bit of help to get the server install to boot. The installer kernel is fine, but the stock server kernel does not boot. You need to then "Repair broken install" drop down to root, uninstall the kernel and then install th stock desktop kernel. (Read about it: here, here, and here)

64-bit Ubuntu kernels did not boot either. I didn't try the hack to swap kernel versions

Anyways, once it's up, it works, and works well. Networking is no problem, ifconfig works, you can find your IP and ssh into the box, no problem. I did find occasionally if I put the computer to sleep, the networking didn't always return. Doing "nothing" (waiting for login prompt) the Parallels VM takes 12-16% of the CPU. If you suspend the VM, 3% of CPU is being used (for what?). That said, over all the performance is very good. For a few of the processes I tested, the overall runtime was about equal to native, but much more CPU was used (which is fine).

Ok here's one crazy thing. The file system seems faster on the VM than on the native mac osx! (there are many ways this could be possible but who can say).

Virtual Box 1.5.51 Beta 3

Their big claim to fame is that it's open source and free. These guys just got bought by Sun. It's not clear what they are intending to do with them. But you can guess. It's a GPL VM so any Linux distro can include it. This will make OpenSolaris easy to install and try out. And you can guess that OpenSolaris will run the best. Who can say. Anyways...

Sadly, at least for now on Mac, you get what you pay for. The UI is very confusing, and frequently dead-ends, where you effectively have to restart (you get a modal dialog with no buttons... nice). While the install kernel booted the final image did not (this appears to be fixed in an upcoming version) Also compared Parallels, it was noticeably slower. The idle VM took 25% of the CPU (!!) during the install.

Given the price point it's worth checking out, and the version I checked was a "beta 3". We'll see what Sun does.

VMware Fusion 1.1.2

I spoke with a VC yesterday and we had a nice chat on virtualization on Mac and recommended VMware Fusion.

Downloaded, 30-day trial key (retail is $80 same as Parallels), and..

Whoa. So litterally in about 5 seconds of starting this up, my Ubuntu iso image was booting and installing.

The final install booted without any monkey business like Parallels and VirtualBox

VMWare correctly emulated, and I did not have to use

The idle VM takes on 5%, and when suspending VMWare takes 0% CPU. I like that!

And without any modifications, VMware can install and boot 64-bit kernels (i.e. ubuntu-8.04-server-amd64.iso)

And as a bonus, VMware provide pre-configured VMs with various linux distributions (Fedora, Cent, etc).

Conclusion

So right now, 25-Apr-2008, VMware is the clear winner. But honestly, the VM space is hyper-competitive. In a few months, this will probably be wrong. Parallels certainly works and works well. If the VirtualBox cleans up it's act, the whole basic server VM space on Mac might be commodity.

Wednesday, April 9, 2008

Getting Started with EC2

Wow is EC2 fussy. You know this already. Such is life when using public-key encryption. if you are just getting started, the following greatly simplifies the situation. It won't handle multiple accounts or multiple instances, but sometimes one is all you need.

First, go through the stock EC2 tutorial. And then compile all the keys and stuff into one spot like so:

export S3_PUBLIC='your s3 public key'
export S3_PRIVATE='your s3/private key'
export EC2_PRIVATE_KEY='./pk-YOURCERT.pem'  # somewhere
export EC2_CERT='./cert-YOURCERT.pem'  # somewhere

export EC2_HOME='./ec2-api-tools-1.3-19403'  # OR SOMETHING
export PATH=$EC2_HOME/bin:$PATH

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/home

export AMI='ami-226e8b4b'   # or something

export EC2_SSH_KEY=yourname-keypair

Ok, now the following functions make it simple to start stop and login


# VERSION 3 -- now with ec2do
# VERSION 2 -- now with push/pull

function ec2start {
    ec2-run-instances $AMI -k $EC2_SSH_KEY
}

function ec2list {
    ec2-describe-instances
}

function ec2host {
   # it's one line below
    ec2-describe-instances | grep INSTANCE | grep amazonaws.com | head -n 1 | awk '{print $4}'
}

function ec2id {
   # it's one line below
    ec2-describe-instances | grep INSTANCE | grep amazonaws.com | head -n 1 | awk '{print $2}'
}

function ec2stop {
    echo "Stopping..."
    ec2-terminate-instances `ec2id`
}

function ec2login {
    ssh -i id_rsa-$EC2_SSH_KEY root@`ec2host`
}

function ec2push {
    scp -i id_rsa-$EC2_SSH_KEY $1 root@`ec2host`:~/
}

function ec2pull {
    scp -i id_rsa-$EC2_SSH_KEY root@`ec2host`:$1 .
}

function ec2do {
       ssh -i id_rsa-$EC2_SSH_KEY root@`ec2host` $@
}

function ec2help {
    echo "haha"
    echo ""
    echo "ec2list  -- alias for ec2-describe-instances"
    echo "ec2start -- starts one instance"
    echo "ec2stop  -- stops an instance"
    echo "ec2id    -- lists the reservation id of a running instance"
    echo "ec2host  -- lists the host of a running instance"
    echo "ec2login -- log in to running instance"
    echo "ec2push localfile  -- push a file to remote homedir"
    echo "ec2pull remote -- pull a remote file to local current dir"
    echo "ec2do cmds -- execute a command remotely"
}

Put all of that into your profile, and you'll be all set. Enjoy..

Monday, April 7, 2008

Python and JSON performance

Ola. So I was working on a project and noticed that json encoding/decoding of a string to string dictionary was taking... uhhh 70% of the time. The are a few other benchmarks out there (kbyanc, hill-street), but none including the new simplejson 1.8.1 which has extensions to improve performance (links below).

My Tests

I'm only serializing dictionary mapping strings to strings. Maybe occasionally dict within a dict, but always primitive types. I don't need special ways of serializing custom objects. This test is just raw primitive types and data structure performance.

For this test, I made two dictionary with a dozen entries. One has both keys and value as ascii regular python strings, the other had the same data but used unicode strings.

cjson 1.0.6

cjson 1.0.6, which has the bug fix described here and the original source and homepage is over here. It's bit confusing, but it's worth it:
Serialization, cjson, ascii: 94
Serialization, cjson, unicode: 117
Deserialization, cjson, ascii: 80

simplejson 1.8.1

Available here The author maintains a blog.

This version has c-extensions for both encode and decode

Serialization, simplejson, ascii: 549
Serialization, simplejson, unicode: 592
Deserialization, simplejson, ascii: 2068

simplejson 1.6

This version is 100% pure python. It can be grabbed here

Serialization, simplejson, ascii: 2150
Serialization, simplejson, unicode: 1744
Deserialization, simplejson, ascii: 3318

Conclusion

Well, guess which version I'll be using. simplejson has all sorts of other features that you might want. Sometimes speed isn't everything, but for my application it is. Enjoy!