Python links 2010-04-23

# Digg has releases ‘clusto’ their cluster management system written in Python. “Clusto is a cluster management tool. It helps you keep track of your inventory, where it is, how it’s connected, and provides an abstracted interface for interacting with the elements of the infrastructure ”
Clusto on GitHub

# SQLAlchemy 0.6 has been released.
SQLAlchemy 0.6

# stdeb produces Debian packages from Python packages
stdeb on GitHub

# Mike Malone, an engineer at SimpleGeo put together a Cassandra mock in Python.
Fake Cassandra Gist

# Mike Dirolf posted a recipe on using date range queries on a Mongo collection via the Python driver
Mongo date range

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Python links 2010-04-16

Some useful links from over the week

Armin Ronacher has released a new micro-framework called “Flask” inspired by Ruby’s Sinatra framework
Flask 0.1
Flask on GitHub
Flask Project Page

Paul Bohm released “Tragedy” a high-level Cassandra Object Abstraction for Python.
Tragedy on GitHub

Johann Rocholl gave a presentation to the Seattle Google Technology Users Group (GTUG) on profiling and performance tuning App Engine applications.
Making app engine fast

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Mongo Distributed Consistency

I came across these two links on Distributed Consistency on the MongoDB wiki. Both articles are good reads and do a good job comparing the consistency model of MongoDB to other systems such as CouchDB and Cassandra.

On Distributed Consistency Part 1

On Distributed Consistency Part 2

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Redis Include Directive

Jeremy Zawodny has forked Redis on GitHub and pushed an interesting patch that adds support for the “include” directive presumably to make splitting up large configuration files simpler.

Check the redis fork here: Redis include directive fork

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Python iteration anti-patterns

So I watched a slideshow where an author (name intentionally omitted) had a snippet that iterated over a list. That’s all well and good, but one thing that bugged me was that it checked for the length of the array first, and then iterated.

list_len = len(mylist)
if list_len > 0:
for item in list_len:

This just feels wrong and I wanted to make two styleistic points here.

1) IF you have a method that returns a list, always return a list. If it is empty don’t return None and make me check if the value is None or not. Just return an empty list [] or tuple ()

2) If code follows step #1, don’t check – just iterate. If the list is empty obviously the iteration will never happen which is what is desired in the snippet above.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Cloud Confusion

I read this article on Cloud Hosting today and some things I read really bothered me. I haven’t actually browsed technorati in a long time (maybe 2006?) and things seem to have changed dramatically. Technorati is now hosting content (blog postings). Maybe I was aware of this at point but forgot about it… but I digress.

To start off the author mentions three companies as examples of “cloud hosting”: Amazon, GoGrid and CrackerHost. Cracker who? Yea that was my reaction. I happend to scroll down and saw the authors bio… You guessed it – Rudhir is the founder of CrackerHost. The over aggresive, and frankly out of context promotion of his service is already off-putting.

In the opening sections of the article Rudhir states the following definition of cloud computing: “Cloud hosting has been broadly defined as an on-demand, pay-as-you-go, high-availability service. … In simple words, cloud hosting allows you to use a web server for an hour, pay for an hour and be done with it”. A quick glance at the CrackerHost billing reveals monthly, and not hourly pricing.

Rudhir then continues to use an interesting definition of VPS  in a comparison of “cloud computing” and VPS providers. According to him provisioning a VPS instance is manual, and not on-demand/realtime, and does not afford the ability to resize instances. So… where do Rackspace Cloud, Slicehost and Linode fit into this picture? And how do they differ from Amazon… I’m pretty sure SliceHost and Linode self identify as VPS providers. Though other VPS providers like arpnetworks and prgmr are closer do his definition of a “VPS” provider.

And finally in his last sentance he says:

  1. You should. One. In case of a disaster, a cloud hosting-based provider will be able to recover sooner. This means that your web site will have a lower downtime.
  2. Cloud hosting will allow your web hosting provider to ensure that backup recovery is fail-safe. So, this will mean better data protection for you.

Oh boy – where do I start. Perhaps he hasn’t had time to read the post-mortem of the recent GAE outages, or is forgetting about the various AWS outages. At best, and only in advanced setups (like GAE) will there be any cross data-center failover, and being a “cloud” provider has nothing to do with this. What facts or evidence support these claims? None as far as I can tell.

If you are interested read the article here

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Mozilla weave a day late and a buck short

The mozilla blog just announced the 1.0 release of mozilla weave something I had been keenly interested in until recently. That was before Chrome. I understand that it was a massive undertaking, but honestly they are late to the market.  Chrome has already won me over and once there was a release for the Mac – frankly there was no going back. Chrome has been much more pleasant to use both more responsive and less resource intensive. Firefox has been a memory hog recently (some blame the flash). The ‘dev channel’ (Chrome 5) finally has the bookmark manager and bookmark sync.

The way things are going – Google has taken this round in the browser wars.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

2010

Wow – It has been a long time since I’ve posted anything here. That will change. I’m going to make it a resolution to write more often, and also explore different writing styles.

Happy New Year!

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

The origin of CloudFronts competitive advantage

Sorry for the word play, but I’m about to drop some nerd up in here. There is some FUD being spread by people who either don’t understand the real advantages of Amazons cloudfront, or have a vested interest in spreading fear. One conversation that caught my attention for was in the comments section of an article published on GigaOM: Amazons cloudfront could strom rival CDNS

The two real competitive advantages I see are the pricing points (lack of contract, quick setup) and more technically that it eliminates the need of an origin server.

That last point is key — you’ll hear things like: “Cloud front doesn’t have the ability to pull from an origin server, there for it is a big joke that can’t compete with Akamai, LimeLight…etc”.

This is bogus.

In most typical CDN setups that I know there is what is called an ‘origin’ server, the server where you continually host the content you want pushed out to the delivery network. As requests come in for specific assets most CDN providers will ‘pull’ the content off of your server by convention (specific URI/path) and publish to their cloud.

The problem with this setup?

It’s so not cloud.

You are forced to maintain a perpetually running server, with enough storage for all the assets, which sits there slurping up space, electric and maintenance (admin) fees

Since CloudFront is built on the S3 storage service — S3 is in essence the origin server. In fact CloudFront is merely an S3 bucket that has been blessed into a ‘distribution’ via a simple RESTful API call…

That’s the origin of cloudfronts competitive advantage.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Amazon CloudFront via SVN or GIT hook

So the big news of the night is that Amazon has released ‘CloudFront’ their S3 based CDN that competes really agressivly with exisitng players (Akamai…etc)

Having worked with… less enlightened solutions… I’m thinking the ultimate CloudFront deployment scenario would be via an SVN or GIT hook. If you commit an asset to /static/css or /static/css a post-commit hook would automatically publish to CloudFront.

If I have time I’ll take a look at implementing something quick and dirty tomorrow.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

DjangoCon: Schema Evolution Panel

The Schema Migration panel by: Simon Willison, Russ Keith-Magee, Andrew Godwin, and  moderated by Michael Trier was an interesting sampling of the various methods used in schema migration.

Simon Wilson presented dmigrations . Installing dmigrations is as simple as installing it in INSTALLED_APPS, and it will registers a few custom admin commands:

./manage.py dmigrate app APP_NAME

./manage.py dmigrate list

./manage.py dmigrate addcolum

My take: dmigrations is great and will work for migration problems right now, but in its current form is unlikely to end up in django as the annointed migration solution. Why? Because it basically wraps SQL directly, loosing some of the cross-database portability of the Django DB.

Andrew Godwin: South — described as the next step of dmigrations. Philosophy: Migrations are essential, branched development / missing migrations, inter-app dependencies, database abstraction needed too. Can handle model dependency (foreign keys.)

My Take: worth looking into, more database indapenent.

Russel Keith-Magee: Django-evolution, his response to a mailing list thread that would not end. Google Summer of code 2006. Russel complaining of Magic moniker attached to django-evolution, everything done via introspection of models. Goals: Hint and Tweak, Simple changes without user intervention, easy entre for customization, raw sql, self documenting, self auditings, Validation (where possible).

Signature: pickled summary of django model. Stores state of django models at syncdb, can be diffed against current models, diffs used to generate hints.

Mutation: Atomic unit of change, common operations built in. Can be user-defined, can be raw sql, know the effect they will have.

Evolution: Ordered collection of mutations. Two flavors of mutations: hinted and stored. Executed evolutions stored in database.

Hinted Evolution: Best guess by looking at diff, if acceptable can be used to execute evolution right away. Can also be used as prototype to stored evolution. Can’t resolve ambiguous updates (rename) can’t fill in the blanks (initial data)

Stored Evolution:
Named sequence of mutations. Defined per application, stored in evolutions directory of app (can be put into version control).

django-evolution extends syncdb — as in ./mange.py syncdb — schema change detected you need an evolutions.

Custom commands:

./manage.py evolve –hint

./manage.py evolve

My Take: Highly interesting, most likely to end up in django. Grok this.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

DjangoCon: Django Code Design and Writing Patches

A follow up to the “Inside the Django ORM” speak, Malcolm Tredinnick gave his second talk Sunday on Django Code design and patch writing (Aka Code Quality, Patch Quality).

After reading around 6000 bug tickets, certain patterns have emerged in the submitted patches…

Code Quality Matters — funny quote from Leah Culver: “Have you every written a library? It’s like people seeing you in your underwear. You gotta make sure it’s clean.”

Do the basics properly. The word “print” is in your patch. Similarly for “import pdb”. If you think PEP 8 is the name of a new energy drink. Run the django testsuite. Submit patches, and not entire files. Start from top of tree when using svn diff.

Read the contributing document (contributing.txt), use sensibly names, comments should last, comments should be correct.

Fix problems, not symptoms. The crowd is smarter than you think — the code mostly works. If you find yourself ripping out lots of code, stop and think.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

DjangoCon: Inside the ORM

Malcom Tredinnick gave an awesome presentation about the Django ORM.

The code for the ORM is located in django/db with juicy bits in the following locations:

django/db/models/query.py (public queryset API)

django/db/models/sql/* (Public API->SQL conversion. Deep dark internals. Does’t know DB, knows SQL)

django/db/backends/* (Individual DB wrappers, third-party wrappers possible. This is where you actually talk to the DB). See (dummy or mysql dir for examples)

The different pieces in depth

Blog.objects.filter(owner=user) = [model, manage, QuerySet method]

Model Manages:

inherit from django.db.models.Manager, ideal for extra methods that act on the whole table at once, not just one record, usually wraps/proxies public querySet methos, has method called get_query_set() which returns QuerySet (QuerySet.all)

QuerySets:

django.db.models.query.QuerySet

Every time you call a method on a QuerySet it returns a copy (clone) of that QuerySet (side effect free?) For example every time you call filter on a query set you get a different queryset.

Query

django.db.models.sql.query.Query, an attribute of QuerySet, holds the internal state of the current query, knows how to produce SQL. This is where you implement something that knows how to speak to MySQL, or CouchDB or Hadoop for example. (Aka class from Hell, almost every data-structure known to man used here). AsSQL is when state is rewritten into SQL. This is a change from before QuerySet Refactor landed in trunk where the internal representation for everything was strings.

Every method of QuerySet updates the Query (calls method on Query).

Query.results_iter() is how the results end up from the DB back into Python objects, reconstructing Python objects happens in various other Query methods: Query.select_related, Query.extra_select…etc

Looking inside the Query. Query.add_filter() is the guts — the place to start when trying to grok the Query class. Query.setup_joins() converts filter/exclude into table joins. Query.join() responsible for joining a pair of tables

Recap on flow/organization: Manager->QuerySet->Query

Customization:

Custom Managers (Easy)

Custom QuerySet (Not so Easy)

–> (Some example QuerySet) ValuesQuerySet, DateQuerySet, EmptyQuerySet

Custom Query (Not so Easy)

GeoDjango for example has to use very different types of query.

Note to self: I’m wondering how(or even IF) one would  use Django’s ORM for a Solr-backend for intergrated search via a DJango ORM model.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

DjangoCon: ReviewBoard

Presented by: Christian Hammon, David Trowbridge

Review-Board is a python/django based code-review application. In 2004 VMware was approx. 600 employee’s and already had a rigorous code-review process in place. At first simple HTML snippets were emailed around. Fast forward to 2007 and the company had grown to 5000 employee’s, and the previous system of email around snippets was unmanageable.

And thus Review-Board was born.

Earlier in the day Guido mentioned that he wrote a similar, python based code review tool that is used internally at google. I’ll try to dig that up and make a follow up post.

Stats: 2 core developers, 74 contributors, 295 mailing list members.

I’ll be installing this on my own dev box — In the future expect a post or two about how it intergrates with other tools I like (git/svn, trac…etc)

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

DjangoCon 2008

I’ve been really, really inactive in posting. That is going to change. I’m currently at DjangoCon 2008, and will be posting a few things about some cool new things I’m learning about.
More soon.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Amazon EBS SAN for the cloud

Amazon has just released their latest addition to the cloud offerings: EBS or “Elastic Block Store”. The pricing looks very reasonable at 10 cents per Gigabyte, and 10 cents for every millionth I/O operation. It also comes with some juicy features like the ability to create a snapshot to S3 at any point in time, and then create another EBS volume from that snapshot.

The SAN like storage mounts are (only) available in a given “Availability Zone”. Nothing specific is mentioned about distributed file systems like GFS, so that is something I’ll be looking into ASAP. Like a regular volume you can create several EBS volumes and do your own software raid on top of that.

There are also some interesting scenarios around the snapshot facility such as having one “master” EBS volume in the designated “write” availability and replicating out to several “slave” availability zones in geographically disparate regions.

Performance seems inline with high end RAID arrays:

To recap:

Price: $0.1 per GB
Performance:
Features: Reliability, Snapshots, Traditional Posix filesystem

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Django 1 Alpha 2 released

The Django foundation has pushed out the second alpha release of Django 1.0 , see the blog post here

Earlier in the week I received “Practical Django Projects” in the mail — chomping at the bit for enough time to get into it.

Exciting times.

I wonder if there isn’t some room in the Django ORM back-end to support stuff like S3 and Google App Engine — with some warnings of course that the store isn’t relational… etc

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

LinkedIn redirecting?

Has anyone else noticed that LinkedIn is redirecting to some video commenting startup? I did an nslookup on linkedin.com which resolves to 70.42.142.23 — I ran host 70.42.142.23 which resolved to redirect.linkedin.com

The page “Intense Debate” seems to be about some sort of video blogging system. See screenshot:
What is going on?

linkined in redirect to intense debate

Update: redirect is no longer happening as of 7:06 pm PST.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

why -1 evaluates true

Languages that do not have native boolean data-types that let you branch on non-boolean data types evaluate all non zero values as “truthy”, (-1) evaluates to true. You might be wondering, as I did, why that is?  After poking around and looking at some x86 assembly instructions, I suspect it is a compiler optimization.

At first I thought it was a perl specific choice, but a few quick tests with C and Python confirmed this was not the case.

It seemed like an odd design choice given that truthiness in this (programming) context is quasi-inspired after the presence of current in a given transistor. So why wouldn’t you want negative integers to be false?

Then I started thinking that it was either an optimization issue — so I started looking at the x86 assembly generated by GCC for my C example (see below). I used the -S flag to gcc to save the assembly so gcc -S test.c created the test.s (assembly). A quick glance revleaed that this was indeed the case, particularly the following three lines:

  movl  $-1, -4(%rbp)
  cmpl  $0, -4(%rbp)
  je  .L2

movl moves the value of X (-1) to the appropriate place (this is int x = -1) then the actual comparison happens with the cmpl operator which compares that value to zero. If those to values are equal (0 and X) the je executes (jump-if-equal) and executes that branch.

Now the next question, for another day, why is gcc assembly ifs to JE and JLE — which would return false for anything less than 1 for example.

My guess is because it’s more expensive. But more to come on that subject.

perl:

#!/usr/bin/env perl
if (-1) {
  print STDERR "I am true \n";
}

python:

#!/usr/bin/env python
num = -1
if (num):
  print "I am true"

C:

#include  

int main (void) {
  int x = -1;
  if (x) {
    printf("I am true");
  } else {
    printf("not true");
  }
}
  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

On demand indexing

The world is full of buzz around the launch of cuil whose premise is more effecient indexing — what about indexing on demmand — more after I get off the bus

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit


Zero-Analog Blog Network | Pollinatr