On demand indexing

The world is full of buzz around the launch of cuil whose premise is more effecient indexing — what about indexing on demmand — more after I get off the bus

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

FreeBase Meetup Tomorrow

I’ll be heading to the FreeBase (I keep on writing FreeBSD…) meetup tomorrow at their SF HQ… Check out a description here: http://blog.freebase.com/2008/06/12/speakers-at-next-tuesdays-freebase-user-group/

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

What is next for OpenSocial

I’m at the Google IO conference, currently in the “What’s next for OpenSocial” presentation. I’m going to try and live blog this:
Speaker: How to balance dictatorship and anarchy…
OpenSocial 0.8 has been release, see http://opensocial.org
Enhancements to JS and XML definitions, RESTful APIs, Templating language
JS/XML:
Main thing: Cleanup and convenience (Gadgets XML, gedgets.* opensocial.*)
InlinedMessage bundles (languages, [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Heritrix Conference Call

The internet archive is trying to reach out and connect with the crawling / harvesting community that uses its open source crawler Heritrix
The first call will occur via a ‘Skypecast’, at the following time:
1500GMT Wednesday May 28th
(8a San Francisco / 11a WashingtonDC / 4p London)
The call will open with a brief overview of what’s new [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Blekko launches…something

Blekko — allegedly a search engine has just scored another round of funding according to Mike Arrington at Techcrunch — Blekko raised 3 million at a 23 million post money valuation.
Mike is (implicitly) comparing Blekko to Cuill. Cuill (pronounced “cool”) also in stealth mode claims to have a much cheaper, and more efficient way of [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Powerset launches

Powerset finally came out of private beta this Sunday with a search product around Wikipedia, and Freebase.
It will be interesting to see how competitor Hakia responds.
I’m a little underwhelmed with powerset, especially since it is searching semi-structured data where some of the relationships are more explicit than out in the wild.
http://20bits.com/2008/05/12/powerset-launches-verdict-meh/
http://venturebeat.com/2008/05/12/powerset-opens-to-everyone-now-whats-next/
http://www.techcrunch.com/2008/05/11/powerset-launches-showcase-for-user-search-experience/
http://gigaom.com/2008/05/11/powerset-is-live/

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Setting http headers with Apache and mod_headers

Continuing my HTTP bender, I’d like to discuss some fun / necessary things you can do to manipulate HTTP headers using the Apache mod_headers module.
If you were to make a request to Slashdot and examine the Http headers using either HttpLiveHeaders or FireBug you would notice one of two unusual headers: X-Bender, and X-Fry with [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Search.com redesign

Search.com has just launched a redesign — very cool. Check it out: http://www.search.com

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Handy wordpress plugins

I try not to re-hash content from other sites, but here’s a link that is too tasty to pass up on
15 handy wordpress plugins (for power users) 

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Digg please return semantically accurate http status code

Dear Digg,
I’m a huge fan. I check our site a few times a day. However, there is one thing that has really been bugging me. The maintenance page. Whenever you throw up that oh-so-useful page full of links to the favorite sites of your employees… you return an 200 status code.
Think of me not [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

evernote has me excited

I rarely get pumped out about tech startups — they are a dime a dozen — and the ideas behind them are usually bad.
Evernote is an exciting startup — when I day-dream about the future of the web — this is the sort of thing I imagine.
It’s a service that lets you take pictures of [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

my favorite firefox extensions

ChatZilla
Download Statusbar
Firebug
Forcastfox
Google Browser Sync
Live HTTP Headers
Web Developer
YSlow

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Mahout Machine Learning Lucene Subproject

Grant Ingersoll of the Apache Lucene project. Lucene developed open source search libraries, and Mahuts goals are to develop Machine Learning libraries around hadoop, and hbase.
This is definitely a project worth watching.  It’s just starting (was accepted to apache on the 28th) so if you are looking to jump in, now is the time.

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Wikia Wikia Wikia

Lot’s of movement at Wikia today.
1) Foowi, the “Social” aspects have gone open source. See: http://svn.swlabs.org/foowi
2) I’ve released a few patches for the re-designed grub crawler, that aim to make things easier to use. The patch only slightly breaks things. See here: http://lists.wikia.com/pipermail/grub-dev/2008-January/thread.html
3) The Nutch stuff is supposed to be open sourced in the coming [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Grub Crawler Goes RESTful

This is a short blerb, but Jeremie aka Jer outlines his plan on taking the recently aquired from looksmart crawler “grub” and replacing its heavy SOAP communication protocol with essentially RESTful interface.
http://lists.wikia.com/pipermail/grub-dev/2007-November/000010.html

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

The problem with Zimbra

Zimba is the full open source “collaboration” suite (email…calendar…) though it really should be classified as a semantic application due to its “Documents” feature which allows you to create notebooks with “pages” (aka light weight web-pages) that you can easily drag and drop snippets of text into.
It is relatively easy to setup and run (I [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Will Microsoft buy Ask?

This month Barry Diller, CEO of IAC split up his media empire into five distinct companies, aligned more or less with their respective sectors. The Home shopping network “HSN”, Lending tree, Ticketmaster, Interval International, and IAC. IAC contains all the remaining Internet properties including Ask.com, Evite, Match.com and CollegeHumor.com
The timing of this is a little [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

shelob the evil bot (spider from juniper networks)

I just noticed a new spider in my servers logs: “shelob v1.0″ coming from host 208.223.208.181 which resolved to security-lab1.juniper.net — Per this site: http://ella.slis.indiana.edu/~pwelsch/shelob/ — shelob stands for “ Shelob Helps Examine Links on Blogs”
For those of you who are keen Tolkien fans, you’ll remember Shelob is the “evil spider”. This story gets [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Visvo search startup

There is a new crawler on the block — the VisBot has been making its rounds. The seemingly legitimate crawler led me to its companies site: Visvo — and I like what I’ve found. Why does Visvo matter among this new wave of search start ups? Three reasons: 1) Each search result has an explain [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit

Download all wikipedia images with WikiX

There are scores of interesting projets to do with the data made available on Wikiepdia
I recently had the need to download all the images on Wikipedia, and an excellent project– wikix — was brought to my attention, which is the “best-practice” way of downloading Wikipedia image data.
It is an application written in C that parses [...]

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Reddit