The Future of Search




If you've seen the movie "A.I.", you may remember the scene where Gigolo Joe spends all the money he has so that David can ask Dr. Know how to find the Blue Fairy.  For eleven “NewBucks”, David was allowed to ask just seven questions, and even lost a question by inadvertently inquiring about the process itself.  All because, as Joe says, “In this day and age, David, nothing costs more than information.”  This is certainly nothing like today's search, where the most a question can cost you is some wasted time and a bit of spam.  But is it the future?

The major search engines certainly hope it is.  After all, they’re in business to make money, and so of course every one of them dreams of the day when they can charge not just for ad placement, but for the search itself.  The minute they try it though, we’ll all immediately drop them for one of the many free alternatives.  So we’re safe, and they have no hope of ever achieving that dream.  

Or do they?  In 2006, five search engines were used for 96.6% of all web searches.  That doesn’t seem like a wide open field, does it?  Two years later, Microsoft Live Search has joined the pack, and there are now six leading engines - with a combined market share of 99.85%.  For a new search engine to survive, it has to be able to attract advertisers - would you advertise your business at a place that has to share the remaining 0.15%?

The leading search engines all have one thing in common - each has spent huge sums building vast server farms that repeatedly scan, copy, and index virtually every web page on the internet.  Today, we all accept the notion that any useful search requires one of these massive, centralized data warehouses.  In doing so, we also accept the massive, super-rich corporations that own them.  As the web grows, so do they, and in time, it will simply become too expensive for anyone else to even get started.  Once that happens, will search continue to be as free and easy as today?  Is that how business works?

Like it or not, we seem destined for a future in which “nothing costs more than information.”  It just might be possible though, for us to reclaim ownership of search.  The web doesn’t just grow, it evolves - and as it does so, it creates opportunities to try new approaches to old problems.  To explore how one new approach might work, let's start by focusing on a very common type of search - online shopping.  Most of us use one of two typical patterns.  The first begins with a keyword search at a general-purpose search engine.   

Let's say there's one called "Giggle" where we'll begin our search for a new pair of shoes.  Pointing our browser to Giggle.Com, we type in "shoes" and hit Enter.  Anyone who's done this knows what comes next.  A word like "shoes" is a bit vague, so we'll have to ignore listings for brake shoes, tap shoes, and "Shoes The Musical" as we scan for those that seem promising.  Each time we find one, we'll follow the link to the web site itself, where we’ll search again, this time for the precise brand and size of shoes we need.  Eventually, we will have followed enough links to enough sites to convince ourselves that there’s nothing more to see, and so we’ll settle on the best of the current lot.

Of course, we don't have to be that vague in choosing a search term.  However, being specific can also have a downside. A web site may have exactly what we want but not mention it in such a way that it is ranked high enough in Giggle's results for us to encounter it.  Since being too specific can mean seeing too little, most of us will stick with being a bit vague and accept seeing too much instead.  And so we go, pursuing a long list of maybes to the sites themselves, repeating the same search again and again.

A second pattern goes straight to the sources.  We know we're shopping, we know what we're shopping for, so why not just go directly to the best online stores for shoes?  Okay then, just which web sites would they be?  Amazon is a great place to shop online, but the reality is that most of us go there because it was a very well chosen name, it got an early start, and we all have it memorized.  Except through personal experience - mostly gained through long hours of the first pattern - few of us can name more than a handful of good places to shop online for much of anything.  There may well be a hundred such web sites, but we'll be lucky to find just a few of them.

And what's truly tragic about that fact is that increasingly, these vendor web sites are the best places to run our search.  With every passing day, more and more of them are installing their own search engines.  Unlike Giggle, these engines have intimate knowledge of their own site’s inventory, updated with each incoming shipment and outgoing sale, with more detailed and reliable information than Giggle could ever hope to have.  But we can't ask them, because we don't know that.

This phenomenon is actually well known in some circles, and has been given such names as the "Deep Web" and the "Invisible Internet."  It's estimated that as much as 90% of what's available on the web is not indexed - and so will not be found - by search engines like Giggle.  That's why the first method is always a two-step process.  Giggle can tell us that a web site once had something we might like, but we have to search again at the site itself to find out if they still have it, and if it is exactly what we want, at terms we can accept.  Because their pages mentioned "shoes" more than others, Giggle knows that the web site existed and may have been relevant to our search - but that's all Giggle knows, and all they will ever know.

As a result, we'll never get rid of the second step - searching the web site itself.  The question is, can we get rid of the first step?  Can we find a way to search the web sites themselves more directly, or at least a way to use a middleman that won't one day become a gatekeeper, and then a toll booth?  And ideally, in a way that lets us search multiple sites with a single command?

There have been various attempts over the years to make just such a search possible, with little or no success.  For a more direct search to be effective, some things that don't exist now - at least in a useable form - would have to be created.  Before we get into that though, let's imagine they do exist, and see how they would be used to help us find those shoes.

Our search would begin as usual, by opening our favorite browser or search program (say, for example, MetaFind.)  Whichever our choice is, let's think of it in general terms, as our Search Agent.  It's a little smarter than today's browser when it comes to searching for things, in one important way.  Just as the current generation of browsers can be told which search engine to use for our searches, this one can be told which Directory to use to start our new type of search.

You've probably heard the term "Directory" used before in relation to web sites.  This idea is nothing new.  A Directory is just a categorized online listing of web sites, usually sites that someone, somewhere, thinks are the best places to go for certain types of items.  These new Directories are a little different though.   Given a keyword or phrase, they will determine the appropriate category, and return a list of relevant, searchable web sites.  That leads to all sorts of questions as to how they get and maintain that information - for the moment, we’ll just assume that they do.

The first step our Search Agent takes is to submit the search term to our favorite Directory and retrieve a list of web sites to be searched.  The next step is to submit our search to each of these web sites, and return the results directly from the sites themselves.  The first step is taken silently, in the background - all we see is the second step, in which the results from each web site in the list are returned.  No outdated or irrelevant listings, no spam or scams, no link farms or phishermen - just relevant responses from a well-maintained list of just the right places to ask.

No doubt you've already spotted some fatal flaws in this utopian fantasy.  Let's explore them:

How do these Directories get created, and where do they get their information?  Who maintains them, and what is their motivation?

The technology itself is not the issue.  All that’s needed is a much simpler form of the same web-crawl that powers today’s search engines, with a tiny fraction of its data storage needs.  The real question is, who would do this, and why?  Many of them would be the same people who currently maintain Directories, and they would have the same motivation as they do now.  Some top-shelf Directories might be subscription-based.  Others would require that you accept some degree of advertising along with the listings, perhaps even Sponsored Sites, similar to today’s Sponsored Listings.  Still others would offer complimentary services, such as in-depth reviews, advice columns and user forums - but of course, you would have to visit the sites for those, allowing them to support themselves through advertising.  And, of course, some would be former search engines.

Some Directories would, over time, be greatly enhanced through the collection of anonymous usage data, and volunteer user ratings.  With your consent, your Search Agent would report which web sites in a category were most visited, and which were the final choices.  In a model similar to web sites such as Digg and Web of Trust, users would also rate sites for their usefulness and reliability.  In the long run, this community-driven approach would focus searches toward sites chosen by actual, successful search experience, rather than by Giggle's ever-shifting algorithms.

If you search Giggle for “buy shoes”, they will boast of having found over 13 million relevant results.  How useful is that, really?  Just how much time did you plan to spend looking?  Would you rather see that, or a list of a few - or even a few hundred - web sites that sell shoes, and have results returned by searching those sites directly?

How can a single program or browser know how to search sites it's never even heard of before, when every web site has its own rules for searching?

This is a difficult question, with only one answer - standards.  Consider MetaFind’s Group Web Search.  The 5000+ web sites it knows how to search took over two months to compile, and must be constantly re-tested.  In addition, some potentially useful sites have been excluded from its list because they cannot be searched directly through a submitted URL.  The current landscape of web site search is complex, even chaotic - just the way Giggle likes it.

On the other hand, nearly 80% of the web sites in MetaFind use URL-based searches that fall into one of only twenty or so distinct patterns.  What this means is that a Search Agent that understood these patterns would only need two pieces of information to search a web site it had never encountered before - the address of that site, and the pattern its search matches.  In addition, as increasing numbers of sites add their own searches, a small industry that specializes in adding search capabilities to existing web sites has begun to grow.  This reduces the number of individual patterns, and makes the emergence of a set of standards much more likely.  It also creates the possibility of "retrofitting" a pattern-compliant search capability onto an existing web site's search engine without impacting any of its current functionality.  

Giggle gives me 20 results from a dozen web sites almost instantly.  Waiting for search pages from those same twelve sites to load takes forever.  I want my results now!

It’s true that the initial page of results arrives quickly.  As we have learned, though, that doesn’t mean that the search itself is finished.  The odds are you’re going to wait for those pages to load in any case, and by starting with Giggle simply add the time it takes to sift through its results and follow the links to the total search time.

However, even this delay can be greatly reduced.  Web sites currently optimize their design to capture your attention and provide functionality.  This content often isn’t necessary when someone comes to the site for a specific purpose, and could easily be omitted when a user arrives by way of a search request.  In principle, there’s no reason results from the sites themselves couldn’t arrive just as quickly as those from Giggle.  In fact, it can get even better than that, which brings us to the last question:

How do I get and view the search results?  Do I always have to go back and forth between multiple browser tabs, comparing the results from the different sites?

For now, yes.  Again though, this is what you’re likely already doing, with the initial Giggle search and the time to sift through those results tacked on to the total.  Imagine though, if there was a second set of standards, this time for search results.  Not only could these results be returned more quickly than entire web pages, they could be blended into a single page of results, and even filtered and customized by your Search Agent according to your preferences.  While there would be a variety of search result standards, each suited to a different purpose, every result would contain a standard “Result Type” attribute.  Your Search Agent would only have to peek inside any search result to know how to present it to you.  

That same approach could be used for the web sites themselves - a simple, uniform way of asking “How do I search you?” and a simple, uniform way of answering that question.  Together, these standards create something known as “Discovery” - the ability, given only the address of a web site, to visit that site, learn how to search it, perform the search, and then correctly display the search results.  This in turn reduces the work required by Directories, who now need only know where to search, leaving the “how” to your Search Agent.  It also means that while Directories would be a valuable resource, you would also be free to find and add searchable web sites on your own, leaving it to your Search Agent to sort out the details.

The last two answers raise yet another question:  Why would a web site spend the time and money to conform to such standards?  Consider the question from the web sites point of view.  Most are very much at the mercy of Giggle et al.  They already spend large sums of money and many sleepless nights searching for ways to rise through the ranks of search results, and whether their business lives or dies often depends how Giggle - not you or I, but only Giggle - sees them.  They only need to be convinced that another way is possible, and they will be very interested in anything that helps them escape what many consider an increasingly oppressive regime.  They have as much to gain as anyone from a fundamental change to the way search works - or doesn’t work - now.


Can we get there from here?

That’s the real question, isn’t it?  Not “Would it work?”, but “Will it work?”  There are multiple Chicken-and-Egg problems facing us.  How many web sites will change the way their searches work for just one program like MetaFind?  How many other developers will take the time to write this kind of program with no assurance they can reach a worthwhile number of web sites?  Why create a searchable Directory of web sites if there are no programs ready to consult them, and no users to help fill the popularity database?  Why use a Directory or a Search Agent if they don’t have access to a large-enough number of searchable web sites?

Like many new ideas, this one may have to grow slowly, one piece at a time.  MetaFind is one such piece, and will hopefully find a niche and survive on its own.  Its database of web sites is not encrypted and uses an easily understood format.  If you have a Directory, or are a developer who wants to get into this market, you are free to make use of that database for your own project.  If you’re a wise developer or Directory owner, you’ll adopt that same philosophy, and we can all grow together.  As we grow, more web sites will be encouraged to participate, which will create greater incentive to develop standards-compliant search engines and Search Agents, and so it goes - each small change encouraging and supporting still more small changes.

In fact, this type of search already exists, if only on a small scale.  The Hybrid Search in MetaFind is one example.  It uses the results of a Blended MetaSearch as a Directory, to guide its choice of web sites for a Group Web Search.  Of course, the end result cannot compare to what would be possible if true community-based Directories were used to choose from all searchable web sites on the internet.  And yet, despite working with limited resources, it somehow manages to be surprisingly useful most of the time.  Just imagine, then, what could be accomplished without those limitations. 

Nobody can predict what shape searching the web will ultimately take.  We don’t claim to know the ideal solution, or that MetaFind is The One Search Tool that will solve all problems.  While we certainly believe it’s a fast, easy, and effective way to search the web, it’s also a collection of new ideas, and of new ways to look at search.  You - the searchers - will decide what works and what doesn’t, and in doing so, will determine for yourselves what the future of search will be.

In the meantime, we’re not planning to just sit around and see what happens.  In addition to being willing to support the efforts of other developers, we would be happy to talk to - and, if possible, assist - those who are

•  interested in creating searchable Directories, or in converting existing ones into searchable resources.
•  adding search capabilities to web sites, and interested in discussing the evolution of standards for discovery, search and results.
•  tapping into the power of the community, and interested in harnessing that power to drive searchable Directories.


We can be reached at questions@metafindsoftware.com, and our forums are in development and should be up and running in the near future.  We look forward to hearing from you.