World-wise web? Finally on the horizon are computers that can reason

Financial Times
03-Mar-2008
By Richard Waters in San Francisco

Bill Gates displays a paranoid tendency common among technology industry billionaires. "In this business, by the time you realise you're in trouble, it's too late to save yourself," he once said. "Unless you're running scared all the time, you're gone."

Those words came in an interview with Playboy magazine in 1994 - 10 years before Sergey Brin and Larry Page, two new rock stars of the tech world, sat down for their own heart-to-heart with the same magazine.

Tech fashions - and fortunes - shift with great speed. The Microsoft Mr Gates founded might not yet be on the scrapheap of history but, as its unsolicited take­over offer for Yahoo makes clear, even seemingly dominant companies find it hard to keep pace in the latest and most promising tech markets.

A decade ago, who could have imagined that the feared monopolist of the software business would be so roundly beaten in online search and advertising by Google (NASDAQ: GOOG - News) that it would have to mount a hostile bid for another distant also-ran to try to catch up? A decade from now, as the editors at Playboy stroke the egos of some new Silicon Valley hotshot, will the Google founders' playful interview (to which Mr Brin, hot off the company volleyball court, went shoeless) be just a quaint memory?

Predicting where the next big disruptive change in the technology industry will come from is a perilous business. Google's rise has been as much a result of its business model innovation as its technological supremacy. By using advertising to support its internet services, it may eventually be able to pull the rug from under Microsoft in more traditional software markets.

It seems a fair bet, though, that some of the biggest fortunes will continue to be made in Google's area of focus: finding and manipulating information gathered from the world wide web. To hear the optimists in Silicon Valley describe it, a new wave of technology is on the way that will leave Google's early advances in its wake.

Imagine, for instance, being able to ask a computer, "Where should I go on holiday?" and receiving an answer that is as suitable as anything you could have come up with yourself. That level of computer-generated reasoning is on the horizon, says Nova Spivack, one of the entrepreneurs involved. It may still take 15 years or more to be fully realised, but between now and then lies a series of breakthroughs that will revolutionise the way we draw information from the web, he adds.

This technology draws its inspiration, and some of its techniques, from a field that has provided more than its fair share of disappointments over the years: artificial intelligence (AI). Based on a collection of technologies that includes natural language processing, image recognition and expert systems (programs that try to emulate the skills of experts), AI is a 50-year-old dream that was meant to lead to intelligent machines.

"I had some hope you could just put everything into some big neural network that would just start to think - but it doesn't take long working in AI to realise it's much more complex than that," says Danny Hillis, founder of Thinking Machines, a company whose rise and fall in the 1980s came to symbolise both the unbounded optimism and the failed hopes of the AI movement.

"I've shifted over time from trying to make machines smar­ter to trying to get machines to make people smarter," Mr Hillis says now. That more modest goal lies at the heart of the latest movement, with its pragmatic emphasis on melding approaches from AI with new core technologies that are changing the web.

As Google shows, being able to return a string of websites in response to a query can give rise to a multi-billion dollar business. With so much at stake, even small incremental improvements on the road to AI may create big business opportunities. "It isn't about being perfect," says Barney Pell, chief executive of Powerset, an ambitious new search company. "It's about being able to differentiate enough to make a commercial product. People are realising that the goals of AI may be way out, but in the field of AI the time is here for really exciting applications."

"There are vast areas of human activity that are slowly being chipped away at," agrees Mike Lynch, who heads Autonomy, another search technology company. "Even automating a tiny part of the problem can have a high economic impact."

The movement already has a name: web 3.0. Venture capital is drifting in, even though no one seems too sure exactly how to define the field and there are still sharp disagreements among the experts about the effectiveness of some of the technologies. "When we started, it was largely a science project," says Mr Spivack, who has raised $20m (£10m, €15m), a sign of the sudden interest of the financiers. Referring to recent developments in online social networking, he adds: "These are not little Facebook applications - these are significant technology investments."

The basic building block for this new technology movement is something known as the "semantic web". This has become one of the most controversial, and misused, terms in the internet industry, conjuring up as it does a vague promise that meaning will somehow become part of the medium.

Yet to suggest that computers will be able to determine meaning raises a thorny question: whether meaning itself has an independent existence or is something that arises only in the mind of the person perceiving it. Terms such as "meaning" and "understanding" are so closely linked to human intelligence that it is hard to conceive of their corollaries in a computer-mediated world.

In reality, the semantic web is based on a defined and narrow - even if still highly ambitious - set of goals. It is the brainchild of Sir Tim Berners-Lee, who invented the present web, a collection of documents connected by links using hypertext mark-up language. Tracing those links, companies such as Google are able to identify documents that are likely to be most relevant to a particular search - though they can only point to the document, not dig deeper to find the actual information that is being sought.

To overcome this, Sir Tim imagined a new web formed by linking the data contained inside the documents. That way the data, not just the documents, would become accessible to machines. Riding this network of links, computers would be able to follow related ideas from one website to another and draw together related information. A reference to Sir Tim in Wikipedia, the collaborative online encyclopaedia, could for instance be connected directly to his name in this article on FT.com and to his personal social network on Facebook.

"If you put data on the web about yourself in this form, I can pull data about you," he says. Subject to privacy and otherrestrictions, the web itself would In effect become one vast social network, tracing links between people, or between people and things, that were previously invisible.

This semantic web is the product of a set of core standards promoted by the World Wide Web Consortium, the organisation that Sir Tim leads. "It's happening - it has just taken a long time to build," he says. "HTML is a really simple language. All this data stuff is more complicated. It just takes more design work."

Now, nearly seven years after he outlined the idea, some supporters say enough pieces are in place to make the first semantic web services a reality. "A bunch of people have started making applications that share data across the web," says Thinking Machines' Mr Hillis. Linking information in this way is a first step. The next will be to write software that can find and manipulate the data, opening the way to that automated advice on holiday destinations.

Standing in the way of this grand vision, however, are some very big obstacles. This is not just a matter of technology: at a deeper level, it touches on philosophical questions about the nature of language and meaning.

At the heart of the problem is the need to make information on the web "understandable" to machines, so that it can be extracted, processed and made useful. To make this possible, machine-readable "tags" need to be attached to each piece of data to describe what type of information it represents - a person's name, for instance, or a day of the week. A computer that reads the tag knows to treat the first item as a name and can then match it against the same name found in other sources.

Attaching these tags to every piece of information on the web is in itself a huge task. "Tagging is a complete non-starter: no one has the time to do it," says Mr Lynch of Autonomy. At Powerset, Mr Pell calls this a "chicken and egg problem". Without new semantic services capable of using it, there is no incentive to undertake the laborious work of tagging data, but creating the services is pointless unless the data exist in the first place. To overcome this, computers are being enlisted to "read" text and apply tags automatically.

Yet the process of tagging, or categorising, the world's information may be beyond the capabilities of even the human brain. "Information is relative; it's not objective," says Mr Lynch. "The possibility that the person tagging and the person reading it mean the same thing is very small." Context and subjective judgment play too big a role in how language is used, he adds.

To try to overcome the problem, the semantic web depends on a set of "ontologies", or dictionaries that help to create common definitions that can be universally applied. These may oversimplify the great complexities of meaning, but they are designed to establish a basic common level of understanding about language to allow machines to do their work. The word "city", for instance, conjures up different ideas in the heads of city planners, local politicians or sewerage experts, says Mr Hillis. But for most purposes, a lowest common denominator definition will do: for a city, they "all agree more or less on what it is".

To create those common ways of looking at the world, however, means crossing some deep political, philosophical and cultural divides. In areas such as religion, for instance, the meaning of words is closely tied to a broader world view. "Who's going to set all the rules?" asks Robert Cailliau, one of the developers of the worldwide web. "You can say two plus two equals four. But there are things like the Bible and the Koran that also set out the rules about how you should see the world."

Some of the early web 3.0 companies are setting out to stamp their mark on this process, sensing the chance to put themselves at the centre of a new global information network by defining the standards that bring meaning to the cacophony.

"We're trying to create a useful point of view," says Mr Hillis, whose latest company is seeking to build what it calls an "open, shared database of the world's knowledge". Investors including Goldman Sachs have put more than $50m into the company. Known as Freebase, it has a database designed to operate similarly to Wikipedia. It tries to outline standard definitions that are then made available for anyone to access and link their own data to over the web.

A reference to London in a web document, for instance, might be linked back to the Freebase definition of London: this could then be connected to any other instances of the word London on the web that are connected to the Freebase definition. Freebase hopes that outlining this lowest common denominator of meaning to help link data could make it part of the web 3.0 foundations.

Meanwhile, technologies first developed for use in AI are being brought to bear. Chief among these is natural language processing, or teaching software to discern the meaning in a piece of text. Views about this technology differ sharply. Mr Lynch, for instance, declares it a "dead duck: the world is just too complex". The fundamental ambiguity of language, and its dependence on context for meaning, make it impossible to automate the process of extracting meaning from text, he says.

Even simple words or concepts can mean very different things to different people and their meaning changes depending on the circumstances in which they are used, says Mr Lynch. While the human mind can make the necessary adjustments, computers that follow strict rules about language find it hard to grasp the many context-specific meanings.

Although the companies trying to employ natural language processing admit it is far from perfect, they maintain that technical advances in recent years have at least given it a level of practical application. By using software to "read" text, services such as Powerset and Mr Spivack's Twine aim to add tags to data automatically. The natural language approach also raises the possibility of new applications, for example being able directly to answer questions posed by a user - which has long been a dream in web search.

Powerset has become the most visible champion of this approach. The plunging cost of computing and the wealth of data available on the web have combined to breathe new life into this technology, according to Mr Pell. "One of the big problems was just a lack of computing resources," he says of earlier attempts. Also, refining a natural language search engine requires "a tremendous amount of 'tuning'; you need data to improve these systems". Thanks to the explosion of information on the web, data are not in short supply.

Powerset is using technology licensed from Parc - the famed Silicon Valley research laboratories formerly owned by Xerox - to try to solve the problems of natural language processing. The software is based on similar ideas to those in quantum physics, says Mr Pell. A number of potential meanings for all the elements in the text are allowed to co-exist as equally accurate during the "reading", until the most likely answer is singled out at the end.

Even supporters of this type of natural language analysis limit their claims for the technology, though they say it does not need to be perfect to be useful. According to Mr Spivack, an accuracy level of 70 per cent in analysing and tagging text has its uses.

Combining this approach with other techniques of data analysis can lift the accuracy level further. One method relies on statistics - predicting the meaning of a word based on the probabilities of its proximity to other words in the text. "It treats language as a mathematical problem," says Mr Lynch, whose company uses this method in preference to natural language. As words do not appear in random sequences, the fact that one word has been used in a sentence increases the chance that a particular other word will also turn up. "Meaning depends on your viewpoint - it's not absolute," he says.

While none of the semantic techniques has been perfected, some are reaching a level of sophistication that could lead to practical applications, at least in the eyes of the investors who are backing the start-ups. "That's the difference now - people are building artefacts that are actually useful," says Mr Hillis.

So what will these artefacts produce? Most expect the impact of the technology to be felt in stages. The early advances are likely to be "incremental improvements, and at first they won't be that noticeable", says Mr Spivack. For instance, a wide range of web services should start to become "smarter": search engines should return higher quality results, and services that rely on personalisation should make better guesses about your preferences, while targeted advertising systems should become more accurate.

The existing big names on the web, including Google, should benefit from these improvements - though entrepreneurs who are pushing the boundaries of semantic web technology, like Mr Spivack, hope that they can come up with advances that are distinctive enough to set them apart from older sites that have not mastered the approaches.

Connecting related data across the web may also usher in new types of service. A common example used by the web 3.0 visionaries again involves planning a holiday: a semantic web browser would be able to find and draw together travel schedules, hotel details, weather forecasts and other information needed to plan a trip.

Further in the future, adding a degree of reasoning to the software may enable it to filter and select information. That may start off simply - acting on your behalf, for instance, a software agent sets out across the web to compare prices for a product and identify the lowest. Eventually it may lead to making decisions on your behalf. As Eric Schmidt, Google's chief executive, told the FT last year: "The goal is to enable Google users to be able to ask the question, such as 'What shall I do tomorrow?' and 'Which job shall I take?'"

This fuller version of artificial intelligence is still over the horizon but the path towards it is "a continuum", says Mr Hillis. Contrary to the early dreams of AI, he adds, it will not be intelligent machines that provide many of the advances but dumb machines throwing up apparently smart answers by using tricks that the human brain cannot match.

The current kings of Silicon Valley certainly have no intention of being left behind. As Mr Brin said in that 2004 Playboy interview: "It's credible to imagine a leap as great as that from hunting through library stacks to a Google session, when we leap from today's search engines to having the entirety of the world's information as just one of our thoughts." But in the race to get to that point, Google is assured of many rivals.

Companies: Google Inc ;Yahoo! Inc ;Microsoft Corp ;Google Inc ;Yahoo! Inc ;

Ticker Symbols: us:MSFT; us:YHOO; us:GOOG; NASDAQ:GOOG; NASDAQ:YHOO;

Industries: Software Publishers; Information; Publishing Industries;

Subjects: Company News;

FT.com
Copyright The Financial Times Ltd. All rights reserved.