September 01, 2000

Article at Newton Magazine

Mining the Motherlode

Lakes of data are fast accumulating in the world's computers banks – and it can all be very useful

By Wilson da Silva

HEARD OF data mining? It’s all the rage. Thanks to the burgeoning databanks of seemingly unrelated information now accumulating in computers servers, as well as some very savvy technology like artificial intelligence, software agents and neural nets, all sorts of new discoveries are being made. And it’s even happening at the supermarket.

Take Wal-Mart, the large national supermarket chain in the United States. After data mining tools were unleashed on months of sales across all of its stores, two seemingly unrelated products were flagged by the computers as being highly linked: nappies and beer. Company executives scratched their heads in bewilderment. 

But after a little detective work, they found the computers were right: young husbands on the way home from work were being asked by their wives to pick up nappies. While in the store, many were also stocking up on beer for the weekend, or to last them through the traditional Friday Night Football on TV. Suddenly, through the haze of raw sales data, a marketing opportunity was identified. So, store managers moved nappies to the front of their aisles and placed a freezer full of expensive imported beers right next to it. Sales of imported beers went through the roof.

Much the same is likely to be happening in Australia. But try to get anyone to talk about it. “This stuff is very strategic, no-one wants to talk about it,” says Steve Hitchman, managing director of Management Information Principles, a leading data mining merchant. Manufacturers like the Broken Hill Pty Company are known to be using data mining to improve efficiencies in their production processes, and at least one major bank and a regional one have been employing the techniques extensively to establish which clients are potential credit risks based on their spending and saving patterns. At least 40 companies in Australia are known to be running pilot programs with the technology.

 “With data mining tools, you don't really know what you're looking for,” says Bob Hayward, vice-president of leading technology consultants the Gartner Group. “You point these [software] tools at your data warehouse, and it discovers patterns, trends, relationships that you weren't aware of and would know to look for.”

Data mining is one of the hottest areas of information technology today. And the technique is being applied in all manner of situations: from manufacturing to mortgage loans. Some involve deep, deep databases of information that the programs can drill into, or the merger of many shorter data sets that the computers can cross-check and analyse for clues. 

Most applications involve drilling deep into the mounds of data kept by retail and service outlets. The first stop along this road is warehousing: collecting the data, ‘scrubbing’ it  – establishing a common format so different data sets can work together – and storing it in one place. Then you build a numerical back-log: sales of different products over months or years, and sales from different distribution points. You then merge this processed information with related data from external sources: census statistics, economic data, weather – anything that might have an impact on sales. 

Data miners look for affinity patterns – “something you can't detect by looking for it, something that usually surprises you,” as Hayward explains. But not everyone has the capacity to do it. “It's very data intensive,” he adds. “It's only within the last few years that it's become practical to use these sorts of tools, when you have reasonably inexpensive massively-parallel processors [high-speed computers working in tandem] and huge storage capabilities.”

The most promising use of data mining could be in research, where they can be used to help make discoveries. At Monash University in Melbourne, Dr Xindong Wu, a leading data mining researcher, is developing techniques that scour medical databases for inter-relationships between symptoms and external factors, like weather and geography, with the onset of disease.

Using a database of two million individuals at one hospital, Wu hopes to uncover clues that would help medical researchers more easily recognise ailments, or better treat them. “Many people have predicted that it will become the biggest application in artificial intelligence,” says Wu.