Wilson da Silva

Science journalist, feature writer and editor.

Aug 27, 1996
Published on: The Age
3 min read

A tide of data is pouring into the world’s computer banks from sources as varied as sales inventories to credit card transactions. It is creating a wealth of personal information for eager marketeers and cautious bankers. But it can also save lives, as Wilson da Silva reports.

YOU APPLY for a mortgage at a bank. The loans officer asks the usual questions, you fill out a form with your personal details and sign a waiver allowing the bank to check your credit history. She enters the details into the computer.

“I’m sorry, sir,” she says minutes later, peering up from the screen and donning a pleasant smile. “I’m afraid we won’t be able to lend you the amount you require at this stage.”

Normally, it would be no big deal. Not when there are other banks in town, only this is the fourth bank you’ve been to. You’re on the same salary as your sister who did get a loan. You keep up your credit card payments. What’s going on?

Your friend goes to hospital with a nasty gash on his leg. The wound is puffed up and getting bigger, and he’s running a temperature.

The nurse checks the wound and asks a series of questions, entering the information into a palmtop computer. Finally, she asks, “Where were you bushwalking?” He tells her. She looks up from her screen and orders an emergency team to the room. Your friend is pumped full of antibiotics and told he has a good chance of surviving Necrotising fasciitis, the “flesh-eating bug”.

Both scenarios are possible or already real, thanks to the application of some very clever artificial intelligence techniques to the rich motherlodes of seemingly unrelated data now accumulating in corporate and government computer banks. These are the fruits, and the frustrations, of “data mining” - one of the hottest things in information technology today.

It has led to concerns about privacy that have prompted the Victorian Government to adopt a far-sighted plan to set up an advisory council to ensure that the personal information it holds remains private.

Multimedia Minister, Alan Stockdale, said last week that transactions such as payment of parking fines, obtaining a birth certificate or notifying an agency of an address change would be carried out electronically by 2000. The advisory council would recommend how to keep such personal information private.

But governments are far from the only organisations gathering vast amounts of primary data. Supermarkets monitor product sales and match them with other data sets that might have only a notional relationship with the primary data, such as time of purchase, quantities, weather, suburban demographics, etc.

You “warehouse” the data in one place, scrub it, sort it and save it. Then, when you have a sizeable body of information, you let loose a neural network program that sorts through the reams of seemingly meaningless data and - bingo! The artificial intelligence agent discovers links you would have never dreamed of.

A data mining program at the Wal-Mart retail chain in the US found a relationship between two completely different products: nappies and beer. Sales of the two would shoot up on Friday nights. It didn’t seem to make sense, but the correlations were too high for the neural net to ignore.

The link was soon pinpointed: young husbands whose wives had called and asked them to pick up nappies on the way home were stocking up on beer for the weekend.

A new market niche had been identified. Wal-Mart moved the nappies to the front of its stores, and put high-margin imported beers next to them. Sales went through the roof, and profit margins on beer soared.

In another case, a British supermarket chain discovered through data mining that it sold a lot more of the smaller, more expensive icecream packs than it expected. Why? When the data was matched with existing demographics and household surveys, the reason became obvious: many customers lived on their own and had small fridges that could fit only the smaller packs. The company began promoting small packs to the market segment they had discovered.

“With data mining tools, you don’t really know what you’re looking for,” said Bob Hayward, vice-president of leading technology consultants, the Gartner Group. “It’s a discovery process using very advanced technical means. You point these tools at your data warehouse, and it discovers patterns, trends, relationships that you weren’t aware of and would not know to look for.”

Data mining is the hot topic in marketing. In the United States, companies have sprung up in the past two years that do nothing but establish data warehouses and then plumb their depths for nuggets of useful information.

Others are industry stalwarts who have started playing big in the market, like IBM and Hewlett-Packard. THE technique is being applied in many situations: from manufacturing processes to loan applications. Some involve deep, deep databases of information that the programs can drill into, or short data sets of a particular process that neural nets can analyse for clues. They have been used in nuclear power plants to understand why reactors at times performed at high yields and at other times low yields.

At the large US printing firm, R. R. Donnelly, of Tennessee, technicians used data mining to analyse the incidence of “banding” in its high speed printers - a production snag that requires whole rolls to be thrown out and print jobs to be started all over again. After the neural nets studied clean production runs against failed runs over time, they discovered relationships the human printers couldn’t see. Within a short time, the incidence of banding fell by 90 per cent.

“There were about 40 things they were measuring in each run, including the thickness and roughness of the paper, the amount of solvent in the ink, the temperature and humidity and all sorts of things,” said Professor Ross Quinlan of the University of Sydney, a recognised authority in the rarefied world of data mining. “Once they realised the combination (that led to banding), they could predict when it was going to occur and develop rules to reduce it.”

Most applications so far involve drilling deep into the mounds of stored data kept by retail and service outlets. The first stop along this road is warehousing: collecting the data, scrubbing it, making sure it’s of highintegrity, establishing a common format so different data sets can worktogether, and storing it in one place. Then you build a numerical back- log: sales of different products over months or years, and sales from different distribution points.

You then merge it with relevant or notionally related data from external sources: census statistics, economic data, weather - anything that might have some impact on sales.

You look for affinity purchasing patterns, “something you can’t detect by looking for it, something that surprises you, “ said Hayward. “You may identify new buyer segments. You can do cross-product marketing and you can improve your store layout.”

The ramifications can be disturbing. In the mortgage lending case above, you might be rejected for a loan by the neural net program, not because you were a bad risk, but because your profilematches those of borrowers who have, in the past, ended up being high-risk.”

Federal privacy legislation would prevent your personal information being made available to others, and provides for fines of up to $150,000 for violations. But there is nothing in common law to prevent companies from using customer information to build profiles of “bad customers”.

“Lenders accumulate vast quantities of personal data from the time credit is applied for,” Daniel Marks, a senior associate at lawyers Gadens Ridgeway, told a recent data mining seminar in Sydney. “During the course of the credit, more information is gathered about how customers use and repay credit.”

The use of such data to forecast potential “problem customers” and refuse them credit, or set restrictions on their activity, can probably be done without breach of existing privacy laws, he said.

“The rationale for using this information is that (it) is already held by the data user, or lender, and is merely being used to produce a trend . . . providing broad results that do not refer to individuals,” he said.

But the trends generated can be then applied by corporations to make judgments about individuals - to class people as potential defaulters, for example, even if they’ve never defaulted before.

You might be denied credit not because of who you are, but the kind of person you are judged likely to be. Whether you have a spotless credit record may or may not matter.

While financial institutions rely on consent forms signed by customers at the time of credit application, the courts may well disagree with the extent of the use made by the data.

“The issue always arises as to whether the consent was truly an informed consent,” said Marks.

The federal Privacy Act of 1988 protects customers from a trade in private information, but only covers governments, telecommunication providers and financial institutions. In New South Wales, the Privacy and Data Protection Bill 1996 has been introduced into Parliament, seeking to extend this to the private sector as well as strengthen existing provisions.

Victorian Treasurer and Minister for Multimedia, Alan Stockdale, has recently indicated that Victoria will also introduce enhanced data protection laws.

In the finance industry, such tools are already being used extensively for things like fraud detection, credit card authorisation, and portfolio and investment analysis by brokerage companies trying to forecast what Wall Street will do tomorrow. Governments are also employing them, especially for welfare payments.

BUT most companies in Australia are largely unaware of data mining, and very few use it. “It’s very data intensive,” said Hayward. “It’s only within the last year or two that it’s become practical to use, when you have reasonably inexpensive massively parallel processors and huge storage capabilities.

“ Australian companies using data mining tend to keep quiet about it. BHP is known to be using it for manufacturing analysis, and at least one major and one regional bank have been employing the techniques extensively.

“This stuff is very strategic. No one wants to talk about it,” said Steve Hitchman, director of Management Information Principles, a leading data mining merchant.

“In Australia it’s quite rare, and even overseas, say in America, it’s not enormously widespread,” said Professor Quinlan.

Yet, data mining is gaining currency in Australian business: last week, the Data Warehousing Institute of Australia was set up, the first time the US-based organisation has opened a chapter outside the country.

The group has its eye not just on collecting the data, but mining it. “The bottom line is that with better management of information, business decisions can now have a sounder base,” said Phillip Clewlow, president of the new institute.

At Monash University in Melbourne, Dr Xindong Wu, one of the country’s leading data mining researchers, sees the technology as a source for good. He is developing programs that scour medical databases for interrelationships between symptoms and external factors, like weather and geography, with the onset of disease.

Using a database of two million individuals at one hospital, Wu hopes to uncover clues that would help medical researchers more easily recognise ailments, or treat them better.

“Many people have predicted that it will become the biggest application in artificial intelligence,” said Wu. “Data mining is becoming more and more practical.”