Board index » delphi » Inconsistent data within a column

Inconsistent data within a column

I think
If you want simply to treat them as one in reports, you can add something
like tree structure in your persons.

The simplest way - add a column
in persons' table that will contain its or another persons key.
(if  id for 'IBM' = 123 then for any  'I.B.M..' this column must contain
123).

Then you can make grouping by this column.

Quote
Richard Crawford wrote in message <3648F88F.72883...@awod.com>...
>Is there any way I can make data consistent within a column, or
>standardize data easily? I've got about 15 different ways a company name
>appears in a column, such as 'IBM', 'International Business
>Machines','I.B.M.', etc..  making queries and reporting very difficult,
>and since this data is collected via a form on the Internet there is no
>way to normalize it - people can enter data the way they want. Any easy
>solution?

>Thanks,

>Richard

 

Re:Inconsistent data within a column


The easiest way for the user is to prepare a lookup table with the
company names. In the main table you have only a reference to one entry
in the lookup table.
The user can choose the desired value from a combo box. If the desired
value does not yet exist you have to supply a possibility to insert a
new value.

Heri Bender

Re:Inconsistent data within a column


Quote
Richard Crawford wrote:
> Is there any way I can make data consistent within a column, or
> standardize data easily? I've got about 15 different ways a company name
> appears in a column, such as 'IBM', 'International Business
> Machines','I.B.M.', etc..  making queries and reporting very difficult,
> and since this data is collected via a form on the Internet there is no
> way to normalize it - people can enter data the way they want. Any easy
> solution?

> Thanks,

> Richard

 Richard,

If you can't normalize and have a separate table with keys for company name,
there is a tool available from DataFlux (called the dfPower Series) which
allows you to quickly standardize data within a column by building
standardization schemes for any type of data, not just company names. And it
is pretty easy to use because it is entirely point and click driven. The
other thing I like about it is that it connects directly to the database via
ODBC. It does other things like identify duplicate data via 'fuzzy logic'
etc.

They provide a free 'data quality' diagnostic tool that allows you to
determine the extent of data quality problems that you might have in the
database based on your own data:

http://www.dataflux.com/dfroi.htm

And if that works for you then you can move up to the dfPower Series.

Hope this is helpful!

Denise

Other Threads