Data
are the raw material from which information is produced. Therefore, it is not
surprising that in today’s information driven environment, data are a valuable
asset that requires careful management. To assess data’s monetary value, take a
look at what’s stored in a company database : data about customers, suppliers, inventory,
operations, and so on . How many opportunities are lost if the data are lost ?
What is the actual cost of data loss ? For example, an accounting firm whose
entire database is lost would incur significant direct and indirect costs. The accounting
firm’s problems would be magnified if the data loss occurred during tax season.
Data loss puts any company in a difficult position. The company might be unable
to handle daily operations effectively, it might be faced with the loss of
customers who require quick and efficient service, and it might lose the
opportunity to gain new customers. Data are a valuableresource that can
translate into in fo rmati on. If the information is accurate and timely, it is
likely to trigger actions that enhance the company’s competitive position and
generate wealth. In effect, an organization is subject to a data – information -
decision cycle; that is, the data user applies intelligence todata to produce
in fo rmati on that is the basis of knowledgeused indecisionmaking by the user.
This cycle is illustrated in Figure 15.1.
Note
in Figure 15 .1 that the decisions made by high-level managers trigger actions
within the organization’s lower levels . Such actions produce additional data
to be used for monitoring company performance. In turn, the additional data
must be recycled within the data – information - decision framework. Thus, data
form the basis for decision making, strategic planning, control, and operations
monitoring. A critical success factor of an organization is efficient asset
management. To manage data as a corporate asset, managers must understand the
value of information that is, processed data. In fact, there are companies (for
example, those that provide credit reports) whose only product is information and
whose success is solely a function of information management.
Most
organizations continually seek new ways to leverage their data resources to get
greater returns. This leverage can take many forms, from data warehouses that
support improved customer relationship management to tighter integration with
customers and suppliers in support of electronic supply chain management. As
organizations become more dependent on information, the accuracy of that
information becomes ever more critical. Dirty data, or data that
suffer
from inaccuracies and inconsistencies, becomes an even greater threat to these
organizations . Data can become dirty for many reasons, such as :
- Lack of enforcement of integrity constraints (not null, uniqueness, referential integrity, etc.).
- Data entry typographical errors.
- Use of synonyms and/or homonyms across systems .
- Nonstandardized use of abbreviations in character data .
- Different decompositions of composite attributes into simple attributes across systems.
Some
causes of dirty data can be addressed at the individual database level, such as
the proper implementation of constraints. However, addressing other causes of
dirty data is more complicated. Some sources of dirty data come from the
movement of data across systems, as in the creation of a data warehouse.
Efforts to control dirty data are generally referred to as data quality
initiatives. Data quality is a comprehensive approach to ensuring the accuracy,
validity, and timeliness of the data. The idea that data quality is
comprehensive is important. Data quality is concerned with more than just
cleaning dirty data; it also focuses on the prevention of future inaccuracies
in the data, and building user confidence in the data. Large-scale data quality
initiatives tend to be complex and expensive projects. As such, the alignment
of these initiatives with business goals is a must, as is buy - in from top
management. While data quality efforts vary greatly from one organization to another,
most involve an interaction of :
- A data governance structure that is responsible for data quality.
- Measurements of current data quality.
- Definition of data quality standards in alignment with business goals.
- Implementation of tools and processes to ensure future data quality.
There
are a number of tools that can assist in the implementation of data quality
initiatives. In particular, data profiling and master data management software
is available from many vendors to assist in ensuring data quality. Data profiling
software consists of programs that gather statistics and analyze existing data
sources. These programs analyze existing data and the metadata to determine
data patterns, and can compare the existing data patterns against standards
that the organization has defined. This analysis can help the organization to
understand the quality of the data that is currently in place and identify
sources of dirty data. Master data management (MDM) software helps to prevent
dirty data by coordinating common data across multiple systems. MDM provides a master
copy of entities, such as customers, that appear in numerous systems throughout
the organization. While these technological approaches provide an important
piece of data quality, the overall solution to high-quality data within an
organization still relies heavily on the administration and management of the
data.
Reference
:
Coronel,
Carlos., Morris, Steven., Rob, Peter., Database Systems : Design,
Implementation & Management, 9th Edition, Course Technology, 2011
Comments
Post a Comment