Report: Cost/Benefit of Enterprise Warehouse Solutions

In-depth Comparison of IBM Smart Analytics System 7700,
Teradata Active Enterprise Data Warehouse and
Oracle Exadata Database Machine.

Data warehousing has emerged as one of the IT world’s fastest growth areas. New deployments continue to accelerate, and numbers of applications and users within organizations continue to expand. Demand for high-quality, current information and for tools to interpret and exploit it shows no signs of abating. High
double-digit growth in data volumes has become the norm.

The business benefits of data warehouse applications are clearly recognized. But, increasingly, users are faced with escalating expenditure not only on data warehouse solutions, but also on underlying platforms. At a time of budgetary pressures, questions are raised about the most cost-effective means of realizing
information value.

This is particularly the case for special-purpose platforms offered by IBM (Smart Analytics System, Netezza TwinFin), Oracle (Exadata Database Machine), Teradata (Active Enterprise Data Warehouse) and smaller players. Architectures and technologies of these systems are often unfamiliar to organizations that deploy them. Techniques for measuring comparative performance and cost are rudimentary.

Challenges are compounded by several factors. One is that the performance of different architectures depends on the workloads they execute. Another is that data warehouse usage tends to evolve rapidly – organizations that deploy platforms for specific applications may soon find that they must deal with significantly different environments. A third is that vendor pricing may vary widely between customers.

This report sets some parameters for comparisons. To do this, it takes into account types of workload – in particular, a key distinction is drawn between complex mixed workloads and queries involving large sequential table scans – compares overall three-year as well as acquisition costs, and bases platform calculation on “street” pricing (i.e., discounted prices paid by users).

The report focuses on three platforms: IBM Smart Analytics System 7700, Oracle Exadata Database Machine and Teradata’s flagship Active Enterprise Data Warehouse (Active EDW) 6650. Results are based on input from 46 users of these systems and their recent predecessors, on other industry sources, as well as on research and analysis conducted by the International Technology Group (ITG).

Two sets of cost comparisons, based on performance and user data, are presented.

Conclusions

Cost comparisons presented here are based on typical configurations, utilization and staffing levels, along with street prices reported by users for data warehouses characterized by complex mixed workloads. In practice, configurations and applications vary between and in some cases within organizations, and vendors may price more aggressively in genuinely competitive situations.

Certain conclusions may nevertheless be drawn. The economics of special-purpose data warehouse systems are affected not only by pricing, but also by how well architectures handle specific types of workload and by levels of data compression that may be achieved in practice. Personnel costs are also affected by the extent of automation, and facilities costs by the efficiency with which systems operate.

From these perspectives, key distinctions must be drawn among the three platforms that are the focus of this report. In terms of underlying architecture, system design and hardware and software implementation, Smart Analytics System 7700 and Active EDW 6650 are better optimized – by wide margins – to handle complex mixed workloads than Exadata Database Machines.

The level of optimization is highest for Smart Analytics System 7700, which employs a newer architecture. Teradata systems are more constrained by legacy characteristics.

The capabilities of Exadata Database Machines, however, are significantly different from Smart Analytics System 7700 and Teradata Active EDW equivalents. This is particularly the case in the following areas:

  • Workloads. Exadata Database Machines deliver their best performance for workloads characterized by large sequential table scans.

    Such workloads are generated by applications that are structurally simple, but require a great deal of processing power; e.g., identifying and collating specific variables in large volumes of records. These applications typically support small numbers of executives and/or analysts.

    The entire Exadata Database Machine architecture is geared to this type of applications and workload. This is the case for the overall system design as well as for three key technologies – (1) Smart Scan, (2) Exadata Hybrid Columnar Compression and (3) Smart Flash Cache – presented by Oracle as Exadata Database Machine differentiators.

    The emphasis on large sequential table scans reflects, at least to some extent, earlier development of the “data warehouse appliance” market. By the mid-2000s, Teradata’s dominant market position in special-purpose systems was being eroded by Netezza, which offered lower-cost appliances built around “commodity” components.

    Netezza systems were rapidly adopted by many organizations – by September 2010, the company claimed 373 installed customers – for applications generating this type of workload. When Oracle Exadata Database Machine was introduced, it was generally seen as aimed at Netezza.

    Although Oracle has since positioned Exadata Database Machines more broadly, design for high- volume table scan processing remains a fundamental characteristic.

  • Compression. The extent to which systems can compress data without unacceptable performance degradation has a major impact on system capacities and, if measurements are based on user data, comparative costs. System processing may also be accelerated, and I/O throughput times reduced.

    IBM DB2 9.7, which is employed by Smart Analytics System 7700, features one of the industry’s most effective across-the-board implementations of data compression. Compression extends to rows as well as indexes, temporary tables, log files, large objects and other data structures. Users have routinely experienced overall compression levels of 55 percent to more than 85 percent. Among Smart Analytics System 7700 users who contributed to this report, for example, overall compression levels averaged 72 percent. As a general metric, IBM employs a 60 percent ratio.

    Teradata expanded compression in version 13.10 of its database. This was introduced only in November 2010, and no Teradata users contacted for this report had practical experience with its compression capabilities.

    Oracle employs two compression technologies. Database Machines implement Advanced Compression, which is a feature of Oracle Database 11g. Although higher levels have been achieved, users have found that unacceptable performance degradation typically occurs above approximately 25 percent.

    Exadata Storage Servers implement EHCC technology, which is designed to compress large tables, and is most effective when these tables are processed sequentially. Oracle has claimed compression rates of up to 70 times. Among users, rates of two to three times have been reported. EHCC does not, however, have a similar effect for other data structures and types of workload.

  • Scalability. Experience has shown that Smart Analytics System 7700 and Teradata systems can scale to very large configurations. Teradata systems routinely contain dozens – in some cases, hundreds – of nodes. Among Smart Analytics System 7700 users, seven production systems were reported to support 40TB to more than 120TB of user data. Larger systems were planned.

    Oracle Database Machines implement RAC architecture, which is built around a “shared everything” model; i.e.; multiple processors share a common memory pool. In comparison, Smart Analytics System 7700 and Teradata systems employ “shared nothing” architecture, in which these components are subdivided into nodes. The Exadata Storage Server also employs a “shared nothing” model.

    “Shared everything” models are generally regarded as more vulnerable to scalability constraints – contention for system resources increases as systems expand. It is unclear whether this is the case for Exadata Database Machines, as deployments to date have been of relatively small configurations. Users have reported installations of 1/2 to 2 racks, with 3TB to 10TB of user data.

  • Automation. IBM Smart Analytics Systems and Teradata systems typically require fewer DBAs and other IT personnel than Oracle equivalents. Levels of automation are significantly higher for DB2 9.7 and (to a lesser extent) Teradata 13.10 than for Oracle Database 11g.

    The impact on FTE staffing levels is magnified when databases undergo frequent changes – which is typically the case when numerous applications, diverse user populations and complex workloads must be supported. Users of all types of special-purpose systems noted this effect.

    Performance competitiveness and cost-effectiveness of Exadata Database Machines thus tend to be workload-specific. Users have cited “massive table scans and joins,” “very large table scans” and similar terms in describing their workloads. For many organizations, however, the viability of this platform will tend to decline as data warehouses evolve towards more complex mixed workload environments.

    All Smart Analytics System 7700 and Teradata users who contributed to this report reported that they had either planned for multifunction data warehouses, or that usage had developed in this direction.

    There will clearly still be demand for special-purpose systems to handle large sequential table scans. But users should be aware of the limitations of these. Platform decisions should be based not only on short-term requirements, but also on how organizations expect data warehouse demand to evolve in the future.

  • Full report download [PDF]

    Share

Your email address will not be published. Required fields are marked *