Part 1: SQL Server Parallel Data Warehouse – Best Thing Since Sliced Bread?

Update: As of an April 2014 announcement Microsoft is calling its upcoming next iteration of Parallel Data Warehouse Edition-based offerings Analytics Platform Systems and relative unknown Quanta joins Dell and HP as a HW provider.

With my first post, I wanted to take a look at the capabilities of Microsoft’s SQL Server Parallel Data Warehouse offering and contrast it with a more established offering, IBM’s PureData System for Analytics – still probably better known today as Netezza.

Parallel Data Warehouse (PDW) is offering you can order from HP called the AppSystem for Parallel Data Warehouse or from Dell called the Dell Parallel Data Warehouse, both running SQL Server 2012 Parallel Data Warehouse edition. Parallel Data Warehouse Edition combines or leverages capabilities from Microsoft’s SMP-only SQL Server and that of their 2008 DATAllegro acquisition.

PDW Background/History

When general availability of PDW V1 was first announced in November of 2010, the message seemed to me to be that the MPP (massively parallel processing) or shared-nothing architecture of PDW was something new and revolutionary, rather than a technology leveraged by some other vendors for two decades for very large databases. IBM introduced DB2 Parallel Edition, today called the DB2 Database Partitioning Feature or DPF, in 1995; Netezza, today called PureData System for Analytics, came out in 2003; Teradata had an offering in the 80s. While it is positive that Microsoft introduced an option for customers hitting the wall with BI on SQL Server (where Oracle for example persists with their RAC shared data architecture for everything), many customers long ago recognized that shared nothing was the right approach for working with large data sets and have leveraged shared nothing platforms to gain insights from their data. The small number of PDW case studies being highlighted by Microsoft up to two years after the V1 release suggest adoption has been slow.

In first half of 2013, PDW V2 came out with a significantly different architecture, moving from deployment of the software directly on the servers in a rack to using Hyper-V virtualization, using JBODs (just a bunch of disks) vs. a SAN, and a 1 rack starting format (vs. 2 rack in V1) with more CPU, memory, and disk. There were also a few database-level enhancements, the most notable being use of ColumnStore Indexes for query performance improvement.

Proof Points

Reading about how a product offering’s benefits in a vendor’s solution brief is great. Hearing from actual customers is much better. PureData System for Analytics is used by over 1000 customers, with all of them echoing the same key points. Customers see excellent Total Cost of Ownership, not only from the software and hardware cost side but in terms of on-going management cost – big data volumes tend to create big data complexity. They also invariably see excellent out-of-the box performance for their most demanding analytic workloads. According to the a Jan 2014 Information Week article on Big Data Analytics platforms by Doug Henschen, “There’s no doubt that Microsoft is amassing all the pieces, but it’s early days for HDInsight, and we still don’t see many PDW deployments after three years in the market.” While proof points are not everything in a world of rapidly evolving technology, they are something worth paying attention to.

Stay tuned for following parts…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s