tag:blogger.com,1999:blog-52988824997180104822024-03-08T06:33:35.438-05:00Unimpeded By SanityChad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.comBlogger30125tag:blogger.com,1999:blog-5298882499718010482.post-46711988189104018722017-06-17T06:00:00.000-04:002017-06-28T14:18:40.168-04:00DataWorks Summit 2017 - Netflix: Scaling Data QualityNetflix processes over 700 billion events / day, with 300 TB data warehouse writes each day and 5 PB of data warehouse reads each day. When quality starts to go bad, it can escalate into a major problem very quickly.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijau7a38vRMvpveEOKM-srJbsY9SScmy615q_C9B-N2j-6zlFzTpJ3NXizx7IYsDnnqxv_DENd_AHMH2PK7Yo1SXdwUPTZ1_h6utAnL9kD1yTBOce7_7XdqVKJlOYuSgd4S3j_vZViMhw/s1600/Netflix_data_ingest.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="377" data-original-width="854" height="175" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijau7a38vRMvpveEOKM-srJbsY9SScmy615q_C9B-N2j-6zlFzTpJ3NXizx7IYsDnnqxv_DENd_AHMH2PK7Yo1SXdwUPTZ1_h6utAnL9kD1yTBOce7_7XdqVKJlOYuSgd4S3j_vZViMhw/s400/Netflix_data_ingest.png" width="400" /></a></div>
<br />
Data quality problems tend to be from upstream systems when either the volume changes dramatically or values shift out from underneath the system. It is more important to find out <b>"when"</b> there is a problem, than <b>"why"</b> there is a problem. To support this, developers have created a few tools:<br />
<br />
<ul>
<li>Metacat</li>
<ul>
<li>Federated Metastore, like an extended HCatalog</li>
<li>Contains statistics about the data on the partition</li>
<ul>
<li>Missing data</li>
<li>Life Cycle</li>
<li>Audience using the data</li>
<li>Sunset date</li>
<li>Sunrise date</li>
</ul>
</ul>
<li>Quint</li>
<ul>
<li>DQ Service</li>
<li>Defines metrics using a sql-like syntax</li>
<li>Evaluation rules include normal distribution compared over X partitions</li>
</ul>
<li>WAP (Write, Audit, Publish)</li>
<ul>
<li>This is an ETL pattern that leverages the idea of separating schema from data.</li>
<ul>
<li>Write the additional data to a new partition file</li>
<li>Audit table definition is created to match the original table + the new partition</li>
<ul>
<li>Validate the data using the Quint rules</li>
</ul>
<li>Publish the partition by updating the production table definition to include the new partition</li>
</ul>
</ul>
</ul>
<div>
This pattern is not easy to put into place, but library components help simplify it it by applying common patterns based on the audience and life cycle of the data.</div>
<div>
<br /></div>
<div>
Lessons learned</div>
<div>
<ul>
<li>Query based validation may be enough</li>
<li>Not all tables require quality coverage</li>
<li>One size does not fit all, use multiple small components so that customizations don't require the process to be completely on its own.</li>
</ul>
</div>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com5tag:blogger.com,1999:blog-5298882499718010482.post-35378856751493924982017-06-15T06:00:00.000-04:002017-06-22T11:30:09.455-04:00DataWorks Summit 2017 - Verizon Finance Data LakeVerizon has stood up a finance data lake to be a shared Enterprise Data Repository and enable self-service data discovery. This combines data from multiple SAP systems into a single, easy to access data repository. Benefits included:<br />
<br />
<ul>
<li>Simplifies access to ERP data</li>
<li>Reduces data replication</li>
<li>Enables sharing of data (eliminating point to point integration)</li>
<li>Drive data archiving strategy</li>
<li>Rationalize data sources</li>
<li>Central set of business rules</li>
<li>Common reporting and analytic tools</li>
<li>Unified location to apply Data Governance and Security</li>
</ul>
<div>
The Architecture has data flow up from the source systems and into the data lake (~1,000 tables). The data movement is a combination of direct file imports, batch ingestion with Sqoop and incremental ingestion. The incremental ingestion was done by CDC and partition swaps originally but was eventually replaced with Attunity Replicate to simplify the maintenance and speed up processing.</div>
<div>
Governance and security is applied, the sensitive data is either tokenized or not allowed into the cluster. The data is processed through the ETL layer, consisting mostly of Hive processes. The Logical model is the semantic layer provided by the reporting tools. </div>
<div>
<br /></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhppDIejnpoJANTi4R5Qq9TDdIbNoxkV5rAXaHj0j0LM5M25dN2ouMXcAU0R811XUCC1cSHI4W3ZgkSbXcJLacny_UkKzG3XzvMn83Njfv3WI-BjeE71eTt2ulLEafDWABvbPoN3WJRu5U/s1600/vzw_dl_arch.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="526" data-original-width="637" height="330" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhppDIejnpoJANTi4R5Qq9TDdIbNoxkV5rAXaHj0j0LM5M25dN2ouMXcAU0R811XUCC1cSHI4W3ZgkSbXcJLacny_UkKzG3XzvMn83Njfv3WI-BjeE71eTt2ulLEafDWABvbPoN3WJRu5U/s400/vzw_dl_arch.png" width="400" /></a></div>
<br />
There are areas to improve.<br />
<br />
<ul>
<li>Data Validations are executed with manual scripts after the fact.</li>
<li>Data Governance is limited and needs review as more users are given access to the data</li>
<li>There is no managed data catalog at this time</li>
<li>Very few curated and maintained data sets</li>
<li>Self-service data preparation tools are being reviewed</li>
<li>Capabilities are limited to small number of uses</li>
</ul>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com14tag:blogger.com,1999:blog-5298882499718010482.post-71915387232775566312017-06-14T09:39:00.003-04:002017-06-14T09:50:36.276-04:00DataWorks Summit 2017 - Day 1 KeynoteI was able to get to my first Hadoop conference this year, so far it's been really good. Below are my notes from the sessions I was able to attend today.<br />
<br />
The sessions I went to today include:<br />
<br />
<ul>
<li>Summit Keynote</li>
<li>Verizon: Finance Data Lake</li>
<li>Whoops, the numbers are wrong! Scaling Data Quality @ Netflix</li>
<li>Dancing Elephants - Efficiently working with object stores from Apache Spark and Hive</li>
<li>Governance Bots - Metadata driven compliance</li>
<li>Cloudy with a chance of Hadoop - real world considerations</li>
<li>Hadoop Journey at Walgreens</li>
<li>LLAP: Building Cloud-first BI</li>
</ul>
<br />
<h2>
Summit Keynote</h2>
The keynote was considerably flashier than I was expecting, the first 10 minutes or so was a lot of fog and laser light shows stuff. Not really my thing and I wish they didn't do that.<br />
<h3>
IBM Announcement</h3>
They did announce a very interesting partnership with IBM. IBM's Big Insights was based on open source Hadoop, maintaining their own distribution.Now IBM is going to leverage Hortonworks Data Platform (HDP) with Big SQL and Data Science Experience (DSX) layered on top of it.<br />
<br />
Hortonworks has stated that for a distributed Data Warehouse Big SQL is their recommendation, but it doesn't seem to change their message about Hive. Hive still is their platform for Analytical SQL processing. The thing that Big SQL brings to the table is their "Fluid Query" (Federation) across multiple data sources that live in Hadoop (Hive, HBase, Spark and Object stored through HDFS) and conventional RDBMS (Oracle, SQL Server, Teradata, DB2, PDA / Netezza, etc.). Big SQL is a query optimizer and compiler to gather data across all of those systems. I've never used it, but the notion is interesting.<br />
<br />
As for DSX, I have only seen a little bit of it. At first glance it appears that it stitches together some opensource Machine Learning (ML) platforms (and SPSS) into a set of coherent tools that focus upon collaboration. There are video tutorials to help team members with varying experience in the tools learn about using them through DSX. They have a website for machine learning use cases by industry at <a href="http://www.ibm-ml-hub.com/">www.ibm-ml-hub.com</a>, but there isn't much up there yet. Their visualization engine has been open-sourced, it's called brunel and is available on <a href="https://github.com/Brunel-Visualization/Brunel" target="_blank">github</a>. It can be used with Python, R and Spark, but I'm not clear how it stacks up to its competitors. They also focused on how simple it is to integrate the models generated into existing applications. The models can be tested, scored, monitored and generate alerts based on defined conditions.<br />
<h3>
Round-table</h3>
There was also a round-table of folks using Hortonworks in different industries, including:<br />
<br />
<ul>
<li>Duke Energy</li>
<li>Black Night (Fidelity Mortgages)</li>
<li>Healthcare Services</li>
</ul>
<h4>
Duke Energy</h4>
<div>
They came to the conclusion that they need to be a digital company that sells energy, instead of an energy company using some technology. This is a completely different approach from what they did before and required a major mental shift for everyone. There was a complete shift from commercial to open source software whenever the option exists. This model has been an advantage for attracting and retaining top talent. Above all else, they recommend making sure that Governance be considered early in the journey, otherwise it will lead to problems.</div>
<div>
<br /></div>
<div>
They have many sensors (>4 million) feeding into their system which would have conventional data warehouse appliances, so most data was getting lost. They are able to identify customer issues faster than ever and are using customer data to fuel their warehouse and applying machine learning to help with customer calls. An interesting thing they are working on is dynamically looking at the customer calling in, identify anticipated needs and offer the options to them faster than ever.</div>
<div>
<br /></div>
<h4>
Black Night (Fidelity Mortgage)</h4>
<div>
Black Night uses Hadoop to improve their understanding of the customer. Similar to Duke, the cultural shift was hard but it also attracted (and retained) talent that otherwise would not of joined the company. They are using the data they gather to predict customer behavior and areas of interest. They want to reach the point where a customer can get their mortgage in a few minutes instead of the time intensive process it has been in the past.</div>
<div>
<br /></div>
<h4>
Healthcare Services</h4>
<div>
Healthcare Services are two years into their data journey. This industry is even more heavily regulated and tends to towards data silos than most. To break out of the silos wherever possible they are switching to opensource software to fuel their ongoing machine learning efforts. For example, zip codes are highly correlated to lifespan, this can help identify populations that need help and preventive care. By working to provide clinics and focus attention on those areas those numbers can be improved.<br />
<h3>
Microsoft using YARN</h3>
Microsoft is switching to using YARN on their COSMOS clusters. These are very large clusters that need to support > 50k nodes (each!) distributed throughout the globe. Previously it used their own resource negotiator that processed Dryad DAGs (talk about a blast from the past!). Microsoft has been steadily contributing changes to YARN so that it can swap out their homegrown solution for an indsutry standard approach. They are a major project contributor and have dedicated MS Research members to work on improving cluster size and creating predictable and preemptable processing allocations. This effectively allows them to create higher priority jobs that will always execute when needed and allow other jobs to run when resources free up again. They are achieving the scalability by adding a federation layer and having each YARN instance scale to 10k nodes. Fascinating stuff, I love keeping an eye on the <a href="https://www.microsoft.com/en-us/research/" target="_blank">MS Research site</a>.<br />
<br /></div>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0tag:blogger.com,1999:blog-5298882499718010482.post-32203097206559454892015-03-30T20:26:00.001-04:002015-03-30T20:26:49.647-04:00EDW 2015: Data Governance Organization
<br />
<div style="border-width: 100%; direction: ltr;">
<div style="direction: ltr; margin-left: 0in; margin-top: 0in; width: 5.302in;">
<div style="direction: ltr; margin-left: 0in; margin-top: 0in; width: 5.302in;">
<div style="color: #1e4e79; font-family: Calibri; font-size: 16.0pt; margin: 0in;">
Pharmaceutical model</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Janet Lichtenburger and Nancy Makos presented a terrific view of how Data Governance works at their company. They implement functional (small) data governance, with the exception of data sensitivity.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Here is my brain dump of the session today. You're welcome to it if you can make heads or tails of it. :-) </div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Some numbers that show the challenge of Data Governance there are: </div>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">5 people in Enterprise
Architecture</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">3 people in Data Governance</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">130 people provide Data
Stewardship support</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">350k total employees</span></li>
</ul>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Data Governance is
about Policies and Standards and is typically independent of implementations,
as part of the Enterprise Architecture or Finance groups.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
To encourage
adoption, Data Governance could be considered an internal consulting service to
support projects ,that is not charged back.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #1e4e79; font-family: Calibri; font-size: 16.0pt; margin: 0in;">
Organizational
Model</div>
<div style="color: #2e75b5; font-family: Calibri; font-size: 14.0pt; margin: 0in;">
Overall
Model</div>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="1">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">Enterprise Data Governance
Executive Committee</span></li>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="a">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">Meets only a few
times a year</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Limited number of senior
executives</span></li>
</ol>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Data Governance Committees</span></li>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="a">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">Each committee is
chaired by Data Goverenance</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Divided into specific
domains </span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Meets as often as required
by projects</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Up to 15 people on each
Domain Specific Committee.</span></li>
</ol>
</ol>
<div style="font-family: Calibri; font-size: 11.0pt; margin-left: .75in; margin: 0in;">
<br /></div>
<div style="color: #2e75b5; font-family: Calibri; font-size: 14.0pt; margin: 0in;">
Data
Steward Roles</div>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Executive Stewards</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Member of the Data
Governance Executive Committee</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Strategic Direction</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Authority</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Enterprise Stewards</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Member of the Data
Governance Committee</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Development Support of Data
Governance Policies</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Operational Stewards</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Communicates to promote
policies</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Endorses Data Standards</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Domain Stewards</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Recommends canonical
structures</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Endorses Data Standards</span></li>
</ul>
</ul>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
However the model is
constructed, it must make sense for the business. All new policies should be
considered from a Cost / Benefit analysis, the exception is with regulatory
requirements. Regulatory and Legal compliance are critical to avoid jail. </div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #1e4e79; font-family: Calibri; font-size: 16.0pt; margin: 0in;">
Policies</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
The goal of policies
is to drive behavior changes needed for Enterprise Information Management to
succeed.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
People generally
don't like change, a way to get buy-in is to amend existing processes instead
of creating new ones. These are smaller, less intrusive and stakeholders have
already been identified. This also leads to more partnerships and shared
endorsements of the changes.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
There are typically
a small number of extremely high value areas, policy should focus upon those.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Align with the
Enterprise goals and other Enterprise ranging groups, there are a lot of shared
concerns and ways that the teams can support each other.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Keep the policies
easily accessible, do not hide them in a 500 page volume, instead keep them
somewhere easily discoverable, such as a wiki on the corporate intranet.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Policies that are
overly broad and not enforceable can quickly cause legal / compliance problems.
In those cases, no policy is a better choice than an unenforceable one.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #2e75b5; font-family: Calibri; font-size: 14.0pt; margin: 0in;">
Data
Classification</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
The source and type
of data both define the data classification required. Similarly, data from
several, more open sources can be combined to escalate the protection required.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Restricted</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Financial information, such
as credit cards</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Protected</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Regulated information, such
as HIPPA data</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Private</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Named Persons</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Internal</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Business Data</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Public</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Everything Else</span></li>
</ul>
</ul>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
The data
classification levels are combined with Information Security levels for systems
to identify where the data is able to be transmitted.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #2e75b5; font-family: Calibri; font-size: 14.0pt; margin: 0in;">
Anonymizing
Data</div>
<div style="color: #5b9bd5; font-family: Calibri; font-size: 12.0pt; margin: 0in;">
Safe
Harbor</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Remove 18
identifying attributes from the data, which renders it fairly useless.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #5b9bd5; font-family: Calibri; font-size: 12.0pt; margin: 0in;">
Expert
Determination</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Expert certifies
that the data available is too low of a probability to re-identify an
individual.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #5b9bd5; font-family: Calibri; font-size: 12.0pt; margin: 0in;">
<span style="font-style: italic;">Automation</span></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
There are tools out
there, such as Parit(?) that are capable of doing this automatically after a
survey and analysis of the data.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #1e4e79; font-family: Calibri; font-size: 16.0pt; margin: 0in;">
Standards
Examples</div>
<div style="color: #5b9bd5; font-family: Calibri; font-size: 12.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
USPS Publication 28
specifies international address requirements</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
ISO15836 standards
for tagging unstructured documents</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
ISO/IEC 11179 -4 and
-5 has naming standards for business metadata</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
</div>
</div>
</div>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com1tag:blogger.com,1999:blog-5298882499718010482.post-53216834572742450282015-03-29T22:11:00.002-04:002015-03-29T22:11:56.950-04:00EDW 2015: Data Strategy
<div style="border-width: 100%; direction: ltr;">
<div style="direction: ltr; margin-left: 0in; margin-top: 0in; width: 5.95in;">
<div style="direction: ltr; margin-left: 0in; margin-top: 0in; width: 5.95in;">
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
This session was
presented by Lewis Broome (from Data Blueprint) and Brian Cassel (from the
Massey Cancer Center). </div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Lewis presented a
his strategic model and roadmap using the case study of a logistics company
implementing an ambitious project.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Brian spoke about
the challenges he faced within the cancer center as he implemented analytics
across data hidden in silos. This was primarily culture based but once he was
past that he was able to use the existing Data Analytics hub to build a
specialized data mart to support strategic review of the data.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
This was really great stuff and my summary doesn't do it justice by a long shot.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
</div>
<div style="color: #1e4e79; font-family: Calibri; font-size: 16.0pt; margin: 0in;">
Data
Strategy</div>
<div style="color: #2e75b5; font-family: Calibri; font-size: 14.0pt; margin: 0in;">
Business
Needs</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
In order to get
anywhere with discussions about data and mays to improve it throughout the
organization, the value of the effort has to be made clear. Clean data may seem
like the most obvious need in the world, but that view is too low level to make
it on to the radar of senior management. Instead, it needs to clearly address a
business need.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
There are three
aspects to consider</div>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">How will mesh with the
company Mission and Brand Promises?</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Ex. FedEx: Your package will
get there overnight. Guaranteed.</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Does it improve the company's
market position / provide a competitive advantage?</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Michael Porter's Market
Positioning Framework and his Competitive Advantage Framework provide a
good way to think about this.</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Will it improve the operating
model and support the company's objectives?</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Operating models improve by
changing the degree of business integration or standardization.</span></li>
</ul>
</ul>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
If the data changes
do not address any of these areas, it will not gain the support needed to
succeed.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<span style="font-weight: bold;">New capabilities that do not meet a business need
aren't a program, they are a </span><span style="background: yellow; font-weight: bold; mso-highlight: yellow;">science project</span><span style="font-weight: bold;">.</span></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #2e75b5; font-family: Calibri; font-size: 14.0pt; margin: 0in;">
Current
State of the Business</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
The current state
assessment looks at</div>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Existing Assets and
Capabilities</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Gaps in Assets and
Capabilities</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt; font-style: italic;">Constraints
and Interdependencies</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">This can be the toughest
stuff to identify.</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt; font-weight: bold;">BEWARE
SHADOW SYSTEMS</span><span style="font-family: Calibri; font-size: 11.0pt;">
typically excel spreadsheets with macros or access data fixing before
feeding it into the next step of a process.</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Cultural Readiness</span></li>
</ul>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #5b9bd5; font-family: Calibri; font-size: 12.0pt; margin: 0in;">
Cultural
Readiness</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Cultural Readiness
depends on 5 different areas</div>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Vision</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">A clear message of what the
program is expected to achieve</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Skills</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Ensure that the right people
are part of the program</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Incentive</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">The value and importance of
the program should be clear to all of the participants</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Resources</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Backing the program will
require more than just good will, tools, environments and training may
all be required</span></li>
</ul>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Action Plan</span></li>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="circle">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">The system boundaries being
developed should be clearly defined</span></li>
</ul>
</ul>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #5b9bd5; font-family: Calibri; font-size: 12.0pt; margin: 0in;">
Capability
Maturity Model Levels</div>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="1">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">Starting point</span></li>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="a">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">There is some
data in a pile over there.</span></li>
</ol>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Repeatable Process</span></li>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="a">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">This is how we
sweep the data into a pile and remove the bits of junk we find.</span></li>
</ol>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Defined Process</span></li>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="a">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">Sweep from left
to right, avoid the dead bugs. Leave data in a pile.</span></li>
</ol>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Managed Process</span></li>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="a">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">The entire team
has the same brooms and the dead bugs are highlighted and automatically
avoided by the brooms.</span></li>
</ol>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Optimizing</span></li>
<ol style="direction: ltr; font-family: Calibri; font-size: 11.0pt; font-style: normal; font-weight: normal; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="a">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;" value="1"><span style="font-family: Calibri; font-family: Calibri; font-size: 11.0pt; font-size: 11.0pt; font-style: normal; font-weight: normal;">Maybe we can add
rules to avoid sweeping twigs into the pile as well.</span></li>
</ol>
</ol>
<div style="font-family: Calibri; font-size: 11.0pt; margin-left: .75in; margin: 0in;">
<br /></div>
<div style="color: #1e4e79; font-family: Calibri; font-size: 16.0pt; margin: 0in;">
Roadmap</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
The Roadmap
establishes the path of the Data Management Program to achieve the strategic
goals.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="direction: ltr;">
<table border="1" cellpadding="0" cellspacing="0" style="border-collapse: collapse; border-color: #A3A3A3; border-style: solid; border-width: 1pt; direction: ltr;" valign="top">
<tbody>
<tr>
<td style="border-color: #A3A3A3; border-style: solid; border-width: 1pt; padding: 2.0pt 3.0pt 2.0pt 3.0pt; vertical-align: top; width: 1.7375in;">
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Leadership and
Planning</div>
</td>
<td style="border-color: #A3A3A3; border-style: solid; border-width: 1pt; padding: 2.0pt 3.0pt 2.0pt 3.0pt; vertical-align: top; width: 3.0701in;">
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .2055in; margin-top: 0in; unicode-bidi: embed;"><ul style="direction: ltr; margin-bottom: 0in; margin-left: 0in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Planning and Business
Strategy Alignment</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Program Management</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Clearly Defined
Imperatives, Tactics and KPI</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Accountable to CDO</span></li>
</ul>
</ul>
</td>
</tr>
<tr>
<td style="border-color: #A3A3A3; border-style: solid; border-width: 1pt; padding: 2.0pt 3.0pt 2.0pt 3.0pt; vertical-align: top; width: 1.7375in;">
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Project
Development</div>
</td>
<td style="border-color: #A3A3A3; border-style: solid; border-width: 1pt; padding: 2.0pt 3.0pt 2.0pt 3.0pt; vertical-align: top; width: 3.0701in;">
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .2055in; margin-top: 0in; unicode-bidi: embed;"><ul style="direction: ltr; margin-bottom: 0in; margin-left: 0in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Outcome Based Targets</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Business Case and Project
Scope</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Program Execution</span></li>
</ul>
</ul>
</td>
</tr>
</tbody></table>
</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="color: #2e75b5; font-family: Calibri; font-size: 14.0pt; margin: 0in;">
Project
Model</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Big Projects tend to
fail, at least twice sometimes more than that as the business learns what it
really needs.</div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
<br /></div>
<div style="font-family: Calibri; font-size: 11.0pt; margin: 0in;">
Always start with
crawling and walking before going to running.</div>
<ul style="direction: ltr; margin-bottom: 0in; margin-left: .375in; margin-top: 0in; unicode-bidi: embed;" type="disc">
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Governance should start with
a small 'g', where it matters most. There are commonly 5-10 critical data
elements, take care of those before setting targets higher.</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Data Strategy as top-down
approach works best. Otherwise it is uncoordinated and is only capable of
supporting tactical initiatives.</span></li>
<li style="margin-bottom: 0; margin-top: 0; vertical-align: middle;"><span style="font-family: Calibri; font-size: 11.0pt;">Data Architecture must focus
on the business needs, not individual systems or applications.</span></li>
</ul>
</div>
</div>
</div>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com3tag:blogger.com,1999:blog-5298882499718010482.post-52168300514657805382015-03-28T19:54:00.002-04:002015-03-28T19:54:46.070-04:00Questions and where to seek answersThere always seems to be an unending stream of questions.<br />
<br />
In the past, a good number of them were about things like "How can we use this shiny new feature?" "What are the best practices for these scenarios?".<br />
<br />
However, times change and so do the questions.<br />
<br />
Microsoft has been following the lead of the SQL Server community and steadily become more open with road maps and dialogs. We understand the platform and internals better than ever before, making it easier to address concerns.<br />
<br />
<br />
Now the questions that need to be addressed are more strategic than tactical. Strategies, by their very nature are more challenging to find peers to discuss success and (more importantly) failures in plans and their implementations.<br />
<br />
I'm attending Enterprise Data World this year to have a chance to discuss ideas, strategies and technologies to bring new insights and make things run better than ever.<br />
<br />
This will be fun!Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com1tag:blogger.com,1999:blog-5298882499718010482.post-31466764786128595702012-11-10T07:37:00.002-05:002012-11-10T07:37:44.105-05:00PASS Summit Day 3<br />
The last day of PASS is pretty exhausting. My head was still spinning with ideas from the previous days and the tracks were a bit less balanced. To let the ideas settle in I investigated some different areas.<br />
<br />
<h3>
Hadoop</h3>
Microsoft is really getting into Hadoop in a big way. I am glad that they have partnered with Hortonworks (the orginal Yahoo Hadoop team) for the Windows implementation. Microsoft is making a serious attempt to be a good opensource citizen and is actively contributing back to the Apache project. Their Hadoop product now has a name, HDInsight and some interesting characteristics, such as:<br />
<br />
<br />
<ul>
<li>Available via NuGet</li>
<li>Web Management console</li>
<li>Map-Reduce can be written in C# and JavaScript</li>
<li>ODBC Driver to connect to Hive</li>
<ul>
<li>Normal .Net</li>
<li>LINQ </li>
<li>Excel (with a friendly plug-in!)</li>
</ul>
</ul>
<br />
<br />
There are two versions announced<br />
<br />
<ul>
<li>Azure Edition</li>
<ul>
<li>This version actually substitutes the Azure BlobStore for HDFS. Presumably this was done because they share many characteristics but the BlobStore is already tuned for the MS Datacenters and has presumably avoided the NameNode issue somehow.</li>
<li>A popular model is to gather data (such as web or application logs) into the blobstore and on a regular schedule spin up a large HDInsight cluster, run the analysis and then spin it back down. This technique is used by Halo 4.</li>
</ul>
<li>On-Premises Edition</li>
<ul>
<li>This sounds as though it is targeted at preconfigured Hyper-V Virtual Machines (I may be way off here).</li>
<li>The filesystem is HDFS, not NTFS or version of the BlobStore</li>
<li>Enterprise infratructure support coming</li>
<ul>
<li>Microsoft Virtual Machine Manager</li>
<li>System Center HDInsight Management Pack</li>
<li>Active Directory for unified authentication</li>
</ul>
</ul>
</ul>
<br />
<br />
Additional features (maybe)<br />
<br />
<ul>
<li>Scheduled</li>
<ul>
<li>Oozi Worksflow Scheduler</li>
<li>Mahout</li>
</ul>
<li>UnScheduled (but popular requests)</li>
<ul>
<li>SSIS Integration</li>
<li>SQL Server Management Studio</li>
</ul>
</ul>
<br />
<br />
<h3>
Azure</h3>
The Azure data integration story continues to improve. The model to move data is either using SSIS or Data Sync. However, the SSIS development best practices are a little bit different. To transfer data efficiently there are a combination of things to do:<br />
<br />
<ol>
<li>Compress your data being sent over the network</li>
<li>Parallelism is good but adding thread will reduce the transfer time</li>
<li>Robustness must be maintained by the package. A "blip" in th connection cannot force the entire operation to restart. This is long and painful process and should be able to pickup where the blip happened. This is achieved by using a loop (with sleep) to keep retrying and sending small sets into staging tables and then pushing loaded sets from staging into final tables.</li>
</ol>
<br />
<br />
<h3>
Encryption</h3>
Encryption is becoming more of an issue as PII regulations become more prominent. Some things to keep in mind when planning security<br />
<br />
<ul>
<li>Always salt the hash and encryption. </li>
<li>Be aware of CPU scaling concerns</li>
<li>Encryption makes everything bigger</li>
<li>Only 4000 characters at a time work with hashing and encryption functions built into SQL Server.</li>
<li>Encrypted and hashed values will not compress well</li>
</ul>
<br />
<br />
What types on encryption should be considered?<br />
<br />
<ul>
<li>Field Level</li>
<li>Data at rest (on disk or on backup)</li>
<li>Connection Level</li>
<li>SAN level</li>
</ul>
<br />
<br />
<h4>
Field Level</h4>
At the field level this can get complex because it is either managed in the DB or the Application. Each has tradeoffs<br />
<br />
<ul>
<li>The DB is hard to scale out by adding nodes if it is responsible for all encryption. </li>
<li>The DB could manage to invisibly decrypt columns based on account permissions</li>
<li>The application would have to wrap all sensitive data with encrypt / decrypt logic if it was responsible for doing it.</li>
<li>The application can scale out to additional servers if needed to distribute the CPU load.</li>
</ul>
<br />
<br />
<h4>
Data at Rest</h4>
When the data is at rest the database can encrypt it with minimal overhead using Transparent Data Encryption (TDE). This does mean that data is not encrypted in memory so an attacker could view data sitting through an exploit. TDE does do some changes silently. Instant File Initialization becomes disabled as soon as TDE is initiated. The shared database TEMPDB also must become encrypted.<br />
<br />
Backup encryption is critical because there is no telling what will happen when tapes leave the security of a private datacenter and goes touring through the outside world.<br />
<br />
<h4>
Connection Level</h4>
There are two ways to make sure you connection is encrypted. A simple one to setup is having Active Directory enact a policy requiring IPSec for all connections to the DB Server. The other one is to use certificates to establish the classic SSL connection encryption. This can be controlled at the connection string level so you can configure it for only a subset of the traffic and reduce the CPU load.<br />
<br />
<h4>
SAN </h4>
SAN Level Encryption (MPIO) is fascinating because it will stop the LUN from being restored to a machine without the proper certificates. This tends to require a lot of CPU resources but it is possible to offload that encryption work to special (such as the Emulex HPA) that can be integrated into the RSA Key Manager.<br />
<br />
<h3>
DQS</h3>
I did discover something a bit disappointing about DQS today. There are curerntly no matching services available. That means that there are no internal web services for identifying matches nor are they any SSIS operations available. Matching work can ONLY be done through the client GUI. Without automation this will have a real challenge working with some types of data..<br />
<br />
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com4tag:blogger.com,1999:blog-5298882499718010482.post-29267398752772399692012-11-09T00:40:00.000-05:002012-11-09T00:40:05.119-05:00PASS Summit Day 2Another great day filled with learning new techniques and discussing different challenges. There was more of a focus on Query optimization than anything else today. After the sessions We all went to the Experience Music Project where there was food, live band karaoke and a Dalek!<div>
<br /></div>
<div>
Great fun all around, I just need to get back to a normal sleeping schedule, this 4 hours a night thing is exhausting.</div>
<div>
<br /></div>
<div>
Technical notes below!</div>
<div>
<br /></div>
<div>
<div>
The session that Adam Machanic did about forcing better parallel execution plans was astounding. He took a 28 minute query and optimized it to about 3 seconds and by improving the parallelization. </div>
<div>
<br /></div>
<h3>
The Optimizer Issue</h3>
<div>
The premise is that optimizer is outdated because the Memory to Data Volume ratio (size or speed) is nothing like the way it was 1997 (when the last big optimizer rewrite happened). This means the optimizer needs to be coaxed into doing the right thing. </div>
<div>
<br /></div>
<div>
The optimizer tends to serialize a lot of processing for a number of reasons that may not make sense now. The problem with serializing all of these processes is that Amdahl's Law (not theory, LAW) says that the number of serial processes substantially impacts performance as the number of processors increases. The goal is to make as many parallel as possible. The reason the optimizer does not default to this behaviour is because it is expecting greater disk latency than we have now that would have rendered this point moot.</div>
<div>
<br /></div>
<h3>
Possible Approach</h3>
<div>
An example of a common item that inhibit can parallelism is the Scalar Function. Potential solutions to this are: </div>
<div>
<ul>
<li>Convert it to an in-line TVF and CROSS APPLY it. </li>
<li>Create a CLR scalar function (weird but true, it works much faster)</li>
</ul>
</div>
<div>
<br /></div>
<div>
To check how balanced the parallelism is you can open the properties of any step of the query plan and check view the actual number of rows returned for each thread (there is a little arrow thing hiding there).</div>
<div>
<br /></div>
<div>
There is a traceflag that will force the optimizer to try to make some part of query parallel (Trace Flag 8649) but this is not a good answer. The issue is that it doesn't ensure that you are actually removing the serial parts to balance the load.</div>
<div>
<br /></div>
<div>
One way to encourage the optimizer to seek the parallelism path without the Trace Flag is to use a TOP(@i) with an OPTION(OPTIMIZE FOR (@i = 2000000000)). This will make the optimizer expect 2 billion rows even if there are only 10,000 in the table. This will top the estimated cost calculate to the side of making the query parallel.</div>
<div>
<br /></div>
<div>
MAXDOP should be configured to less than the number of processing cores for each numa node. This avoids making for foreign memory requests which may be slow. It is ok to go higher, but this is a very sensible starting spot, be sure to test carefully.</div>
<div>
</div>
<h3>
Parallel Apply Pattern</h3>
<div>
The Parallel Apply Pattern can be used to enourage working with an managing efficient streams with the optimizer. The abstract form of this query is:</div>
<div>
<br /></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">SELECT</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> myfields</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FROM</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> [DRIVER TABLE]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> CROSS APPLY</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> (</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> [CORRELATED PAYLOAD]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> )</span></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
To execute this pattern the DRIVER of your query must be identified, for example is it really just calculating information about product stock levels? Then the driver table would be Product and the CORRELATED PAYLOAD would be the sophisticated joins, aggregations and calculations. </div>
<div>
<br /></div>
<div>
Roughly speaking, for each record in the DRIVER TABLE the CORRELATED PAYLOAD executes all of it's heavy set based work as a smaller piece rather than against the full DRIVER TABLE set. This is more efficient for parallel loads.</div>
<div>
<br /></div>
<div>
DRIVER TABLE issues</div>
<div>
<br /></div>
<div>
If the driver table is not going parallel there are a couple of choices after turning it into a derived table</div>
<div>
1. Use the TOP techique mentioned above. The optimizer will think there are more rows.</div>
<div>
2. Use a DISTINCT 0%ColA + ColA calculation in the select list to prevent a calculation from being extracted from the parallel stream.</div>
<div>
<br /></div>
<div>
CORRELATED PAYLOAD issues</div>
<div>
<br /></div>
<div>
If the CORRELATED PAYLOAD is still having it's calculations pulled away from the parallel stream that change the CROSS APPLY to an OUTER APPLY. This prevents the optimizer from knowing if it is safe to pull the calculations up (as a poor optimization choice).</div>
</div>
<div>
<br /></div>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com3tag:blogger.com,1999:blog-5298882499718010482.post-45405021969225983802012-11-08T09:22:00.004-05:002012-11-09T00:43:10.244-05:00PASS Summit Day 1<br />
Summit officially began today. I had a great time meeting new people, talking to vendors and learning stuff.<br />
<br />
The big things today were they Keynote, Extended Events, Enterprise Information Management and Performance Tuning.<br />
<br />
MS is clearly taking the threat from SAP's HANA seriously with making a new in-memory storage engine fr SQL Server. It will be in the next major release and is expected to speed up existing applications substantially.<br />
<br />
Extended Events are interesting and really are complex enough that there would be a real benefit from having an actual book assembled about them. It is replacing a very simple tool (Profiler) with a very complex and powerful one. I learned just enough to know that I need to send a lot of quality time to understand it as much as I'd like.<br />
<br />
So far it looks like Microsoft is taking the approach of going after the smaller, simpler data to Master instead of trying to battle in the arena of CDI or PIM where there are large established player. This follows the approach MS has had for a while. Something like Informatica is just overkill to manage Jill's spreadsheet that everyone shares, but MDS is perfect for managing and distributing the information throughout the company.<br />
<br />
Some of my notes from today are below:<br />
<br />
<h3>
Keynote</h3>
MS Announced Hekaton : An in-memory engine for SQL Server<br />
<br />
<ul>
<li>There will be an adviser on when to use it</li>
<li>Live as part of the DBMS server as a different storage engine. Heterogeneous</li>
<li>Lock and latch free</li>
<li>The storage engine for tables can be changed to move them in and out of memory. If the table no longer fits it switches to the conventional model. Because this can be done seamlessly the data is likely persisted to disk in the conventional DBMS way.</li>
<ul>
<li>Exadata uses flash as a sort of cache fusion</li>
<li>HANA is designed as Memory primary and disk secondary so has huge performance drops when memory limits are exceeded.</li>
</ul>
<li>Stored procedure optimization compiled to native code</li>
<li>Included in the next version of SQL Server</li>
</ul>
<br />
<br />
Parallel Data Warehouse<br />
* Polybase allows seems connection to Hadoop / Hive and SQL Server DB to allow a single point of federated querying<br />
<br />
<h3>
Extended Events</h3>
SQL Trace and Profiler are deprecated. With SQL 2012 every traceable is now available in Extended Events<br />
<br />
Extended events are much lighter than Profiler (similar to trace) of the previous options with support for multiple targets. http://www.sqlperformance.com/2012/10/sql-trace/observer-overhead-trace-extended-events<br />
The available targets are:<br />
<br />
<ul>
<li>Ringbuffer</li>
<li>Event Bucketing (Bucketizer) is a specialized ringbuffer, storing aggregates</li>
<li>Event Pairing allows the tracking of things like everything happening between lock_acquired and lock_released</li>
<li>Event file (xml)</li>
<li>Synchronous Event Counter only tracks number of times and event fired, no details</li>
</ul>
<br />
<br />
The SQL Server audit events have been merged into the normal extended events.<br />
<br />
System Overhead (as tested by SQLCAT)<br />
<br />
<ul>
<li>Query Time increases by 2 microseconds. As more actions are tracked this does increase.</li>
<li>Memory increases by 4MB for the RingBuffer</li>
<li>Disk storage is more verbose, always try to avoid logging to the Data / Log volumes</li>
<li>If the event tracking causes too much of a slowdown, it will be automatically disconnected until it has less impact on performance.</li>
</ul>
<br />
<br />
In general the learning curve is steeper but you can do more with Extended Events than with Profiler.<br />
<br />
The GUI is not as nice and profiler and there is a larger learning curve. Also the SQL 2012 GUI does not work with SQL 2008.<br />
<br />
Event information is recorded in an XML structure that is fairly consistent. Use XQuery to extract the results in a tabular form.<br />
<br />
One mental model for understanding the different parts of the setup is to think of it like an insert statement<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">INSERT INTO [TARGET]<target></target></span><br />
<span style="font-family: Courier New, Courier, monospace;">SELECT [EVENT DATA]<event data="data"></event></span><br />
<span style="font-family: Courier New, Courier, monospace;">FROM [EVENT]<event></event></span><br />
<span style="font-family: Courier New, Courier, monospace;">LEFT OUTER JOIN [ACTION]<action></action></span><br />
<span style="font-family: Courier New, Courier, monospace;">WHERE [PREDICATE]<predicate></predicate></span><br />
<br />
The predicate will short circuit, this is means that Criteria should be placed with the most restrictive rules first!<br />
<br />
More good stuff on Jonathan Kehayias's <a href="http://sqlskills.com/blogs/jonathan/category/Extended-Events.aspx" target="_blank">blog </a><br />
<br />
Because traces are very light they can run in the background. For example, the old SQL Trace Blackbox has returned as an Extended Event session called "system_health" and is visible under: Server>Management>Extended Events>Sessions and is using two targets, an event file and the ring_buffer.<br />
<br />
<h3>
Performance Tuning</h3>
5 Mains areas to consider when there is a problem<br />
<br />
<ul>
<li>IO</li>
<li>CPU</li>
<li>Memory</li>
<li>Network</li>
<li>Locks and Blocks</li>
</ul>
<br />
<br />
Activity Monitor is MUCH better than it was in SQL 2005. Most of the common troubleshooting DMVs are presented there.<br />
<br />
The only system process with a SPID > 50 is... SERVICE BROKER! That is a feature that really gets no respect and it turns out the other system process don't want to be associated with it either<br />
<br />
When a WAIT x:y:z is displayed, the variables represent<br />
<br />
<ul>
<li>x : DBID</li>
<li>y : File ID</li>
<li>z : Page</li>
<ul>
<li>page 1 contains the page free space bitmap, contnention here is typically for heaps (most temp tables and table variables are heaps). Scalar and non-inlined TVFs can abuse this by creating and destroying objects swiftly.</li>
</ul>
</ul>
<br />
<div>
<br /></div>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com3tag:blogger.com,1999:blog-5298882499718010482.post-64387886361405064792012-11-07T09:00:00.001-05:002012-11-07T09:00:39.107-05:00Pre-Con Day 2 T-SQL Querying and Query Tuning Enhancements in SQL ServerThis was the second (and last) day of the pre-conference events. I selected the T-SQL session for two reasons, I know that there were a lot of enhancements to core and it was being given by Itzik Ben-Gan, who is a SQL Master (he literally wrote the book on T-SQL query techniques).<br />
<br />
I was a bit concerned because he enumerated the items he wanted to cover and I've used nearly all of them so I didn't expect to learn much. Much like yesterday, I was wrong.<br />
<br />
There were two areas that I struggled to keep up with him during the session, those were the Aggregate Windowing (which I never used) and all the crazy ways you can use the APPLY operator. The rest of the session was good but I kept coming back to (and testing out) those two areas.<br />
<br />
I didn't get much of a chance for socializing because I was tinkering with the new stuff I learned whenever I had a few free minutes and I did something bad to my ankle so I just hung around my room for the evening.<br />
<br />
Below are my notes from this session:<br />
<br />
<br />
<h2>
T-SQL Querying and Query Tuning Enhancements in SQL Server</h2>
<br />
<h3>
Extended Events Change</h3>
This was the last thing we discussed but it is SO important that I need to talk about it first. Extended Events have anew Global Field called Query Hash. This is a hash of the auto-paramterized query, capturing this allows the detection of all sorts of performance problems. All of the tiny, fast queries that execute ALL the time can be evaluated to see if your system is suffering the "death by a million cuts". This is HUGE.<br />
<br />
<h3>
CTE</h3>
CTEs join the rank of derived tables, views and table valued functions as a way to make the query syntax cleaner. By themselves, all of these options generate the exact same query plan and do nothing improve or degrade query performance. The biggest immediate benefit to non-recursive CTEs is the improved legibility they gain by allowing query reuse.<br />
<br />
CTEs are evaluated Top to Bottom. This allows the first query to be referenced by the second query, Both the first and second queries could be referneced by the third query, etc.<br />
<br />
To improve performance consider using something to actually materialize the data and reference that single result set. Examples of this would be actual tables, temp tables and table variables.<br />
<br />
Recursive CTEs do not follow the rules mentioned above, instead they need to keep a running tally of the data. It does not persist it to a normal table but instead creates a B-Tree structure for the data on disk that has been optimized for the Recursive CTE's usage patterns.<br />
<br />
<h3>
Windowing Functions</h3>
Craziness. Absolute, amazing crazy that has been hiding from me since SQL 2005 that can be applied to any aggregate (but works best with cumulative aggregates).<br />
<br />
This is work applied to the result set.<br />
<br />
The Windowing Functions are made up of a bunch of different parts<br />
<br />
<ul>
<li>OVER : Establishes the window of data over the result sets</li>
<li>PARTITION BY : Defines the size of the sliding window by referencing what columns cause the aggregate to reset. (Note - FILTER operator is not supported, which would be an explicit where to filter the partition)</li>
<li>ORDER BY : Sorting sequence to be used when calculating aggregates. This important for things like running totals.</li>
<li>Window Frame : </li>
<ul>
<li>ROWS BETWEEN ... AND ...</li>
<ul>
<li>Specify the number of result rows to consider, regardless of missing data </li>
<li>Duplicate ORDER BY values are calculated individually</li>
<li>Always Specify Row, it is not the default (Damn you ISO SQL Standards!) but it is much more efficient</li>
<li>UNBOUNDED PRECEDING : From the beginning of the Partition (Default)</li>
<li>UNBOUNDED FOLLOWING : To the end of the Partition</li>
<li>n PRECEDING : From the beginning of the Partition</li>
<li>n FOLLOWING : To the end of the Partition</li>
<li>CURRENT ROW : To the current row (think of a running total) (Default)</li>
</ul>
<li>RANGE BETWEEN</li>
<ul>
<li>Specify the value range to consider, this is not fully supported in SQL 2012 </li>
<li>Duplicate ORDER BY values are grouped together for the aggregate calculation</li>
<li>Cannot tell the difference between? </li>
<li>Specify the number of result rows to consider (may skip missing values)</li>
<li>Use with extreme caution, it is the default when there is a partition</li>
<li>UNBOUNDED PRECEDING : From the beginning of the Partition (Default)</li>
<li>CURRENT ROW : To the current row (think of a running total) (Default)</li>
</ul>
</ul>
</ul>
<br />
<br />
<h4>
Performance</h4>
This is NOT syntactic sugar. The query plan has a new operation called Window Spool which allows a "Fast Track" of results (for Linear Scaling) under two very common sets of conditions.<br />
<br />
<ol>
<li>"BETWEEN Unbounded Preceding" looks only at directly previous row instead of maintaining a full list for each record. </li>
<li>The aggregate is a Cumulative Aggregate (such as SUM or COUNT, not a MIN or MAX) it is calculated by using the "Unbounded Preceding" behind the scenes and doing appropriate.</li>
</ol>
<br />
<br />
If the Fast-Track conditions are <b>not </b>met it turns from a 2n cost to an n^2 cost.<br />
<br />
<br />
For indexing the window aggregate a POC pattern should be followed.<br />
<br />
<ul>
<li>P: Partition (Quirk: Is nearly always ASC unless the same column is in the order by as DESC)</li>
<li>O: Order By</li>
<li>C: Coverage (can be in the INCLUDE)</li>
</ul>
<br />
<br />
To check for performance issues, use Extended Events or Statistics IO (which show the worktable)<br />
<br />
<h4>
New SQL 2012 Window Aggregates</h4>
<br />
<ul>
<li>LAG : Element from previous row in the Partition</li>
<li>LEAD : Element from next row in the Partition</li>
<li>FIRST_VALUE : Element from the FIRST row in the FRAME</li>
<ul>
<li>Typically the frame and partition are the same when using FIRST_VALUE</li>
</ul>
<li>LAST_VALUE : Element from the LAST row in the FRAME</li>
<ul>
<li>Typically the frame and partition are the same when using LAST_VALUE</li>
</ul>
</ul>
<br />
<br />
<h3>
Paging</h3>
The new Offset - Fetch syntax was added as an extension of the ORDER BY clause. The implementation is not SQL Compliant yet because it is missing PERCENTAGE and WITH TIES options still<br />
<br />
<h3>
Performance</h3>
It performs similar to TOP with a discard of the first X rows (the offset). This is fine for the first few "pages" of data but does not scale deeply. To scale it, use a CTE on an indexed column and join to that, such as:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">WITH Keys AS</span><br />
<span style="font-family: Courier New, Courier, monospace;">(</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>SELECT orderid</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>FROM dbo.orders</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>ORDER BY orderid</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>OFFSET (@pagenum - 1) * @pagesize ROWS</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>FETCH NEXT @pagesize ROWS ONLY</span><br />
<span style="font-family: Courier New, Courier, monospace;">)</span><br />
<span style="font-family: Courier New, Courier, monospace;">SELECT</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>K.orderid</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>, O.orderdate</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>, O.custid</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>, O.empid</span><br />
<span style="font-family: Courier New, Courier, monospace;">FROM</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Keys as K</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>INNER JOIN Orders as O</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>ON O.orderid = K.orderid</span><br />
<span style="font-family: Courier New, Courier, monospace;">ORDER BY K.orderid</span><br />
<br />
<h3>
PIVOT / UNPIVOT</h3>
This is not part of the SQL Standard. These are syntactical sugar for some common pivot requirements. As a pivot becomes more complex (such as multiple columns) they break down and require a more sophisticated model.<br />
<br />
An efficient UNPIVOT can be created with the CROSS APPLY using a VALUES clause (which creates a derived table behind the scenes). This method uses VALUES as a constant scan which feeds the stream aggregate avoiding any worktables.<br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">CREATE TABLE #t (id int , [2008] int, [2009] int , [2010] int, [2011] int)</span><br />
<span style="font-family: Courier New, Courier, monospace;">;</span><br />
<span style="font-family: Courier New, Courier, monospace;">INSERT into #t</span><br />
<span style="font-family: Courier New, Courier, monospace;">values </span><br />
<span style="font-family: Courier New, Courier, monospace;">(1, 100, 101,102,104)</span><br />
<span style="font-family: Courier New, Courier, monospace;">,(2, 200, 201,202,204)</span><br />
<span style="font-family: Courier New, Courier, monospace;">,(3, 300, 301,302,304)</span><br />
<span style="font-family: Courier New, Courier, monospace;">,(4, 400, 401,402,404)</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">WITH </span><br />
<span style="font-family: Courier New, Courier, monospace;">u AS </span><br />
<span style="font-family: Courier New, Courier, monospace;">(</span><br />
<span style="font-family: Courier New, Courier, monospace;">SELECT t.id, sum(val) AS total</span><br />
<span style="font-family: Courier New, Courier, monospace;">FROM </span><br />
<span style="font-family: Courier New, Courier, monospace;">#t t</span><br />
<span style="font-family: Courier New, Courier, monospace;">CROSS APPLY</span><br />
<span style="font-family: Courier New, Courier, monospace;"> ( VALUES</span><br />
<span style="font-family: Courier New, Courier, monospace;"> (t.id, 2008, t.[2008])</span><br />
<span style="font-family: Courier New, Courier, monospace;"> , (t.id, 2009, t.[2009])</span><br />
<span style="font-family: Courier New, Courier, monospace;"> , (t.id, 2010, t.[2010])</span><br />
<span style="font-family: Courier New, Courier, monospace;"> , (t.id, 2011, t.[2011])</span><br />
<span style="font-family: Courier New, Courier, monospace;"> ) AS v (id, [year], val)</span><br />
<span style="font-family: Courier New, Courier, monospace;">GROUP BY </span><br />
<span style="font-family: Courier New, Courier, monospace;"> t.id</span><br />
<span style="font-family: Courier New, Courier, monospace;">)</span><br />
<span style="font-family: Courier New, Courier, monospace;">SELECT * FROM #t t inner join u on u.id = t.id</span><br />
<br />
<h3>
Sequences</h3>
The new seqeunce object uses the same internals as the identity feature but is more configurable with different defaults.<br />
<br />
<h4>
Defaults</h4>
<br />
<ul>
<li>MIN value: Most sequences are not user visible values so it starts at the lowest value possible for the data type.</li>
<li>DATATYPE : BIGINT. This is effectively an alternative to using the GUID keys so it should be support a OT of values. BEWARE! This datatype SHOULD match the datatype in the table (especially if being used as a constraint), otherwise there will be a lot of conversions that may not have been expected.</li>
<li>CYCLE : OFF is the default value but the reason it exists is for IDs used in staging data that have a relatively short lifespan to avoid running out of keys.</li>
</ul>
<br />
<br />
<h4>
Extended beyond the SQL Standard</h4>
<br />
<ul>
<li>It is able to be bound as a default constraint on a table. This will help with being able to eliminate the IDENTITY columns without impacting application code or adding triggers (which is the normal model).</li>
<li>There is an "OVER" clause available when generating IDs for a set to ensure the expected sequence.</li>
</ul>
<br />
<br />
<h4>
Notes</h4>
<br />
<ul>
<li>This has all of the dangers on the IDENTITY when it comes to sequential values. The process of getting IDs is outside of the transaction, so once it's been handed off it is gone (unless the sequence cycles).</li>
<li>A single sequence can be used across multiple tables but this does increase the gap likelihood.</li>
</ul>
<br />
<br />
<h3>
Merge</h3>
This is commonly thought of as an UPSERT but it can do a bit more. It compares the source (stg) to the target (active) and will cause different DML actions to be taken.<br />
<br />
The "USING ... ON" clause is used matching the table from the source and left joining to the target. This is not a "WHERE" but left join conditions so it DOES NOT filter the source table. To filter the source table use a CTE as the source.<br />
<br />
MS extended this to support executing an action if the record exists in the Target but not in the Source. This is great when doing full table refreshes.<br />
<br />
<h4>
Tips</h4>
<br />
<ul>
<li>Use a HOLDLOCK to keep the data consistent because the merge executes multiple actions.</li>
<li>Review any Target tables to ensure there are no triggers referrencing @@ROWCOUNT in the beginning. This value is set for the ENTIRE Merge command, not the single operation. Typically I've seen this used to do a quick eject from the trigger or switch the path from row parocessing to set processing. Look at the INSERTED / DELETED tables instead.</li>
<li>Unlike the OUTPUT from an INSERT statement, you can return the any generated values from both the source and target. If you need these values then make a merge with a non-mathcing condition (1=2).</li>
<li>If you only want to update the table when there has been a change there are two approaches, adding each column to the WHEN or adding an exists to the WHEN with an EXCEPT, like this:</li>
</ul>
<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">MERGE INTO dbo.Customers AS TGT</span><br />
<span style="font-family: Courier New, Courier, monospace;">USING dbo.CustomersStage AS SRC</span><br />
<span style="font-family: Courier New, Courier, monospace;"> ON TGT.custid = SRC.custid</span><br />
<span style="font-family: Courier New, Courier, monospace;">WHEN MATCHED AND </span><br />
<span style="font-family: Courier New, Courier, monospace;"> EXISTS ( </span><br />
<span style="font-family: Courier New, Courier, monospace;"> SELECT * FROM CustomersStage AS S WHERE s.custID = src.custid</span><br />
<span style="font-family: Courier New, Courier, monospace;"> EXCEPT</span><br />
<span style="font-family: Courier New, Courier, monospace;"> SELECT * FROM dbo.Customers AS T WHERE t.custid = tgt.custid</span><br />
<span style="font-family: Courier New, Courier, monospace;"> ) THEN</span><br />
<span style="font-family: Courier New, Courier, monospace;"> UPDATE SET</span><br />
<span style="font-family: Courier New, Courier, monospace;"> TGT.companyname = SRC.companyname,</span><br />
<span style="font-family: Courier New, Courier, monospace;"> TGT.phone = SRC.phone,</span><br />
<span style="font-family: Courier New, Courier, monospace;"> TGT.address = SRC.address</span><br />
<br />
<br />
<h3>
Misc cool stuff</h3>
<br />
<ul>
<li>If the goal is to have a non-deterministic sort but an ORDER BY is required, the field list can replaced by a (SELECT NULL)</li>
<li>Left to Right reading of the Execution Plan is how the API works</li>
<li>The lack of a ; after each SQL Statement is officially Deprecated! </li>
<li>Avoid using IIF and CHOOSE statements. They were added as a stopgap to help with conversions from MS Access. Behind the scenes they are converted into CASE statements by the optimizer.</li>
<li>Interesting trick that can be embedded in other queries to get a delimited list:</li>
</ul>
<br />
<span style="font-family: Courier New, Courier, monospace;">SELECT N',' + QUOTENAME(orderyear) AS [text()]</span><br />
<span style="font-family: Courier New, Courier, monospace;"> FROM (SELECT DISTINCT YEAR(orderdate) AS orderyear FROM Sales.Orders) AS Y</span><br />
<span style="font-family: Courier New, Courier, monospace;"> ORDER BY orderyear</span><br />
<span style="font-family: Courier New, Courier, monospace;"> FOR XML PATH('')</span><br />
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com1tag:blogger.com,1999:blog-5298882499718010482.post-6117566453494560642012-11-06T09:43:00.000-05:002012-11-06T09:58:12.043-05:00PreCon Day 1So the conference began. Registration was a snap, I got a nifty backpack that is big enough to hold my workstation (an Elitebook 8560w) and was remarkably comfortable. I don't think it will replace my Tom Bihn bag for travel but it may be my new commute backpack.<br />
<br />
The session I attended today was <b>SQL Server 2012 in a Highly Available World </b>and, while it was not what I expected I found a lot to learn. I was expecting a focus on Availability Groups with maybe a section on Failover Cluster Instances (FCI). Instead this was a much broader session. It started with Log Shipping, Database Mirroring (deprecated in SQL 2012), FCI and then finally Availability Groups. While I didn't want the first bits (Log Shipping has been around forever and Mirroring is going away) it was a good refresher and I learned a lot.<br />
<br />
The best part of today's session was working through a problem that the person next to me was having during the breaks. There were two real questions<br />
<br />
<ol>
<li>Can you mirror upstream from SQL 2008R2 to SQL 2012?</li>
<ul>
<li>It turns out you <a href="http://social.msdn.microsoft.com/Forums/en/sqldisasterrecovery/thread/cf7f96cd-0e46-4ebe-b052-785d740ad160" target="_blank">can</a>, if the patches / services packs are correct. In fact this will be demonstrated later this week in one of the sessions! </li>
</ul>
<li>Can you combine Failover Clusters with Availability Groups?</li>
<ul>
<li>Yes, but it prevents automatic failover between datacenters, this may be fixed in a service pack.</li>
</ul>
</ol>
<div>
I now have an intense need to get VMWare going on my laptop and start messing with cluster configurations.<br />
<br />
My notes from this session are below.</div>
<br />
<br />
<br />
<h2>
PASS Pre-Con 1: SQL Server 2012 in a Highly Available World</h2>
This was an overview of the High Availability options typically used in SQL 2012. Topics ranged from the venerable Log Shipping to the newer and shinier Availability Groups.<br />
<h3>
</h3>
<h3>
<br /></h3>
<h3>
Log Shipping</h3>
What happens when the database is set to "Bulk Logged" mode and log shipping is enabled? It turns out that the data never makes it over to the target system unless it is updated afterward. This is a good example of why data should only be bulk loaded into staging tables, cleaned and moved to final tables using normal, logged commands.<br />
<h4>
</h4>
<h4>
<br /></h4>
<h4>
Test log shipping Disaster Recovery <span style="font-size: x-small;">(with an outage window)</span> </h4>
<br />
<ol>
<li>Server 1: "BACKUP LOG ... NORECOVERY" </li>
<li>Server 2: "RESTORE LOG ... WITH RECOVERY"</li>
<li>Do stuff (nothing bad) and flip it back</li>
<li>Server 2: "BACKUP LOG ... NORECOVERY" </li>
<li>Server 1: "RESTORE LOG ... WITH RECOVERY"</li>
</ol>
This maintains allows you to maintain the log chain and test DR as long as only legitimate transactions happen as part of the test.<br />
<h3>
</h3>
<h3>
<br /></h3>
<h3>
Mirroring</h3>
The reason that this still matters is that is was JUST deprecated. That means it will still be there for at least the next two releases (SQL Server 2014 and 2016 presumably). The last version of SQL Server with mirroring will be supported for at least 10 years after release, that means it will be 2026 before it will be gone.<br />
<h4>
</h4>
<h4>
<br /></h4>
<h4>
Unsupported DB Objects</h4>
<ul>
<li>FileStream </li>
<li>FileTable</li>
</ul>
<h4>
</h4>
<h4>
<br /></h4>
<h4>
Endpoint</h4>
The Log Stream endpoint hangs out on port 5022. This endpoint will ONLY talk to other SQL Servers and is always encrypted, typically by a server signed certificate.<br />
<h4>
</h4>
<h4>
<br /></h4>
<h4>
Concerns</h4>
If everything is working well, Synchronous replication is expected to only add 5ms to the DML commands at distances < 100 km. If the distance is > 100km then Asynchronous Replication is recommended. The Mirroring overhead time is viewable in the Mirroring Monitor (which presumably fronts a DMV).<br />
<br />
The greatest visible lag in mirroring is when the application realizes that it must switch to the secondary node. The application needs to have a connection timeout happen (5 - 10 secs) before flipping to the next server (which will of course work).<br />
<br />
Mirroring is not impacted by single user mode (other than nobody doing transactions) because it is using a log reader and is not "in" the database.<br />
<h3>
</h3>
<h3>
<br /></h3>
<h3>
Always On - Clustering (Local)</h3>
<h4>
</h4>
<h4>
<br /></h4>
<h4>
Cluster Configuration</h4>
The dreaded Hardware Compartibility List (HCL) is gone! The reason for this is that the storage options for the cluster now include network shares (not just iSCSI). Now who in their right mind thinks that network shares over SMB are a good idea for the DB? Well the enhancements to SMB (starting in 2.2 with Win 2008R2 but much improved in 3.0 with Win2012) are pretty amazing. They have removed the overly chatty aspects of the protocol and made it a viable alternative to SAN storage. There is a lot of value going Win2012 instead of Win2008R2 for the SMB 3.0 enhancements.<br />
<br />
Sadly, the validation wizard still needs to execute to make sure that the node can be part of a cluster. The good news is that it is a lot less invasive (doesn't do things like take SAN Luns offline). It is still slow, while a 2 node cluster may take 10 minutes, larger ones can take a weekend or longer.<br />
<br />
When setting up the cluster, if you are not a domain admin you will need to either get access to add computers to AD or get the new cluster name added to the directory before installing.<br />
<br />
The heartbeat for the cluster is using IPv6, if it is not a dedicated connection between the servers the network must be able to accommodate it (it's 2012, probably time to update to IPv6 anyway).<br />
<br />
Windows Core is completely supported for the database engine. An interesting thing about core is that (in case of emergency) you can now activate the full GUI when required and then flip it off again when done. The core instance actually keeps all the DLLs and extra files on disk, but in a packed form that is unpacked (into memory?) when needed.<br />
<br />
<br />
Best Practice tip: Try to have your resource group name match the network group name for your cluster. It will improve manageability a lot.<br />
<div>
<h4>
</h4>
<h4>
<br /></h4>
<h4>
Quorum Modes configuration</h4>
Node Majority Set should be chosen for large clusters (> 2 nodes) with an odd number of nodes<br />
Node and File Share Quorum should be chosen for large clusters that may not have an odd number of nodes<br />
Quorum Drive can be used for 2 node clusters but remember, this is a single point of failure.<br />
Win2012 improvement: Dead Nodes are removed from voting after they are kicked out of the cluster so the quorum requirement is recalculated.</div>
<br />
<h4>
</h4>
<h4>
Database Cluster Configuration</h4>
TempDB can finally use direct storage! This allows the use of local SSD or even Fusion-IO cards (gasp!). This can be a lifesaver in systems with heavy TempDB usage.<br />
<br />
The number of available SQL Server nodes per cluster is getting absurd in Enterprise Edition (but is still stuck at 2 for Std and the new BI Editions). The cap is somewhere around 64, but is really limited by the number of drive letters (26 - A,B,C,D = 22). Technically, this can probably be worked around with mount points but requires further research.<br />
<br />
One thing that seemed to cause confusion was the idea that even named instances can be treated like a default instance if you left it hanging around on port 1433. I've always thought of instance naming as just a human readable shortcut for remembering the port number (I know for other protocols the name is more important but most systems are TCP/IP focused). So, if the instance is locked to port 1433, then when you try to connect without an instance name that is the door it knocks at.<br />
<br />
When reviewing the cluster resource groups make sure that the mount points are explicitly listed and set as proper dependencies, this does not happen automatically at this time. This can be complex for mount points because the label can be a bit obscure, use the Disk ID to match the volume in Disk Management to the one in the Cluster Manager.<br />
<br />
Slipstream installs are now deprecated. I didn't see that one coming. The reason for it is that every patch that comes out need to get an updated slipstream image generated. Now you can store the patches in a centralized location and point the installer at it. It will pickup and apply the patches automatically. This is a big improvement if you haven't automated your image generation yet, otherwise it's a minor change.<br />
<br />
Only the Database Engine, Analysis Services and SQL Agent are cluster aware. The other SQL Server goodies are STILL not cluster aware. This includes the MDM solution (MDS + DQS).<br />
<h3>
</h3>
<h3>
<br /></h3>
<h3>
Always On - Geo Distributed Clusters</h3>
There have always been a LOT of obstacles when doing Geo-Clustering, primarily around moving the data and managing the network configuration. As far as I know, MS is not touching the data side, they are leaving that to the SAN experts for this option. However, MS now has options for Multi-Subnet Failover where the IP Address can be set dynamically based on the available subnets. This feature <i>may</i> stop the network admins from wanting to kill you.<br />
<br />
<h3>
Always On - Availability Groups</h3>
<h4>
</h4>
<h4>
Different from Mirroring</h4>
This is really a fork of the mirroring technology used previously. It is still an encrypted transaction log stream replayed from the primary to the secondaries with the same overhead.<br />
<br />
There can only be a total of 4 replicas of any single availability group.<br />
<br />
<ul>
<li>Synchronous Replicas are limited to 2</li>
<li>Asynchronous Replicas are limited to 4</li>
</ul>
<br />
This means you can have 4 Async Replics or 1 Sync + 3 Async or 2 Sync + 2 Async but NOT 3 Sync + 1 Async.<br />
<br />
Improvements<br />
<br />
<ul>
<li>Read-Only replica (independent statistics kept in tempdb)</li>
<li>Routing based on application intent</li>
<li>Increased number of worker threads</li>
<li>Increased cap from 20 to 200 Replicas / instance </li>
<li>Replication Publishers and Subscribers are supported</li>
<li>Filestream / Filetable</li>
<li>Extended events are supported</li>
</ul>
<br />
Limitation<br />
<br />
<ul>
<li>Cannot be used across domains</li>
<li>Does not support the replication distributor database</li>
<li>No SQL Server Profiler Events are available for Availability Groups</li>
</ul>
<br />
<br />
Fancy stuff. You can set your read-intent route to point at a load balancer that will bounce you between several read-only instances if desired.<br />
<h4>
</h4>
<h4>
<br /></h4>
<h4>
FCI + AG</h4>
The combination of Failover Cluster Instances can be used in conjunction with Availability Groups for a very comprehensive solution (licenses may get pricey though). The sequence for doing this is:<br />
<br />
<ol>
<li>Install Windows Cluster Services</li>
<li>Install SQL Server Cluster</li>
<li>Enable Always On Availability Groups</li>
<li>Create / Restore Databases</li>
<li>Configure Availiability Groups</li>
</ol>
<br />
<br />
SQLCAT has a <a href="http://sqlcat.com/sqlcat/b/whitepapers/archive/2012/06/22/alwayson-architecture-guide-building-a-high-availability-and-disaster-recovery-solution-by-using-failover-cluster-instances-and-availability-groups.aspx" target="_blank">whitepaper </a>on implementing this combined cluster.<br />
<br />
<h4>
Backups</h4>
<br />
<ul>
<li>Full and Differential backups shold be taken from the primary</li>
<li>Transaction Log backups can be taken from the secondary. The info is pushed up to the primary so it knows that the logs have been archived.</li>
<li>Only maintenance plans (and Ola's script) support backup rules so far.</li>
</ul>
<br />
<h4>
Connecting</h4>
<br />
<ul>
<li>An Availability Group Listener is created (more Active Directory fun) and all systems should connect to that name.</li>
<li>The Availability Group Listener is not supported by the SQL Server Browser and is recommended to live on port 1433 to avoid confusion.</li>
<li>If there is a read intent connection and no secondaries are available to service the request it will execute on the primary.</li>
</ul>
<br />Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0tag:blogger.com,1999:blog-5298882499718010482.post-23146545474082388962012-11-05T08:31:00.001-05:002012-11-05T08:31:15.208-05:00PASS Summit 2012 Pre-PreConI flew into Seattle last night and I'm here for PASS Summit 2012 this week. I can hardly wait for things to get going (hence blogging at 5am).<div>
<br /></div>
<div>
Why is it so exciting? Well this is what I liked so much last time.</div>
<div>
<br /></div>
<div>
This is a SQL Server conference, but it is not run by Microsoft and SQL Server (and Data in all it's varied forms) is the ONLY topic. This means that the party line was not always followed and DB folk were not playing second fiddle to Developers or Server Admins.</div>
<div>
<br /></div>
<div>
The presenters were all top-notch. Not just good presenters but some of the best SQL Server presenters from across the country (world?) are here. They are experts in their arena, not only specific parts of SQL Server but on the broader topics of <b>data </b>modeling, <b>data </b>integration, <b>data </b>warehousing or enterprise <b>data </b>architecture. There were discussions that continued long after the sessions were over and private dialogues sometimes continued past the conference over email.</div>
<div>
<br /></div>
<div>
The people, not just the presenters were great too. There is something special about being surrounded by a totally new group of peers with different and exciting experiences. Everyone was having fun sharing war stories about the weird corner-cases that were encountered.</div>
<div>
<br /></div>
<div>
I learned more from that week at PASS Summit 2009 than I ever expected.</div>
<div>
<br /></div>
<div>
This week is gonna be great.</div>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0tag:blogger.com,1999:blog-5298882499718010482.post-35421404419335156202012-10-07T10:00:00.000-04:002012-10-07T10:42:24.178-04:00Some, Any and All oh my!<h2>
Tales of the exotic and strange</h2>
When discussing some of the strange parts of a "new to me" system, a co-worker said "That's nothing, our queries have rules where the data has to meet Some, Any or All criteria!"<br />
<div>
<br /></div>
<div>
Now this sounded completely absurd to me so I just let it slide by, but it got me thinking. There are a number of unusual ANSI operators that appear to be logically redundant, maybe this is real.</div>
<div>
<br /></div>
<div>
Sure enough, it <a href="http://msdn.microsoft.com/en-us/library/ms187074(v=sql.105).aspx" target="_blank">is real</a>.</div>
<div>
<br /></div>
<div>
What is surprising, is that isn't an <b>Operator</b> but instead is a <b>Modfier</b> to existing operator.<br />
<br /></div>
<h3>
</h3>
<h3>
Some / Any</h3>
<div>
Some and Any are synonyms. If the subset (query, or static list) has any records that meet the criteria records are returned.</div>
<h4>
Isn't that an "IN"?</h4>
<div>
Nope. "IN" is an operator, and would be the same as the combination "= ANY" but because this is a modifier you can pull some neat tricks like ">= ANY". This is handy for checking things like security levels or general product availability.</div>
<div>
<br /></div>
<div>
Example:</div>
<div>
<br /></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">SELECT firstName, lastName </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FROM employee AS e </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">WHERE </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> e.secId >= ANY </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> (</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> SELECT minSecId </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> FROM securedAssets AS sa </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> WHERE </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> sa.region = 'Inventory Mgmt'</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> )</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">;</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<h3>
All</h3>
<div>
All is another one with some interesting rules. Everything that is returned needs to meet the criteria. This is pretty terrific for things like being able to test if there is enough inventory to build an assembled item. You know if you are building 3 widgets, you need to have at least 3 of each part. </div>
<div>
<br /></div>
<div>
<div>
Example:</div>
<div>
<br /></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">SELECT widgetName</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FROM widget AS w </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">WHERE </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> 3 >= ALL </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> (</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> SELECT availableWidgetParts </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> FROM widgetParts AS wp </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> WHERE </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> wp.widgetId = w.widgetId</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> )</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">;</span></div>
</div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<h3>
<span style="font-family: Times, Times New Roman, serif;">Conclusion</span></h3>
<div>
<span style="font-family: Times, Times New Roman, serif;">Now this is a nifty feature, and there are plenty of other ways to do this for the majority of cases. I'll probably lump it in INTERSECT and EXCEPT as part of my seldom used, exotic query techniques.</span></div>
Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0tag:blogger.com,1999:blog-5298882499718010482.post-7303487030074882662012-05-19T16:59:00.000-04:002012-05-19T17:42:43.171-04:00Data Modeling Rematch!So I've got a lot of data modeling ahead of me at work and I need tool. The system was being tracked in Visio and that doesn't scale, for the reason I've noted before. So I did a breakdown <a href="http://www.unimpededbysanity.com/2012/02/state-of-data-modeling-tools-conclusion.html">before</a> of the choices but I wanted to address my specific needs this time.<br />
<br />
I like <a href="http://modelright.com/">ModelRight</a>, a lot. If you read the review you know it has an excellent working style. The mode-less property boxes are incredibly efficient and let you make changes remarkably quickly.<br />
<br />
I also really enjoy <a href="http://embarcadero.com/products/er-studio-data-architect">ER/Studio Data Architect</a> (referred to as ER/Studio thoughtout this post). The User Interface is extremely smooth and responsive. I do find that the modal dialog boxes slow me down a bit.<br />
<br />
So I've been doing a bake-off between the two. They both support the same core functionality really well (forward and reverse engineering) and are fairly stable.<br />
<br />
I'm working with a larger (~700 table) model and seeing how it does.<br />
<br />
<h3>
Data Modeling</h3>
<div>
So which is faster for actually developing a new model?</div>
<div>
<br /></div>
<h4>
ER/Studio</h4>
<div>
This is very much an awkward process of dragging the table on to the screen, double-clicking it and adding entries to the table. There are custom domains that can be defined to group similar types together and provide guidelines to improve consistency. The tabs can be awkward and it feels like a very stilted process.</div>
<div>
<br /></div>
<h4>
ModelRight</h4>
<div>
This is a bit more natural because after you drag the table over, you can start editing on the canvas or go to the docked modeless window. The same domain features are available to provide consistency for your columns and types. The modeless windows really do make this a much faster process to create and modify the tables.<br />
<br /></div>
<h3>
</h3>
<h3>
Model Layout (Reverse Engineering)</h3>
<h4>
ER/Studio</h4>
ER/Studio offers several different layout models to improve layout accuracy<br />
<ul>
<li>Circular</li>
<li>Orthoganal</li>
<li>Hierarchical</li>
</ul>
These help to various extents based on your sub-model contents.<br />
<br />
The canvas the ER/Studio has is unmatched for it's smoothness when manipulating the entities and adjusting the layout.<br />
<h4>
ModelRight</h4>
ModelRight only has auto-arrange, with an adjustable number to indicate spacing.<br />
ModelRight does it better (after fiddling with the number).<br />
<br />
The diagram needs less adjustment for clarity even if the UI isn't as smooth as ER/Studio offers.<br />
<br />
<h3>
Reporting</h3>
It's important to be able to distribute the model. Typical options for this are JPG, XML, HTML and RTF.<br />
<br />
<h4>
ER/Studio</h4>
<ul>
<li>JPG: Good</li>
<li>XML: Limited information is available but ok</li>
<li>HTML: Good. It's a nice tree structure and they provide a fairly good interface for it</li>
<li>RTF: Good. It's a word doc, nothing super fancy but it is pretty clear.</li>
</ul>
<h4>
</h4>
<h4>
ModelRight</h4>
<ul>
<li>JPG: Good</li>
<li>XML: Good. Super Detailed info in this file</li>
<li>HTML: Brutal to setup. Basically it is an XML export that expects you to transform it with an XSLT. Complete Customization, but out of the box it is weak. On the other hand, the help file says that ModelRight support will help you craft any XSLT you need.</li>
<li>RTF: I don't think it's offered. May be available as an XSLT from support.</li>
</ul>
<br />
<h3>
Sales and Support</h3>
<div>
<h4>
ER/Studio</h4>
</div>
<div>
I've been going through the Embarcadero sales rep whenever I needed anything. There was some initial lag on responsiveness but when it was clear I was serious, things progressed well. She brought engineering on to a call to demonstrate functions they felt were superior to their competitors. It was very interesting and involved a lot of add on products.<br />
<br />
I also contact support for two incidents. There was about a one day lag between requests for help and responses. Neither incident was resolved but the staff was courteous and tried to help.</div>
<div>
<br />
<h4>
ModelRight</h4>
</div>
<div>
The sales rep has been very responsive through email, but that was the only communication channel. I questioned her about features in the next two versions and the answers were useful without over promising.<br />
<br /></div>
<div>
Support has been remarkable. The support team was a bit terse in it's replies to my questions but always responded quickly. They went so far as to make an application build with fixes for bugs a couple of issues I found the same day. This support is amazing considering I'm still running the trial version.</div>
<div>
<br /></div>
<div>
<h3>
Defaults</h3>
</div>
<div>
I'm not even sure if the section should be in here but I think it is important from the standpoint of balance. </div>
<div>
<br /></div>
<div>
<h4>
ER/Studio</h4>
</div>
<div>
The defaults here are so good that I didn't even realize that a can configure a lot of the options to do something different. A lot of thought went into the initial configuration of out of the box. </div>
<div>
<br /></div>
<div>
<h4>
ModelRight</h4>
</div>
<div>
This feels like it was developed by programmers. For example<br />
<br />
<ul>
<li>The screen defaults to being overly crowded (even at 1600 x 900) with too many widgets active by default. </li>
<li>Reports REQUIRE an XSLT to be legible outside of Internet Explorer.</li>
</ul>
<br />
<br />
On the other hand, this leads to things like<br />
<br />
<ul>
<li>A complete XML export of the model is available and can be diff'ed with older versions</li>
<li>Quick short-cut to take a JPG of your diagram</li>
<li>Ridiculously powerful scripting</li>
</ul>
<h3>
</h3>
<br />
<h3>
Conclusion</h3>
<div>
ER/Studio is an excellent, polished tool. I have the budget to purchase it for myself and my team but not all of the secondary features. I could not get the repository or portal functionality which is what I need to be able to distribute the model to the development team.</div>
<div>
<br /></div>
<div>
ModelRight has similar (or better) features when compared to the baseline ER/Studio package. The price is about half but it also includes a free viewer and more flexibility if you are interested in putting a bit more work into it. Future enhancements slated for ModelRight 4 look very impressive and I'm glad to be on board with them for this project. </div>
<div>
<br /></div>
<div>
TLDR: ModelRight wins! Embarcadero has a lot to worry about with ModelRight. I will be using them as my data modeling platform of choice for the foreseeable future. </div>
</div>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0tag:blogger.com,1999:blog-5298882499718010482.post-26752776557578573712012-05-02T17:37:00.002-04:002012-05-02T17:37:47.552-04:00Sure it scales UP, but does it scale DOWN?I've recently run into an interesting issue, scalability doesn't necessarily go both ways.<br />
<br />
Now, intellectually I understand that you make trade-offs to scale to millions of users but that doesn't help my "gut feeling" that I am doing something wrong when I accept the trade-off.<br />
<br />
Let's take the classic example of triggers. Triggers tend to be very wrong when it comes to performance, however they are highly effective at securing the data and ensuring business rules are followed. I weigh the need for triggers very carefully before I add them to any table. This can be much more effective than adding the logic to every system that must integrate with the database.<br />
<br />
Small databases allow you to pull all sorts of fun tricks that you would never consider otherwise.<br />
<br />
Performance concepts do not always scale down to provide the flexibility that is needed on some projects. I've been measuring system with 500GB as a baseline for a while now and forgetting about the ~100GB options. <br />
<br />
Right now, I'm looking at different ways to do row, column and cell level security. Almost none of these are viable in a large database. In fact, most of these are not even viable in a "small" database but this gives me a lot of freedom to experiment with concepts, which is fun.Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0tag:blogger.com,1999:blog-5298882499718010482.post-68516727823078919392012-04-07T21:40:00.000-04:002012-04-07T21:40:03.042-04:00Change, it's inevitableI'm in the middle of changing jobs, I mean it literally. The last day for my old job was Friday, the first day of my new job is Monday, and it's Saturday night.<br />
<br />
There is much to miss with my old position. I had the chance to participate in changing the core data strategy and the luxury of doing it with the right people. I can't stress this enough, the right people make the difference. I can't mention leaving without mentioning them because they are an incredible group.<br />
<br />
<br />
<div>
When I announced my plans to move on, it was possible to promote from within because every had grown so very much. They are all brilliant and I will miss working with them.</div>
<div>
<br /></div>
<div>
So now I'm switching my role up a little bit. </div>
<div>
<br /></div>
<div>
I'm moving on to more of a straight Data Architect position instead of Enterprise data. I have the chance to build out a number of interesting "cloud" products that will have several very serious, real world applications. </div>
<div>
<br /></div>
<div>
There is a good bit of greenfield development and different agile techniques I expect to run into. I'll post as I learn new and interesting things. </div>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com3tag:blogger.com,1999:blog-5298882499718010482.post-84656088484507257442012-02-18T15:31:00.000-05:002012-02-20T19:21:17.148-05:00The State of Data Modeling Tools (Conclusion)<br />
After reviewing a bunch of the data modeling tools out there, I've come to the conclusion that there are some great tools out there, some workable tools and some tools to avoid.<br />
<br />
<b>Data Modeling Tools Result Summary</b><br />
<ul>
<li><a href="http://www.blogger.com/blogger.g?blogID=5298882499718010482#ERwin">CA ERwin </a></li>
<ul>
<li>This is a toolset that has been in the decline for while. CA is trying to pull it out of the nose-dive it's in but it has a legacy code-base without a lot of the original team around to explain it. As a long term system I can't recommend it. It will continue to go downhill.</li>
</ul>
<li><a href="http://www.unimpededbysanity.com/2012/01/visio-2011-erd.html" target="">Visio 2010 Professional </a></li>
<ul>
<li>Fine for small models but it will quickly become to unwieldy with a significant model. Microsoft doesn't seem interested in this aspect of the tool market and are not improving Visio for it.</li>
</ul>
<li><a href="http://www.unimpededbysanity.com/2011/12/state-of-data-modeling-tools-mysql.html">MySQL Workbench</a></li>
<ul>
<li>Stability is important. It's free but my time is not. I don't recommend this tool.</li>
</ul>
<li><a href="http://www.unimpededbysanity.com/2012/01/sql-power-architect.html">SQL Power Architect (Community Edition)</a></li>
<ul>
<li>This tool needs better integration with the actual database. It has the most potential of any open source modeling tool I've seen.</li>
</ul>
<li><a href="http://www.unimpededbysanity.com/2012/02/erstudio-data-architect.html">Embarcadero's ER/Studio Data Architect (Trial)</a></li>
<ul>
<li>This tool works flawlessly. If you can afford it, buy it. They've been doing this for over 15 years and have built an entire product line around modeling tools.</li>
</ul>
<li><a href="http://www.unimpededbysanity.com/2012/02/toad-data-modeler.html">Toad Data Modeler (Trial)</a></li>
<ul>
<li>I admit that the lack of stability irritated me greatly but I don't think that this product has improved much over the last few years and it may be on life support from Quest. Not recommended. </li>
</ul>
<li><a href="http://www.unimpededbysanity.com/2012/02/modelright-professional.html">ModelRight Professional (Trial)</a></li>
<ul>
<li>For the price (roughly 25% of ER/Studio) this tool works amazingly well. The only thing it lacks is logical modeling, if you have one platform (like most folks) this is a great choice.</li>
</ul>
</ul>
<span style="font-size: small;">I hope that some of you found my research helpful. Your great comments help me remember to post. :-)</span>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com2tag:blogger.com,1999:blog-5298882499718010482.post-46530062751386661592012-02-18T15:12:00.000-05:002012-02-18T15:12:05.774-05:00ModelRight Professional<br />
<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">This is part of continuing set of reviews / opinion pieces about the state of data modeling tools that began over </span><a href="http://www.unimpededbysanity.com/2011/12/state-of-data-modeling-tools.html" style="background-color: white; color: #7c93a1; font-family: Arial, Helvetica, sans-serif; line-height: 18px; text-decoration: none;" target="_blank">here</a><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">.</span><br />
<br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Background</b><span style="background-color: white; color: #333333;"><br /></span></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">I was talking to a co-worker who was at LogicWorks before they were purchased by Platinum Technology (who were later purchased by Computer Associates) and he mentioned that there was a new ERD company on the block, ModelRight. It turns out that some the original Logic Works folks put this together and I gave it a whirl. It's UI is a bit different but worth investigating further.</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br /></span></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Stability</b></span><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><br /></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">This ia another product I used frequently over the evaluation period. The small model </span><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">I tested with (< 50 table) worked great and the application was fast and responsive even after many days of use. </span><br />
<span style="background-color: white; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><br style="background-color: white; color: #333333;" /><b style="background-color: white; color: #333333;">Price</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">ModelRight Professional: </span><i style="background-color: white; color: #333333;">~$1,095 ($895 + $199 maint)</i><br style="background-color: white; color: #333333;" /><b style="background-color: white;"><span style="color: #333333;"></span></b></span><br />
<span style="background-color: white; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><span style="background-color: white; color: #333333;">ModelRight Professional for MySQL: <i>~$494 ($295 + $199 maint)</i></span></span><br />
<span style="background-color: white; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><i style="background-color: white; color: #333333;"><br /></i></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Features</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">Common Checklist</span></span><br />
<ul style="background-color: white; margin-bottom: 0.5em; margin-left: 0px; margin-right: 0px; margin-top: 0.5em; padding-bottom: 0px; padding-left: 2.5em; padding-right: 2.5em; padding-top: 0px;">
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Logic Modeling: <i>No</i></span><span style="background-color: white; color: #333333;"><i> </i></span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Physical Modeling: <i>Yes</i></span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Reverse Engineering: </span><i style="background-color: white; color: #333333;">Yes</i><span style="background-color: white; color: #333333;"> </span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Forward Engineering: <i>Yes</i></span></span></li>
<ul style="line-height: 18px;">
<li style="margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><i>Compare against files and live databases. </i></span></li>
</ul>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Sub Models:</span><i style="background-color: white; color: #333333;"> Yes</i></span></li>
</ul>
<span style="background-color: white;"></span><br />
<div>
<span style="background-color: white;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">The UI is extremely responsive, not only do you grab things when you are intend to, but all of the properties for the selected item are immediately visible. This is because the properties are not in windows that popup but are in a docked pane. I like this very much but it requires a bit more screen space, which makes working with it on a laptop awkward. After I got used to it this made working with the model unexpectedly more efficient.</span></span></span><br />
<span style="background-color: white;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br /></span></span></span><br />
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
Reversing out the DB worked as expected and the submodel management was intuitive. Just drag the entity you want to add to a submodel into the model pane and it was added. Slick, fast and efficient.</div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<br /></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
The compare and forward engineering created some of the most efficient scripts of all the modeling tools I've used. They actually demonstrated features I didn't know about on MySQL and worked very well. I don't think I had to re-write any of the generated scripts.</div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<br /></div>
</div>
<span style="background-color: white;"></span><br />
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;"><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-family: Arial, Helvetica, sans-serif;">Conclusion</span></b></span></div>
<span style="background-color: white;"><div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
If you have a single DB platform, ModelRight is definitely a modeling tool to consider. It is the only one that actually made me more efficient than I expected. The SQL it generated was was excellent and it was rock-solid. </div>
</span>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0tag:blogger.com,1999:blog-5298882499718010482.post-49813703829955255812012-02-18T15:00:00.000-05:002012-02-20T16:11:26.830-05:00Toad Data Modeler<br />
<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><br class="Apple-interchange-newline" />This is part of continuing set of reviews / opinion pieces about the state of data modeling tools that began over </span><a href="http://www.unimpededbysanity.com/2011/12/state-of-data-modeling-tools.html" style="background-color: white; color: #7c93a1; font-family: Arial, Helvetica, sans-serif; line-height: 18px; text-decoration: none;" target="_blank">here</a><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">.</span><br />
<br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Background</b><span style="background-color: white; color: #333333;"><br /></span></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">Toad is legendary for it's Oracle database management software. When I started at my current job I was asked to evaluate their SQL Server tooling. My evaluation was that the SQL Server tools were not fully baked and that Toad Data Modeler treated non-Oracle DBMS platforms as second class citizens. That was 4 years ago, Quest has put a lot of resources into the Toad family since then, lets see if Toad Data Modeler (TDM) has improved.</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br /></span></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Stability</b></span><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><br /></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">I used it for a few days to make my evaluation. The system only supports 25 entities in trial mode so I had to limited my model pretty severely. I didn't realize that I hit the limit until I was using it for a while, and it crashed. As soon as I hit the limit, the app stopped saving and hours got lost. </span></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><br /></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">Later on I used the application for a few hours and it crashed again. Luckily, because the model was kept small I was able to use my last save.</span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br /></span></span><br />
<span style="background-color: white; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Price</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">Toad Data Modeler: </span><i style="background-color: white; color: #333333;">~$450</i><br style="background-color: white; color: #333333;" /><b style="background-color: white;"><span style="color: #333333;"></span></b></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br /></span></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Features</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">Common Checklist</span></span><br />
<ul style="background-color: white; margin-bottom: 0.5em; margin-left: 0px; margin-right: 0px; margin-top: 0.5em; padding-bottom: 0px; padding-left: 2.5em; padding-right: 2.5em; padding-top: 0px;">
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Logic Modeling: <i>Yes</i></span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Physical Modeling: <i>Yes</i></span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Reverse Engineering: </span><i style="background-color: white; color: #333333;">Yes</i><span style="background-color: white; color: #333333;"> </span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Forward Engineering: <i>Yes</i></span></span></li>
<ul style="line-height: 18px;">
<li style="margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><i>Does not compare against the live DB, only against previous models. </i></span></li>
</ul>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Sub Models:</span><i style="background-color: white; color: #333333;"> Yes</i></span></li>
</ul>
<span style="background-color: white;"></span><br />
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;">The UI is reasonably responsive. </span><br />
<span style="background-color: white;"><br /></span><br />
<span style="background-color: white;">Reversing the DB works ok but the arrangement leaves a lot to be desired.</span><br />
<br /></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;">The model compare and forward engineering works ok, but not great. Not being able to compare against a live DB leads to issues.</span><br />
<br /></div>
<span style="background-color: white;"></span><br />
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;"><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-family: Arial, Helvetica, sans-serif;">Conclusion</span></b></span></div>
<span style="background-color: white;"></span><br />
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;">Toad Data Modeler is not stable. I can't stress the value of stability in tool selection. I really didn't enjoy using the tool and cannot recommend it.</span></div>
<span style="background-color: white;">
</span>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com2tag:blogger.com,1999:blog-5298882499718010482.post-31295166571204290652012-02-18T14:13:00.000-05:002012-02-18T14:13:27.024-05:00ER/Studio Data Architect<br />
<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">This is part of continuing set of reviews / opinion pieces about the state of data modeling tools that began over </span><a href="http://www.unimpededbysanity.com/2011/12/state-of-data-modeling-tools.html" style="background-color: white; color: #7c93a1; font-family: Arial, Helvetica, sans-serif; line-height: 18px; text-decoration: none;" target="_blank">here</a><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">.</span><br />
<br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Background</b><span style="background-color: white; color: #333333;"><br /></span></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">Embarcadero Technologies started off with making DB development and management tools. I ran into one of their tools (DBArtisan) in 1998 for managing my Sybase instances. At my next job I insisted on getting the full suite of Embarcadero Software. And my next job too. Unfortunately after that I wasn't able to justify the cost and it has been 7 years since I used the software. Did it age poorly, similar to what happened to ERwin? Let's see.</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br /></span></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Stability</b></span><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><br /></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">I used it nearly every day during the evaluation period, testing out a number of the different options. There were no problems with the small model I tested with (< 50 table) while it remained fast and responsive over many days being open and used. </span><br />
<span style="background-color: white; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><br style="background-color: white; color: #333333;" /><b style="background-color: white; color: #333333;">Price</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">ER/Studio Multi-Platform: </span><i style="background-color: white; color: #333333;">~$4,615.00 ($3,700 + $915 maint)</i><br style="background-color: white; color: #333333;" /><b style="background-color: white;"><span style="color: #333333;"></span></b></span><br />
<span style="background-color: white; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><span style="background-color: white; color: #333333;">ER/Studio Open Source Databases: <i>~$1,360 ($1,100 + $260 maint)</i></span></span><br />
<span style="background-color: white; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><i style="background-color: white; color: #333333;"><br /></i></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Features</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">Common Checklist</span></span><br />
<ul style="background-color: white; margin-bottom: 0.5em; margin-left: 0px; margin-right: 0px; margin-top: 0.5em; padding-bottom: 0px; padding-left: 2.5em; padding-right: 2.5em; padding-top: 0px;">
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Logic Modeling: <i>Yes</i></span><span style="background-color: white; color: #333333;"><i> </i></span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Physical Modeling: <i>Yes</i></span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Reverse Engineering: </span><i style="background-color: white; color: #333333;">Yes</i><span style="background-color: white; color: #333333;"> </span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Forward Engineering: <i>Yes</i></span></span></li>
<ul style="line-height: 18px;">
<li style="margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><i>Compare against files and live databases. </i></span></li>
</ul>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Sub Models:</span><i style="background-color: white; color: #333333;"> Yes</i></span></li>
</ul>
<span style="background-color: white;"></span><br />
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;">The UI is responsive, when you grab items they do what you expect them to. It's very similar to ERwin but the UI is much more responsive and stable.</span><br />
<span style="background-color: white;"><br /></span><br />
After reversing out an existing database, the initial entity arrangement is surprisingly good, they have clearly put a lot of effort into it. There are several layout options and all of them work pretty well. Ultimately you still need to re-arrange the model a bit, but far less than any other tool I've used.<br />
</div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;"><br /></span><br />
<span style="background-color: white;">The Database compare and forward engineering makes very usable scripts that respect the integrity of the original model.</span><br />
<br /></div>
<span style="background-color: white;"><div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-family: Arial, Helvetica, sans-serif;">Conclusion</span></b></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
ER/Studio is an excellent product. There are add-ons to manage the model with a centralized repository and share database development tasks. There are also tools to establishing and maintaining conventions throughout your diagram.</div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<br /></div>
<div>
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">In conclusion, ER/Studio is a great product, the price is high but it works flawlessly.</span></span></div>
</span>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com1tag:blogger.com,1999:blog-5298882499718010482.post-76365303399838300362012-01-07T18:13:00.000-05:002012-01-07T18:13:41.487-05:00Visio 2011 (ERD)<span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">This is part of continuing set of reviews / opinion pieces about the state of data modeling tools that began over </span><a href="http://www.unimpededbysanity.com/2011/12/state-of-data-modeling-tools.html" style="background-color: white; color: #7c93a1; font-family: Arial, Helvetica, sans-serif; line-height: 18px; text-decoration: none;" target="_blank">here</a><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">.</span><br />
<br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Background</b><span style="background-color: white; color: #333333;"><br /></span></span><br />
<span style="background-color: white; color: #333333; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">Visio is the default business diagramming tool in most organizations. I've used it off and on for after-the-fact documentation but never to create the actual model.</span></span><span style="background-color: white; color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"> The newest version of Visio (2010) included a slightly remodeled UI and some tools to help with choosing colors (surprisingly, that is very useful to me). </span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br /></span></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Stability</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">It's Visio, it works well for the small model I tested with (< 50 table) and had no stability issues. </span></span><br />
<span style="background-color: white; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><br style="background-color: white; color: #333333;" /><b style="background-color: white; color: #333333;">Price</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">Visio Professional: </span><i style="background-color: white; color: #333333;">~$400.00</i><br style="background-color: white; color: #333333;" /><b style="background-color: white;"><span style="color: #333333;"></span></b></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; line-height: 18px;"><b style="background-color: white; color: #333333;">Features</b><br style="background-color: white; color: #333333;" /><span style="background-color: white; color: #333333;">Common Checklist</span></span><br />
<ul style="background-color: white; margin-bottom: 0.5em; margin-left: 0px; margin-right: 0px; margin-top: 0.5em; padding-bottom: 0px; padding-left: 2.5em; padding-right: 2.5em; padding-top: 0px;">
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Logic Modeling: <i>No</i></span><span style="background-color: white; color: #333333;"><i> </i></span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Physical Modeling: <i>Yes</i></span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Reverse Engineering: </span><i style="background-color: white; color: #333333;">Yes</i><span style="background-color: white; color: #333333;"> </span></span></li>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Forward Engineering: <i>No</i></span></span></li>
<ul style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px;">
<li style="margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;"><i>Note, there is a plug-in created by Alberto Ferrari and hosted at <a href="http://forwardengineer.codeplex.com/" target="_blank">CodePlex</a> which does a <b>non-comparing</b> forward engineer of the DB</i></span></span></li>
</ul>
<li style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333;">Sub Models:</span><i style="background-color: white; color: #333333;"> Yes</i></span></li>
</ul>
<span style="background-color: white;"></span><br />
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;">Once you get past the Visio UI quirks, documenting an existing system using Visio is a breeze. Yes, extra clicks are required everywhere. Yes, the relationship link points are limited and not fun to reset. </span></div>
<span style="background-color: white;">
</span><br />
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<span style="background-color: white;"><br /></span></div>
<span style="background-color: white;">
</span><br />
<div>
<span style="background-color: white;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">The initial arrangement of the entities is pretty bad, but after using SQL Power Architect and MySQL Workbench I can appreciate that the layout is not awful.</span></span></span></div>
<span style="background-color: white;">
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<br /></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
Microsoft has a team of writers maintaining the help system and there are several good web resources on building ERDs using Visio so there is plenty of available documentation.</div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<br /></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
The lack of a built-in forward engineering function can be worked around (as noted above) but it does not compare to the actual database. </div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></b></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-family: Arial, Helvetica, sans-serif;">Conclusion</span></b></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
I have used Visio to document data models in the past and I maintain that it can do this at an acceptable level.</div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<br /></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
Also Visio can be a good way to get down some initial thoughts on a model if that is all you have available to you.</div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
<br /></div>
<div style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">
I do not recommend using Visio for ongoing, iterative data model development. Forward engineering without a comparison function makes this work too awkward. </div>
</span>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com1tag:blogger.com,1999:blog-5298882499718010482.post-6459069612974252582012-01-07T16:53:00.000-05:002012-01-07T16:53:20.845-05:00SQL Power Architect<span style="background-color: white; color: #333333; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">This is part of continuing set of reviews / opinion pieces about the state of data modeling tools that began over <a href="http://www.unimpededbysanity.com/2011/12/state-of-data-modeling-tools.html" target="_blank">here</a>.</span></span><br />
<b style="background-color: white; color: #333333; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></b><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><b style="background-color: white; color: #333333; line-height: 18px;">Background</b><span style="background-color: white; color: #333333; line-height: 18px;"><br />
</span></span><br />
<span style="background-color: white; color: #333333; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">I must say that SQL Power Architect came out of no where for me. I was googling around when I stumbled upon it and it sounded too good to be true. A multi-platform, open source database modeling tool with an Enterprise version available (for pay naturally) if you need advanced features. Watching some of the videos it sounds like this was a tool developed in house that evolved into a product for the company. I couldn't find much more history than that, if anyone has some let me know and I'll add it in here.</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br />
</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><b style="background-color: white; color: #333333; line-height: 18px;">Stability</b><br style="background-color: white; color: #333333; line-height: 18px;" /><span style="background-color: white; color: #333333; line-height: 18px;">I was back to using my small (< 50 table) model and had no stability issues. Rock Solid.</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br style="background-color: white; color: #333333; line-height: 18px;" /><b style="background-color: white; color: #333333; line-height: 18px;">Price</b><br style="background-color: white; color: #333333; line-height: 18px;" /><span style="background-color: white; color: #333333; line-height: 18px;">SQL Power Architect: </span><i style="background-color: white; color: #333333; line-height: 18px;">~$00.00</i><br style="background-color: white; color: #333333; line-height: 18px;" /><span style="background-color: white; color: #333333; line-height: 18px;">SQL Power Architect Enterprise: </span><i style="background-color: white; color: #333333; line-height: 18px;">$495.00</i><br style="background-color: white; color: #333333; line-height: 18px;" /><b style="background-color: white; color: #333333; line-height: 18px;"><br />
</b></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><b style="background-color: white; color: #333333; line-height: 18px;">Features</b><br style="background-color: white; color: #333333; line-height: 18px;" /><span style="background-color: white; color: #333333; line-height: 18px;">Common Checklist</span></span><br />
<ul><li><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333; line-height: 18px;">Logic Modeling: <i>Yes</i></span><span style="background-color: white; color: #333333; line-height: 18px;"> </span></span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333; line-height: 18px;">Physical Modeling: <i>No</i></span><span style="background-color: white; color: #333333; line-height: 18px;"> </span></span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333; line-height: 18px;">Reverse Engineering: </span><i style="background-color: white; color: #333333; line-height: 18px;">Yes</i><span style="background-color: white; color: #333333; line-height: 18px;"> </span></span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333; line-height: 18px;">Forward Engineering: </span><i style="background-color: white; color: #333333; line-height: 18px;">Yes</i></span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #333333; line-height: 18px;">Sub Models:</span><i style="background-color: white; color: #333333; line-height: 18px;"> No</i><span style="background-color: white; color: #333333; line-height: 18px;"> </span></span></li>
</ul><span style="color: #333333; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">This is an interesting contrast to MySQL Workbench. SQL Power Architect insists that you do all of your work in the Logical Model (called the PlayPen Database) and generate SQL to apply to your Physical Model. Of course this gets awkward if you need to update your Logical Model from the actual database.</span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br />
</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">Once again Auto-Arrange is a very poor choice. So many of the relationships overlap that it is difficult to understand the model. </span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br />
</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">There is no documentation for the Community edition available. To understand how this application works your best bet is to visit the YouTube <a href="http://www.youtube.com/user/giuliosqlpower?feature=BF" target="_blank">SQL Power Channel</a> and play around for a while. In all fairness the Enterprise Edition does have documentation available.</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br />
</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">The GUI, well it is in Java, so it does not feel native. The scrolling is wrong, either it scrolls too fast or begins scrolling before you would expect it to. In most apps this is isn't a problem but as you're trying to work on your diagram this can get really awkward.</span></span><br />
<span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br />
</span></span><br />
<b style="background-color: white; color: #333333; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">Conclusion</span></b><br />
<span style="background-color: white; color: #333333; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">This is easily the best open source tool I've seen for data modeling. It is easy to setup, works with JDBC connections and allows you to compare and forward engineer your changes.</span></span><br />
<span style="background-color: white; color: #333333; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></span><br />
<span style="background-color: white; color: #333333; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;">The downside is that all DB schema changes are expected to originate from it, or be added to the Logical Model manually. And the Logical Model does not support sub-models. And the GUI makes working with the models challenging.</span></span><br />
<span style="background-color: white; color: #333333; line-height: 18px;"><span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span></span><br />
<span style="background-color: white;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">These issues can be worked around by making a lot of small projects as sub-models. This raises it's own challenges with overlapping entities and synchronizing projects. </span></span></span><br />
<span style="background-color: white;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;"><br />
</span></span></span><br />
<span style="background-color: white;"><span style="color: #333333; font-family: Arial, Helvetica, sans-serif;"><span style="line-height: 18px;">All in all if you are working on an open source project with limited funds this may be your best approach but watch out, you will spend more effort managing this model than you expect.</span></span></span>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com5tag:blogger.com,1999:blog-5298882499718010482.post-45219033016596549792011-12-31T13:55:00.000-05:002012-01-07T16:52:01.165-05:00The State of Data Modeling Tools (MySQL Workbench)<b><span style="font-family: Arial, Helvetica, sans-serif;">Background</span></b><br />
<span style="font-family: Arial, Helvetica, sans-serif;">I have only begun using MySQL and trying out the WorkBench tools from Oracle was the next logical step. This product was originally DBDesigner 4 from fabForce.net before it's developer was hired by MySQL AB to create a new set of MySQL GUI tools. This application has grown from a physical database designer for MySQL to include additional query and managment features. It is much closer to SQL Server Management Studio than it is to ERwin.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><br />
<b><span style="font-family: Arial, Helvetica, sans-serif;">Stability</span></b><br />
<span style="font-family: Arial, Helvetica, sans-serif;">I was using a relatively small (< 50 table) model to work with and the system would crash after about on hour of use. The crashing was more frequent on the Mac side of things than it was when running on the PC but neither were very stable.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><br />
<b><span style="font-family: Arial, Helvetica, sans-serif;">Price</span></b><br />
<span style="font-family: Arial, Helvetica, sans-serif;">MySQL Workbench: <i>~$00.00</i></span><br />
<b><span style="font-family: Arial, Helvetica, sans-serif;"> </span></b><br />
<b><span style="font-family: Arial, Helvetica, sans-serif;">Features</span></b><br />
<span style="font-family: Arial, Helvetica, sans-serif;">Common Checklist</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><br />
<ul><li><span style="font-family: Arial, Helvetica, sans-serif;">Logic Modeling: <i>No</i></span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Physical Modeling: <i>Yes</i></span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;"> Reverse Engineering: <i>Yes</i></span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Forward Engineering: <i>Yes *</i></span></li>
<ul><li><i><span style="font-family: Arial, Helvetica, sans-serif;">* Note: Does NOT compare to existing schema. It will make fresh. new objects!</span></i></li>
</ul><li><span style="font-family: Arial, Helvetica, sans-serif;">Sub Models:<i> No</i></span></li>
</ul><span style="font-family: Arial, Helvetica, sans-serif;">The auto-arrange does a terrible job distributing the tables, many were overlapping. The relationship lines are not adjustable, they go where the app assigns them to be.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;">The GUI makes me think of Eclipse, which is clearly not native on any platform and it generally works as expected. The notation can be changed to fit your conventions and the automatic screen refreshing worked flawlessly (not something to take for granted after using ERwin).</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><br />
<b><span style="font-family: Arial, Helvetica, sans-serif;">Conclusion</span></b><br />
<span style="font-family: Arial, Helvetica, sans-serif;">This is a very good tool for visually showing developers how a few objects are related to each other and firmly belongs in the MySQL suite of tools.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;">Unfortunately, this application is just a bit too primitive to use for anything other than the most basic data modeling. The lack of a Logical Model is less than ideal, but not a deal breaker. I did find the crashing, lack of sub-models, minimal control of the relationship lines and the poor forward engineering very frustrating. I recommend using a piece of graph paper instead.</span>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com2tag:blogger.com,1999:blog-5298882499718010482.post-46389745576205320372011-12-30T20:00:00.000-05:002012-02-18T15:12:34.101-05:00The State of Data Modeling ToolsLately have had a chance to do some data modeling for a MySQL project and I thought I'd do a series of posts with a brief review the tools for making ERDs. <br />
<br />
<b>Data Modeling Tools</b><br />
<ul>
<li><a href="http://www.blogger.com/blogger.g?blogID=5298882499718010482#ERwin">CA ERwin (see below)</a></li>
<li><a href="http://www.unimpededbysanity.com/2012/01/visio-2011-erd.html" target="">Visio 2010 Professional </a></li>
<li><a href="http://www.unimpededbysanity.com/2011/12/state-of-data-modeling-tools-mysql.html">MySQL Workbench</a></li>
<li><a href="http://www.unimpededbysanity.com/2012/01/sql-power-architect.html">SQL Power Architect (Community Edition)</a></li>
<li><a href="http://www.unimpededbysanity.com/2012/02/erstudio-data-architect.html">Embarcadero's ER/Studio Data Architect (Trial)</a></li>
<li><a href="http://www.unimpededbysanity.com/2012/02/toad-data-modeler.html">Toad Data Modeler (Trial)</a></li>
<li><a href="http://www.unimpededbysanity.com/2012/02/modelright-professional.html">ModelRight Professional (Trial)</a></li>
</ul>
<b><span style="font-size: small;"> </span><span style="font-size: small;">Evaluation Criteria</span></b><br />
<ul>
<li>Stability</li>
<li>Price</li>
<li>Features</li>
</ul>
<span style="font-size: large;"><b><a href="http://www.blogger.com/blogger.g?blogID=5298882499718010482" id="ERwin">ERwin Review</a></b></span><br />
I have been using ERwin for a few years now so I am pretty familiar with this tool. Like it or not it has been around for ages (Logic Works, who originally made ERwin started in 1988) and is the benchmark all other system have to be measured against.<br />
<br />
<b>Stability</b><br />
The most recent version has <i>not</i> been entirely stable, I cannot make it crash on demand but it will occasionally lock up.<br />
<br />
<b>Price</b><br />
ERwin Data Modeler Std Edt + 1 yr Maint: <i>~$3,900</i><br />
<b> </b><br />
<b>Features</b><br />
Common Checklist<br />
<ul>
<li>Logic Modeling: <i>Yes</i></li>
<li>Physical Modeling: <i>Yes</i></li>
<li> Reverse Engineering: <i>Yes</i></li>
<li>Forward Engineering: <i>Yes</i></li>
<li>Sub Models:<i> Yes </i></li>
</ul>
Now having used it for a while I know a number of ERwin's quirks but stepping back and looking at it with a fresh eye, it is dated. The GUI looks like it is still built on an ancient toolkit which is likely why the work area screens require manual refreshes to clean up visual artifacts.<br />
<br />
<b>Conclusion</b><br />
ERwin has the core features I'd like, but it's not entirely stable, the GUI is quirky and the product is expensive. For the price I would expect CA to be rolling updates more frequently to address these issues but ERwin has had minimal updates these last few years.Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0tag:blogger.com,1999:blog-5298882499718010482.post-60995885872116205722011-01-30T11:56:00.000-05:002011-01-30T11:56:42.601-05:00NoSQL Summary<span class="Apple-style-span" style="font-family: inherit;">The best short description of when not to use NoSQL that I've ever seen is from a reddit comment by bucknuggets</span><br />
<br />
<br />
<blockquote>Assumptions:</blockquote><br />
<ul><li><blockquote>a key-value pair data store isn't optimal storage in the perfect world since data quality is very difficult to manage, reporting is a nightmare, and since most data doesn't map elegantly to key-values app complexity grows.</blockquote></li>
<li><blockquote>but it may be a very effective storage solution at certain scalability thresholds</blockquote></li>
</ul><br />
<blockquote>Then for sites below those thresholds it is premature optimization - that adds unnecessary risk to the project.</blockquote>Chad Dinermanhttp://www.blogger.com/profile/12096950784001563394noreply@blogger.com0