The age of Labor Arbitrage is over; Now it’s Data Arbitrage

Outsourcing was an easy win to reduce IT Costs. And executives loved it.  They loved it so much that giants like Accenture now rely on outsourcing for almost half their revenue (in March, Accenture posted $3.91 B in outsourcing net revenues, which was 47% of their overall).  Even Gartner’s Hype Cycle shows that outsourcing is reaching its maturity.

Former Infosys CEO Vihal Sikka said, “We will not survive if we remain in the constricted space of doing as we are told, depending solely on cost arbitrage.”  That is, there’s significantly less money in future “labor arbitrage”, and if companies – especially Global System Integrators – want to continue the kind of 16% per annum growth curves they’ve seen, they have to turn to a different types of arbitrage.  Loosely stated, arbitrage is just the buying and selling of assets in different markets or forms to take advantage of price differences.  Labor was cheap in India, and expensive in America. So, voila, outsourcing made sense.

A new kind of arbitrage is emerging: Data Arbitrage.  What exactly do I mean by data arbitrage?  I mean that today we pay a hefty price to get our data where we want it when we want it; and there’s a significant price difference for delivering that same data using a DataOps solution.  Same asset.  Radically different price.  Huge opportunity to leverage the difference.

Specifically, what’s the data arbitrage opportunity for application testing?  Consider 5 arbitrage opportunities:

Data is impersonal.  The data most testers use is either shared among many people (because cost forces them to use too few environments) or made personal at enormous cost (because making 100 copies of data for 100 testers isn’t free).

Arbitrage opportunity: If we can give every tester their own environment but do so at an extremely low cost, we get the benefit of de-coupling different testing pathways without the cost of proliferating hardware and storage to support those pathways.  Through data virtualization, DataOps tools can accomplish exactly that.

Data is insecure.  The dirty secret of many IT shops is that they pay lip service to masking – either they use crude homegrown solutions rife with security holes, or they find ways to “exempt” themselves through exceptions.  Further, those that do mask well usually don’t mask often because of the delay it imposes on getting data and the enormous expense of keeping masked copies around.

Arbitrage opportunity: If we can consistently mask every non-production environment before handoff, we significantly reduce risk.  If we can mask continuously, we can mask often.  And, if we can provide those masked environments without the cost of proliferating hardware and storage, we get the much lower risk data that doesn’t impede our application delivery pipeline.  Through integrated masking and through features that support distributed referential integrity (making sure that a name or a number is masked consistently across heterogeneous data sources), DataOps tools can accomplish exactly that.

Data is tethered.  It’s not that we never move it. It’s that once we put data on a host and in a database server, it’s hard to disentangle it from the host and server we put it in.  And since it is hard, we make more barriers to movement to mitigate the risk.

Arbitrage opportunity: If we can radically lower the cost of data mobility in time (moving the time pointer on a dataset) and in space (moving the dataset from one host/server to another), we unlock a productivity avalanche.  A tester can promote a 5 Tb applications from one environment to another in minutes not days.  If mobility is easy, and copies are almost free, then a tester can share a bug with a developer almost instantly and then continue working in another pathway without fear of data loss.  The arbitrage opportunity for driving down the cost of context switch and the cost of error to near zero can’t be understated.  Breaking all of the key dependencies in the Testers workflow has enormous speed and quality consequences because no one depends on anyone else anymore.   And that dramatically lower total cost of Concept-to-Realization for any feature of an application.  DataOps tools make near zero-cost near zero-time context switches a reality, and that means the enormous cost of error we experience and the controls we have around can both be driven down.

Data is heavyweight.  Data size will get bigger.  And the bigger it is, the harder it is to move and the higher the cost of error when something goes wrong.  Thus, the enormous timelines and contingency plans for anything that looks like migration from one platform to another or one place to another.

Arbitrage opportunity: If we can not only reduce the cost of data mobility, but the cost of data provision as well, then we apply the benefits of data mobility to data provision as well.  Not only can we move datasets in time and space with radical ease, we can create net new datasets in the same timeframes.  That’s data elasticity on demand. DataOps tools give you that on-demand elasticity.  It’s like VMWare for data.  Spin it up here; spin it down there; Repeat.

Data is passive.  While, yes, we do update our data every day – most of that big blob of data we manage is static.  It doesn’t change much, and it doesn’t do much.

Arbitrage opportunity: If we can turn all of the “dead” data we have lying around in copy after copy into live, shared, active data we get the benefit that our data is being used to its maximum value, which reduces our storage cost for sure but also makes it dead simple to move groups of related data together very rapidly (such as we might do to get to the cloud, or migrate data centers, etc.).  With DataOps tools, moving a single database or moving a family of 100 related databases is within a few %ge points of being the same operation in terms of cost and time.  Our concern is a lot more focused on how related the data sets are, and not how large they are, because every bit and byte is used to its maximum advantage.

How big is the arbitrage opportunity?  Business owners should pay attention. Numbers from dozens of real customers and real projects show project timeline savings in the 30 to 50% range, significantly increased testing density, a massive left shift in testing defects (including a net reduction in defects by 40% or more) on top of storage costs falling 80% or more.  Imagine getting a real SAP or Oracle E*Business Suite project or migration done in half the time.  Now imagine getting it done without any perceptible errors post-launch.  What’s that worth to your business?

Data is fast and simple with DataOps.
Data is impersonal, insecure,  expensive, tethered, heavyweight, and passive without it.
If you don’t have a DataOps tool and a strategy to help you exploit Data Arbitrage, get one.

Inclusion begins with Hospitality

A recent NY Times Op-ed reminded me how easy it is to leave my own biases unchallenged and thus misunderstood what inclusion really means.  In reflection, my own past failures to be inclusive were born of:

  • My implicit Bias – without vigilance, practice, and reliance on data – it’s easy to arrive at conclusions that satisfy us but aren’t supported by evidence.  Moreover, a preconceived notion or a foregone conclusion gives license to devalue ideas that clash with them.  Bias extends even to those who consider themselves objective (e.g. scientists), as aptly argued by Thomas Kuhn.
  • My context clouds everything – we all arrive at facts within our own context.  We can’t escape that context.  We can only be aware of it.
  • My navel-focus – Companies place a premium on profit, accomplishment, and teamwork.  But, when careers or deals are on the line, its seductively easy to let a deal or career ambition become more important than our mutual dignity.

In response to these realizations, I find my own epistemology of inclusiveness is emerging.  I know I am inclusive:

  • if I am aware of great power disparities within groups, and I make sure the voice of the less powerful person or group is heard.
  • if I insist on inviting and hearing the person that is least like me.
  • if dignity trumps other motives.

Having been on both sides of the power divide, I’ve arrived at some some simple insights on inclusive behaviors:

  1. Much like a good host always makes their guests feel welcomed and special, an inclusive environment allows every person and viewpoint to have a seat at the table and a place in the conversation.
  2. Much like a good host makes all their guests feel comfortable with one another (even if they don’t know everyone), an inclusive environment allows very different people to feel comfortable working together.
  3. Disagreement shouldn’t make us disagreeable.  A good host can be affable with friend and foe alike, and permits no prerequisite to the attribution of respect or dignity to all persons.

Even the best workplaces (and Delphix is certainly one!) have ample situations where bias, disrespect, and exclusionary behaviors occur.  I know they have happened to me; I’m sure they’ve happened to you.

For companies born as startups, disruption is the corner stone of success.  Disrupting my own thinking is an important first step in closing my own inclusion gap.  Though many more steps need to be taken, I do draw this simple lesson: Inclusion begins with hospitality.

 

How does the Delphix Dynamic Data Platform support Oracle vs. SQL Server?

One of our premier partners shot me a message last week to help him walk through the differences between how Delphix is implemented on Oracle vs. SQL Server.  If you are unfamiliar with the Delphix Dynamic Data Platform (DDP), this blog won’t make sense to you until you’ve read through Oracle Support and Requirements.  This blog provides an overview of those differences through the key perspectives that are of interest to technical folks implementing or explaining it.

Permissions

  • Similarities
    • Access. Both Oracle and MS SQL Server need users that can access data.  Both need basic permissions to read backup data from and access the Source (usually production) host and database server.  When the Source and either the Staging or Target Hosts are in different places, there may need to be extra permissions.
  • Differences
Table of MS SQL Permissions
Component Requirements Method Source Target /Validated Sync
Environment Delphix OS User Windows Domain User
Member of Backup Operator or Local Administrators
db_datareader permission on master
Sysadmin role on SQL Server Instance
SQL Instance Should Run As Domain Users or Local service accounts
PowerShell Privileges Execution Policy Set to Unrestricted.
iSCSI service Set to start Automatic in Service.
Read permission to backup share
Delphix Connector Installed & addhostgui.cmd executed
Database Delphix SQL DB User db_datareader permission on master and msdb.
(SQL Authentication Account) db_backupoperator for user databases
Network Enable TCP/IP for JDBC Open firewall for Port 1433(default)
Shared Memory

Data Collection & Connection

  • Similarities
    • Native Backup. In general, the Delphix Dynamic Data Platform (DDP) ingests data through native backup.
    • Recovery Model. Generally, we need to develop an understanding of how often the backups run, where they live, and how we gain access to those backups so that we are able to do that ingestion.
    • Use of Database Primitives. Most databases keep a pointer (aka database primitive) to identify transactions.  Backups are often keyed to these primitives.  For example, you typically must be able to have the continuous stream of transactions associated to these primitives to maintain consistency, and if you break the chain you effectively push the reset button (In Oracle, breaking the chain forces a reset logs event, e.g.) and your next backup looks like a new database.
  • Differences
    • Backup Facility.  Oracle’s native backup facility is RMAN in its various modes (Level 0, Level 1, etc.).  In SQL Server, the Delphix DDDP relies on the customer’s own native SQL Server backups in its various Recovery Models (Simple, Full) which may include T-Logs. The Delphix DDP can use pre-existing or new native SQL, Lightspeed, and RedGate backups located on an SMB share.
    • Need for Staging Server.  The Delphix DDP implementation for Oracle does not need a Staging Server.  We read directly from the Oracle database server using the RMAN facility in modes that mimic both backup and log streaming.  In SQL Server, we must use a staging server where we can ingest those backups.  That staging server has storage allocated directly from the Delphix engine.  It is this storage that allows us to manipulate the data (after its has been ingested) through an always-recovering staging database.  The Staging Server must contain an instance of SQL Server which matches the version found on the Source (but doesn’t have to exactly match the Target). To the Delphix DDP, there is no difference between the staging and the target server functionality-wise except that the O/S user that owns the instance on the “Staging” server needs to be able to find prod.  On the target server, that same owner does not need to be able to do that.  So, the staging server O/S user has a superset of the privileges that a target server owner would have.
    • Name of Database Primitive.  In Oracle, the database primitive is called SCN (System Change Number) whereas in Microsoft SQL Server it is called LSN (Logical System Number).
    • Type of Backups. The type of backups you are doing affect the freshness and granularity of the Delphix DDP TimeFlow.  See: Delphix TimeFlow in Oracle vs. SQL Server. For SQL Server, the Delphix DDP also provides the capability for Delphix to take its own copy-only backup which has no impact on the log chain.
    • Connector/Point of Access to Host.  Unlike adding an Oracle source, when we add MS SQL databases the Delphix DDP needs to use a connector (a small app that allows Delphix to communicate to the server).  We want to be as un-intrusive as possible with Delphix.  So, we don’t want to install a connector on your prod server since we only need the backup.  Instead, we install the connector on the Staging Server and the Target server.   On this Staging server, the Operating system owner of the SQL Instance into which we will be recovering your production data needs to have the capability to go and find your db and the backups for your db and be able to read them and ingest them into that staging server.  This is usually not a big deal if you are in the same Data Center, LAN, and domain.  Customers with different domains for their target, or that have a separation between Staging and Production requires permissions be granted either across domains (a cross domain trust) or specific to that user so they can access those backups on the production side.

Data Presentation

  • Similarities
    • Common Delphix Features. Delphix Virtual Databases are generally treated the same within the Delphix DDP in terms of their ability to utilize the controls and features, particularly the data control features: Reset, Refresh, Rollback, Bookmark, Branch, etc.
  • Differences
    • Protocol.  SQL Server VDBs are presented to Target Hosts via iSCSI.  Oracle VDBs are presented via NFS v3.  Whereas the Delphix DDP uses NFS v3 for POSIX environments such as Oracle, it uses iSCSI for Windows O/S environments.  Crucially, the iSCSI that the Delphix DDP uses is NOT a hardware solution; we use a software based iSCSI.  This may require some configuration of the ISCSI services on the staging environments servers.

Supported Versions

Delphix Features: TimeFlow

  • Similarities
    • The Delphix DDP uses TimeFlow to represent the state of the database (or of a Container) in 2 ways:
      • SnapSync Cards – These represent the equivalent of a complete backup of a dataset as of a specific point in time.
      • LogSync Transaction Level Points – these represent each of the individual transaction boundaries uniquely identified by the database primitive.
  • Differences
    • Log Sync.  Log sync for Oracle is forward-facing; Log Sync for SQL Server is backward-facing depending on the last time time a new T-log was opened.  Since Log Sync can take advantage of Oracle Online and redo logs, it can build the TimeFlow in front of the last SnapSync card that was taken.  For SQL Server, TimeFlow can be granular but the granularity is a function of the last time the T-log was taken and never increments past that border.

Architecture Diagram

Oracle:

SQL Server:

The Digital Transformation Divide

A few days ago at the NASDAQ center in San Francisco, I caught up with MRE CIO Ken Piddington, who also serves as an Executive Advisor to CIOs.  “Top of mind with CIOs and IT shops I’m talking to,” said Ken, “is Data Transformation.”  In fact, he often hears key players tell him, “I’m part of the Data Transformation Group.”   The problem is that Data Transformation has come to mean so many different things to CIOs that it’s hard to define, and even harder to relate new data innovations into their journey.

Digital transformation is a data-driven interconnectedness that impels hyper-awareness, well-informed decision-making, and rapid execution.   Within this context, three key innovations are changing the Data Transformation Journey for CIOs:

  • Data is free to roam
    • Applying the principles of DataOps* to Thin/Sparse clones has effectively decoupled Database Servers from their Content.  It used to be that moving data (like a 5 Tb ERP app) was torturous, requiring lots of time and expertise.  But, DataOps solutions give Data Scientists, Analysts, Developers and Testers the power to provision fresh, personalized and secure copies of such environments in minutes.  The kicker is that these copies are mobile and untethered from the Data Producer.  Moving my 5 Tb ERP from Amazon to Azure can be accomplished in 10 minutes.  In fact, such solutions make it simple both to cross and move the cloud boundary.  That’s powerful.
  • Data Encapsulation amps up our velocity
    • We’re realizing in the data community what developers knew all along: just like encapsulation unentangled code and made it far easier to scale, encapsulating data and the controls we need for it is accomplishing massive scale for Data Consumers.  By setting embedded data controls at “dataset creation time”, Data Operators (who want to make sure secure data never gets out) can control access, portability, masking, and a whole host of other available controls that persist with the dataset.  This untethers those Data Operators from those Data Consumers.  With security in place and persistent, Data Consumers use the data where they want, move it where they want (within the rules), and never have to go back for permission.  It seems simple, but the request-to-provision step of our Data Supply Chain is often the most cumbersome, slowest, and most prone to bottlenecking part of the application delivery cycle for almost everyone who builds applications.
  • Data Synchronicity is a lot less expensive
    • Many make a distinction between “physical” transformations (like converting from Unix to Linux) and “logical” transformations  (such as you might do with your ETL).  But, the dirty little secret of ETL (and of MDM for that matter) is that a huge chunk of the time spent has to do with time logic (e.g., How can I put data from sources A, B, and C in the right order when they arrive out of order?).  DataOps solutions also contain features that place the entire histories of many datasets at your fingertips.  Yes, you can ask for the content of Source A, B, and C as it looked at the same point in time (not the time you received the file).  All the effort to massage data to get it to all match up in time is simply unnecessary if you control the time pointer.  Again, it seems simple, but the reset-to-a-common-point step of our Data Supply Chain is another cumbersome, slow, and involved process that slows down our application delivery cycle.

Data Interconnectedness offers challenges we don’t understand.  What we do know is that 84% of companies fail at digital transformation.  They fail because they believe data mobility is still hard.  They fail because they still operate as though data is anchored and bounded by the vendors’ server in which it is stored, or the fear of data leakage by security controls that are loosely coupled to the data.  And, they have yet to take advantage of the simplification DataOps solutions can bring to complex, composite applications.  The old adage is still true, When you don’t know what to manage, you manage what you know.

New Destinations for your Data Journey

For CIOs just learning about DataOps, there are clear benefits for their journey to digital:

  • DataOps solutions give you the power to commoditize cloud providers, and make the cloud boundary fluid.
    • Since your dataset is mobile and secure and decoupled, there’s no reason you can’t move it seamlessly and quickly from Amazon to Azure in minutes.  Moreover, you can decide to move a dataset from your prem up to the cloud or from the cloud back to prem in minutes.  Switching costs have fallen dramatically, and cloud vendor lock-in can be a thing of the past.
  • DataOps solutions kill the friction between Data Producers and Data Consumers making App Development and tasks like Cloud Migration much faster.
    • The security and process bottlenecks your Developers, Testers, Analysts and Data Scientists experience accessing the data they need will diminish dramatically.  Setting masking and access controls at creation time keeps Data Consumers in a safe space.  Giving data consumers direct control over all of the usual operations they want to do (rollback, refresh, bookmark, etc.) squelches down all those requests to your infrastructure team to near zero.  Applications move forward at the speed of developers and testers, not the speed of your control process.  Longitudinal studies show this can result in a 30-50% increase in application delivery velocity.
  • DataOps also amp up the Speed and Velocity of composite applications.
    • A lot of times, it doesn’t matter how fast you can deliver one app; it’s how fast you can deliver them all.  By giving you time-synchronized access to not just one but many datasets, all sorts of problems disappear.  You can create an end-to-end test environment for your 40 applications and it can be up in hours not months.  You can roll the whole thing back.  You can have all the fresh data you need to feed your ETL or your MDM or your data lake on command.   Data Virtualization makes those datasets not only fast and mobile, it makes them cheap too.

DataOps is disrupting our assumptions and our approached to Data Transformation.  And, it’s the right concept to help those folks in the “Data Transformation Group” cross the digital divide.

DataOps is the alignment of people, process, and technology to enable the rapid, automated, and secure management of data. Its goal is to improve outcomes by bringing together those that need data with those that provide it, eliminating friction throughout the data lifecycle.

A Declaration of Data Independence

Your business must dissolve the barriers that continue to lock-in your data, and arrive at a data-driven interconnectedness that impels hyper-awareness, well-informed decision-making, and rapid execution.  A respect for the difficulties of Digital Transformation demands Data Operators and Consumers declare the causes behind such disentanglement.

Data usage should be friction-free, imbuing Data Consumers with the power to see and access authorized data in all its versions without regard to location, cloud platform, or storage vendor.

To secure these capabilities, solutions are implemented by businesses to deliver data to its Consumers under the care and consent of the Operators who govern it.  And whenever any platform, vendor or process becomes destructive of these ends, it is the right of Data Producers to avoid such obstacles, and to institute new methods to converge Data and Operations, laying their foundation on such principles as shall seem most likely to effect their Data Access, Mobility, and Security.  Prudence, indeed, has dictated that solutions long-established should not be changed for light and transient causes; and accordingly experience has shown that many companies and projects and IT shops are inclined to suffer such pain rather than to right themselves by abolishing such obstacles.  But, when a long train of exorbitant switching costs, project delays, quality failures, security breaches and data transport costs foment such Friction as to hold data ever-more captive, it is the right indeed it is the duty of disruptive companies to throw off such barriers and to provide capabilities to truly safeguard our future Data Liberty.  Such has been the patient forbearance of many companies; and such is now the necessity which constrains them to disrupt that Friction.  Our present system of virtualizing, securing, and managing data presents a history of repeated injuries to our simplest and most vital goals: growth, cost reduction, risk containment, and speed to market.  And it is data Friction that has established itself in Tyranny over our data.  To wit:

Friction routinely prevents access to data, despite data’s vast contribution to the health of the business, because of the fear of loss of control or exposure.

Friction forbids Data Operators from passing on data of immediate and pressing importance, requiring Data Consumers to return for the Assent of Data Operators which Assent is then also gated by ticketing systems that Friction permits to utterly neglect to attend to those same Data Consumers.

Friction impedes Data Operators from their desire to accommodate large datasets, unless those Data Consumers relinquish the right to receive data in a timely fashion, a right of enormous value to them.

Data Consumers often need their data in places unusual, uncomfortable, and distant whence the data was produced, and the Friction of delivering authorized, fresh data to such places fatigues those Consumers into accepting stale data and quality lapses.

Friction confounds the desire of Data Operators to deliver by opposing Data Consumer’s needs with the limitations of Systems which leave Operators under-equipped and constrained, thus allowing Friction to trample on the needs of those Consumers.

And Operators find that after a long time, the mounting menace of data breach causes such Friction that it engenders ever-tighter access and deployment controls instead of permitting authorized Consumers at large to deploy at will within a well-defined, and personal governance framework; thus, our speed to market is constantly under danger of project delay within, and data breach without.

Our data population continues to rise among all our systems, and the size of our datasets continues to obstruct our ability to harmonize change in copies near and far. Thus, more and more Operators must refuse to create new copies or to pass along changes in a timely fashion or engage in migrations – as the value of data is judged less than the cost of the infrastructure and resources to deliver it.

At every stage of Data Oppression, we have sought Redress in the most humble terms: Our repeated workarounds and improvements have been answered only by more limited access, greater immobility, and a governance regime that stifles speed.  Data Friction constrains the value of Data by these various acts of Data Tyranny, and solutions that perpetuate it are truly unfit to guide the lifecycle of data that has been liberated.

We have not been wanting in our attention to our Data Systems.  We have replaced them from time to time with incremental solutions to extend by some small measure their scalability and performance.  We have tried to address the data explosion with emigration to private, public and hybrid clouds. We have appealed to visionaries to find some way to virtualize the last great frontier in IT – our data. And, we have conjured solutions to tie together sprawling data that is in constant flux, and inevitably subject to the limits of bandwidth and the shipment of change.  We must, therefore, acquiesce in the necessity, and hold Data Friction our sworn enemy in the war to win markets and move data.

We, therefore, the proponents of the DataOps movement, do in the name and by the authority of the Data held hostage by Friction publish and declare that our data is and of right ought to be free.

What’s wrong with Test Data Management? Friction. And lots of it.

Market share, profitability, even business survival can be a function of feature deployment frequency.  And, your competitors are speeding up.  The best companies are deploying 30x faster, delivery times are dropping as much as 83%, and unicorns like Amazon are now deploying new software every second.  But, with data expected to grow to 44 Zeta bytes by 2020, all of our work to reduce coding friction and speed up application delivery will be for naught if we can’t reduce the friction in getting the right data to test it.

Companies face constant tension with test data:

  • Feature delivery can take a hard shift right as errors pile up from stale data or as rework enters because new data breaks the test suite.  Why is the data out of date? Most companies fail to provision multi-Tb test datasets in anywhere near the timeframes in which they can build their code. For example, 30% of companies take more than a day and 10% more than a week to provision new databases.
  • To solve the pain of provisioning large test datasets, test leaders often turn to subsetting to save storage and improve execution speed. Unfortunately, poorly crafted subsets are rife with mismatches because they fail to maintain referential integrity. And, they often result in hard-to-diagnose performance errors that crop up much later in the release cycle.  Solving these subset integrity issues often comes at the cost of employing many experts to write (seemingly endless) rulesets to avoid integrity problems that foul-up testing.  Unfortunately, it’s rare to find any mitigation for the performance bugs that subsetting will miss.
  • It’s worse with federated applications.  Testers are often at the mercy of an application owner or a backup schedule or a resource constraint that forces them to gather their copy of the dataset at different times.  These time differences create consistency problems the tester has to solve because without strict consistency, the distributed referential integrity problems can suddenly scale up factorially.  This leads to solutions with even more complex rulesets and time logic.  Compounding Federation with Subsetting can mean a whole new world of hurt as subset rules must be made consistent across the federated app.
  • Synthetic data can be essential for generating test data that doesn’t exist anywhere else.  But, when synthetic data is used as a band aid to make a subset “complete”, we re-introduce the drawbacks of subsets.  To reach completeness, the synthetic data may need to cover the gap where production data doesn’t exist, as well as determine integrity across both generated and subset data.  Marrying synthetic data and subsets can introduce new and unnecessary complexity.
  • Protecting your data introduces more speed issues.  Those that mask test data typically can’t deliver masked data fast or often enough to developers, so they are forced into a tradeoff between risk and speedand exposure usually trumps speed when that decision is made.  As a Gartner analyst quipped: 80% of the problem in masking is the distribution of masked data.  Moreover, masking has its own rules that generally differ from subsetting rules.
  • Environment availability also prevents data from getting the right data to the right place just in time.  Many testers use a limited number of environments, forcing platforms to be overloaded with streams such that the resultant sharing and serialization force delay, rework and throwaway work to happen.  Some testers wait until an environment is ready.  Others write new test cases rather than wait, and still others write test cases they know will be thrown away.
  • Compounding this problem, platforms that could be re-purposed as test-ready environments are fenced in by context-switching costs.  Testers know the high price of a context switch, and the real possibility that switching back will fail, so they simply hold their environment for “testing” rather than risk it.  Behaviors driven by the cost of context-switching create increased serialization, more subsetting, and (ironically), by “optimizing” their part of the product/feature delivery pipeline, testers end up contributing to one of the bottlenecks that prevent that pipeline from moving faster globally.
  • Reproducing defects can also slow down deployment.  Consider that quite often developers complain that they can’t reproduce the defect that a tester has found.  This often leads to a full halt in the testing critical path as the tester must “hold” her environment to let the developer examine it.  In some cases, whole datasets are held hostage while triage occurs.
  • These problems are all subsumed into a tester’s most basic need: to restart and repeat her test using the right data.  Consider, then, that repeating the work to restore an app (or worse a federated app), synchronize it, subset it, mask it, and distribute it scales up the entire testing burden in proportion to the number of test runs.  That’s manageable within a single app, but can quickly grow unwieldy at the scale of a federated app.

Blazing fast code deployment doesn’t solve the test data bottleneck.  Provision speed, data freshness, data completeness, data synchronicity and consistency within and among datasets, distribution speed, resource availability, reproducibility, and repeatability all contribute to the longer deployment frequency.  Why is all this happening? Your test data is Not Agile.

How do you get to Agile Test Data Management?  One word: Delphix.

Data, Delphix, and the Internet of Things

We have many devices, our devices are doing more, and more things are becoming “devices”.  This is the essence of the Internet of Things (IoT), which Gartner calls the “network of physical objects that contain embedded technology to communicate and sense or interact with their internal states of external environment.”

The IDC Digital Universe study concluded that the digital world would reach 44 Zettabytes by 2020, with 10% of that coming form IOT.  That would mean that IOT data will account for as much data in 2020 as all the data that was in the world in 2013.   All of that data needs to be aggregated, synchronized, analyzed, secured, and delivered to systems inside of companies, and that’s where the opportunity comes in.

Growth in the Internet of Things, NCTA
Growth in the Internet of Things, NCTA

That same IDC study showed 5 ways that IOT will create new opportunities:

Source: EMC/IDC
Source: EMC/IDC

How does Delphix help companies with these 5 new opportunities?

New Business Models

  • It’s not how fast you get the data, it’s how fast you can respond to the data.  Delphix can deliver IOT data from a point of arrival to an analytic system, or drive much faster feature delivery for a feature you know you need based on the data.

Real Time Information on Mission Critical Systems

  • Data is one of the most difficult workloads to move around, but Delphix makes moving those workloads easy.  At a few clicks of a button, fresh data can push to your Analytics solution.  Or, you could easily cross the boundary and push your data up into the cloud for a while, then bring it back once the workload was finished.

Diversification of Revenue Streams

  • Monetizing services means delivering features faster and delivering applications faster. Data is the bottleneck in most of these kinds of deliveries, and Delphix has proven time and again that it can speed up application delivery time cycles by 30-50%.

Global Visibility

  • Powerful virtualization software can make far-flung datasets seem right next door, cranking up the speed of insight.

Efficient, Intelligent Operations

  • Delphix is expert at using virtual data to solve the data movement problem, giving your data the kind of agility that can let you actually achieve on-the-fly decision making.

At Delphix, we’re already talking to some of the key large scale entrants into the IOT space.  Take Power & Electric companies for example.  The applications and opportunities for Delphix around key utilities initiatives like Smart Metering are many-fold.  Specific states, regulatory bodies, and power grids may utilize Delphix to create or mobilize aggregate datasets to support new features in service delivery, service provision, or analytics.  Or, customers may start connecting smart appliances into a home grid that allows them to tune up or down their electricity usage, and that data may flow up to electric companies that may need to do standard feature development for new service offerings.  The list goes on and on.

IOT Data poses the same problems that companies have today with data mobility/agility and application delivery – problems that Delphix already solves.  From a new frontier perspective, IOT Data’s new opportunity spaces demand faster decision time, faster feature delivery time, and powerful ways to get data analytics done faster.  Delphix is uniquely poised to help with all of those problems as well.

Data Gravity.

Data has gravity.  And just like gravity can warp the space-time continuum, data can warp your project timeline.

The analogy of data in the information realm being like “mass” in the physical realm seems pretty clear.   But, gravity has other effects like bending light and dilating time.  Just like the gravity around a very massive object (like a star) can bend light, the size of your data can bend the path of data activities that get near that “mass” of data – data management activities like backup, copy, restore, and refresh as well as data mediation activities like ETL, Data Masking, and MDM.  And, just like “slow moving” masses seem to spend comparatively more clock time, your “slow moving” data will eat up more clock time compared to your nimbler competitors.

Data Gravity and the 4 key resources

Storage: A big dataset needs big datafiles.   A big datastore need big backups.  Every copy you make for someone to use needs the same. And there can be a lot of copies and backups.  If the average app has 8 to 10 copies of production data, and uses a 4 week backup cycle, you could easily be looking at 8 copies live on disk + 9 * 4 full backups (and a bunch of incremental data).  That big dataset really has a data mass that’s 44 times its size.  That’s a lot of Data Gravity – even if you’re looking at a very average 5 Tb dataset. (About 220 Tb!)

Network: For your dataset alone, you’ve probably got at least 4 major traffic patterns: (1) Transactions to the dataset itself, (2) Backups of the dataset, (3) provisions and refreshes of copies of your dataset, (4) replicating your dataset to another live copy, and (5) migrating a dataset from one location to another (like moving to the cloud).  If the average app changes 5% a day, gets backed up full weekly/incremental daily, and you refresh your downstream environments 1/week (not even considering provisions), and you replicate production to at least one site, you could easily be looking at moving a data mass 10x the size of your dataset every week – again without even considering provisions or migrations or backups of non-production data.  (Transactions [5Tb*5%*7d] + Backup [5Tb + 6d*5%*5Tb] + Refresh [5Tb*8copies*1/week ] + Replication [5Tb*5%*7 days])

Memory: Let’s say it’s not unreasonable for the memory set aside to service a dataset to be 25% of its size.  For your dataset, you’ve probably got memory allocated to the service each copy of your data (e.g., in Oracle).  There’s also a bunch of memory for processes that do all those storage transactions we talked about in Storage.  So, without breaking a sweat, we could say that for the 8 copies, we’re using 25% * 8, or memory equivalent to a data mass 2x the size of your dataset (and we ignored all the memory for copy, provision, backup, refresh…) (That would be 2x 5 Tb or 10Tb of Memory)

CPU: It takes a lot of CPU horsepower to handle (1) transactions, (2) Backups, (3) provisions/refreshes, (4) replication, and (5) migration.  If we just use the number from our “network” example, and assume that you can sustain about 800 MB/second for 8 Cores, that would yield about 50 Tb/800 MB/sec ~65,500 seconds or ~18 CPU-Hours at full tilt.   Using our previous math, if we estimate the CPU “load” of the live dataset to be around 0.64 CPU-Hours (5% Change * 7 Days * 5 Tb * 800 MB/sec), we’re using CPU equivalent to ~28x the need of our production data mass.

What would happen if we could change the laws of gravity?

The data visualization magic of Delphix let’s your data look the same in two places at once without actually making two full copies.  What does Delphix do?  It combines key technologies that allow block-level sharing, timeflow/shared data history, shared compressed memory cache,  and virtual provisioning.  Delphix can help you change the laws of data gravity.  Let’s take a look at how these key Delphix technologies would change your consumption of our 4 key resources:

Storage: Delphix shares all of the blocks that are the same.  So, when you make those 8 copies of production data, you start near 0 bytes because it hasn’t changed. When you make your “full” backup, if often hasn’t changed much since the last “full” backup, so it’s also much smaller.  And, since your copies also share the same blocks, all of their files and backups get to share this way as well.  In addition, Delphix compresses everything in a way that doesn’t hurt your access speed. Further, the more often you refresh, the smaller each copy gets.  To keep the same mass of data as in our example above (Assuming compressed data is 35% the size of original data), Delphix would need on the order of 0.35*((1+7*0.05)+(8*7*0.05)) = 0.35*(4.15) = 1.45x the original size.  From 44x down to 1.45x.  That’s at least an order of magnitude no matter how you slice it.

Network: Delphix can’t change the mass of your transactions to the primary dataset.  But, the sharing technology means that you get a full backup for the price of an incremental forever after the first pass.  Since provisions are virtual, the network traffic to create the provisioned copy is almost nil (e.g., if you provision off a validated sync on an Oracle database) – network traffic is a function of the change since the last baseline because that’s the major part of the data that Delphix has to send to get your virtual dataset up and running.  Replication works on similar principles.  Delphix only ships change data after the first pass.  Migration is even more powerful.  If you’re moving a virtual dataset from one existing Delphix host to another, it’s a few clicks more than a simple shutdown and startup.  That’s powerful.   To transmit the same mass of data as in our example above (even ignoring compression), Delphix would need on the order of  (Transactions [5% * 7days] + Backup [ZERO(although 7*5% occurs – it’s already included in transactions!)] + Refresh [8 copies * 5% * 1/week ] + Replication [5% * 7 days]) = 1.1x the original size.  From 10x down to 1.1x.  That’s about an order of magnitude before we consider compression.

Memory: Delphix uses massively shared compressed cache.  (To be clear, Delphix does NOT change the memory on the host that’s running your database; But, it CAN change how much memory you need to run it).  Memory makes data transactions fast (fetching from memory can easily be 10x faster than disk).  Today, we size memory for copies in much the same way as for the production.  The big assumption there is that there will be a cost to fetch from disk if we don’t.  But, what if “going to disk” was a lot cheaper a lot more of the time?  The Delphix cache changes those economics.

Our previous required 10 Tb of total memory for 8 copies.  If we assume traffic on one copy is shaped like traffic on other copies, then we could infer an upper boundary of unique data in all those caches at around 5 TB*(1 + 8*5% – an for blocks unique to each copy) or about 7 Tb of unique data. If, like the previous example, we peg 25% of the 7Tb as our memory need, that would mean a combined Delphix cache could services the same shape of traffic with just 25% * 7Tb = 1.75 Tb.  Does that mean you can shrink the memory footprint on the actual hosts and still service the traffic in about the same time? That is exactly what several of our large and small customers do   Let’s suppose that you can shrink each of the 8 copy database’s memory allocation down to 5% from the original peg of 25%.  Apples to apples, the 1.75 Tb of Delphix memory plus the 5% minimum on the 8 copies shrinks total memory to service the same traffic down to 3.75 Tb in our example.  From 10TB down to 3.75 Tb; From 2x the size of the dataset down to 0.75x; That’s less than half.

Of course, for all you solution architects and performance engineers – here’s the disclaimer:

  • this is an entirely hypothetical exercise with plenty of loopholes because traffic shapes are notoriously difficult to pin down (and based on all sorts of variables)
  • There’s no way anyone can guarantee the memory reduction.

But, what we CAN guarantee is that customers are doing exactly the sorts of changes described above to achieve the kinds of results predicted in this example.

CPU: Compression, Sharing, and virtual provisioning have a dramatic effect on the CPU cycles we need.  If we just follow the math from our previous examples, the cost of backup is already included in what Delphix does(but we’ll use 1% to be totally safe).  The cost of refresh with our validated sync is almost zero (but we’ll use 1% to be totally safe). That means to accomplish the same work, the CPU cycles you’d need with Delphix will be around 1.45 CPU-Hours – or less than 8% as much time.

The stuff of Science Fiction

Delphix gives you power over your data in ways that are simply hard to believe, and would have been impossible just a few years ago. Delphix is data anti-gravity, giving you the power to accomplish the same work with 1/10 the storage.  Delphix is like faster than light travel, letting you accomplish the same workload over your network in about 1/10 the bits transmitted.  Delphix shrinks the horizon of your information black hole, letting you accomplish the same workload with 1/2 the memory.  And finally, Delphix is like a quantum computer, letting you solve the same problem with 1/10th the cycles.

 

 

 

Profile, Pedigree, and Paradigm

What’s the power of Profile?

What’s the key to accelerating Data Masking adoption and usage?

Data Masking is nothing new.  In a broad sense, Masking is just a data transformation.  And like other data transformations, you’ve got the cost transporting the data and transforming the data: Plumbing and the Logic.  From the plumbing side, the challenge isn’t masking it once, but keeping consistency among a set of masking datasets as you repeat that masking as well as getting those masked copies re-distributed in a way that doesn’t cause huge disruption or take forever.   From the logic side, it turns into a whole project of its own.  You could spend as much time building your masking rules as you do your ETL rules.  That’s not a cheap investment and since regulations and data change, it’s also going to be an open-ended investment.  This creates a counter-pressure situation for CIOs.  The CISO and the auditors push masking onto them, so they bake something. But, then the application teams complain constantly about how much time it takes to get the masked data, so the “masking” occurs once a quarter.  Now, you’re secure.  But, you’re paying the price of working on old data.  But, even though that is a huge price, it’s one that can easily get shoved under the rug because it’s the testing team and the operations team that pay the price when bugs that could have been caught earlier are shifted right, and Ops gets the joy of spending some time developing in the field.  So, what’s the answer?  We need to automate.  We need to Profile that data – to automate the logic not only of deciding which fields to mask, but of deciding how to mask them for the 95% of cases which, honestly, are common to all companies.  And, when you combine that with the automated refresh and distribution of masked data using a Data as a Service solution, you can build that logic much faster, and you can deliver it to those app teams at the speed they need.  Your time and cost investment can be as much as 20 times less. That’s the power of Profiling.

What’s the power of Pedigree?
What’s the power of shared data history to save CIOs time and money?
We clone our datasets. A lot. And, a lot of technical folks see that as a break point – and they start managing that clone as a new entity.  They treat it that way, they manage it that way, they back it up that way, they think about it that way.  So much so, that even if they update the clone, that clone is also treated like a new entity.  And, that way of thinking made sense before Data as a Service.  But, it doesn’t make sense anymore.
The whole point of Data as a Service is not to just to manage data that’s copied at some point in time, but to take advantage of the fact that large numbers of datasets have a shared history.  They may branch off at different points, but they almost always share a significant footprint of data.   Though you might mask your Prod data to turn it into Dev, its still mostly Prod data.  You don’t mask all of it.  When you create bug fix environments, its often exactly the same data just from a snapshot a few hours ago.  My point is, I could easily build a pedigree of how inter-related each of these datasets are because even if they’ve been through massive change, they still share a lot of the same data.  Data as a Service takes advantage of that inter-relatedness and lets you manage, backup, and think about the entire Pedigree of clones as a single entity.  You could backup each system the old fashioned way, but you’d be backing up what amounts to 80% redundant blocks – just wasted space.  You could continue to refresh a clone the old fashioned “efficient” way by constantly provisioning a new snapshot for the clone to fork off of.  Or, you could just push a button and update the pointers to the main gold copy and stay out of managing snapshots for clones.
I’ve talked to many shops, and I think the hardest thing for people to get their head around is that when copies of datasets break away from their parent it makes sense to think about the cost of rebuilding something in terms of all of the tear down, re-provision, and re-preprataion of data.  But, when you maintain a link between parents and copies through a shared history, the cost is measured in terms of deviation from a shared point in history.  To make it a little clearer, if I asked an old school guy to refresh my 10 Tb data warehouse from Production for me every morning, I might get back a project plan and a $100,000 annual service fee for the 2 warm bodies who will stay up each night to make that happen.  With Data as a Service, I go in and click a checkbox that says refresh for me please and voila each morning the 10 Tb data warehouse is refreshed and nobody was involved.  Why? Because, all I am doing is moving the marker on that warehouse that says how much shared history do these two datasets have.  Apply that same logic to every related copy in an application, and now you’re taking about crazy savings on Dev/Test/QA copies, copies for Backup and D/R, copies for bug fix, copies for your MDM and ETL – all of which have some shared history and all of which live in a universe where you want to change the marker on how much shared history they have.  There’s enormous savings there.  And, that’s the power of Pedigree.
How can we break the DevOps Logjam?

Building and operating resilient systems at scale in an environment of rapid change presumes that we can deploy environments and move data as fast as those development folks need it.  But, the reality is that much of the time this doesn’t happen or comes at an enormous cost.  I was on the phone with one VP who told me that it took him longer to prepare the environment for the sprint than it took to execute the sprint.  Why is that?  We’ve built a process and control infrastructure around the deployments of data and environments based on some specific assumptions, namely:

  • We’ve got to hold onto our environments because the time from request to deploy is so long.
  • We need huge and complex control structures around data because the cost and time to correct an error is so high.
  • The fresher the data is, the more expensive things are for everyone – the extract/restore/configure cycle is expensive and the more we do it, the more expensive it gets.
To break the logjam, we’ve got to unravel each of those assumptions.  And, Data as a Service does just that.
  • DaaS allows you to deploy and re-deploy on demand, shifting your data in time (refresh, rewind), in space (from one target host to another), and to put it on or take it off the shelf (bookmarking).  If you can unease and redeploy your environment in 5 minutes, time to deploy is no longer an issue.
  • DaaS is like a DVR for data.  There’s an automatic record keeper keeping track of your data as it changes.  Great, you say, I already have that. Ok. But, that same record keeper can keep track of days and weeks of changes in a footprint smaller than the size of 1 backup.  And, being able to redeploy in minutes means that data is available basically at a moment’s notice.  The cost of making an error falls dramatically. And, if you include the power of the end user to get their own data back via self service – that means not only can the control process go away, most of the controllers no longer need to be involved.  For one of my customers, that meant that a 34 step 12 week process to deploy turned into a 20 minute 1 click process.
  • The whole point of DaaS is that fresher data is cheaper.  With DaaS, the infrastructure cost is related not to the time it takes to deploy but rather to the deviation from a baseline.  So, when you ask for fresher data, you’re basically asking to go to the freshest baseline and thereby getting the least deviation.  That’s great.  But, the real money comes from the fact that when you work on fresh data you aren’t chasing down all sorts of errors.  You aren’t missing corner cases because your data is old.  You’re not missing integration problems because you’re working on something thats 3 weeks old.  To a company, the impact of DaaS on the top line – the time it takes to deliver a project – is 30%.  30% less has real impact to your Time to Market – especially if Amazon is eating your lunch.
Virtualization is a different Paradigm.  It inverts common wisdom.  You don’t need to hoard environments when you can deploy them in 5 minutes.  You don’t need 34 approvals when the cost of fixing an error is 3 clicks and a cup of coffee.  You don’t need to control a risk that no longer exists.  And that’s hard for a lot of IT shops to swallow.  And you have to shed the idea that fresher data is more expensive.  That’s the Paradigm shift.

Give me a Second

5281453002_bbb90760f8_o
Photo Courtesy: Sean MacEntee

Nothing endures but change

Time stood still for one second last night.  Well… What really happened is that we forced one clock to match the time shown on another clock.  There are camps on both sides arguing whether we should or shouldn’t keep those clocks in sync.  But, it’s clear that understanding the two clocks in relation to one another matters.  Although time itself is a slippery concept, our measurement of time continues to be of great importance in terms of our data.

Supposedly, Einstein said “Time is what a clock measures“.  (Again, it’s notoriously difficult to explain what “Time” is, which is why he didn’t.)  Another approximation of Time is that it is a measure of change — of motion against a background.  Large DataSets (like databases) often have a built-in “timekeeper” that signifies individual changes.  And, some File systems utilize an “uberblock” to track individual changes to a file system.  Since these counters or trackers uniquely identify change, they can be thought of like a “Clock”.  And since they can be ordered, we can put each “tick” of the Clock onto a Timeline.

In the world of Oracle databases, that counter is the System Change Number (SCN), which uniquely identifies and orders change within a single DataSet (SCN#2 must come after SCN#1).  The SCN tracks change like a “Clock”. By distinguishing and ordering these “ticks” of the clock, you can build a timeline for that distinct Oracle database.  From that distinct database’s perspective, time moves forward when the SCN “Clock” ticks;  Without a tick (i.e., an SCN change), there’s no new data and time stands still from the perspective of the DataSet.  That analogy applies to some file systems.  If the uberblock tracks change, then it must change for the clock to “tick” from the perspective of that file system.

Time in a bottle

For this discussion, let’s define a DataSet to be either a Database or a collection of files in a filesystem for which we can build a timeline.  If we collect, store, and tag the set of data changes associated with each distinct “tick” of the “Clock” for any DataSet, we have the real and powerful capability to return to the past (since we can restore the exact history), or to create copies that can have alternate futures.  (Isn’t a DataSet “restore” just a return to the past on a DataSet’s timeline?  Isn’t a clone just a copy with an alternate future?)

If you’ve used Delphix in the past, you know that Delphix acts like a master timekeeper both for datasets with some shared history (clones of clones for example) as well as datasets with NO shared history (for example, 3 different databases from different vendors that are each distinct but are all each part of one unified application).

Time Flowing like a River

Mitchell River delta
Photo Courtesy: Feral Arts

Futurist and physicist Michio Kaku said Einstein viewed time like a river: it can speed up, it can slow down, it can even fork into two new rivers.  That’s apt for our data as well.  First, unlike Wall Clocks, DataSet Clocks “tick” faster during busy hours, and may be slower during others.  [Einstein might even call it time dilation for data.]   Second, from the perspective of data, our “awareness” of time actually turns out to be wildly different than wall clock time.  We often “update” the dataset’s clock in big chunks (e.g., when we apply the “deltas” since yesterday) and not in the discrete”ticks” that our clock gives us.  If we process one update at a different time than another, time moves forward much differently on different datasets.  As a result, it can be really hard to reliably find the last “concurrent moment” among a group of datasets so that we can restore to the same point in time across them all.  In fact, a lot of people spend a lot of time massaging data because of that very fact.  They are forced to get whatever dataset they can, and then they spend their time massaging it so that its synchronized with the other datasets within a larger group.

Though you may be getting data in trickles or floods from differing sources at differing times, the ability to rebuild the exact timeline and coordinate it to a common clock turns out to be exceedingly powerful for getting that “single point in time view” of multiple datasets to happen.   The power of being a good timekeeper as well as being able to recreate a point in time on command is that you have mastery over time – even when the clock changes in huge chunks instead of just tick by tick.  With Delphix, you can easily find that last “concurrent moment” – even among large groups of related datasets.  The time machine fantasy that HG Wells envisioned for you and me is REALITY for your data with Delphix.

Stepping into the same river twice

But that’s the tip of the iceberg.  The value of being a good timekeeper and being able to match up the clocks is pretty significant.   If storing DataSets were free, and reproducing a DataSet could be done instantaneously, wouldn’t everybody store every iteration of the DataSet Timeline?  Of course.  Because that would make development and testing and operations and bug fix run a lot closer to light speed.  But, storage isn’t free and time is money (Thanks, Ben).

Often, solutions to these problems force a tradeoff.  You can go back in time on your dataset, but only once.  (Once they go upriver to the past, their time machine raft is busted.  That’s like reading HG Wells and ending the book after chapter 3.) Or, you can go back in time or fork your dataset at the cost of keeping another copy of your dataset.  (They want you to dig out a new river instead of steering down a new branch.)

I contend that the best solutions manage the river delta of DataSet Timelines by figuring out how to share them via thin cloning.  And further, that Delphix is head and shoulders ahead of anyone else trying to do that.  With Delphix thin cloning, we can actually step into the same river twice.  We can reproduce a DataSet exactly as it was then.  Moreover, we can reproduce it really fast.  So what?  Well, it turns out that the more efficient your DataSet “time machine” is, the faster you can deliver an application that depends on that data.   Why?

Before thin cloning, we made a lot of wall-clock time trade-offs to help us mitigate the cost of storage or the time to restore.  For example, instead of working on fresh data we’d work on old data because it took too long to get the right data back.  Instead of reusing our old test cases, we’d just build new ones because it was easier to build new ones than to go back and try to fix all the stuff that got corrupted during our test.  Instead of having our own environment, we’d have to share with lots of other folks.  That wasn’t so bad until we had to wait for this team, then wait for that team.  And, all the while, those little errors that continually crop up because my code isn’t compatible with your code get bigger and bigger.  The culmination of all those little errors we let slip by is what DevOps calls technical debt.

You can’t be successful in a DevOps/Agile paradigm if you’re operating this way. DevOps needs faster data.  With Delphix, we can step into the same river twice, and we can do it at powerboat speed.

Time to Reap

Humans are Wall Clock bigots.  We’ve lived with the rules of time we know so long, that we often miss out on the quirks of living in a relativistic world.  Being able to relate one clock to another and reproduce a point in time not just for a single dataset but for a whole group of datasets lets you freeze time, fork new timelines, and move back and forth between different timelines at will.  Being able to accept changes to the timeline in floods but reproduce them in trickles means large synchronization efforts become push-button instead of becoming projects.

Delphix give you access to each timelines, and you can synchronize them all at will.    By using shared storage for shared history, you can radically speed up your app provisioning  – even at scale and even when you have LOTS of different datasets that make up your app.  Through massive block sharing and EASY timeline sharing, a tester can find a bug in the data, “save” the timeline for someone else, and then go back in time to a previous point on their timeline and keep working without having to wait for someone to “come and see my screen.”.  The Finders don’t have to wait for the Fixers anymore.  That means that the cost to get capability out the door goes down A LOT.  The ROI there isn’t single digit.  It’s more like 50% faster.  That ROI will, quite simply, blow your socks off.

Ask yourself – who DOESN’T want a time machine?

 

Apologies for all the musical throwbacks.  My musical Wall Clock is wrong.