The age of Labor Arbitrage is over; Now it’s Data Arbitrage

Outsourcing was an easy win to reduce IT Costs. And executives loved it.  They loved it so much that giants like Accenture now rely on outsourcing for almost half their revenue (in March, Accenture posted $3.91 B in outsourcing net revenues, which was 47% of their overall).  Even Gartner’s Hype Cycle shows that outsourcing is reaching its maturity.

Former Infosys CEO Vihal Sikka said, “We will not survive if we remain in the constricted space of doing as we are told, depending solely on cost arbitrage.”  That is, there’s significantly less money in future “labor arbitrage”, and if companies – especially Global System Integrators – want to continue the kind of 16% per annum growth curves they’ve seen, they have to turn to a different types of arbitrage.  Loosely stated, arbitrage is just the buying and selling of assets in different markets or forms to take advantage of price differences.  Labor was cheap in India, and expensive in America. So, voila, outsourcing made sense.

A new kind of arbitrage is emerging: Data Arbitrage.  What exactly do I mean by data arbitrage?  I mean that today we pay a hefty price to get our data where we want it when we want it; and there’s a significant price difference for delivering that same data using a DataOps solution.  Same asset.  Radically different price.  Huge opportunity to leverage the difference.

Specifically, what’s the data arbitrage opportunity for application testing?  Consider 5 arbitrage opportunities:

Data is impersonal.  The data most testers use is either shared among many people (because cost forces them to use too few environments) or made personal at enormous cost (because making 100 copies of data for 100 testers isn’t free).

Arbitrage opportunity: If we can give every tester their own environment but do so at an extremely low cost, we get the benefit of de-coupling different testing pathways without the cost of proliferating hardware and storage to support those pathways.  Through data virtualization, DataOps tools can accomplish exactly that.

Data is insecure.  The dirty secret of many IT shops is that they pay lip service to masking – either they use crude homegrown solutions rife with security holes, or they find ways to “exempt” themselves through exceptions.  Further, those that do mask well usually don’t mask often because of the delay it imposes on getting data and the enormous expense of keeping masked copies around.

Arbitrage opportunity: If we can consistently mask every non-production environment before handoff, we significantly reduce risk.  If we can mask continuously, we can mask often.  And, if we can provide those masked environments without the cost of proliferating hardware and storage, we get the much lower risk data that doesn’t impede our application delivery pipeline.  Through integrated masking and through features that support distributed referential integrity (making sure that a name or a number is masked consistently across heterogeneous data sources), DataOps tools can accomplish exactly that.

Data is tethered.  It’s not that we never move it. It’s that once we put data on a host and in a database server, it’s hard to disentangle it from the host and server we put it in.  And since it is hard, we make more barriers to movement to mitigate the risk.

Arbitrage opportunity: If we can radically lower the cost of data mobility in time (moving the time pointer on a dataset) and in space (moving the dataset from one host/server to another), we unlock a productivity avalanche.  A tester can promote a 5 Tb applications from one environment to another in minutes not days.  If mobility is easy, and copies are almost free, then a tester can share a bug with a developer almost instantly and then continue working in another pathway without fear of data loss.  The arbitrage opportunity for driving down the cost of context switch and the cost of error to near zero can’t be understated.  Breaking all of the key dependencies in the Testers workflow has enormous speed and quality consequences because no one depends on anyone else anymore.   And that dramatically lower total cost of Concept-to-Realization for any feature of an application.  DataOps tools make near zero-cost near zero-time context switches a reality, and that means the enormous cost of error we experience and the controls we have around can both be driven down.

Data is heavyweight.  Data size will get bigger.  And the bigger it is, the harder it is to move and the higher the cost of error when something goes wrong.  Thus, the enormous timelines and contingency plans for anything that looks like migration from one platform to another or one place to another.

Arbitrage opportunity: If we can not only reduce the cost of data mobility, but the cost of data provision as well, then we apply the benefits of data mobility to data provision as well.  Not only can we move datasets in time and space with radical ease, we can create net new datasets in the same timeframes.  That’s data elasticity on demand. DataOps tools give you that on-demand elasticity.  It’s like VMWare for data.  Spin it up here; spin it down there; Repeat.

Data is passive.  While, yes, we do update our data every day – most of that big blob of data we manage is static.  It doesn’t change much, and it doesn’t do much.

Arbitrage opportunity: If we can turn all of the “dead” data we have lying around in copy after copy into live, shared, active data we get the benefit that our data is being used to its maximum value, which reduces our storage cost for sure but also makes it dead simple to move groups of related data together very rapidly (such as we might do to get to the cloud, or migrate data centers, etc.).  With DataOps tools, moving a single database or moving a family of 100 related databases is within a few %ge points of being the same operation in terms of cost and time.  Our concern is a lot more focused on how related the data sets are, and not how large they are, because every bit and byte is used to its maximum advantage.

How big is the arbitrage opportunity?  Business owners should pay attention. Numbers from dozens of real customers and real projects show project timeline savings in the 30 to 50% range, significantly increased testing density, a massive left shift in testing defects (including a net reduction in defects by 40% or more) on top of storage costs falling 80% or more.  Imagine getting a real SAP or Oracle E*Business Suite project or migration done in half the time.  Now imagine getting it done without any perceptible errors post-launch.  What’s that worth to your business?

Data is fast and simple with DataOps.
Data is impersonal, insecure,  expensive, tethered, heavyweight, and passive without it.
If you don’t have a DataOps tool and a strategy to help you exploit Data Arbitrage, get one.

Inclusion begins with Hospitality

A recent NY Times Op-ed reminded me how easy it is to leave my own biases unchallenged and thus misunderstood what inclusion really means.  In reflection, my own past failures to be inclusive were born of:

  • My implicit Bias – without vigilance, practice, and reliance on data – it’s easy to arrive at conclusions that satisfy us but aren’t supported by evidence.  Moreover, a preconceived notion or a foregone conclusion gives license to devalue ideas that clash with them.  Bias extends even to those who consider themselves objective (e.g. scientists), as aptly argued by Thomas Kuhn.
  • My context clouds everything – we all arrive at facts within our own context.  We can’t escape that context.  We can only be aware of it.
  • My navel-focus – Companies place a premium on profit, accomplishment, and teamwork.  But, when careers or deals are on the line, its seductively easy to let a deal or career ambition become more important than our mutual dignity.

In response to these realizations, I find my own epistemology of inclusiveness is emerging.  I know I am inclusive:

  • if I am aware of great power disparities within groups, and I make sure the voice of the less powerful person or group is heard.
  • if I insist on inviting and hearing the person that is least like me.
  • if dignity trumps other motives.

Having been on both sides of the power divide, I’ve arrived at some some simple insights on inclusive behaviors:

  1. Much like a good host always makes their guests feel welcomed and special, an inclusive environment allows every person and viewpoint to have a seat at the table and a place in the conversation.
  2. Much like a good host makes all their guests feel comfortable with one another (even if they don’t know everyone), an inclusive environment allows very different people to feel comfortable working together.
  3. Disagreement shouldn’t make us disagreeable.  A good host can be affable with friend and foe alike, and permits no prerequisite to the attribution of respect or dignity to all persons.

Even the best workplaces (and Delphix is certainly one!) have ample situations where bias, disrespect, and exclusionary behaviors occur.  I know they have happened to me; I’m sure they’ve happened to you.

For companies born as startups, disruption is the corner stone of success.  Disrupting my own thinking is an important first step in closing my own inclusion gap.  Though many more steps need to be taken, I do draw this simple lesson: Inclusion begins with hospitality.


How does the Delphix Dynamic Data Platform support Oracle vs. SQL Server?

One of our premier partners shot me a message last week to help him walk through the differences between how Delphix is implemented on Oracle vs. SQL Server.  If you are unfamiliar with the Delphix Dynamic Data Platform (DDP), this blog won’t make sense to you until you’ve read through Oracle Support and Requirements.  This blog provides an overview of those differences through the key perspectives that are of interest to technical folks implementing or explaining it.


  • Similarities
    • Access. Both Oracle and MS SQL Server need users that can access data.  Both need basic permissions to read backup data from and access the Source (usually production) host and database server.  When the Source and either the Staging or Target Hosts are in different places, there may need to be extra permissions.
  • Differences
Table of MS SQL Permissions
Component Requirements Method Source Target /Validated Sync
Environment Delphix OS User Windows Domain User
Member of Backup Operator or Local Administrators
db_datareader permission on master
Sysadmin role on SQL Server Instance
SQL Instance Should Run As Domain Users or Local service accounts
PowerShell Privileges Execution Policy Set to Unrestricted.
iSCSI service Set to start Automatic in Service.
Read permission to backup share
Delphix Connector Installed & addhostgui.cmd executed
Database Delphix SQL DB User db_datareader permission on master and msdb.
(SQL Authentication Account) db_backupoperator for user databases
Network Enable TCP/IP for JDBC Open firewall for Port 1433(default)
Shared Memory

Data Collection & Connection

  • Similarities
    • Native Backup. In general, the Delphix Dynamic Data Platform (DDP) ingests data through native backup.
    • Recovery Model. Generally, we need to develop an understanding of how often the backups run, where they live, and how we gain access to those backups so that we are able to do that ingestion.
    • Use of Database Primitives. Most databases keep a pointer (aka database primitive) to identify transactions.  Backups are often keyed to these primitives.  For example, you typically must be able to have the continuous stream of transactions associated to these primitives to maintain consistency, and if you break the chain you effectively push the reset button (In Oracle, breaking the chain forces a reset logs event, e.g.) and your next backup looks like a new database.
  • Differences
    • Backup Facility.  Oracle’s native backup facility is RMAN in its various modes (Level 0, Level 1, etc.).  In SQL Server, the Delphix DDDP relies on the customer’s own native SQL Server backups in its various Recovery Models (Simple, Full) which may include T-Logs. The Delphix DDP can use pre-existing or new native SQL, Lightspeed, and RedGate backups located on an SMB share.
    • Need for Staging Server.  The Delphix DDP implementation for Oracle does not need a Staging Server.  We read directly from the Oracle database server using the RMAN facility in modes that mimic both backup and log streaming.  In SQL Server, we must use a staging server where we can ingest those backups.  That staging server has storage allocated directly from the Delphix engine.  It is this storage that allows us to manipulate the data (after its has been ingested) through an always-recovering staging database.  The Staging Server must contain an instance of SQL Server which matches the version found on the Source (but doesn’t have to exactly match the Target). To the Delphix DDP, there is no difference between the staging and the target server functionality-wise except that the O/S user that owns the instance on the “Staging” server needs to be able to find prod.  On the target server, that same owner does not need to be able to do that.  So, the staging server O/S user has a superset of the privileges that a target server owner would have.
    • Name of Database Primitive.  In Oracle, the database primitive is called SCN (System Change Number) whereas in Microsoft SQL Server it is called LSN (Logical System Number).
    • Type of Backups. The type of backups you are doing affect the freshness and granularity of the Delphix DDP TimeFlow.  See: Delphix TimeFlow in Oracle vs. SQL Server. For SQL Server, the Delphix DDP also provides the capability for Delphix to take its own copy-only backup which has no impact on the log chain.
    • Connector/Point of Access to Host.  Unlike adding an Oracle source, when we add MS SQL databases the Delphix DDP needs to use a connector (a small app that allows Delphix to communicate to the server).  We want to be as un-intrusive as possible with Delphix.  So, we don’t want to install a connector on your prod server since we only need the backup.  Instead, we install the connector on the Staging Server and the Target server.   On this Staging server, the Operating system owner of the SQL Instance into which we will be recovering your production data needs to have the capability to go and find your db and the backups for your db and be able to read them and ingest them into that staging server.  This is usually not a big deal if you are in the same Data Center, LAN, and domain.  Customers with different domains for their target, or that have a separation between Staging and Production requires permissions be granted either across domains (a cross domain trust) or specific to that user so they can access those backups on the production side.

Data Presentation

  • Similarities
    • Common Delphix Features. Delphix Virtual Databases are generally treated the same within the Delphix DDP in terms of their ability to utilize the controls and features, particularly the data control features: Reset, Refresh, Rollback, Bookmark, Branch, etc.
  • Differences
    • Protocol.  SQL Server VDBs are presented to Target Hosts via iSCSI.  Oracle VDBs are presented via NFS v3.  Whereas the Delphix DDP uses NFS v3 for POSIX environments such as Oracle, it uses iSCSI for Windows O/S environments.  Crucially, the iSCSI that the Delphix DDP uses is NOT a hardware solution; we use a software based iSCSI.  This may require some configuration of the ISCSI services on the staging environments servers.

Supported Versions

Delphix Features: TimeFlow

  • Similarities
    • The Delphix DDP uses TimeFlow to represent the state of the database (or of a Container) in 2 ways:
      • SnapSync Cards – These represent the equivalent of a complete backup of a dataset as of a specific point in time.
      • LogSync Transaction Level Points – these represent each of the individual transaction boundaries uniquely identified by the database primitive.
  • Differences
    • Log Sync.  Log sync for Oracle is forward-facing; Log Sync for SQL Server is backward-facing depending on the last time time a new T-log was opened.  Since Log Sync can take advantage of Oracle Online and redo logs, it can build the TimeFlow in front of the last SnapSync card that was taken.  For SQL Server, TimeFlow can be granular but the granularity is a function of the last time the T-log was taken and never increments past that border.

Architecture Diagram


SQL Server:

The Digital Transformation Divide

A few days ago at the NASDAQ center in San Francisco, I caught up with MRE CIO Ken Piddington, who also serves as an Executive Advisor to CIOs.  “Top of mind with CIOs and IT shops I’m talking to,” said Ken, “is Data Transformation.”  In fact, he often hears key players tell him, “I’m part of the Data Transformation Group.”   The problem is that Data Transformation has come to mean so many different things to CIOs that it’s hard to define, and even harder to relate new data innovations into their journey.

Digital transformation is a data-driven interconnectedness that impels hyper-awareness, well-informed decision-making, and rapid execution.   Within this context, three key innovations are changing the Data Transformation Journey for CIOs:

  • Data is free to roam
    • Applying the principles of DataOps* to Thin/Sparse clones has effectively decoupled Database Servers from their Content.  It used to be that moving data (like a 5 Tb ERP app) was torturous, requiring lots of time and expertise.  But, DataOps solutions give Data Scientists, Analysts, Developers and Testers the power to provision fresh, personalized and secure copies of such environments in minutes.  The kicker is that these copies are mobile and untethered from the Data Producer.  Moving my 5 Tb ERP from Amazon to Azure can be accomplished in 10 minutes.  In fact, such solutions make it simple both to cross and move the cloud boundary.  That’s powerful.
  • Data Encapsulation amps up our velocity
    • We’re realizing in the data community what developers knew all along: just like encapsulation unentangled code and made it far easier to scale, encapsulating data and the controls we need for it is accomplishing massive scale for Data Consumers.  By setting embedded data controls at “dataset creation time”, Data Operators (who want to make sure secure data never gets out) can control access, portability, masking, and a whole host of other available controls that persist with the dataset.  This untethers those Data Operators from those Data Consumers.  With security in place and persistent, Data Consumers use the data where they want, move it where they want (within the rules), and never have to go back for permission.  It seems simple, but the request-to-provision step of our Data Supply Chain is often the most cumbersome, slowest, and most prone to bottlenecking part of the application delivery cycle for almost everyone who builds applications.
  • Data Synchronicity is a lot less expensive
    • Many make a distinction between “physical” transformations (like converting from Unix to Linux) and “logical” transformations  (such as you might do with your ETL).  But, the dirty little secret of ETL (and of MDM for that matter) is that a huge chunk of the time spent has to do with time logic (e.g., How can I put data from sources A, B, and C in the right order when they arrive out of order?).  DataOps solutions also contain features that place the entire histories of many datasets at your fingertips.  Yes, you can ask for the content of Source A, B, and C as it looked at the same point in time (not the time you received the file).  All the effort to massage data to get it to all match up in time is simply unnecessary if you control the time pointer.  Again, it seems simple, but the reset-to-a-common-point step of our Data Supply Chain is another cumbersome, slow, and involved process that slows down our application delivery cycle.

Data Interconnectedness offers challenges we don’t understand.  What we do know is that 84% of companies fail at digital transformation.  They fail because they believe data mobility is still hard.  They fail because they still operate as though data is anchored and bounded by the vendors’ server in which it is stored, or the fear of data leakage by security controls that are loosely coupled to the data.  And, they have yet to take advantage of the simplification DataOps solutions can bring to complex, composite applications.  The old adage is still true, When you don’t know what to manage, you manage what you know.

New Destinations for your Data Journey

For CIOs just learning about DataOps, there are clear benefits for their journey to digital:

  • DataOps solutions give you the power to commoditize cloud providers, and make the cloud boundary fluid.
    • Since your dataset is mobile and secure and decoupled, there’s no reason you can’t move it seamlessly and quickly from Amazon to Azure in minutes.  Moreover, you can decide to move a dataset from your prem up to the cloud or from the cloud back to prem in minutes.  Switching costs have fallen dramatically, and cloud vendor lock-in can be a thing of the past.
  • DataOps solutions kill the friction between Data Producers and Data Consumers making App Development and tasks like Cloud Migration much faster.
    • The security and process bottlenecks your Developers, Testers, Analysts and Data Scientists experience accessing the data they need will diminish dramatically.  Setting masking and access controls at creation time keeps Data Consumers in a safe space.  Giving data consumers direct control over all of the usual operations they want to do (rollback, refresh, bookmark, etc.) squelches down all those requests to your infrastructure team to near zero.  Applications move forward at the speed of developers and testers, not the speed of your control process.  Longitudinal studies show this can result in a 30-50% increase in application delivery velocity.
  • DataOps also amp up the Speed and Velocity of composite applications.
    • A lot of times, it doesn’t matter how fast you can deliver one app; it’s how fast you can deliver them all.  By giving you time-synchronized access to not just one but many datasets, all sorts of problems disappear.  You can create an end-to-end test environment for your 40 applications and it can be up in hours not months.  You can roll the whole thing back.  You can have all the fresh data you need to feed your ETL or your MDM or your data lake on command.   Data Virtualization makes those datasets not only fast and mobile, it makes them cheap too.

DataOps is disrupting our assumptions and our approached to Data Transformation.  And, it’s the right concept to help those folks in the “Data Transformation Group” cross the digital divide.

DataOps is the alignment of people, process, and technology to enable the rapid, automated, and secure management of data. Its goal is to improve outcomes by bringing together those that need data with those that provide it, eliminating friction throughout the data lifecycle.

A Declaration of Data Independence

Your business must dissolve the barriers that continue to lock-in your data, and arrive at a data-driven interconnectedness that impels hyper-awareness, well-informed decision-making, and rapid execution.  A respect for the difficulties of Digital Transformation demands Data Operators and Consumers declare the causes behind such disentanglement.

Data usage should be friction-free, imbuing Data Consumers with the power to see and access authorized data in all its versions without regard to location, cloud platform, or storage vendor.

To secure these capabilities, solutions are implemented by businesses to deliver data to its Consumers under the care and consent of the Operators who govern it.  And whenever any platform, vendor or process becomes destructive of these ends, it is the right of Data Producers to avoid such obstacles, and to institute new methods to converge Data and Operations, laying their foundation on such principles as shall seem most likely to effect their Data Access, Mobility, and Security.  Prudence, indeed, has dictated that solutions long-established should not be changed for light and transient causes; and accordingly experience has shown that many companies and projects and IT shops are inclined to suffer such pain rather than to right themselves by abolishing such obstacles.  But, when a long train of exorbitant switching costs, project delays, quality failures, security breaches and data transport costs foment such Friction as to hold data ever-more captive, it is the right indeed it is the duty of disruptive companies to throw off such barriers and to provide capabilities to truly safeguard our future Data Liberty.  Such has been the patient forbearance of many companies; and such is now the necessity which constrains them to disrupt that Friction.  Our present system of virtualizing, securing, and managing data presents a history of repeated injuries to our simplest and most vital goals: growth, cost reduction, risk containment, and speed to market.  And it is data Friction that has established itself in Tyranny over our data.  To wit:

Friction routinely prevents access to data, despite data’s vast contribution to the health of the business, because of the fear of loss of control or exposure.

Friction forbids Data Operators from passing on data of immediate and pressing importance, requiring Data Consumers to return for the Assent of Data Operators which Assent is then also gated by ticketing systems that Friction permits to utterly neglect to attend to those same Data Consumers.

Friction impedes Data Operators from their desire to accommodate large datasets, unless those Data Consumers relinquish the right to receive data in a timely fashion, a right of enormous value to them.

Data Consumers often need their data in places unusual, uncomfortable, and distant whence the data was produced, and the Friction of delivering authorized, fresh data to such places fatigues those Consumers into accepting stale data and quality lapses.

Friction confounds the desire of Data Operators to deliver by opposing Data Consumer’s needs with the limitations of Systems which leave Operators under-equipped and constrained, thus allowing Friction to trample on the needs of those Consumers.

And Operators find that after a long time, the mounting menace of data breach causes such Friction that it engenders ever-tighter access and deployment controls instead of permitting authorized Consumers at large to deploy at will within a well-defined, and personal governance framework; thus, our speed to market is constantly under danger of project delay within, and data breach without.

Our data population continues to rise among all our systems, and the size of our datasets continues to obstruct our ability to harmonize change in copies near and far. Thus, more and more Operators must refuse to create new copies or to pass along changes in a timely fashion or engage in migrations – as the value of data is judged less than the cost of the infrastructure and resources to deliver it.

At every stage of Data Oppression, we have sought Redress in the most humble terms: Our repeated workarounds and improvements have been answered only by more limited access, greater immobility, and a governance regime that stifles speed.  Data Friction constrains the value of Data by these various acts of Data Tyranny, and solutions that perpetuate it are truly unfit to guide the lifecycle of data that has been liberated.

We have not been wanting in our attention to our Data Systems.  We have replaced them from time to time with incremental solutions to extend by some small measure their scalability and performance.  We have tried to address the data explosion with emigration to private, public and hybrid clouds. We have appealed to visionaries to find some way to virtualize the last great frontier in IT – our data. And, we have conjured solutions to tie together sprawling data that is in constant flux, and inevitably subject to the limits of bandwidth and the shipment of change.  We must, therefore, acquiesce in the necessity, and hold Data Friction our sworn enemy in the war to win markets and move data.

We, therefore, the proponents of the DataOps movement, do in the name and by the authority of the Data held hostage by Friction publish and declare that our data is and of right ought to be free.

What’s wrong with Test Data Management? Friction. And lots of it.

Market share, profitability, even business survival can be a function of feature deployment frequency.  And, your competitors are speeding up.  The best companies are deploying 30x faster, delivery times are dropping as much as 83%, and unicorns like Amazon are now deploying new software every second.  But, with data expected to grow to 44 Zeta bytes by 2020, all of our work to reduce coding friction and speed up application delivery will be for naught if we can’t reduce the friction in getting the right data to test it.

Companies face constant tension with test data:

  • Feature delivery can take a hard shift right as errors pile up from stale data or as rework enters because new data breaks the test suite.  Why is the data out of date? Most companies fail to provision multi-Tb test datasets in anywhere near the timeframes in which they can build their code. For example, 30% of companies take more than a day and 10% more than a week to provision new databases.
  • To solve the pain of provisioning large test datasets, test leaders often turn to subsetting to save storage and improve execution speed. Unfortunately, poorly crafted subsets are rife with mismatches because they fail to maintain referential integrity. And, they often result in hard-to-diagnose performance errors that crop up much later in the release cycle.  Solving these subset integrity issues often comes at the cost of employing many experts to write (seemingly endless) rulesets to avoid integrity problems that foul-up testing.  Unfortunately, it’s rare to find any mitigation for the performance bugs that subsetting will miss.
  • It’s worse with federated applications.  Testers are often at the mercy of an application owner or a backup schedule or a resource constraint that forces them to gather their copy of the dataset at different times.  These time differences create consistency problems the tester has to solve because without strict consistency, the distributed referential integrity problems can suddenly scale up factorially.  This leads to solutions with even more complex rulesets and time logic.  Compounding Federation with Subsetting can mean a whole new world of hurt as subset rules must be made consistent across the federated app.
  • Synthetic data can be essential for generating test data that doesn’t exist anywhere else.  But, when synthetic data is used as a band aid to make a subset “complete”, we re-introduce the drawbacks of subsets.  To reach completeness, the synthetic data may need to cover the gap where production data doesn’t exist, as well as determine integrity across both generated and subset data.  Marrying synthetic data and subsets can introduce new and unnecessary complexity.
  • Protecting your data introduces more speed issues.  Those that mask test data typically can’t deliver masked data fast or often enough to developers, so they are forced into a tradeoff between risk and speedand exposure usually trumps speed when that decision is made.  As a Gartner analyst quipped: 80% of the problem in masking is the distribution of masked data.  Moreover, masking has its own rules that generally differ from subsetting rules.
  • Environment availability also prevents data from getting the right data to the right place just in time.  Many testers use a limited number of environments, forcing platforms to be overloaded with streams such that the resultant sharing and serialization force delay, rework and throwaway work to happen.  Some testers wait until an environment is ready.  Others write new test cases rather than wait, and still others write test cases they know will be thrown away.
  • Compounding this problem, platforms that could be re-purposed as test-ready environments are fenced in by context-switching costs.  Testers know the high price of a context switch, and the real possibility that switching back will fail, so they simply hold their environment for “testing” rather than risk it.  Behaviors driven by the cost of context-switching create increased serialization, more subsetting, and (ironically), by “optimizing” their part of the product/feature delivery pipeline, testers end up contributing to one of the bottlenecks that prevent that pipeline from moving faster globally.
  • Reproducing defects can also slow down deployment.  Consider that quite often developers complain that they can’t reproduce the defect that a tester has found.  This often leads to a full halt in the testing critical path as the tester must “hold” her environment to let the developer examine it.  In some cases, whole datasets are held hostage while triage occurs.
  • These problems are all subsumed into a tester’s most basic need: to restart and repeat her test using the right data.  Consider, then, that repeating the work to restore an app (or worse a federated app), synchronize it, subset it, mask it, and distribute it scales up the entire testing burden in proportion to the number of test runs.  That’s manageable within a single app, but can quickly grow unwieldy at the scale of a federated app.

Blazing fast code deployment doesn’t solve the test data bottleneck.  Provision speed, data freshness, data completeness, data synchronicity and consistency within and among datasets, distribution speed, resource availability, reproducibility, and repeatability all contribute to the longer deployment frequency.  Why is all this happening? Your test data is Not Agile.

How do you get to Agile Test Data Management?  One word: Delphix.

Data, Delphix, and the Internet of Things

We have many devices, our devices are doing more, and more things are becoming “devices”.  This is the essence of the Internet of Things (IoT), which Gartner calls the “network of physical objects that contain embedded technology to communicate and sense or interact with their internal states of external environment.”

The IDC Digital Universe study concluded that the digital world would reach 44 Zettabytes by 2020, with 10% of that coming form IOT.  That would mean that IOT data will account for as much data in 2020 as all the data that was in the world in 2013.   All of that data needs to be aggregated, synchronized, analyzed, secured, and delivered to systems inside of companies, and that’s where the opportunity comes in.

Growth in the Internet of Things, NCTA
Growth in the Internet of Things, NCTA

That same IDC study showed 5 ways that IOT will create new opportunities:

Source: EMC/IDC
Source: EMC/IDC

How does Delphix help companies with these 5 new opportunities?

New Business Models

  • It’s not how fast you get the data, it’s how fast you can respond to the data.  Delphix can deliver IOT data from a point of arrival to an analytic system, or drive much faster feature delivery for a feature you know you need based on the data.

Real Time Information on Mission Critical Systems

  • Data is one of the most difficult workloads to move around, but Delphix makes moving those workloads easy.  At a few clicks of a button, fresh data can push to your Analytics solution.  Or, you could easily cross the boundary and push your data up into the cloud for a while, then bring it back once the workload was finished.

Diversification of Revenue Streams

  • Monetizing services means delivering features faster and delivering applications faster. Data is the bottleneck in most of these kinds of deliveries, and Delphix has proven time and again that it can speed up application delivery time cycles by 30-50%.

Global Visibility

  • Powerful virtualization software can make far-flung datasets seem right next door, cranking up the speed of insight.

Efficient, Intelligent Operations

  • Delphix is expert at using virtual data to solve the data movement problem, giving your data the kind of agility that can let you actually achieve on-the-fly decision making.

At Delphix, we’re already talking to some of the key large scale entrants into the IOT space.  Take Power & Electric companies for example.  The applications and opportunities for Delphix around key utilities initiatives like Smart Metering are many-fold.  Specific states, regulatory bodies, and power grids may utilize Delphix to create or mobilize aggregate datasets to support new features in service delivery, service provision, or analytics.  Or, customers may start connecting smart appliances into a home grid that allows them to tune up or down their electricity usage, and that data may flow up to electric companies that may need to do standard feature development for new service offerings.  The list goes on and on.

IOT Data poses the same problems that companies have today with data mobility/agility and application delivery – problems that Delphix already solves.  From a new frontier perspective, IOT Data’s new opportunity spaces demand faster decision time, faster feature delivery time, and powerful ways to get data analytics done faster.  Delphix is uniquely poised to help with all of those problems as well.