One of the challenges facing many development organizations is how to provide adequate development needs while providing enough isolation to prevent developer conflicts. Today's virtual technology provides the ability to spin up developer specific environments quickly easily while providing the horsepower to mimic near physical environment capacity. I will leave the discussions over physical vs. virtual to discuss in another area, I will just talk about my recent experience, and how handling our environment differently could have yielded a more pleasurable result.
First I will tell you a little bit about how we work.
We heavily utilize virtual environments for our development and testing. We buy high end hardware and run on high end SAN solutions. I don't know all of the details about it, I just know that it works and our SEs (System Engineers) bust their butts to give us what we need. Our old development and QA environments are based on some variation of VMWare. Again, I don't know what it is except that it's virtual hardware and it meets our needs. We need an environment, we ask, they deliver. We are in the process of migrating all of our virtual systems for development and QA to a VMWare Lab Manager environment in a different data center. It provides the ability to spin up groups of isolated environments, clone environments to a new configuration and much more. I won't go into the details about VMWare Lab Manager but our SEs have handed over the power to us to be able to create our own virtual environments as we need. They have also provided is with over 9TB (and growing) of storage per team location to host the machines. Unfortunately we are not 100% live on this environment because we have terabytes of data to move across the WAN to the new data center.
So here is what happened recently, and how things could have been different had we been completely migrated to our new VMWare Lab Manager environment.
I was working on diagnosing a problem with deadlocks that was happening in our QA environment. Our QA environments are well protected environments in that our developers, which I am one of, do not have administrative privileges over the web, file, database or any other servers. Our QA teams manage them and we are limited to a basically read only access. So when it comes time to replicating problems found in QA we must do it in our development environment. Had this environment been in our VMWare Lab Manager environment, I would have been able to snapshot the QA environment, make myself an admin over the environment, and proceed with reproducing the errors. Unfortunately it was not so I proceeded to locate an environment with the closest configuration and the same or more data. Since we are not fully live on our new environment, all of our larger development and test environments are in our old system where we do not have abilities to clone configurations. I contacted the person who managed the environment I had been developing on a few weeks prior in order to determine the status of the setup. I informed them that I needed to do extensive testing and would be doing large amounts of data manipulations to the system for troubleshooting a problem with the project that the system was originally setup to develop and test on. I was told that the environment was scheduled for decommission two days from now and that it was not in use. I did not need to backup the database since it was scheduled for decommission, and I would only be working with 4-5 tables with under a billion records which I would be truncating in order to test the process which was deadlocking. Due to the size of the database (100GB+) and the low amounts of storage capacity in our old environment, I likely would not have been able to backup the database if I had tried.
After receiving the all clear, I connected to the system, truncated the tables which I would be working with and started my testing. It didn't take long to reproduce the problem and I was happy. The environment had served it's EXACT purpose to me. As I was adjusting the code to eliminate the deadlocks, the person that I had received permission to use the environment stopped by my desk. They asked, in a troubling voice, "are you working on the ________ environment". Puzzled, since I had just asked for permission less than two hours ago, I replied "Uhhh... yeah. We just talked about that two hours ago". Well APPARENTLY the system was not quite as "not in use" as they had believed. The system has been used for the past two weeks to do large scale testing. Testing that was scheduled to complete the next day and full reports to be generated. Unfortunately, my testing required that the very tables that were used to calculate the results of the scale testing be truncated to perform my tests. Two weeks of testing, POOF GONE in less than 5 minutes.
How could this disaster have been prevented?
- I could have (attempted to) backup the database
- If successful I could have copied that database to the new environment for testing
- I could have looked through the database to look and see how recently it had been used through some of the logging tables (there are several)
- I could have found a different environment, which may have ended in the same result with a different team
How will VMWare Lab Manager prevent this in the future?
- Rather than taking over the environment I could have cloned it to a new fenced environment where my changes would not impact any other developer
Pretty simple!
In closing, I do not place blame on any one person for this. There are things that could, and should, have been done that could have prevented this. I certainly accept part of the blame because there are things that I could have done to ensure the environment was actually unused before proceeding with my testing. And virtualization is just one solution for preventing this type of scenario from happening. And it may not be the solution for your organization. But whatever your needs may be, make sure that you take the proper precautions to ensure that your development environments are isolated in such a way that developers to not wreak havoc on other developers efforts. Work with your boss, your IT staff and the other developers to create a productive environment that meets your needs. Document your configurations so that when the question is asked "Who is using this server/database/file" it can be easily answered.
Brandon Galderisi
SQL Server MVP
SQL Server Nation Co-Founder
Posted
11/17/2009 10:03 PM
by
BrandonGalderisi