Backup Sanity Check

One of my pet peeves is when an application thinks its smarter than its DBA. Our helpdesk system was occasionally complaining that the database hadn’t been backed up in the past 24 hours. And our helpdesk manager was understandably concerned. That’s when I started doing some digging.

My backup script was occasionally skipping databases that had a SIMPLE recovery model. I’m still trying to figure that one out. More importantly, I set the recovery model to FULL, as it should be. * The latest upgrade of the vendor’s software set it to SIMPLE. But still, I was scratching my brain, trying to figure out what was wrong with my script.

In the meantime, I wanted to update my backup script to add a sanity check, backing up anything that hadn’t been backed up in the past 24 hours. I headed for the master.sys.databases table to see if I could find the column that indicated when it was last backed up. That’s the logical place for it, right? Wrong.  After doing some spelunking, I drew up this little gem:

SELECT      sd.name, max(bs.backup_finish_date)
FROM        master.sys.databases sd
LEFT JOIN   msdb.dbo.backupset bs
ON          bs.database_name = sd.name
where       ISNULL(type,'D') = 'D'
GROUP BY    sd.name
HAVING      max(backup_finish_date) IS NULL
OR          datediff(hour,max(backup_finish_date), getdate()) > 24
order by    sd.name

The backupset table gave me a great place to start.  But I found a little problem–it was listing a bunch of databases with really old backup dates. That’s when I realized those databases had been dropped long ago. By joining it to the master database, it only showed databases that were currently attached. And as an added bonus, a couple of NULL checks allowed me to see anything that had never been backed up.

I shouldn’t have to use this, but it is nice to know that I can use it as a sanity check.

* In my environment, the only databases that get SIMPLE recovery is static data. Everything else gets set to FULL.

Failing the Smell Test

In addition to being a DBA, I’m a bit of a travel geek. I have a great love of airplanes, aviation, and travel.

Today, I was looking at flights and saw something that made me think that it was a total violation of constraints.

Just looking at this made me cringe. US Airways flight 0729 departs London’s Heathrow, stops in Philadelphia, and then continues onto Boston. Flights with intermediate stops are common. Heck, Southwest flights are notorious for going coast-to-coast with multiple stops along the way. But in most cases, that flight is operated by same plane. In this case, the flight doesn’t stop in Philadelphia. If you look closely, you depart in Philadelphia and get on an entirely different aircraft. It’s not just a different physical plane–it’s a whole new aircraft type. You’re going from a widebody Airbus A330 onto a regional Embraer E90. This is what I call failing the smell test. The data construct just doesn’t make sense.

We’re facing a very similar situation with my company’s ERP system right now. We make technology devices and the software that runs them. Somewhere many years ago, when we started selling high availability pairs, instead of selling two devices with a discount, someone in marketing decided we would sell them as a pair. This has been a complete nightmare in terms of data. Our ERP system doesn’t allow two serial numbers for a part. The first serial number is recorded in the ERP system and then the second one lands in a spreadsheet somewhere. The folks in customer support go crazy when they can’t find the serial number of the second item in the pair.

I’ve seen a lot of conflicts between technology and marketing people in situations like this. Unfortunately, we lose all too frequently in these situations. In these cases, we need to really think through the process. Had someone looked at our problem from the customer support perspective, we might have been able to convince marketing that this was a bad idea. It should have failed the smell test at inception.

When things work

I’ve mentioned in the past that our facility had some serious drawbacks during power outages.

During our migration to our new facility and data center a few weeks ago, we lost power to the building. Again this morning, we lost building power. Let me tell you what happened.

This sounds like one of those “well, duh!” moments, but I have to tell you, it was revolutionary for us.  Every one of these things was problematic in our old building.

It’s wonderful when things work the way they’re supposed to.

Migration

Tomorrow, my company moves into our new facility. It’s about 1.5 miles down the road from our old facilities.  We were in two different buildings about 100 yards apart. Now, we’re moving into a single building.

We’re a technology company who makes networking software and quite a bit of the hardware that runs our software .We also make management software. On top of that, we do final assembly and testing in-house for the majority of our products.  That means our IT group had two server rooms, engineering had two test labs, customer support had a test lab, and manufacturing had a major facility. For all intents and purposes, we had six data centers. That’s a whole heck of a lot of networking. In our new facility, that is consolidated down to four. The engineering “lab” is massive.  Our cabling contractor tells us they probably ran 150 miles of CAT6 cable in the new facility.

Tomorrow’s move meant we migrated our data center Friday night. It was bumpy but successful.

Six months ago, we bought a pair of EMC ClarIIon CX-120 SANs and a pair of IBM H series blade centers with HS22 blades. My coworker and I have spent the past six months consolidating servers, virtualizing servers, and migrating systems to new virtual machines. About 80% of our corporate systems are virtual now. I have to say that it made this weekend’s migration much easier. Well, almost.

In working with the people from EMC, they noticed that we were replicating the LUNs that contained my backup files as well as my tempdb files. This was causing a lot of replication traffic between the two SANs. While they were sitting next to each other, this was fine. But going across a 100 megabit connection, they thought it might be too much. They suggested that we not replicate those LUNs. Foolishly, I agreed.  After the migration (which was bumpy itself due to a few self-inflicted issues), the virtual machines that run all four of my primary database servers failed to start. I saw my career flash before my eyes.

Fortunately, EMC sent us an amazing engineer who actually made sure the SANs were completely in sync before moving. All he had to do was present the LUN as a VMWare storage group and my servers came up. The process took about 90 minutes longer than we had expected, but it was still quite successful. And now we have a better idea what to expect when we move our DR to a colo facility next month.

One of the blessings of this move is that we got a new alarm system. And that means my only remaining SQL 2000 instance got retired and replaced with a brand new SQL 2008 instance. That made me happy.

New Project: Aviation Statistics

Most technology professionals have a lab or test environment where they can work and play. Unfortunately, I’m not one of them. Sure, I have a test environment, but it’s only for my ERP system. And my company is really particular about software licensing. I could never set up a test environment that wasn’t explicitly related to work. And since we’re so touchy at proprietary data, it makes blogging about work data incredibly difficult.

Until today.

Thanks to another member of the SQL Server community, I got my hands on an MSDN membership. That gives me access to some software licenses from MIcrosoft. And that means I can start building my own test lab. And that’s exactly what I started doing tonight.

I have a reasonably powerful desktop that is running VMWare ESXi (free!), and my first virtual machine was built to play with SQL 2008 R2. If you read my personal blog, you know that I’m a bit of an aviation nut. So to have some good data to test with, I’ve downloaded a lot of data from the Bureau of Transportation Statistics.

In the coming weeks, I’ll have a few posts about the data, what I’m doing with it, and what I’m learning by playing with it. So far, I have about 20 million rows in a single table that’s the raw import I got from BTS. Once I have a clean import, I’ll make that data file available to anybody who wants it. It’ll be my little contribution to the SQL Server community. It’s some good data for testing.

Of course, I got sucked into a project, and I’m up past my bedtime. At least I was doing something fun.

Free Lunch – Without Lunch

The old adage is that there is no free lunch.  Every once in a while, a free event comes up that’s a great use of time at no cost. Today was one such example.

My friend Brent Ozar presented a virtual training class sponsored by Quest Software.  Today’s topic was “Performance Tuning & Troubleshooting with DMVs” and I found it pretty fantastic.

If you keep your eyes open, you can get some really great free training out there. They’re not always sit down training in a proper classroom. Sometimes they’re seminars or virtual events. My current boss seems to think that if it’s not led by an instructor in a classroom, it’s not worthwhile. I will say that I get more out of talking to and listening to other DBAs, learning from their experience.

Sure, these events are usually paid for by a sponsor, and you have to listen to the sales pitch. It’s well worth it. My only complaint is that Quest didn’t provide lunch today. They should have provided the bacon. Since I was sitting in my own office, I guess I can handle buying my own lunch.

Looking for a Security Solution

We have some pretty strict security policies in my company. That often means coming up with creative solutions to problems. Hopefully someone in the DBA community can help me.

We have a company that does some outside work for us. We have a database server at their location. We need to allow the database server to talk to our DB server in a network DMZ.  We’d use a linked server from the remote site to talk to our DMZ server. I’m okay setting up the username/password and the firewall ports. The problem is that we need to have the traffic between the two servers encrypted.

To further complicate this, we don’t want all of the traffic on these servers encrypted, just the traffic that goes across the internet.

I’m completely befuddled on how to do this properly. There is a very good chance that I’m making this harder than it needs to be, and I’d be thrilled if someone gave me a simple solution.

Upgraded

I’ve just upgraded all three of my sites (this one, my personal site, and my Pug site) to the latest version of WordPress. Keeping current on your software is one more way to prevent getting hacked.

My Updated SQL Server 2005 Install Checklist

If you’ve been following me lately (either my blog or Twitter), you know that I’m in the process of building up some new servers on a new SAN for a big database consolidation.

After the first server, I had a pretty good checklist. Brent Ozar has a pretty fantastic checklist that I stole a few items from.  This is my attempt at a more refined checklist. (of my own, not Brent’s)

In my new environment, each server is a VM. The physical box has 16 GB of RAM and the VM will get 8 GB. I have drives for the OS, SQL logs, SQL data, TempDB, and backups.

Here is my order of things:

Disabling autogrowth on TempDB may be a little controversial. I’m okay with that. If TempDB is growing that much on our servers, which aren’t that big, I want to know what the heck is going on.

Okay, fellow DBAs, what did I forget, and what did I do wrong here? I think I did some good stuff here, but I doubt it’s perfect.

Moving System Files

Having built one database server in my new environment, the second one is an awful lot easier.

The following six lines of code saved me a ton of time this morning.

ALTER DATABASE [tempdb] MODIFY FILE (name = tempdev, FILENAME = ‘D:\TempDB\tempdb.mdf’)
ALTER DATABASE [tempdb] MODIFY FILE (name = templog,
FILENAME = ‘D:\TempDB\templog.ldf’)
ALTER DATABASE [msdb] MODIFY FILE ( NAME = msdbdata , FILENAME = ‘G:\system\MSDBData.mdf’ )
ALTER DATABASE [msdb] MODIFY FILE ( NAME = msdblog , FILENAME = ‘e:\system\MSDBLog.ldf’ )
ALTER DATABASE [model] MODIFY FILE ( NAME = modeldev , FILENAME = ‘G:\system\model.mdf’ )
ALTER DATABASE [model] MODIFY FILE ( NAME = modellog , FILENAME = ‘e:\system\modellog.ldf’ )

I just can’t explain how big of a deal this is to me. In my current envrionment, every server was set up with a single C: drive. Some of these file systems are over a terabyte. Now we’re breaking things up, and I can only imagine the performance boost we’re going to see.