The New Year

So just in case you are wondering if I’m conscious of the New Year… yes, I am.  Believe it or not 🙂

I have just three resolutions this year and they’re quite personal.  I’ll share them with you, intrepid reader, since you can keep a secret.

  • Do better at maintaining my temper, blood pressure and sanity.
  • Sleep better.
  • Lose 80 pounds by last December 2009.

Think I can do it?  Who knows.  I have a gym membership, so I’m armed.

Happy New Year to all of you.  You know I love you bunches.  All of you.

Reblog this post [with Zemanta]

Exchange 2007: SCR replication repair

Last week I had to do some serious debugging on storage copy replication.  We discovered that one of our SCC clusters had decided to quit replicating to the SCR node at the other site.  We’re not sure why (we think it’s because the SCR node was rebooted and replication was not cleanly suspended), but the ramifications of failed replication are interesting.

In the Exchange 2003 world, you had to depend on your backups running smoothly to purge log files from the log disks or else eventually, you’d find your databases dismounting in the middle of the day because you’re out of space.  Exchange 2007 and storage group replication has added a new complexity to that.  Now, not only do your backups have to succeed, your log file replication has to be working well too.  We discovered that log files were not being purged and voila… databases dismounted.  If your replication is broken for any reason, Exchange 2007 will not purge those log files.

So, with that in mind, I thought I’d share some of the email that was sent around to the team that discusses how to troubleshoot the storage group replication processes just in case someone out there needs it.

(introduction cut)

Sometime last week, SCOM started complaining about the ReplayQueueLength being elevated on SCR02.  This meant that replication had, once again, halted for some reason.  I thought I’d share with you on how to debug/correct this should it happen again.

Open up Exchange Management Shell ON THE PASSIVE NODE.  To check the replication status of a storage group, type:

Get-StorageGroupCopyStatus -server <servername> -standbymachine <SCRnode>

For instance:

Get-StorageGroupCopyStatus -server exchscc02 -standbymachine exchscr02

This will produce a list of the storage groups and the replication status.

First column lists the storage group.

Second column informs you of the overall status.  Should be == HEALTHY.

Third column lists the CopyQueueLength.  This is how many log files that must be copied from the source active node to the passive SCR node.  Should be a low number or zero.  Anything higher and not decrementing means there is a likely an issue developing.  SCOM is probably (hopefully) alarming about it if that’s the case.

Fourth column lists the ReplayQueueLength.  This is how many log files need to be played into the passive copy of the database at the SCR side.  Will always be 50 or below.  Above 50 indicates there is some kind of problem at the passive SCR side.  DO NOT BE ALARMED by this number being 50.  Exchange is hard-coded to not play anything into the database until it gets 50 log files.  We cannot change this.  If we were to activate the SCR side of the node, it will play these 50 files in.

Fifth column lists the last time the replication service checked the log files out (should be FAIRLY recent, depends on database usage).

If you discover any of the message stores are in a state of “suspended” or “failed” you must debug the issue.

If the message store is in a “suspended” state, you may be able to restart the replication with little issue.  Try this (RUN THESE COMMANDS FROM THE SCR OR PASSIVE NODE ONLY!)

Resume-storagegroupcopy -identity <servername\sgname> -standbymachine <SCRnode>

If the log files are intact and things are happy, replication will restart and you’ll be told that all is well.  If something goes awry at this point, the storage group will go down to a failed state.  You can run the get-storagegroupcopystatus to double check where things are after trying a resume.

If you get a storage group in a FAILED state, things are a little more delicate.  Make sure there are no issues with the servers talking to each other.  CHECK EVENT LOGS, especially APPLICATION log for any errors (PLEASE always do this FIRST, for EVERY EVERY EVERY EVERY EVERY and I mean EVERY (did I say EVERY?) Exchange issue!)  Make sure the replication service is started on both nodes.  Make sure they can ping each other.  Make sure they can open a windows explorer window to each other’s c$ share.  Check all of that out before proceeding.

If you can find absolutely no reason why the servers cannot talk to each other and the SG’s should be replicating fine, you can try to reseed the databases.  This is a time-consuming operation and could consume lots of bandwidth.

Before reseeding a database, you must put the FAILED storage groups in the suspended state.  In this example, let’s assume exchscc02\SG02 went down to a FAILED state.  First, we suspend it:

Suspend-storageGroupCopy -identity exchscc02\sg02 -standbymachine exchscr02

Now do another get-StorageGroupCopyStatus command to verify it is suspended:

Get-StorageGroupCopyStatus -server exchscc02 -standbymachine exchscr02

Verify that SG02 is now showing SUSPENDED.

Now the moment of truth.  BE CAREFUL to execute this ONLY on the passive node (usually the SCR node).  This command DELETES THE PASSIVE COPY of the database and log files and restarts replication!  There’s no going back once you’ve made this decision.  Choose carefully.

Update-StorageGroupCopy -identity exchscc02\sg02 -standbymachine exchscr02 -deleteexistingfiles

After a confirmation and a pause, you should get a progress bar as the live copy of the edb file is copied over the wire to the passive copy and log files begin accumulating.

After this completes, be sure to run another get-StorageGroupCopyStatus command to verify everything is healthy again.

There are no reboots or storage group/database offlines required for any of these commands.

(end email)

Upon review of the notes and the activities that led up to the failed replication states, it was determined that as a standard operating procedure, replication should be manually suspended on all SCC –> SCR nodes prior to patching and rebooting machines.  This means, of course, that replication has to be restarted after your patch party is over.

To do this is pretty much the same as above:

Suspend-StorageGroupCopy -identity <servername\sg> -standbymachine <scr_server>

You could get fancy and do something like a pipe of Get-StorageGroupCopyStatus to this command and it would probably will in all of the identity stuff.  That’d be fine, but I prefer to do things the hard way I guess.  I like to take it easy.

Then when your patch party is over:

Update-StorageGroupCopy -identity <servername\sg> -standbymachine <scr_server>

Hope these notes help someone out there struggling with Exchange 2007.

Reblog this post [with Zemanta]

Macworld speculation

Apple Inc.
Image via Wikipedia

Did anyone catch the memo?  The one that said Apple is done with Macworld after this year because they want to announce products on their own timeline?  Yeah that one.  I thought you saw that one.

People seem to have forgotten it already.  There’s rampant speculation all over the net about products that Phil Schiller will be announcing during the keynote next week.  Hey, I’ll be there (I’ve got a guaranteed seat for the keynote, yay!) but I’m not expecting anything Earth-shattering.  I’m actually expecting not much of anything except some horn tooting and market share numbers.

But should they choose to introduce new products, I won’t complain.

Reblog this post [with Zemanta]

From the Philadelphia news leader: Woman charged with raping boy, 14 – 12/29/08 – Philadelphia News – 6abc.com

From the Philadelphia news leader: Woman charged with raping boy, 14 – 12/29/08 – Philadelphia News – 6abc.com.

Just a quick post to point out that I’m radically happy my family is nowhere near this f-ed up.

Dish Network finally installed

The installers kept the appointment this time around – as of today we have Dish Network added to our plasma box of glory.  So far it’s good.  We ate breakfast while watching bizarre shows from the other side of the planet.

It’s amazing how quickly the wife tired of it 😉  That was a quick Americanization.

Reblog this post [with Zemanta]

Dish Network Customer Service: A poor first impression

SuperDISH 121 mounted on a roof.
Image via Wikipedia

One of the fortunes I’ve had this year was to finally obtain a “real” TV set to replace the small dinky tube I’ve had since I moved out of my house.  In case you’re wondering, that’s about 19 years ago.

I had promised myself that once I obtained a home theater-type of setup, I was going to load it up with plenty of HD services.  When analyzing the bills, I realized that I was paying some fairly outrageous fees for Internet, cable and phone.  I could consolidate all of those services into Knology (added bonus: drop Comcast, which is not pro-consumer) for less than half of the monthly price.  I could then take those savings and get Chinese televsion for my family.

My wife has been here 7 years and has been quite tolerant of the lack of Chinese television.  As a matter of fact, the move is more to expose our half-Chinese children to the world of Mandarin rather than for her own enjoyment.  She really, really wanted to get ahold of YOYO-TV in particular.

Unfortunately, Dish Network is the only available option to obtain this channel.

I started off by calling the 1-888 number advertised on their website.  I wanted to add some HD channels to the package as well since, frankly, I’m a hair disappointed in the amount of HD offerings from Knology.  I wanted Sci-Fi Channel and Cartoon Network in HD and for some unknown reason, Knology does not offer them.  Dish Network, however, does.  I decided to go for the all-HD silver package and the Taiwanese mega-pack to obtain the YOYO-TV channel.

I called the number and was greeted by a very helpful young man named Alan.  Alan understood exactly what I was after.  He priced out a fabulous package, made sure the installation was completely free and signed me up.  He scheduled the appointment for 8am – noon on the following Tuesday.

Great, thought I.  I’ll just telecommute from work that day and it’ll all go well.

Tuesday comes and my wife runs the kids to school.  I stay home and join the regular telecons and whatnot (sometimes what I refer to as a job).  Noon comes and goes.  By 12:30pm, I’m pretty sure they’re not going to show up.

So I call the 1-888 number once again to get an idea of what might have happened.  I was flabbergasted at the answer.

They had cancelled the appointment.  Not only did they cancel the appointment, no one bothered to call me and inform me of the cancellation.  It gets better.  They rescheduled the appointment for the following Thursday from 8am – noon.  I presume they just assumed an adult would be available anytime they pleased, since they can apparently reschedule the appointment without notice.

I asked why the appointment was cancelled – especially since the day before they had called to confirm the appointment and verify someone would be there in the house.  I was told that this confirmation system was not tied in to the dispatching system at all – they had no idea if the appointment would be held or not.  Unfortunately, my appointment was not held because they did not have the required international dish in stock.

I was furious.  It usually takes a lot to get my blood pressure boiling this hot, but the incompetence was not only so appalling, they seemed to be incredibly indifferent about it.  I kicked into instant pushback mode.  “I work in the service industry too,” I growled, “and if I had given this type of service to my customer I would be fired.”

“I understand your disappointment sir but there’s nothing I can do.  We have no way to determine if they have the part.  We can only schedule the appointment.”

“So you’re telling me that I have to reschedule the appointment, take off of work to be home to meet you and you guys may or may not show up?”

“If the required parts are in stock the appointment will be kept.”

I was seething at this point.  There was absolutely no way in hell this could be true.  “You’re telling me that I have absolutely no recourse whatsoever except to just sit here and wait to see if you have the part?”  At this point, I decided to make it sound far worse than it really was.  I wanted her to think that I had devoted vacation hours to this appointment and lost.  “I have to take off work again and you may not show up?  Are you kidding me?

She wasn’t kidding.  Really.  She wasn’t.  Apparently, there is absolutely no way between appointment scheduling, inventory and dispatch for them to predict if they will be able to keep the appointment.  Due to their process, they cannot tell you if they will keep the appointment, confirmed or otherwise, until the day of the appointment.

This must be pure insanity.  I could not believe it.

After pushing back and arguing on this further, I finally gave in and made them reschedule the appointment for the following Sunday from 8am – noon.  I also asked for the manager so I could at least get a free month out of this.  I was assured that I would get a free month of service for the “inconvenience.”

Fast forward to Sunday.

At least this time they called.

That’s right.  At 8:30am they called and said that again, they didn’t have the required dish in stock and would not be showing up today.  It was my wife who took the call and the bad news.  She tried to reschedule the appointment and they refused!  She told them that since I was the one that ordered the service, only I could call back and reschedule the appointment.  Again, it was a crap shoot as to whether or not they would have the part – but I still had to reschedule.

While writing all of this, I wish I was making this up.

At this point I made another check with DirectTV to find out if I could get YOYO-TV from them.  Sadly, no.  I must just bend over and take this Dish Network abuse.

Anyway, so I call back in on Monday and get a very helpful rep who hears this entire story.  She apologizes profusely and decides to call dispatch herself and find out when they will have the dish.  Amazing.  Why didn’t the others bother to do this?  Dispatch confirms that “for sure” they will have the dish by Friday, so I went ahead with rescheduling the appointment.

So we’re 48 hours away from that.  We’ll see what happens.

My first impression of Dish Network?  Very unimpressed.  No wonder they hired a talentless comedian to try to pep up their commercials.  They obviously want to cheer you up before they put you through this ridiculous customer service racket.  Their process is so horribly broken that I wonder what I’m in for if I actually do get the service.

I guess we’ll see.  From what I see at dishnetworksucks.com I’m not in for a good time.

Reblog this post [with Zemanta]

Abit’s Death Date Reportedly Set: 31st of December, 2008 – X-bit labs

Abit’s Death Date Reportedly Set: 31st of December, 2008 – X-bit labs.

Even though I don’t care too much about building PCs anymore, this is sad, sad news.  ABIT is shutting down completely at the end of this year.

However, on a positive note – at least that narrows down motherboard selections for enthusiasts.

A few random words

I’ve not had much in the way of time this month. I’ve not had much time to do much of anything productive, let alone write here. I’ve actually spent the few writing moments that I have had on some projects invisible to the casual Galaxycow reader. Perhaps, if luck shall have it, sometime next year you’ll see the fruits of that labor.

Aside from that, it’s all been about the day job and the kids at night. Christmas is a crazy time for us, especially now with three kids to feed. Not just that, but right out of the holidays we’re packing most of the family up and heading out to Macworld. I’ll probably have some pictures and blog my thoughts about Macworld, but by and large I’m sure you’ll stick to the sites with writers who are paid to cover it for you. I’m fine with that; I’m going to Macworld for work anyway. Who has time to do silly things like report on it?

I know most of the world has updated to WordPress 2.7. I thought about doing this tonight myself, but worried about theme incompatibilities and other fun issues, so I decided not to kick that off tonight. I’ll save it for another day or night when I’m just not so busy.

I fear that twitter is starting to take the place of this blog and that shouldn’t happen. (Woo, told you this was a random post). Twitter serves the need to offload quick nuggets of thought very quickly, rather than planting them here and letting it grow into a full-fledged fruit-bearing monstrosity. Not that the fruit here is very fresh or anything, but you see the analogy. It’s a little problematic, I suppose, because if I offload a thought, chances are I won’t remember it again. It’s kind of like those plays I did in school and all the way up to 1999 – once the play is over, the lines just eject themselves from my brain. Sad, but it seems my brain only has limited amounts of space and will clean house to make sure there’s room for other material.

Anyway, it’s late. We’ve been ripped off from getting snow again this season. Hope that doesn’t happen again. Now that I hear my iPhone making that oh-so-familiar “dong” sound of new mail on my work server, I should retire for bed.

I promise to write more. Really.

Reblog this post [with Zemanta]

Bun’s Bedtime Story

I’ve pretty much posted this everywhere I have a presence, but that’s because I’m just plain seriously proud.

My son has written his first story.  He wanted to share it with everyone on the video camera, so here away we went.  I think my son may have a future in vlogging.

This is the same video that is posted on the family site and podcast, www.porkbuns.org.