Tuesday, September 6, 2011

Adaptiva One Site – Bare Metal OSD Using SCCM With Peer to Peer PXE

Once in a while there is an opportunity to present something that feels truly revolutionary. Even though I was involved with the design of the Peer to Peer stack in One Site, and wrote some of the code for it, by the time we were done, I was simply blown away by how beautifully it worked. It took a ton of work, and I’m very pleased to be able to describe it in this blog.

You can also download the actual user guide here: OneSite P2P PXE User Guide

1. About PXE

PXE (Preboot Execution Environment, pronounced as pixie) has been around for ever. It originated as part of Intel’s “Wired for Management” initiative, and the current de facto standard, version 2.1, was published in 1999, jointly by Intel and Systemsoft.

At the time, I worked at Microsoft, as a program manager with the Windows Networking team, and it was really exciting to see, work on, and help manage a lot of new things coming out of Intel in those days.

However, when I look back at many of these things all these years later, they really didn’t work out that well: Wake on LAN, G.723.1/H.323 based IP conferencing, and yes, PXE.

Even though all these protocols were carefully designed, reused a lot of work that had been done at the standards bodies, and were vetted repeatedly, the kind of enterprise thinking that permeates these companies today wasn’t as prevalent at the time. It is really an irony that I have wandered into the candy store a second time and been involved in alleviating the shortcomings of all three of these protocols.

Challenges of deploying PXE on large distributed enterprise-scale networks are well known to all of us. I’ll enumerate them quickly and we’ll move on to P2P PXE:

Deploying the server infrastructure required for PXE and TFTP
Network configuration changes (IP helpers) required to routers
DHCP server changes required, specially for those companies that run their DHCP servers on Cisco

2. Benefits of Peer to Peer PXE

These are exactly the reverse of the deployment challenges I listed above. This is not a coincidence, because we designed our Peer to Peer PXE protocol to exactly remove the shortfalls that exist in the current PXE protocol.

No servers or server roles are required
No router changes are required
No DHCP server configuration changes are required

There is nothing to install, or configure, in any of your branch offices, period.

All you have to do is:

Install one copy of Microsoft’s WAIK toolkit – a free download from here: WAIK Download
In the Adaptiva Workbench, select the collections where you want PXE
Click the “Enable” button in the Adaptiva Workbench

Peer to Peer PXE is automatically enabled on all machines that belong to the selected collections – all within the next 1-2 minutes! What you’ll have is a completely elastic and self healing PXE infrastructure that continues to function, no matter how many machines go up or down, come online or go offline, break, or get fixed, and then get broken again.

3. Under the Covers

There are two core protocols that are required for a machine to boot its neighbor using PXE:

PXE Server
TFTP Server

Brutally efficient implementations of both these protocols are built into every One Site client.

When you download and install the WAIK toolkit, One Site server extracts the following seven tools, and packages them into an “Adaptiva WAIK Package”, which is then automatically deployed to all the required client machines – using the Peer to Peer protocols, and the WAN bandwidth management protocols that One Site is getting known for.

abortpxe.com
bcdedit.exe
boot.sdi
bootmgr.exe
pxeboot.com
pxeboot.n12
wgl4_boot.ttf

The clients now have everything they need in order to PXE boot fellow machines. One Site’s Peer to Peer protocols take care of the rest.

4. The Real Mccoy

Seeing is believing, so here is a complete step by step walkthrough, including screenshots.

5. Open the One Site Peer to Peer PXE Perspective

To start using Peer to Peer PXE, you must first open the “Peer to Peer PXE Perspective”, which contains UI for enabling and using Peer to Peer PXE.

1. Open the “Home” perspective by clicking on the “Home” icon in the toolbar, as shown below.

2. Open the Peer to Peer PXE perspective by double-clicking the item named: “One Site - Peer to Peer PXE Perspective”, as shown above

3. This will open the Peer to Peer PXE perspective, which is displayed below:

6. Enable Peer to Peer PXE

Enabling Peer to Peer PXE is straightforward and takes only a few moments.

1. Open the “P2P PXE Settings” editor by clicking the “Edit P2P PXE Policy Settings” item in the “P2P PXE Task Navigator”

2. This will open the editor window, which is shown below

3. Download the “Windows Automated Installation Kit”, or WAIK, from Microsoft’s web site, and install it on the Adaptiva Server machine

4. Specify the path of the installation folder in the “WAIK location” field of the editor

5. Check the box “Enable Unknown computer Support” if you’ll be supporting unknown computers in future

6. Drag and drop one or more SCCM collections into the “Target collections” field

7. Click on the “Enable” button

7. Watch As It Happens

At this point, you’ve done everything that you need to do. Just sit back and watch magic happen.

8. Finally, One Site PXE Boots a Machine

Here is a VM booting up using Adaptiva One Site Peer to Peer PXE.

Remember, P2P PXE is included with your basic One Site license – there are no additional licensing costs, and no consulting services that need to be purchased.

Monday, September 5, 2011

Adaptiva One Site Scores Huge Win Over 1E Nomad

This weekend was champagne time at Adaptiva. We’re not averse to a few drops of local booze any time, but guzzling gallic bubblies on a perfect midsummer weekend in Seattle, and blogging about it?

The occasion demanded it.

It’s finally official. Adaptiva One Site has won a hard fought battle against 1E’s Nomad product at a major bank. It is not only one of the largest deals we have ever done, in dollar terms, it is also one of the most closely contested battles we have have ever won against our traditional rivals.

At stake was the opportunity to help consolidate the SCCM infrastructure across thousands of bank branches, and eliminate thousands of server roles – DPs, Secondary sites, and Child primary sites. Specially since the bank was migrating away from a combination of Altiris and Bigfix.

I’d like to thank, in no particular order:

- People at the bank, (which shall remain unnamed for now), for looking past all the sales and marketing presentations, getting down to the business of software design, architecture, measuring actual performance, and selecting the better solution.

- The Microsoft people who were involved, for their support, encouragement, and endorsement

- My team mates at Adaptiva, for creating truly magical software that is a delight to demonstrate

- Our competitors, for a battle well fought. See you again soon Smile

Inside the Adaptiva Client Health Workflow Engine

The boys at Geico presented us with an interesting question last week. They were looking at ways to detect users who were logging in as local administrators on their machines.

The traditional solution would involve a lot of scripting, MIF files, and hardware inventory, before you could collect all this data and get it uploaded into an actionable database, from which you could then construct collections and web reports as needed. In other words, a lot of duct tape.

They, of course, had been there, done that, and didn’t want to do it again.

Their question was whether the Adaptiva Client Health Workflow Engine could do it, without writing any code. From Geico’s perspective, the question obviously made a lot of sense.

If a health check could be created which failed when the user was logged in as a local administrator, and reported back the user’s login credentials, then everything else would flow automatically. The admins could point at a machine or collection, run the health check, and see the results immediately. Or they could create a scheduled health check policy, and have the check running everyday on all their machines.

The coolest thing would be the automated collection and uploading of results. All failing machines would automatically become members of the Client Health Collection for that health check, success / failure results would show up in the 40 pre-built Client Health Reports, and the login credentials of all users who were logging in as local admins would show up in an actionable SQL table.

In this blog we’ll discuss how such a workflow can be designed with a few minutes of drag-and-drop using the Adaptiva workflow designer, and without writing any code at all.

Step 1: Admire the completed workflow

Let us first admire the one object of pure beauty in this whole exercise. This is the actual workflow we’re going to create. In the rest of the article we’ll create this workflow, one step at a time.

Figure 1: The actual complete Workflow

You can see the basic elements of a workflow in this screenshot. There is a Start node and an End node. There is a Try-Catch construct that is being used for handling unexpected error cases. The If-True-False nodes together provide a conditional branching construct. You also also see the Exit1 and Exit2 nodes, which immediately terminate the workflow, even before flow of execution has reached the End node.

You can also see how the designer generates unique names for each node. e.g. When you drag and drop your first If node, it is given the name If1. If you were to drag and drop another If node, it would get the name If2, and so on. Of course, these are just default names. You can changes the names of your nodes to anything you like, as long as each node has its own unique name that identifies it unambiguously.

Step 2: Switch to the Workflow Designer Perspective

The Adaptiva Workbench is our Admin console. It provides a “Perspective” based user interface. The workbench contains hundreds of different windows, explorers, editors, views, and other user interface elements. All these UI elements can be dragged around the workbench, resized, and rearranged to suit your taste.

Since there are so many of then, they have been organized into different perspectives.

Each perspective is simply a collection of those user interface elements that are relevant to performing a particular task. You can think of it as a layout. A lot of perspectives are built into the product, e.g. One Site Package Perspective, Security Management Perspective, Instant Client Health Machine Perspective, and so on. You can also select the windows you want, and create your own perspective.

The perspective for designing a workflow is called: “Workflow Designer Perspective”. You can switch to the “Home Perspective”, and then double-click on the “Workflow Designer Perspective” to switch to it.

Figure 2: The Workflow Designer Perspective

The figure above shows what the “Workflow Designer Perspective” looks like when it has just been opened.

On the left you see the “Workflow Designer Palette”, which is an explorer of all workflow activities which are available to be dragged and dropped into the workflow you’re building.

In the middle you see a gray area where the Workflow Editor will open up when you actually create the workflow. This will provide you the drawing canvas on which you will paint your logic by dragging, dropping, and arranging different workflow activities from the palette.

In the lower middle, you see the “Workflow Errors” explorer, which displays the mistakes you have made and need to fix in your workflow. You can double-click each error, and it will navigate you to exact place in the workflow where the error has been made, and you can fix it.

On the right you see the “Workflow Properties” view. Each workflow activity that you drag and drop onto the canvas has a set of properties, using which you can control the behavior of that workflow activity. This editor lets you specify values for all the properties of each workflow activity.

Step 3: Create a new client workflow

The workflow designer allows you to create three different types of workflows: Client Workflows, Server Workflows, and Business Workflows.

For this blog, we’re mainly interested in client workflows, because these workflows are intended for execution on Adaptiva clients, and have some unique characteristics.

Chief among these is the ability of client workflows to self-deploy to clients. i.e. whenever any client needs a client workflow, it can automatically discover, download, and deploy a copy of the latest version of the workflow. Under the covers, Client Workflow self deployment uses the same peer-to-peer and bandwidth management technology on which our Adaptiva OneSite product has been built. This ensures that only one copy of each workflow will ever traverse your WAN, and that it will never impact your WAN.

To create a new client workflow, simply click on the “New Client Workflow” button in the toolbar. If you look at Figure 2, you’ll notice the toolbar near the top of the open perspective. The “New Client Workflow” button is the second button from the right.

As soon as you click the button, a workflow editor window opens up, and occupies the gray area near the center of the screen. This editor provides the drawing canvas where we’ll paint the workflow logic. It also provides a basic skeleton of a workflow, consisting of the Start and End nodes.

Figure 3: A new workflow – the drawing canvas is clearly visible in the workbench

In the screenshot above, you can clearly see how a new workflow has been created inside a workflow editor window, and a Start node and an End node have already been added to the workflow for you. The canvas is ready. We’ll soon start dragging workflow activities from the palette onto our canvas in order to flesh out the workflow.

Step 4: Set Start node and End node properties

As you can see in figure 3, our brand new workflow was born with an error already showing up in the error view. The workflow needs a name, and the name is specified using the WorkflowName property of the Start node. Let us click on the Start1 node in the canvas. The Workflow Properties view on the right of the screen automatically switches to the properties of that node which is currently selected in the canvas.

We can now provide a name to the workflow by setting the value of the WorkflowName property to “HealthCheck – Is User Local Admin”, as shown below. as soon as we do that, the error immediately disappears.

Figure 4: Setting the workflow name in the Start node’s WorkflowName property

Let us also set the value of the ResultBoolean property of the End1 node to true. This is not really required, but it is part of the workflow design etiquette for Client Health workflows. The workflow uses the ResultBoolean property of its End node to communicate success or failure. By default, we set it to true, to indicate success. Whenever we detect that the health check has failed, that workflow activity will immediately set the value of the ResultBoolean property of the End node to false, and that will make the health check get reported as failed.

Figure 5: Setting the ResultBoolean property of the End node to true, indicating success for the health check

The screenshot above shows how we used the combobox to set the value of the ResultBoolean property of End1 to true.

Step 5: Add a Try-Catch node to the workflow

It is time to add some nodes to our workflow and flesh out the logic we want to implement. Let us drag and drop the Try-catch node from the Palette onto the canvas.

The palette contains so many activities, we don’t want to search manually through them. As in all explorer views in the Adaptiva workbench, the palette also contains a search box, which uses AJAX, much like Google instant, and does all the searching for you. So, we’ll just type the word Try into the search box, and this is what will happen:

Figure 6: Making the search box find the Try node for us

Now we can drag the Try-catch exception node onto the drawing canvas, somewhere between the Start and End nodes, and this is the result:

Figure 7: We just dragged and dropped the Try-catch exception activity from the palette

As you can see, the designer added three nodes to the workflow, which together form the Try-Catch construct. As in programming, the Try-Catch construct is a wrapper for catching and handling any unexpected errors. The Try1 node is the parent node and serves as the container node. The MainTry1 node is the child node that will contain the main logic for the workflow. the Catch1 node will contain our logic for dealing with errors.

Whenever an unexpected error condition occurs during the execution of the MainTry1 node, or any of its children nodes, the Catch1 node will execute immediately. In this case, we will handle the error by simply signaling that the health check has failed, and we’ll report back the error that occurred.

In the next step we’ll add this error-handling mechanism to the Catch1 node, and then we’ll be ready to put the core logic of our workflow into the MainTry1 node. It isn’t absolutely necessary to use a Try-Catch node. We could easily skip it, but we always do it, because it adds a layer of robustness to the workflow.

Step 6: Add exception handling capability to the Catch1 node

The workflow uses the following properties of the End node to signal failure and send data back to the SCCM Site:

ResultBoolean: true for success, false for failure

ResultText: in case of failure, it contains the error message that gets reported back to the SCCM Site

ResultWholeNumber: in case of failure, it contains the error code that gets reported back to the SCCM Site

Each node, when it executes, gets an opportunity to set the properties of all other nodes in the workflow. We’ll use this capability to modify the properties of the End1 node whenever the Catch1 node executes:

Figure 8: Setting properties on the End1 node whenever the Catch1 node executes

We’ll also drag and drop an Exit node to terminate the workflow immediately when an error occurs:

Figure 9: Adding an Exit node to terminate the workflow immediately when an error occurs

Even without this Exit node, execution would have simply fallen through to the End node, but adding the Exit node makes it more obvious that the workflow must end immediately whenever an error occurs.

Step 7: Checking if the user is an admin

Now we come to actual all-important step of determining whether a user is currently logged on to the system, and whether he is logged on as a local admin.

For this we’ll just drag and drop the IsUserLoggedOn activity from the palette on to the canvas:

Figure 10: Adding a IsUserLoggedOn node to detect the logged on user

When this activity executes, it first checks whether a user is logged on. If so, it obtains the user’s credentials, and checks whether he is logged on as a local admin. If not, it determines the credentials of the user who had last logged on to the system. The results are returned in the properties of the IsUserLoggedOn1 node.

Here is the complete list of properties:

Figure 11: Properties of the IsUserLoggedOn node contain the results of its execution

The names of all the properties of this node are suggestive of the data returned by each one of them. The properties of all the nodes of the workflow are accessible to all other nodes in that workflow. So, now that we have this data available inside the workflow, we can process it in any way we wish.

Step 8: Conditional branching based on results

Next we’ll drag and drop an If node from the palette to the canvas, which will allow us to check the value of the IsLoggedOnUserAnAdmin property of the IsUserLoggedOn1 node, and branch differently depending on whether it is set to true or false;

Figure 12: Adding an If node to the workflow for conditional branching

You’ll notice above that dragging and dropping an If node actually added three nodes to the workflow: If1, True1, False1. The If1 node will contain the condition based on which branching will take place, True1 is the branch that will execute if the condition is true, and False1 is the branch that will execute if the condition is false.

Let us set the condition in the Condition property of the If1 node:

Figure 13: Setting the Condition property of the If1 node

The condition we have added simply checks if the IsLoggedOnUserAnAdmin property of the IsUserLoggedOn1 node is set to true or not. If it is set to true, then the True1 branch will execute. If it is set to false, then the False1 branch will execute.

Step 9: Failing the workflow if the user is an admin

Now we’re almost done. We know that the True1 node will execute only if a user is logged on as a local admin on the system. All we need to do is, fail the workflow, and return the user’s credential’s from inside the True1 node, and that’s it.

We saw earlier that the LoggedOnUserName and LoggedOnUsrDomain properties of the IsUserLoggedOn1 node contain the user’s credentials. We can just use these properties and report them back to the SCCM Site, by setting the ResultText property of the End1 property from them, as shown below:

Figure 14: Setting the workflow’s return values from the True1 node

This is the actual expression we provide in the ResultText property of the End1 node:

Figure 15: Returning the logged on user’s name and domain name to the SCCM Site

For good measure, we’ll also add on Exit node as a child of the True1 node, to emphasize that the workflow will terminate immediately as execution enters the True1 node.

Figure 16: Adding an Exit node under the True1 node – the workflow is complete

Step 10: Finally, the analysis

So the workflow is done, completed. Let us just analyze how the execution flows in case a local admin is logged on.

The Start1 node executes, followed by Try1, and MainTry1. They simply pass control to the next node. Then the IsUserLoggedOn1 node executes. It performs the actual checking, and sets the results in its output properties. After this, the If1 node executes, and performs conditional branching. If the user is an admin, the True1 node executes, and it sets the values of the End1 node to indicate failure of the health check, and reports back an error code, and the user name and domain name of the logged on user. Then the Exit2 node executes and immediately terminates the workflow.

I’ve enclosed below the execution log for the above workflow. It shows how the properties of the workflow’s nodes change and captures the flow of execution control.

PropertyValueChange: Start1.WorkflowInstanceId, WHOLE NUMBER, Old: None, New: 196
PropertyValueChange: Start1.LaunchEventType, TEXT, Old: None, New: AdaptivaNotification
PropertyValueChange: Start1.LaunchNotificationType, TEXT, Old: None, New: ExecuteHealthCheck
PropertyValueChange: Start1.LaunchNotificationQualifier, TEXT, Old: None, New: 44
PropertyValueChange: Start1.LaunchNotificationObject, JAVA OBJECT, Old: None, New: None
ExecutionNode: Starting: Start1
ExecutionNode: Starting: Start1.Try1
ExecutionNode: Starting: Start1.Try1.MainTry1
ExecutionNode: Starting: Start1.Try1.MainTry1.IsUserLoggedOn1
PropertyValueChange: IsUserLoggedOn1.IsUserLoggedOn, BOOLEAN, Old: None, New: true
PropertyValueChange: IsUserLoggedOn1.LoggedOnUserName, TEXT, Old: None, New: Administrator
PropertyValueChange: IsUserLoggedOn1.LoggedOnUserDomain, TEXT, Old: None, New: SMS-COMP
PropertyValueChange: IsUserLoggedOn1.LastLoggedOnUserName, TEXT, Old: None, New: Administrator
PropertyValueChange: IsUserLoggedOn1.LastLoggedOnUserDomain, TEXT, Old: None, New: SMS-COMP
PropertyValueChange: IsUserLoggedOn1.IsLoggedOnUserAnAdmin, BOOLEAN, Old: None, New: true
ExecutionNode: Ended: Start1.Try1.MainTry1.IsUserLoggedOn1
ExecutionNode: Starting: Start1.Try1.MainTry1.If1
PropertyValueChange: End1.ResultBoolean, BOOLEAN, Old: true, New: false
PropertyValueChange: End1.ResultText, TEXT, Old: None, New: Local admin: SMS-COMP\Administrator
PropertyValueChange: End1.ResultWholeNumber, WHOLE NUMBER, Old: None, New: 1283
ExecutionNode: Starting: Start1.Try1.MainTry1.If1.True1
ExecutionNode: Starting: Start1.Try1.MainTry1.If1.True1.Exit2
ExecutionNode: Ended: Start1.Try1.MainTry1.If1.True1.Exit2
ExecutionNode: Ended: Start1.Try1.MainTry1.If1.True1
ExecutionNode: Ended: Start1.Try1.MainTry1.If1
ExecutionNode: Ended: Start1.Try1.MainTry1
ExecutionNode: Ended: Start1.Try1
ExecutionNode: Ended: Start1
Terminated By[:Terminate Activity]

You can watch a high resolution video of the Adaptiva Client Health product here: http://adaptiva.com/videos.html