Tuesday, November 28, 2006

Merging and resolving conflicts using API

Recently, there were several questions on MSDN forums how to perform merge and resolve conflicts using Version Control object model API. Insofar as online help for those API is very insufficient, I thought I'd share merge related gotchas that I have accumulated over the time.
I will try to have a look at somewhat typical scenario of merge automation (where all changes are merged with uniform conflict resolution algorithm).

First thing to do is to call Merge method of Workspace class:


    Workspace workspace = _serverVc.GetWorkspace("WORKSTATION",
                "DOMAIN\\user");
    GetStatus status = workspace.Merge("$/Project/Ongoing/Solution",
                        "$/Project/Branches/Solution",
                        null,
                        null,
                        LockLevel.None,
                        RecursionType.Full,
                        MergeOptions.None);

Let’s have a look at Merge and its parameters (I will talk about more complex overloaded version; other version just takes default values for unexposed parameters):

    public GetStatus Merge (string sourcePath,
                        string targetPath,
                        VersionSpec versionFrom,
                        VersionSpec versionTo,
                        LockLevel lockLevel,
                        RecursionType recursion,
                        MergeOptions mergeOptions)

First two parameters specify source path and target path for the merge. Those may be either server or local paths (the target path must be mapped in workspace prior to merge).
VersionFrom” and “versionTo” parameters may be used to specify range of versions to perform merge from. One may use that in several different ways. By default (if null is specified for first parameter, null or latest version for second), all unmerged changes in source will be merged. If same changeset is specified in both parameters, only changes in that changeset will be merged (selective merge). To merge changes up to specific changeset, first parameter may be null and second version up to which merge required.
LockLevel” parameter specifies what lock will be set on changes pended (LockLevel.None will usually do – I cannot at the moment think of any scenario you would like other lock).
Recursion” parameter specifies level of recursion to use on source and target paths (usually with folders one will use RecursionType.Full).
MergeOptions” parameter is the one that defines merge initial behavior. It may have the following values (see MergeOptions enum):

  • None – no special options (same as using Merge wizard in VS UI)

  • AlwaysAcceptMine – discard any changes in source, and just update merge history (same as discard option in tf.exe merge command, not available in VS UI). Essentially, the option says to resolve all conflicts using Resolution.AlwaysAcceptYours (more on that below)

  • ForceMerge – do not look at merge history and perform merge for specified range of versions from source as if no merges were performed (same as force option in tf.exe merge command, not available in VS UI). When that option is specified, “versionFrom” and “versionTo” parameters must be set in call to Merge

  • Baseless – perform baseless merge (when source and target items have no branch relationship between them)

  • NoMerge – do not perform actual merge (same as preview option in tf.exe merge command, not available in VS UI)

All options except NoMerge are mutually exclusive.

After all parameters values were specified and Merge was invoked, the next thing to do is to look at returned value (of type GetStatus). Personally, I dislike it very much as it provides information in rather incomprehensible way - each field in return value as well as their combination tell you what happens in merge.

Possibilities are (I know those by trial and error, so probably there is still dearth of other interesting combinations):

  • NoActionNeeded == true && NumOperations == 0  – means that no changes in source needed to be merged, so no actual changes were pended

  • NoActionNeeded == false && NumOperations > 0 && HaveResolvableWarnings == false  – means that merges were performed, but all conflicts were resolved automatically. Need to check in pended merge changes and that’s about it

  • NoActionNeeded == false && NumConflicts > 0  – merge was performed and there are conflicts to resolve

First two cases are obvious. In the last case there are conflicts and resolution is required. I will talk only about simple conflicts (content changes) and not rename/delete changes (those are kinda complicated, and I will leave that to MS guys with access to source code; besides I doubt if there is any merit in automatic merge or conflict resolution for delete/rename changes).

Let’s try to implement conflict resolution algorithm similar to manual merge in Visual Studio.

First, one needs to retrieve list of conflicts:

Conflict[] conflicts = workspace.QueryConflicts(new string[] { "$/Project/Branches/Solution" }, true);



The method QueryConflicts is pretty obvious – it returns all conflicts on the specified paths in the workspace, last parameter specifying whether query should be recursive.
Now it is possible to iterate over the conflicts and resolve them one by one:

    foreach (Conflict conflict in conflicts)
    {
        if (workspace.MergeContent(conflict, true))
        {
            conflict.Resolution = Resolution.AcceptMerge;
            workspace.ResolveConflict(conflict);
        }
    }

The code above calls for each conflict method MergeContent, that will invoke visual merge tool. After the user performed visual merge (known by MergeContent return value being true), the conflict is ready for resolution.
To resolve conflict, the Resolution property of the conflict is changed according to the resolution (see Resolution enum). Possible values are (I discuss only options relevant to simple merge scenarios):

  • AcceptYours – local version is to be used for merge

  • AcceptTheirs – server version is to be used for merge

  • AcceptMerge – resolve conflict by doing manual merge

Now, in my code I use AcceptMerge as it is expected that user will create version using the merge tool locally.
After the resolution is set, ResolveConflict method is called to signal that conflict is resolved (if resolution succeeded, IsResolved property of the conflict will now return true). By the way, if we talk about conflict properties already, another useful property is IsNamespaceConflict; it lets you know whether conflict is in file content or in version control namespace (rename etc.)
Suprisingly, AcceptMerge resolution option goes an extra mile for you and does something similar to code snippet below

    if (conflict.IsResolved)
    {
        workspace.PendEdit(conflict.TargetLocalItem);
        File.Copy(conflict.MergedFileName, conflict.TargetLocalItem,
            true);
    }

After the resolution, you have Edit pending on the file and merged file as a local copy.

Once all conflicts are resolved it is possible to check in changed files and thus complete merge.

But resolution of conflicts raises several additional issues (even if not taking into account renames and deletes). For example, if during conflict resolution one specifies that source version should be taken in merge that essentially means that local file after resolution must be identical to source version. It turns out that ResolveConflict will handle those situations for you: for example, after resolution with AcceptTheirs you will have source version in your workspace without doing anything extra.

Obviously, the same steps to conflict resolution may be used when resolving conflicts that occur on check in (though I am not sure I can see ready automation scenarios there).

In conclusion, I would not recommend using Version Control API for merge and conflict resolutions but rather recommend sticking to command line tf.exe client for advanced scenarios. While the thing is doable, you should be prepared to spend quite an amount of quality time on it and be prepared later to fix bugs (mostly related to myriad scenarios and border cases you did not think of).

Please take the examples above with a grain of salt; if you find errors/omissions do let me know so I can keep it updated.

Friday, November 24, 2006

(Not) getting latest on check out - a bug?

Did you know that TFS will not automatically get latest version on check in? And what do you think about it?

In all probability, you know about that particular feature and roll with it (if you use TFS that is). But the question that appears to be still actively discussed is whether it is such a big deal, that “get latest version” is not done automatically on check out. When I started using TFS, at first impulse I thought it to be somewhat problematic (coming from VSS background). But after I worked with TFS for a year and being now somewhat older (and may be even wiser) I do not feel that way anymore, and even consider it an advantage of TFS Version Control over Visual SourceSafe. But as it happens, there is an opposite point of view; out of disagreement with it I was moved to write that post.

So let us start with short preamble. If you used Visual SourceSafe for version control, you used it with exclusive check out only (yes, I know that it allows to check out files concurrently, but never ever heard about success story of using VSS in that manner – while hearing lots of stories to the contrary). When you perform check out using VSS, it conveniently retrieves latest version for you and makes it writable.

Enter TFS Version Control. By default, check out is performed concurrently. When check out is performed, the local file is made writable (no version is retrieved from server).

My conclusion at that point would be “Wow, Team Foundation Server is not the same as Visual SourceSafe and uses different source control model, so we need to learn something about it and may be even adjust our practices!” Should have been no brainer, that one, don’t you think? But strangely enough, people tend to overlook that point from very beginning and try to use TFS VC as next version of VSS.
At that stage the typical issues that arise are


  • TFS won’t get latest version for me before check out

  • TFS will perform concurrent check out by default

  • TFS by default will not place any lock on checked out file

Can we work around these issues to make TFS exactly like SourceSafe? Unfortunately no. Is TFS any worse for that? Absolutely not. And I am going to prove that!

I am going to take real life example. Before I got my hands dirty with TFS, I participated in development of largish application as part of team of around 30 developers, using VS 2005 and VSS for source control. Let’s have a look at typical check out and what can (and did happen) after.

  1. The file I checked out was not modified on server. Then essentially I have latest version on my workstation, and simply making it writable would suffice. As nothing has changed, I shall be able to compile my project without problems.
  2. The file I checked out was modified by someone else and checked in, so my version is outdated. Get latest retrieves newer version but luckily the changes are such that when I perform build it compiles.
  3. The file I checked out was modified by someone else and checked in, so my version is outdated. Get latest retrieves newer version but now the changes are such that my project does not compile (for example, the method signature in that file was changed, and that very method is used throughout other project files).

As one may see, only third case creates a problem. Now, I bet that when you checked out that file you were not going to integrate the changes made in that file on server into your project. My guess would be that you are implementing some feature, and implementing it requires file modification so you checked that out. And now you cannot compile your project.
So instead of doing coding you are integrating changes. If the project is large, you probably will not perform “Get latest version” on whole project recursively (as in VSS it will take eons of time, and you are in the middle of development!). What you do is to try and handle the files one by one – let’s perform get latest for the files that break my build! Surely that will help! Ok, you do that. And it turns out that latest version of that other file breaks your build in some other place. That’s called chain reaction! At that point you have two choices – either perform “Get latest version” file after file until the project compiles, or start that recursive “Get latest version” beast and go pour some coffee (I assume that beating the crap out of guy who broke your build is not a valid alternative :).
Here we go as far as VSS is concerned – out of three cases, two work nicely and one is a major pain. I worked over VPN oftentimes (with full recursive “Get latest version” on the project taking lots of time), so I was full of apprehension every time I checked out “popular” file.
I can understand why dev’t guys at Microsoft wanted to help out the users in that problematic case. TFS solution is elegant, easy to understand and supports concurrent development at that! But it appears it is never a good idea to take away freedom of choice (even if it means preventing people from shooting themselves in the foot). Anyway, here goes TFS solution:

  1. The file I checked out was not modified on server. Then essentially I have latest version on my workstation, and TFS makes it writable. My project compiles as it did before check out, and there will be no problem to check in, as I am the only one who changed the file
  2. The file I checked out was modified by someone else and checked in, so my version is outdated (but the changes are such that they do not affect other files). Local file is made writable, my project compiles as it did before check out and all is well until check in. On check there will occur a conflict for me to resolve (more on that later).
  3. The file I checked out was modified by someone else and checked in, so my version is outdated (the changes are such that they do affect other files). Local file is made writable, my project compiles as it did before check out and all is well until check in. On check there will occur a conflict for me to resolve (more on that later).

In TFS, two cases created a problem for me instead of one in VSS! What’s happening here? We have paid some serious money for that souped up VSS and it cannot even check in files, huh?
In fact, several things happened, not all of them obvious.
First, what you got is a boost in immediate productivity – developer is allowed to develop (supposing one does check out in order to add new changes) without interruption, integration is delayed to development completed stage.

Second, there is overhead for conflict resolution on check in. That part here is tricky, and I am afraid that here it will be my personal opinion vs. yours. But as it is my blog, I am not afraid, so I can state plainly that overhead depends on quality of your code, quality of your development tasks and your engineering people.
If the code is modular, and each engineer performs well-defined task in all probability the conflict will be resolved automatically – that is new version of file will have changes non-overlapping with changes in version made by other developer (talking about agile shops, there developers may more often perform code breaking tasks; but in agile settings effective communication is the key, so assuming that it is relatively easy to handle merge conflicts effectively and in real time).

But in real world we have code breaking changes, and code does overlap! Does TFS do better job then by highlighting those conflicts after the fact (as compared with VSS that by breaking your build signals you before the fact)? In my opinion, TFS VC approach is indeed better and here is why. You check in your changes, the conflict cannot be resolved automatically, and you have that three-way merge window to stare at. At that point, you either qualified to perform merge, or not qualified to do so. How can you be not qualified to do that? For example, if code you have written does different thing from the same lines of code in server version of file. But wait a minute, that’s a sign of a different problem! You have been doing your job in parallel with someone else, and at that point in TFS there surfaces the problem while VSS would be hiding it!

I am well aware that my reasoning is not perfect, but overall I believe that adoption of TFS will lead to significant productivity gain over VSS, even though that may require some changes in work habits. But would you like to do concurrent development? If so, how well would your VSS-centered model fare? Your new practices should answer those questions as well, may be even before you start thinking how that get latest stuff will affect your development.

To conclude, while that get latest thing may seem a deficiency, and surely the user must have a choice (as one apparently will in TFS v2), I do not view it as a showstopper, and strongly believe that client may be made aware of TFS advantages over VSS using that very feature (or an absence of it – depending how you look at it). It seems that Microsoft somewhat underestimated the impact VSS had on the development practices over the years; but as VSS addicts have more hands on experience with TFS I do hope that VSS-only work patterns will fade away.

And to add up to this argument, some links from MS development team on that very subject:
Buck Hodges blog post (read the comments as well)
Adam Singer blog post (excellent read, but be prepared - it is longer than that one)

I tried to be as concise as possible (unless Thanksgiving dinner somewhat got in the way :), but please drop me a line to know what you think and where I might have been wrong.

Tuesday, November 14, 2006

Creating custom tasks for Team Build

When you create task for Team Build, the usual rules of custom MSBuild task apply, but there are little differences.

Probably, when you create custom task for Team Build you will want to use TFS object model to access version control. For that purpose, first you will need to establish connection to TFS server.

Most obvious approach would be to implement task with server name parameter and establish connection using that parameter:


public class SimpleTfsTask : Task
{
    private string _serverName;


    [Required]
    public string ServerName
    {
        get { return _serverName; }
        set { _serverName = value; }
    }


    public override bool Execute()
    {


        TeamFoundationServer server =
         TeamFoundationServerFactory.GetServer(_serverName);
        // ...
    }
}



Now, the downside of the approach is obvious - the server name parameter should be either placed inside the build script or in somewhat better way supplied as external parameter for the build.

But wait a minute - how come that most of Team Build predefined tasks (for example, those in Microsoft.TeamFoundation.Build.Tasks.VersionControl assembly) do not have that parameter? Whence the Team Foundation server url is retrieved?

The solution is rather simple - Team Build tasks that do not have server parameter must be used in context of workspace (that is it is supposed that build workspace exists before those tasks are called). Tasks that cannot have such context (for example, CreateWorkspaceTask) have TeamFoundationServerUrl parameter.

Let's have a look at how to get Team Foundation server url given valid workspace.
The code below first gets workspace in which specified local path is mapped (not Workspace class but rather WorkspaceInfo that uses local cache) and then Team Foundation server data is retrieved from that workspace properties.

public class AnotherTfsTask : Task
{
    private string _localPath;
    [Required]
    public string LocalPath
    {
        get { return _localPath; }
        set { _localPath = value; }
    }
 
    public override bool Execute()
    {
        WorkspaceInfo workspace =             Workstation.Current.GetLocalWorkspaceInfo(_localPath);
        TeamFoundationServer server =             TeamFoundationServerFactory.GetServer(workspace.ServerUri.AbsoluteUri);
        // ...
    }
}



Obvious advantage of the second approach is that if you have workspace (retrieved either by path or by workspace name), you may always create Team Foundation server instance to access version control or its other services.

In conclusion, I would like to note that it is wise to use TeamFoundationServerFactory rather than constructor to create TeamFoundationServer instance, as factory approach uses caching and given that usually single build script connects to only one TFS server that may result in noticeable performance boost.

Wednesday, November 08, 2006

TFS folder items and history

Recently, Richard Berg wrote very concise post on what are items in TFS version control. In the post he states "TFS rarely makes a distinction between files and folders". While that surely true on a high level, there are still some important differences in small implementation details.

The one difference I'd like to address is history representation. The history for the folder displayed in History tool-window in Visual Studio contains both changes performed on the folder itself as well as changes to any files/folders contained in the folder.For example, when folder is renamed the change will appear in its history together with the change resulted from new file added to the folder.

To make things more interesting for the user, History window for the folder does not indicate whether the change performed affected the folder or only the files in it (in History Sidekick we tried to display that information - history entries that include folder changes are marked as such). Also the history for the folders does not allow version comparison by simply selecting two changesets in list (though you may compare folder versions using Tree Diff from TFS Power Toys).

But other than that, files and folders do behave alike and I hope that in next TFS version the folders history will be much more similar to files.

On related note, I would highly recommend to watch Richard Berg's blog as he unveils the mysteries of TFS Version Control in his upcoming posts (two posts so far, so that's the high time to jump on the bandwagon).

Sunday, November 05, 2006

Changing work item types affects source control

You'd ask how will changing work item type affect source control? My first answer would be - it will not really change anything as only connection between source control and work item tracking is through changesets or files association to work items. But it turned out not to be not that simple.

Let's say that you have introduced new validation rule as a result of work item type change, and some field that was optional become mandatory. If there were existing items at the time of the change, that will mean that upon changing and saving any of these items you will have to specify value for the field that become mandatory. But how that is related to source control?

Here goes the scenario: you are checking in some files and want to associate them with work items in "Pending Changes" window, and work items have data that become invalid as result of work item type conversion. The files to check in are selected, work items to accociate with are selected, you hit Check In - but that's no go! The following message is displayed (clicking on image will display larger screenshot):



And if you think about it some more, it really makes sense since the default check-in action for changeset and workitem association is "Resolve", and that will require
work item modification.

But what about changing that check-in action? Let's change that to "Asscociate" and see if check in will fare any better. No luck here (and I would say the result is even worse) - files are checked in, changeset is created but association to work items is not created, as seen on the following screenshot (clicking on image will display larger screenshot):



In that case, I am not so sure that the behavior exhibited is one expected. I am doing the association between work item and some other artifact, should the full validation be performed? And why the validation is performed after check in?

To conclude, that would be wise to handle data conflicts in existing work items as part of work item type modification. When you have hundreds of work items and modify the type, you do not want all your developers fill in the missing data - that is clearly part of conversion (and I am not talking about breaking changes; making several fields mandatory can seriously affect productivity).