These are the days of cheap storage - but even the cheap storage may run out. And running Team Foundation Server storing artifacts in its (multiple) databases may use up your space rack faster than you might have expected (and if you want to know what to expect, refer to this classical post by Buck Hodges on database size calculations).
If that happens, the most probable culprit is version control database (TfsVersionControl) – in other words, all these files that people check in into version control. The size of the file matters because TFS stores difference only for each new revision of “small” files but for the “large” files every new revision gets full-blown copy (by default TFS considers the file to be large if it is over 16 Mb - read more on that topic in my previous post).
There are several ways of making sure that your users do not fill up your version control with memory dumps, images of installation CDs and such. Mind you – I am not saying that large files do not belong to version control; I am saying that the addition of large files should be a) conscious step and b) “revisionless” (i.e. with no versioning).
Myself, I have been always ambivalent about storing large binary thingies in source control – on one hand, you get all content in one place (which is mighty convenient for builds etc.), on the other hand, many users will probably check in the content that does not belong in source control. So here is my hit list of measures to deal with large files in version control
- Educate your user – make sure your average user understands that DVD ISO added to version control ends up being transmitted and stored in the database; perhaps what the user is looking for is file server, not version control
- Make user aware of his actions – it is possible to write check-in policy that would alert the user at the time of check-in, that the files being checked in are large and perhaps should not be in version control. And then, even if the user decides to override the policy you may run report on policy overrides
- Monitor your storage – if high level prevention and low level prevention fail, you can query the database to identify the offending files. The query below (with usual caveats – it is AS IS etc.) will give you a list of large files in the database (it will not take into account the summary size of all versions, only the latest version):
DECLARE @LargeFile int;
-- return files larger than 16 Mb
SET @LargeFile = 16 * 1024 * 1024;
USE TfsVersionControl; –– use source control DB
SELECT -- item path
Versions.ParentPath + Versions.ChildItem AS ItemPath,
-- size of latest version in DB
Files.CompressedLength AS DatabaseSize,
-- size of original file
Files.FileLength AS [Size],
-- whether item deleted
CASE WHEN Versions.DeletionId = 0 THEN 0
ELSE 1 END AS Deleted
FROM tbl_File Files, tbl_Version Versions
WHERE -- get item latest version
Versions.VersionTo = 2147483647
-- join to table with sizes
AND Versions.FileId = Files.FileId
-- return only large files
AND Files.CompressedLength > @LargeFile
ORDER BY ItemPath;
I would be happy to hear your horror stories of the application of the above query; mine was nothing more than a bunch of ISO images checked in :)
Thanks for reviewing the query go to Chandru Ramakrishnan