About myself and everyone else.
Object version checker, the problem.
The particular design makes use of a primary Database(tier 1) for point of entry for the data. A subset of the data on this database is replicated using transactional replication to a destination called replicated database (tier 2). Based on business logic and a set time some of the data on the tier 1 database is moved to the archive (tier 1.1) . The archive is in turn also replicated to the tier 2 database. The tier 3 database transforms the data to a OLAP schema via triggers and store procedures and is the replicated to the final stage where the data is queried. This is called the (tier4) database. Each client have 8 databases. Tier 1,Tier 1.1 Archive, Distributors for tier 1 and 1.1 and distributor for tier 3, there is a Tier 2 database, Tier 4 and final data store where the front end resides on. All the databases are SQL server databases. The tier 1 and tier 1.1 database is used by the core department responsible for applying patches the those specific databases release by the core development department. They would use their own system for versioning. 3rd Party application developers will also release their software to the core department to be rolled out. Specialised development departments i.e. product or campaign centric will also be developing patches that are applied to this database (tier 1), an example of this external APIs to replace internal core functions .
The tier 3 development department develops patches and functionality to be applied to the tier 1 – 4 databases. Because it’s an independent department and developers they also use their own versioning system as their release cycle is different. 3rd Parties can also develop patches and functionality for these databases. The customer can also develop objects for the tier 4 database they may have a different versioning system or none at all.
Patches and new software releases between the departments aren’t synchronized, because they have independent goals and timelines. Each release has a beta period. Once the beta is finished the release is applied to all the systems.
The problem with versioning.
1 Different releases target different tables and with same database.
2 Different departments (Physical in different locations geographically).
3 No standards for code (3rd Parties mostly).
4 Unique goals on department level.
5 Individual timelines for departments.
6 Individual timelines for urgent releases and fixes.
7 All departments use the same testing team. Thus a sequential work process as opposed to parallel.
8 Rolling window of change – Because of a BETA period for a release one system could be experiencing issues of a BETA release. This would have a knock on effect of all other release from all other software developers. The issue would be fed back to dev for a fix but the release would continue to be deployed to systems unless it’s a major issue. Once the release is rolled out the system will immediately have the pending hotfix applied. In summary on an overall level very few systems are on the same version for a long duration.
9 Customer functionality and their own development. – Some systems will have only have a major release cycle , this could be because of request by the customer or because of different platforms used for the core module (tier 1). For instance, They would have their own APIs for expressing application behaviour. This means the development would only have major releases as opposed to major, minor and hotfixes.
I couldn’t develop a solution based on the developer’s versioning system as it is not consistent. I decided to make use to the system objects and meta-data for the systems in order to build a version catalogue. This version catalogue would run on system tables because there is no lower level available to describe an object in SQL. This same meta-data is used by the SQL engine and is extremely reliable and consistent across any major version of SQL server. System tables is SQL are also very efficiently designed because of the SQL engine using them for query plans and other internal processes.
Meta-data version construction
The meta data available in SQL server is very normalized for optimal performance. In order to derive a version of the data I used a name value lookup system combined with OTL (one true lookup), this is also used by windows registry. It had a key defining the object and it has an associated name and value within which delivers the property and value. I also combined this general logging system with an additional feature called the parent-child. In SQL server many items can be derived as parent child. As example; if a table is a parent and the type of trigger, the trigger name would be the child and the contents of the trigger would be its value. In many cases the content of the trigger would be textual. Text is not great for versioning and an internal SQL function is used in order to generate a checksum of the binary data. This would generate a unique number associated with the specific trigger and parent child relationship. Another example; when a table is parent and the type is column, each column name within the table is a child and the data type and data length would be the value.
With the end goal of the data being reported on and presented visually the concept of a hierarchy can be used to a massive advantage where a database version can be rolled up into a single common parent to all versioned objects. As example; When the hierarchy of all the parents and child is joined to create a hierarchy you will have a server as the top-tier, all the databases within it and within the database the relevant objects with their relevant versions. This visual presentation also look very similar to the tool SQL Server Management Studio which ships with SQL. Thus support engineers feels familiar with the visual appearance.
Pro’s of system tables.
a. Data always in a consistent state.
a. Always online
3- Homogenous schema.
a. Any major SQL Server version will have the same objects and column names.
a. Used internally by SQL Server; some objects are in memory.
5- Developer friendly
a. Using the parent – child, name and value design it’s very easy to introduce new items to the system to be versioned. The version value can also be calculated at run time.
The data set for collection.
The above gave me the method of querying the data and a basic dataset of all the relevant objects within a targeted SQL Server database. However because of the window of change of the systems I also had to introduce a time stamp, this could also be used to integrate with other monitoring systems and would allow for trending of the changed data. I also needed to add the source of the information, this is used in order to give the source i.e. Server name of the versioned data. This server name and source information is also used to build the hierarchy. Because there can be multi SQL instances on a single server additional information is also added, the SQL instance name.
The ETL – Extract transform and load.
The information is locally collected on the server by means of a scheduled job executing the procedures that pulls the versioning data directly from the system tables. Because of the JIT collection there is no need for additional space on the server for storing heaps of version information and other DDL modification logs. In many instances I’ve experienced these tables can grow quite large and provide unnecessary overheard to the system. Once collected the data is moved to a central database server into a stage table. This is done via a pull in order to reduce potential overhead on the system where versioning data is being collected from.
Once all data is collected the load process starts to populate a data warehouse. I designed the data warehouse according to a star schema. I created a dimension table for each parent and for each child. I’ve also create a data dimension which would later be used in order to generate SQL partitions from. This is done in order to minimize the load time for new data. The source information was also incorporated into the dimension model. I created a fact table for each type this fact table would have the relevant version as a measure. During the ETL the first upload would be done to add any new dimensions (this is a star schema with slow changing dimensions). Once all the dimensions are loaded the fact tables are loaded with the relevant dataset . This load process begins by identifying the relevant fact table that is to loaded and the date for which the data is to be loaded for. It will then use SQL servers partitioning to switch out the partition for which the data is being loaded. It does this in order to improve performance as only the swapped out partition will be scanned for the information. Once the partition is loaded is it switched back into the main fact table. This process reduces the loading time significantly. Now that the data is loaded and committed in the data warehouse and stubs are removed from collecting sources. This is to make sure that we have a clean slate for the next collection.
Reporting on the data.
I created a view that would render the detail into the data. This would in essence perform a join to the relevant dimensions and give us meaningful information about the child and it’s relevant version which in data warehouse terms are the measure. I did develop a SQL Server analysis services(SSAS) cube to present the data but I found that this was not very use friendly as many SQL engineers had to acquire the knowledge of how to interrogate SSAS. I designed an excel pivot table to display that version data. Data is imported in SQL which can be done natively in excel with the available data adapters. Once in excel I designed a pivot table. The pivot table includes all the information from the view which at this stage contains the source information, the versioning parent,child, timestamp, type , name and value.
When the pivot is created the source information is display as columns, the child(in our example trigger name) is displayed as rows. The value which was derived as a checksum from the original trigger text on our source database is display in the values area. The aggragation function applied to this is set to none. In that way our checksum is displayed as it was generated on the specific source database. Because our various sources are split out as column headings we can see where the trigger version is different. When we add the timestamp to the columns we can also view the trend.This wil indicate when the discrepancy took place we can then associate the date with a deployment schedule. The immediately value is added by an at a glance visibility of version differences on the systems.
This forms an effective and powerful tool in order to baseline and SQL server system as the collection relies on system tables. The visual appeal of excel can instantly highlight issues.