Simple.Data 2.0

Today I started work on the next version of Simple.Data. Even though there hasn’t been an actual, official release of 1.0 yet. That’s something I hope to rectify in the next week or so, by fixing the few remaining issues and hoping the documentation is sufficient, although not exhaustive.

The initial work on 2.0 involves addressing some of the issues that have appeared as a result of a code-base growing organically from something that started out much less ambitious. These changes will clean the code up to provide a better base to build on, and hopefully to make it easier to write adapters for a wider range of back-end data platforms. I’ll also be building a RavenDB adapter, not because RavenDB’s own API needs improving upon, but because I’d like RavenDB to be one of the platforms that can be swapped in to an application built using Simple.Data.

Once that’s done, there are a few headline features for the 2.0 release:

Async support

Async’s strong point is non-blocking I/O operations, so getting it into Simple.Data is a no-brainer. I’ve got to work out the details, but you should be able to await everything.

Batch operations

You’ll be able to create a batch of operations and execute it with a single connect-run-disconnect process. This will also support adding arbitrary Actions and Funcs into the chain of operations to be run between database calls, using previously-returned data.

Metadata exposed

You’ll be able to access all the database schema information from the underlying database, without worrying about differing INFORMATION_SCHEMA or sys.* or what-have-you. Where appropriate, you’ll also be able to connect to a server and get database information too.

Plug-ins

There are lots of things that could be added to Simple.Data that would probably stop it being quite so Simple, so I’m going to add various hooks and extension points to allow either me or other people to add this stuff. Adapter authors will be able to add custom “methods” to the Database or Table objects, and you’ll be able to check whether those methods are available at runtime.

I might also refactor out some of the more esoteric functionality that Simple.Data currently has into plug-ins. We’ll see.

Better WithStuff

The With operations on queries will be getting much more powerful, allowing complex object graphs to be populated in a single call. I’m also looking at allowing object graphs to be saved with a single call, if that’s actually possible.

Oh, and IntelliSense

It’s not possible to glom onto the built-in IntelliSense in Visual Studio and augment it. Believe me, I tried. All you can do is completely replace it. Which, obviously, is a nightmare. But with the upcoming release of Roslyn (I am assured there will be a new preview release shortly after VS2013’s full release), I suspect replacing IntelliSense is going to be a lot easier.

It’s still a sufficiently big undertaking that I’m not going to just build it for Simple.Data. My plan is to create an open source IntelliSense replacement that is designed from line one to be extensible by anyone who wants to, and then to build a Simple.Data extension for it. I’m hoping I can make the engine for that extension modular enough to be able to create ReSharper and CodeRush completion plug-ins, too.

If you’re interested in contributing to that project, particularly on the UI/UX side of things, I’d be grateful for any help.

.NET 4.5

The biggest other change is that I’m pretty sure 2.0 is going to be .NET 4.5 (and up) only. With all the async/await stuff, maintaining compatibility with 4.0 will likely require too much time and effort, and will probably compromise the code and performance.

Contributors

If you’re the author of an adapter or provider for Simple.Data and you don’t want to continue to maintain that package through the 2.0 release, please let me know and we can bring it into the central fold. You’ll continue to be credited as the original author for the life of the package.

RFC

So, if you’ve got any comments or anything about any of that, please leave them below or reach out to me on Twitter.

Running TeamCity in Windows Azure

I’ve used JetBrains’ TeamCity continuous integration server on a couple of projects that I’ve worked on recently, and I really like it. It’s more approachable and flexible than the TFS Build system, which is kind of hung up on the whole MSBuild thing, and to be honest, if I wanted to write all my tasks as mind-bogglingly verbose XML, I’d use NAnt. TeamCity plays nice with all manner of build and test tools, version control systems and so on. It’s also free to use the Professional Edition; you only need to buy a license when your requirements exceed the generous limits (20 build configurations etc.). The only problem is that there is no hosted, as-a-Service provider as yet, so I decided to see if I could get it running on one of the new Windows Azure Infrastructure-as-a-Service VMs.

Initial setup

Getting things started was easy enough. I went to the Azure portal and created a new VM from the default Windows Server 2012 image; I used a small instance, with a single core and 1.7GB RAM. After waiting a few minutes while it was provisioned, I logged in via Remote Desktop, disabled Internet Explorer Enhanced Security Configuration, downloaded the TeamCity 7.1 installer and ran it. I set up the build server and a build agent on the same machine, just to see how it would cope with the limited resources. To make it accessible from the internet, I configured it to run the server on port 80; I had to open that port in the Windows Azure management portal and in Windows Firewall, as all ports except the RDP one are closed by default on Azure VMs.

It ran right away with no problems, but using its local storage engine, which is not recommended for production; you get a message box telling you to configure a proper database. Here begins the fun.

Database setup

TeamCity supports MySQL, PostgreSQL, Oracle and SQL Server. That’s the order of preference: the documentation recommends using MySQL unless you absolutely can’t for some reason. I guess this is due to its Java heritage. I could have set up another VM running Linux and MySQL, but that’s getting kind of expensive, so I decided to have a go at getting it working with a Windows Azure SQL Database (formerly known as SQL Azure). This is completely unsupported by JetBrains, but as we all know, unsupported doesn’t mean impossible.

I created a 1GB SQL Database through the portal, and followed the instructions for setting up an external database. The JTDS driver didn’t seem to get on with Azure for some reason, so I used Microsoft’s native JDBC driver. That allowed TeamCity to connect to the database, but when it tried to run its migration to set up the schema, incompatibility disaster struck. Windows Azure SQL Database requires a clustered index on every table. Most databases have a primary key on every table, which includes a clustered index, but the TeamCity database has a couple of tables with no primary key. The database setup appears to be compiled into the application somewhere; I couldn’t find any SQL scripts in the installation, so I couldn’t tweak anything.

I tried manually modifying the database to include clustered indexes and continuing the setup, but it wasn’t having it, so I went to plan B. I installed TeamCity on my workstation and set it up with a database on my local SQL 2012 instance, which worked without any problem. Then I used the SQL Azure Migration Wizard to copy that local database up to Azure. The Migration Wizard is a fantastic tool which automatically makes the necessary changes to schema as it copies the database, including creating clustered indexes. It also copies data using BCP, so all the standard configuration and user data was copied too.

Collation hell

Once I’d copied the database up, I tried again with the Azure-hosted installation, and it mostly worked, but a couple of features kept crashing. Checking the logs revealed that old chestnut, the collation error. I’d created the Azure database with the default collation, SQL_Latin1_General_CI_AS, but TeamCity wanted Latin1_General_CI_AS and wouldn’t budge. So I dropped the database and recreated it with that collation, and ran the Migration Wizard again. Happy with the new collation, TeamCity started working perfectly.

I hope that JetBrains will consider making the few changes necessary to support Windows Azure SQL Database as a database server in the future. There is an issue for it on the TeamCity YouTrack, but currently there appear to be no plans to address it.

I may put a script-dump of the database with the relevant changes somewhere for people to use to avoid this slightly tortuous route. The only issue is that you have to create the user account on the local database, but I could create one with a default administrator username and password. Let me know in the comments if you’d find such a thing useful.

Setting up a build

I wanted to test the system with a proper project with a good number of unit tests, so I used Simple.Data. TeamCity will pull from GitHub, and you can configure it to check for updates regularly. It will connect using HTTPS with your username and password, but I prefer using SSH, so I installed msysGit on the Azure VM and configured it with an SSH key. I’d set TeamCity up to run with a local user account, so I just copied the .ssh folder into that user’s home folder. With that set up, it was able to pull the Simple.Data repository with no problem.

The main Simple.Data project has a bunch of unit tests and integration tests that all use NUnit, for which support is built-in to TeamCity. The unit tests all ran fine, but the integration tests for SQL Server and SQL Compact 4.0 required some additional setup. For the SQL Compact tests to run, I had to install the redistributable on the VM, which was straightforward enough. For the SQL Server tests, I created another Azure SQL Database and tweaked the database creation script (which runs for every test run) to work; again, mainly to do with primary keys and clustered indexes, but also removing references to file groups and options that are not valid in the Azure environment.

The last problem was managing the connection string. I don’t want to leave a valid connection to an Azure database in the source code of an open source project. Fortunately, TeamCity can set environment variables for the duration of a build/test cycle, so I tweaked the test code to check for an environment variable and use that as a connection string if it found it. That meant I could put the connection string in the build configuration. I also set one test to conditionally ignore if the environment variable was detected, because it was to do with named connections and wouldn’t work.

With that done, I had all but three tests passing; the three failures were due to the Microsoft SQL Server CLR types assembly not being in the GAC on the VM. I copied the relevant assembly into the project and set the reference to Copy Local=true, and voila! All 800 tests pass!

Performance

Considering that the spec of the Azure VM is considerably below the recommendation for TeamCity, and that the server and a build agent are running on the same box, the performance is surprisingly good. A build takes less than a minute and a half, and a good chunk of that is running the integration tests. In terms of using the TeamCity web application, the performance is fine; having the database off the server probably helps there. I’ve enabled guest access, so you can take a look for yourself: teamcity.cloudapp.net (there’s a "Login as a Guest User" link at the bottom of the login window).

Production use

I’m intending to run TeamCity in Azure for proper production use, including continuous deployment to an Azure hosted service, in the near future. There’s only one change I intend to make to the setup I’ve got now, and that’s to have a separate build agent, which shouldn’t be very difficult to set up.

When I shared my experience with an Azure mailing list I’m on, someone else said they had got TeamCity running on an Ubuntu Linux VM with it’s own MySQL database, and a Windows build agent, so that’s another option, and of course, if your project is entirely Linux-based, then you can have a Linux build agent too.

Hidden Complexity

This coming Monday at Skills Matter in London, I’ll be giving my new talk, Hidden Complexity: Inside Simple.Data and Simple.Web for the first time. In October, I’ll be presenting it at Dev Day in Krakow and Leetspeak in Malmo as well.

Simple.Data and Simple.Web are like the proverbial duck: on the surface, everything is clean and calm and simple, but underneath, there’s a whole lot going on. There’s some runtime code generation using System.Linq.Expressions; in the case of Simple.Data, there’s a lot of dynamic stuff going on; in Simple.Web, I’ve built a whole asynchronous pipeline on top of Tasks; and there are some downright abuses of the C# language in there for good measure.

In the course of writing these things, I’ve learned a hell of a lot, sometimes from trial and error, and sometimes from gurus like Jon Skeet (mainly abuses of C#) and Bill Wagner (mainly code generation), so this talk is my attempt to share some of that acquired knowledge. It might not be that useful in day-to-day programming – I certainly don’t use most of it on a regular basis – but I hope at the very least it will be an interesting and entertaining look at how fun code can get when you step off the well-trodden paths and into the backwoods of C# and .NET programming.

I’m on the look-out for more opportunities to present this one, so if you run a user group or have an event planned where this might fit in, please get in touch.

Simple.Data 0.18 / 1.0.0-rc1

Just published the latest Simple.Data build to NuGet with both the above versions (the 0.x versions get patched more often, generally the –pre packages are updated with a minor version bump, like 0.17 to 0.18). We’re really nearly at 1.0 now, just fixing the last few issues and getting the docs finished (Dan Maharry is doing a bang-up job on those).

New feature

I’ve had a couple of feature requests that have been outstanding for a very, very long time now: the ability to specify a command timeout; and a way to do “identity inserts” in SQL Server (that’s where you override a table’s identity column specification and supply your own values).

Both these features are kind of specific to ADO, so I didn’t want to implement them in a way that affected other adapters like MongoDB or OData. At the same time, I wanted to allow those adapters to have their own customisation options. I also didn’t want to make enormous changes to the code to support these features: if a command timeout were added as an optional parameter to the various “methods”, I’d have had to go through adding code in dozens of places.

So here how you want to insert a record into a SQL Server table, with an identity override, and a command timeout of 1 second:

db.WithOptions(new AdoOptions(commandTimeout: 1, identityInsert: true))
    .Users.Insert(Id: 42, Name: "Marvin", Occupation: "Android");

db.Users.Insert(Name: "Eddie", Occupation: "Shipboard Computer");

Let me explain what happens there.

When you call WithOptions, a shadow copy of the underlying Database (or Transaction) object is created, with a shadow copy of the Adapter, and the options are set on that copy. So the first call to Users.Insert happens on that shadow copy, and the second call happens on the original, which doesn’t have those options set.

Somebody told me they thought having the options at that point in the call chain was clunky, but there are a couple of reasons for it being there; one technical, and one API design.

The technical reason is that putting it anywhere else is either difficult or impossible. I thought about putting it on the table object instead of the database, but that wouldn’t work for stored procedures, which are a common case for needing to set a longer timeout period. And it can’t go at the end of the call chain, because most methods return directly, so you can’t chain anything onto them.

The design reason is that this way allows you to create a reusable database object with options set, which strikes me as being potentially very useful. Maybe you want all commands executed against the database to have a shorter timeout than the default 30 seconds. In that case, you only need to set the options once, on a database object held in a field in a type, and then all your calls happen with those settings. Also, I can envisage other future uses for this Options-based approach, other options that might be database-wide, so putting the WithOptions call there leaves maximum scope for those future options.

Fixes

I fixed a bunch of small-to-medium problems in this release, too. There was an issue with the way I was scanning assemblies and using MEF to compose adapters and providers which is now fixed. I’ve also fixed a problem with the magic casting in medium trust environments, and a handful of problems people reported on GitHub. There are still a couple of issues outstanding, which will get fixed in a 0.18.x patch soon, and then I sincerely hope there’ll be a short RC2 period before the long-awaiting 1.0 final release. Not least because there’s a 1.5 that I really want to get on to, adding async support.

I’d like to thank everyone who has helped with this release, either with pull requests, or bug reports, diagnostic efforts and patience, or in one instance somebody who gave an excellent answer to a really obscure question on Stack Overflow. I can’t call out everybody, but you are all appreciated.

Learn Simple.Data & Nancy at SkillsMatter

I should have blogged about this ages ago, but I’ve been unbelievably busy. Still, better late-but-not-too-late than never-or-not-soon-enough.

Following the well-received half-day Introduction to Nancy and Simple.Data that I presented with Steven Robbins (@Grumpydev) at the Progressive .NET Tutorials 2011, Wendy at SkillsMatter approached me and asked if I’d be interested in doing a full two-day course. I’ve always been very impressed with the courses they offer, so I jumped at the chance to get involved. So you can now sign up for a full-on, two day deep dive into building web applications with minimal code.

The course will quickly cover the basics, such as View Engines, Dependency Injection, Nancy’s TDD/BDD test helpers and Simple.Data’s built-in data mocking functionality, then move on to more interesting things like: security, authentication (including OAuth) and authorisation; building RESTful web APIs (proper ones, not just HTTP web services); using different data stores with Simple.Data; and different methods for validating data.

It’ll be a heavily interactive couple of days. You can either turn up with an idea for a web application that you want to get started on, or you can work on an app I’ve designed to exercise the skills you’ll be learning. Either way, at the end of the course, that code is yours to take away and use as a starting-off point or a useful reference.

The first instance of the course is on the 12th and 13th of March, and there are still places available. Lots of places. All of them, in fact. So go sign up for it.

Simple.Data 1.0.0-beta1

At last

It’s taken me so much longer to get here than I originally expected, but I’ve released the first 1.0 beta version of Simple.Data.

If this post is the first you’ve heard of Simple.Data, then head over to the GitHub page and browse back through previous posts to find out more.

New features

I didn’t do a post for the 0.14 release, so I’ll cover the changes from that as well as what’s new in the 1.0 beta.

Eager-loading with With

Since very early on, Simple.Data has supported lazy-loading when you reference a joined property from the dynamic record type. There were two issues with that: firstly, if you assigned the record to a static type, the joined properties were not hydrated; secondly, multiple selects do not make DBAs happy, and should be avoided if possible.

Now you can use the With method to load the joined data at the same time as the main record.

var db = DatabaseHelper.Open();
Customer actual = db.Customers.FindAllByCustomerId(1).WithOrders().FirstOrDefault();

This uses a single SQL select statement to pull all the Customer rows, plus the Order detail rows, and groups the data in-memory; generally, that’s more efficient than running two select operations. When the record is converted to the Customer type, it sets the ICollection<Order> property at the same time, either creating a new instance of List<Order>, or populating an existing instance if the property is readonly.

The inverse works too:

var db = DatabaseHelper.Open();
Order actual = db.Orders.FindAllByOrderId(1).WithCustomer().FirstOrDefault();

If the property name is not the same as the database table name, you can use an alias to tweak it:

var db = DatabaseHelper.Open();
var actual = db.Orders.FindAllByOrderId(1).With(db.Orders.OrderItems.As("Items"));

And if there’s no referential integrity in the database, you can specify an explicit join separately:

dynamic manager;
var q = _db.Employees.All()
    .OuterJoin(_db.Employees.As("Manager"), out manager)
    .On(Id: _db.Employees.ManagerId)
    .With(manager);

 

(Oh, yeah, and check out the OuterJoin method. Finally.)

Of course, if there’s no referential integrity, it’s hard for Simple.Data to work out whether the joined property is a collection or a complex object, so you can specify WithOne or WithMany to help it out:

dynamic manager;
var q = _db.Employees.All()
    .OuterJoin(_db.Employees.As("Manager"), out manager)
    .On(Id: _db.Employees.ManagerId)
    .WithOne(manager);

 

And of course, you can mix and match all these. This test gives you a good idea of some of the heavy lifting that’s going on for you with this feature, just with the SQL:

public void MultipleWithClauseJustDoesEverythingYouWouldHope()
{
    const string expectedSql = 
        "select [dbo].[employee].[id],[dbo].[employee].[name]," +
        "[dbo].[employee].[managerid],[dbo].[employee].[departmentid]," +
        "[manager].[id] as [__withn__manager__id]," +
        "[manager].[name] as [__withn__manager__name]," +
        "[manager].[managerid] as [__withn__manager__managerid]," +
        "[manager].[departmentid] as [__withn__manager__departmentid]," +
        "[dbo].[department].[id] as [__with1__department__id]," +
        "[dbo].[department].[name] as [__with1__department__name]" +
        " from [dbo].[employee] left join [dbo].[employee] [manager] " +
        "on ([manager].[id] = [dbo].[employee].[managerid])" +
        " left join [dbo].[department] " +
        "on ([dbo].[department].[id] = [dbo].[employee].[departmentid])";

    dynamic manager;
    var q = _db.Employees.All()
        .OuterJoin(_db.Employees.As("Manager"), out manager)
        .On(Id: _db.Employees.ManagerId)
        .With(manager)
        .WithDepartment();

    GeneratedSqlIs(expectedSql);
}

(Never mind the logic involved in turning that result set into the correct in-memory object graphs.)

Now, this works on multi-record queries, but not on single-record ones such as FindById or the key-driven Get method, and that’s more problematic since those methods don’t return a query you can modify, just a record. In the past I did actually toy with having the SimpleRecord type do lazy self-evaluation, but the fact that the NUnit Assert.IsNull test wouldn’t accept that the object was null even when it swore black-was-blue that it was put me off. (It works for Nullable<T>; no fair.)

Instead of that, and this is only in the 1.0 beta release, you can specify your With clause before the Get or FindBy:

var db = DatabaseHelper.Open();
Customer actual = db.Customers.WithOrders().Get(1);

So now you get all that goodness for single records, where it’s arguably more useful anyway.

Upsert

Inserting and updating is all very well, but sometimes you’ve got some data and you just don’t know whether it’s in the database already or not. If it is, you want to update the row with some new values; if it’s not, you want to insert it. Boring.

To save you the time and trouble, Simple.Data now provides the Upsert method. Give it a record, and it will do all the checking to see if it exists or not. And in beta2, there’ll be back-end database-specific optimizations; for example, if you’re using the SQL Server provider with SQL 2008 or later, it will use the MERGE operation.

Upsert returns the record as it is in the database following the operation, with any database-specified values intact.

var db = DatabaseHelper.Open();
var user = new User {Id = 2, Name = "Charlie", Password = "foobar", Age = 42};
var actual = db.Users.Upsert(user);

 

That’s one example, but there are plenty of other ways to use Upsert. Take a look at the tests to see the others.

NuGet and SemVer

I’ve pushed this release to NuGet using the Semantic Version number for pre-release (1.0.0-beta1), which NuGet added support for in 1.6. Using this form means that NuGet knows that it’s pre-release software, and you’ll have to explicitly tell it that that’s what you want. So to get the 1.0 beta releases, remember to use the –Pre flag for the Install-Package command. The great thing about this is that when we get to 1.0 RTW, I’ll start on 1.1.0-alpha1 and both package types will be available from the repository.

If you’re waiting for the Mono build, I hope to have it out as a tgz by the end of this weekend.

What’s still to do?

So with those two features, I’m drawing a temporary line in the sand and focusing on getting everything to release quality. That means implementing some optimizations around object creation and database tricks, but more importantly, working on test coverage, refactoring some messy code, and making the documentation comprehensive. Help on that last one would be much appreciated!

Come to that, if anybody wants to make a really nice website… 🙂

Macro-optimisations

I’ve used Simple.Data in a few production projects now (and it’s doing a great job so far). It’s not often you actually get to use the software that you write, but when you do, it’s a great opportunity to see it through users’ eyes, and I’ve made a few changes and improvements over the past year as a result.

The most recent project that we’ve used it on at Dot Net Solutions is the Met Office open data thing that was announced on Tuesday. And that forced me to bring forward an optimisation that’s been way down my to-do list, partly because I didn’t really just how much of an optimisation it would be.

The Met Office project involves inserting something like 8 million records a day into a SQL Azure database, which isn’t a huge amount, but enough to need you to be smart about how you do it. The version of Simple.Data that was on NuGet when we started supported bulk inserts, but it wasn’t friendly to the error handling we needed and it assumed it needed to return the inserted records, doing that whole ‘select just-inserted-record’ thing, which is often completely unnecessary.

(So it turns out that when you’re handling TryInvokeMember in a DynamicObject, you can actually find out whether the return value is used by the caller, and not bother if it isn’t. But that’s another blog post.)

Anyway, I tweaked a couple of things and shaved off a fraction of the time it was taking, but it was a small fraction, and things were still far too slow. So we did what we should have done in the first place, and used SqlBulkCopy.

If you haven’t used this (SqlClient-specific) method, you should read up on it and keep it in your mental list of “things that are good that I might need some day”. It lets you prepare a big batch of rows in a DataTable (turns out they’re still good for something) and then insert them in a single operation, and man, it’s quick.

But it’s SQL Server specific, so I couldn’t support it in the generic ADO adapter code.

I’ve exposed a few interfaces in the Simple.Data.Ado assembly which providers can optionally implement if they need to do something a little differently or can do something better. The first instance was ICustomInserter, which is implemented in the Oracle provider to handle fetch-backs in a world without IDENTITY columns. Since then I’ve added more as I went along, and IBulkInserter was one of them because, as I said earlier, I had half a mind to implement this. And now I have.

Anyway, I’ll stop blathering now and just post the comparison code I wrote (measures time to insert 10,000 records, five times) and the before and after results.

Code:

namespace BulkInsertComparison
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using Simple.Data;

    class Program
    {
        static void Main(string[] args)
        {
            var db = Database.OpenConnection("data source=.;initial catalog=BulkInsertTest;integrated security=true");

            for (int i = 0; i < 5; i++)
            {
                Console.WriteLine(TimeInsert(db));
            }
        }

        private static TimeSpan TimeInsert(dynamic db)
        {
            var stopwatch = Stopwatch.StartNew();
            db.Target.Insert(GenerateItems(10000));
            stopwatch.Stop();
            return stopwatch.Elapsed;
        }

        static IEnumerable<Item> GenerateItems(int number)
        {
            for (int i = 0; i < number; i++)
            {
                var guid = Guid.NewGuid();
                yield return new Item(0, guid, guid.ToString("N"));
            }
        }
    }

    class Item
    {
        private readonly int _id;
        private readonly Guid _guid;
        private readonly string _text;

        public Item(int id, Guid guid, string text)
        {
            _id = id;
            _guid = guid;
            _text = text;
        }

        public string Text { get { return _text; } }
        public Guid Guid { get { return _guid; } }
        public int Id { get { return _id; } }
    }
}

Before:

00:00:16.9799819
00:00:17.1971797
00:00:18.0744958
00:00:19.1514537
00:00:17.3798541

After:

00:00:00.4616911 (First run includes MEFing IBulkInserter)
00:00:00.2757802
00:00:00.2852119
00:00:00.2504587
00:00:00.2453277

Totally worth it.

0.12.2 on NuGet now. Mini-roadmap: eager-loading (0.14) and upserts (0.15).

Simple.Data for Mono

TL;DR

I’ve got Simple.Data running on Mono 2.10.6. Most tests pass. YMMV. Download it here.

Ritalin version

Simple.Data on Mono is something that people have been asking about, and I’ve been meaning to sort out, pretty much since the project went from a proof-of-concept to an actual OSS product. One way or another I’ve never gotten round to it, but a couple of things have made it seem more relevant. Firstly, there are providers for lots of OSS databases now (MySQL, SQLite, PostgreSQL); secondly, a certain @fekberg has been very persistent in his status update requests, so I’ve taken the time this weekend to make it work.

The challenges

For the most part, I’ve been pleasantly surprised by how easy it’s been. I’ve found what I think is a bug in the Mono implementation of either dynamic or LINQ or the combination of the two, which I’m going to file with Xamarin once I’ve created a simple repro project. I’ve come up with a workaround involving old-school class-based IEnumerable/IEnumerator implementations, and the performance doesn’t seem to be affected, so that’s fine.

The hardest thing about the whole process is that MonoDevelop just isn’t Visual Studio 2010 + ReSharper. It’s not a bad IDE by any standards – I’d still take it over Eclipse any day – but I work day in, day out with VS and jumping into any other IDE just feels like somebody moved all the cheese. Add to that the fact that I’m running it under OSX, so even the standard Windows keyboard shortcuts don’t work, and it’s a bit like running in treacle. As I understand it, the Mono Tools for Visual Studio don’t support 2.10; Miguel de Icaza tells me that they’re working on some awesome new VS tooling, so I’m really looking forward to that.

The other big challenge was testing the ADO adapter against a real database. I only own the SQL Server and SQL Compact providers, so I really wanted to test with SQL Server to keep things simple and let me step-debug if necessary. I was expecting not to have fun with this, but it turned out fine. I run my Windows development environment on my MacBook using Parallels 7, so I set up the Host-Guest networking (easy) and opened inbound port 1433 in Windows Firewall (also easy) and that was it. There were a couple of failing tests, one calling a stored procedure with a DataTable and one involving a scalar function, but the rest just passed. I’m guessing that the majority of people who want to use Simple.Data on Mono will be using one of the OSS DB providers, so hopefully this won’t be a problem.

Releases

I don’t know what the Mono NuGet situation is, so I’ll be releasing Mono builds as tgz downloads from the GitHub project page. For the time being, there are differences between the Mono build of Simple.Data.SqlServer and the Microsoft .NET build, so if you want to use that provider, don’t use the NuGet version.

Right now, there’s a 0.11.4 build in the Downloads section which I hope works. However, I haven’t really exercised it to any great degree, so if you encounter any problems please raise an issue if you think it’s a bug, or ask on the mailing list if you aren’t sure.

If you are one of the developers of an adapter or provider, you might want to test against Mono. If, for any reason, that’s not an option, then let me know and I’ll try to find time to test it for you.

Going forward, I will support (in the OSS sense of the word) both Microsoft .NET and Mono for all releases.

Simple.Data 0.11

Some API changes and enhancements

After the slow-down in development caused by all that InMemoryAdapter stuff, there were a few important things I needed to address quickly. One of these will have broken third-party adapters (but not providers) so let’s talk about that one first.

Get

var db = DatabaseHelper.Open();
var user = db.Users.Get(1);
Assert.AreEqual(1, user.Id);

I’m not really sure why Get wasn’t already there, to be honest. Part of the problem is that it requires a new abstract method internally, and until adapter authors implement that method, their users are stuck on <0.11, which I try to avoid where possible.

For the ADO adapter, Get will use the table’s primary key to construct the query (once; it’s then cached internally, no worries about performance). For the MongoDB adapter, I’d expect it to use the built-in id value that Mongo assigns to all records. Somebody is working on an OData adapter, for which, e.g., Customers.Get(1001) will resolve to the /Customers(1001) URL.

Get is supported in the InMemoryAdapter, but you’ll have to configure the key(s) for each table:

var adapter = new InMemoryAdapter();
adapter.SetKeyColumn("Test", "Id");
Database.UseMockAdapter(adapter);
var db = Database.Open();
db.Test.Insert(Id: 1, Name: "Alice");
var record = db.Test.Get(1);
Assert.IsNotNull(record);
Assert.AreEqual(1, record.Id);
Assert.AreEqual("Alice", record.Name);

Trace configurability

The ADO adapter has been writing all generated SQL to the Trace output at the point of execution for a while now. While this is often very useful, I’ve had a couple of people ask if I could make it turn-off-and-on-able, so I have. You can do this in two ways:

In code:

Database.TraceLevel = TraceLevel.Off;

In config:

<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <sectionGroup name="simpleData"> <section name="simpleDataConfiguration"                type="Simple.Data.SimpleDataConfigurationSection, Simple.Data"/> </sectionGroup> </configSections> <simpleData> <simpleDataConfiguration traceLevel="Error"/> </simpleData> </configuration>

(Gotta love XML.)

ADO SQL output will happen with the trace level set to Info, Warning or Error.

More ADO connection control

I occasionally see how Simple.Data performs compared to other ORM/micro-ORM tools, using the PerformanceTests project from Dapper. I was running this through the other day, and I realised that Simple.Data was losing a lot of time opening and closing connections, while the other test cases were mostly using an open connection for the duration of the test. I’ve had a few comments that they’d like more control over the connection, or that Simple.Data is too aggressive in closing connections, so I decided to improve my standing in the Dapper smack-down and hopefully help some real people out too.

Start using an open connection like this:

SqlConnection connection = Program.GetOpenConnection();
((AdoAdapter) db.GetAdapter()).UseSharedConnection(connection);

And stop using it again like this:

((AdoAdapter) simpleDb.GetAdapter()).StopUsingSharedConnection();

And that’s it. In my performance test project (which is in the solution on Github), this knocked 20-30% off the runtime, with 500 FindById operations taking ~80ms, versus ~50ms using plain ADO.

I’ve also tried to tone down the aggression a little when it comes to closing connections, doing it as soon as possible, instead of (occasionally) before.

Immediate road-map

I want to get the Azure Table Service adapter done, and help with the OData adapter, and I’ve got a cool website to build using Simple.Data and Nancy, so Core will go into maintenance while I do those things for a while. I’ll try to fix any problems in a timely fashion, as usual. If you’ve got anything still outstanding as of 0.11.1 (now on Nuget) please gently remind me on Twitter.

Just a quick note about Tests

I’ve been fixing a few bugs in Simple.Data over the last couple of days, and I’m feeling the need to post something about the benefits of a good test suite.

For most feature development on Simple.Data, I do test-driven development. The dynamic nature of my API lends itself really well to this approach; this is one of the real reasons TDD is popular with dynamic languages like Ruby and Python. When I want to add a new Query operator or method, I can write a test that uses the syntax I’m aiming for, and the Behaviour test project will compile and run, and I’ll get a failing test, either with a failed assert or an exception. I really like the latter form of failure, since I get a stack trace and I can just dive into the code at that point and start working out what’s wrong.

So that’s great, but the thing that makes me want to write this post is my “QA” process. When someone reports a bug, like this one, it means my test suite is incomplete. So again, the first thing I do is to create a test which reproduces the bug. Then I fix the code so that test passes. Then I do a Release build, and run the full set of tests (currently 560+ including integration tests) to make sure that the fix hasn’t broken anything else. And then, and this is the really awesome part: if all the tests pass, I package the build and push it to NuGet.

I can do this because I trust those tests to be verifying all the behaviour that Simple.Data users are relying on in their applications. If the tests pass, I’m not going to break anybody’s system when they update to the new version. On the (rare) occasions when this has gone wrong, it’s been because I didn’t have the right tests, and I’ve gone back and added them (would you believe the SQL Server test project didn’t test delete’s until yesterday? Epic fail).

When people look at the Simple.Data repository, and say “wow, you’ve got lots of tests”, it’s as if they think that’s a discipline thing, that I’m just really conscientious about coverage. But that’s not it. I couldn’t do this without the tests. And neither can you, whatever you’re working on.