Code generation has over time turned into one of my pet peeves when working in .NET (although most of it will apply to similar platforms like JAVA). Let me be very clear from the beginning of this post. I am advocating that you should use extreme caution before even considering using code generation to solve whatever real or perceived problems you might find it useful for. I have yet to see an implementation that has not caused extra problems. That does not necessarily mean that it cannot be valuable tool, but please take a step back and think before continuing down that path.
This is the longest blog post I have yet written. It contains no code, just ranting, and hopefully some insights.
Code generation is typically introduced to:
- remove duplication.
- reduce tedious coding.
- enforce security measures (authentication and authorization).
- enforce consistent handling of resources such as database connections and transactions.
- apply cross cutting concerns such as logging or caching.
And probably a number of other reasons. Basically, anything that seems hard to do in a nice way in the language and framework you are currently working with. Most of these things can be handled by using more light weight measures, using more simple and well known designs based on patterns.
Regarding tedious coding it would be wise to step back and consider why there is so much tedious coding? If the number of lines of code compared to the functionality it provides is so high that you consider using code generation, I would put it to you that your design is not very good. Basically, you are trying to hammer a square peg into a round hole, and instead of seeing the error of that you just get a bigger hammer (called code generation).
How to Determine what to Generate
Usually, you would determine what to generate based on an external resource (like XML files), examining the existing code structure using the reflection API, or annotating the existing code somehow (probably using attributes) and then examining that using the reflection API.
No matter how you choose to do this you are defining rules that are not clear from the context. An XML file is a very loose format and what is generated from that XML file is by no means clear to a developer looking at that XML, and it can also be very hard to determine from the code of the code generator. Similar problems arise when using the existing code structure or attributes, it is simply not very clear what rules are being applied and how the resulting code will look.
Step back and consider the number of characters in those XML files. Now consider the number of characters of the code if you wrote it by hand in the same way the code generator does it. Now consider the number of characters for the code of the generator. Often you will find that you end up with more characters using the XML than just using the code. Then consider that XML files cannot be unit tested, the build process will be more complicated, and you will have to actually write the code generator.
If you determine what to generate based on examining the existing code you will introduce the problem of simple code changes causing an unwanted side effect. I still recall changing the name of a parameter in a function in a model class as it was misspelled. Compilation and tests worked fine as they should. What I did not foresee was that suddenly our public API had a breaking change as it was generated from the core model. Our public SDK had suddenly also introduced this change as it was also generated from the code models. So basically all our integration partners had to change their code, or we could rollback the release. I know this is an extreme scenario and probably not something you would ponder, but there really is no limit for the ingenuity (insanity) of developers when trying to solve problems.
If you end up doing code generation despite all my attempts to get you to move away from it, do work dilligently on creating and maintaining documentation for you base generation format, whether it is XML based, annotations, or the structure of the code.
How to Actually Generate the code
Once the motivation is in place and you have somehow figured out how to determine what the output code should look like, it is time to actually generate some code.
There are a number of different tools for doing this in the .NET stack, but all of them basically falls into to different categories:
+ generate by code API (such as the CodeDOM)
+ generate by template (such as T4 templates)
Using code to generate code is a feasible approach, but unfortunately the APIs in .NET for this are extremely cumbersome and verbose leading to a lot of incomprehensible code. For instance the combination of CodeDOM and reflection (for reading base format) leads to some of the least readable code I have ever encountered in .NET. This basically means that it is very hard to envision the resulting code from the generator, which leads to random experimentation followed by examination of the resulting code to verify changes (if possible).
Using a template engine is a better alternative. The templates will most likely be a raw text format with placeholders for the code that has variation. This means it is a lot easier to envision the resulting code. Usually, the placeholders should be fairly few, otherwise, the reusability is very low and the code generation should probably never have been implemented at all.
Use a template based engine and put as much as the generation logic in the template.
There is an alternative where you generate the code based on a 3rd party solution, such as PostSharp. In this scenario it is the 3rd party that defines the ‘how to determine what to generate’ and maintain most of the documentation.
Do not start from scratch. Search the market for generation tools that can do the bulk of the lifting for you.
Once the code is generated there is a number of ways it can interact with the rest of the codebase:
- never generate a file just build it straight into a DLL
- generate a temp file as part of build and delete it afterwards
- generate a temp file and try to hide it from the developers (the folder it is in is not included in the project etc.)
- generate a temp file and use it in build (the build is seperate step from generation).
- only generate the code on demand and check it in with the rest of the code.
The distinction between these methods can seem very narrow. The important part is whether the generated code is an artifact of the build process and how easy it is to see for the developers.
The first 3 options basically assumes that it is not a good thing if the developer can see and modify the generated code. As if bad stuff would come from that, which it might.
The last 2 assumes that the generated code is not directly an artifact from the build process. Item 4 is not that different from 1-3, but at least it keeps the code generation step as a seperate step from the build process. This could be a pre process step in building with Visual Studio and the actual build just picks up the file from the project.
Item 5 is basically a templating engine. You generate something once and it is perfectly acceptable to modify it afterwards. This is just like when you add a class file in Visual Studio it will have a bit of code prefilled such as namespace, classname, and useless using statements. Similarly, with very simple commands you can generate new controllers and models in RAILS.
Make sure the build process is not made more complicated (and slower) due to the code generation. Ensure that the code generation is an optional step that comes before the build. Consider only generating once and keeping the files in source control if feasible.
I am just going to run through some of the examples of code generation I have encountered sofar. One is bad, but ultimately not that harmful, two have been extremely costly, and one is maybe ok.
Re-use of Internal Model in Other Services
The first example of a custom code generator I encountered was used to share models between a number of different services. Basically, the team had decided to go for a SOA design, but not really unstood how to implement it. It was a relatively simple business application consisting of a an ASP.NET webforms site and no less than 12 SOAP based web services.
We eventually came to the conclussion that probably 3 services would have been just about right, as the services depended on each other to a large degree. The sheer work of maintaining a model that belonged in service A, but was used in service B, service C, and the website quickly became a pain with 12 services. Thus, a code generator had been build to generate these models, making this an example of bad design driving the creation of the code generator.
Luckily, the simplicity of the generated classes, the ease with which they could be extended, and the simplicity of the code generator itself ensured that it was only slightly cumbersome to work with.
Custom ORM NHibernate Wrapper
I could write a series of blog posts on this example alone, but basically this backend system was designed by someone that considered code generation his speciality and consequently applied it to everything possible.
Combine this with a desire to build a custom ORM and you get one of the most convoluted systems I have seen. Basically, more or less the entire backend system was generated from XML. The idea of the homemade ORM was eventually dropped, but its basic API was then used to wrapped NHibernate. So as a developer you are basically stuck with a homemade XML format for queries and entities, which was the used to generate mapping and queries for NHibernate. Due to the complexity of such a system the queries and the mapping are highly ineffective.
The worst part was how hard it was to actually write any kind off meaningful code for the system. The generated entities and queries (called views) were hidden and impossible to extend, actually, the only intended way to implement business logic was static methods were everything was passed as parameters. Some of the hacks developers had implemented on top of this system to be able to deliver features only made it worse.
To top off the problems with this code generation it was full of bugs and very hard to modify. Part of the code generator was even code generated.
Possibly the worst design I have ever seen. For a relatively simple business domain and equally simple functionality.
Service Layer Method Generator
This is another example of a very complex code generation system. I cannot be completely certain of the motivation for this code generation system, but my guess is that it is yet another example of a bad design that caused a lot of manual work.
The core of the system is a number of entities, build by hand and mapped with NHibernate. The entities have specific queries as static method in an Active Record style. This design in itself is not that bad. One could argue back and forth of the merits of Active Record compared to the Repository pattern, but at least it is a well known and tried pattern.
Based on the methods, properties, and certain custom attributes of these entities a number of things were code generated (which was not documented ofcourse):
- A service layer directly exposing all the public methods of the webservice. In this service layer certain cross cutting concerns was added, such as connection and transaction handling, exception handling, and security checks.
- A number of classes that was build into a .NET based SDK.
The idea behind the .NET based SDK was that it should work in a RPC style, and it contained both proxy objects and data objects directly generated based on the data retrieved from the core entities. In itself exposing internal implementation details in this way is never a good idea. As a direct consequence of this I have introduced a bug by changing the naming of a function parameter, which reflected through the service layer and the SDK. This meant that consumers of the API suddenly experienced a breaking change.
Nevertheless, the main problem stems from the design. The way the proxy object works in the SDK is that it does not contain any data. This means if for instance you access the name property of a customer object you will end up going over the wire to retrieve this data. Obviously, this leads to extremely chatty behaviour by the consumers of the SDK and eventhough it is possible to avoid it is not the perceived default. The main problem though is that 50 or so entities with an average of 10 properties leads to 50 X 10 X 2 (both get and set) = 1000 methods on the service layer to implement the proxy object (could have been done with a PATCH like operation for partially updates, but alas it is not). With the standard methods for entities (create, update, delete) and some queries the API ends up having approximately 1500 public methods.
I have removed the code generation completely and changed every method to be a ‘method as an object’ implementation. Combined with the template pattern this means that there is very little redundant code. Unfortunately, it does not change the fact that we have 1500 public methods, which is a nightmare to maintain for us, and equally bad for our consumers to understand and navigate. Fans of the code generation in question would probably argue that it is only a problem to maintain, because the code generation has been removed, and they would be partially right. But, taking into account the number of bugs in the code generation and the number of well hidden side effects I am very happy we got rid of it.
When a design decision leaves you with 1500 different methods on a public API, you should really consider whether this design is a good idea
This last example is actually one I designed myself. In a relatively simple ASP.NET MVC application we use attributes to implement AOP support using PostSharp. This actually works very well as PostSharp has a fairly straightforward API and there is no side effect by changing the code (all is handled by annotating with attributes).
It is used for a number of things:
- checking function parameters for null
- checking function parameters for certain values
- checking security rights on controller actions
I am not sure I would implement it this way again. The addition of the PostSharp tool does add a level of complexity and the fact that it must be installed on each and every developer machine also makes it a pricy dependency.
Alternatively, each controller action could have been implemented as a ‘method as an object’ and used template pattern to achieve similar results. Or maybe just do it by hand, it is not like the size of the application made it that much work. Basically, it probably is an example of overengineering that I am responsible for.
There are a number of general challenges with using code generation:
- It makes the system inherently a lot more complex.
- It reduces readability quite a bit.
- It makes it hard to implement business logic. The required features will be hard to determine up front, and thus the code generator will not correctly accomodate them. This leads to hacks to work around the code generator.
- The APIs in .NET that are often used for this really sucks. If you really want to create code that is very hard to read, just combine CodeDOM code generation with the reflection API.
There are a some challenges that will get harder without code generation. What if you have 1500 command classes that needs to be changed? Depending on the change it could be put in a base class (if one is available), but if this is not possible it will be a harder challenge.
How do you change 1500 classes by hand? One option is hiring a lot of very cheap labour to do it for you, but more constructively it is also feasible to create a script that does the modification and is then thrown away, which was how they were generated in the first place.
I have yet to see a really good example of code generation. Even those provided by visual studio, the RAILS framework, or similar, are at best minor convenience functionality.