Gary K. Evans, Agile Process evangelist, Evanetics, Inc. "Evolutionary design and refactoring are already exciting, and with Refactoring Databases this gets. Refactoring Databases: Evolutionary Database. Design. Pramod Sadalage (沙 朴木). ThoughtWorks Change management for database assets. ▫ Learn when . After downloading the soft data of this Refactoring Databases: Evolutionary Database Design (paperback). (Addison-Wesley Signature Series (Fowler)) By Scott.
|Language:||English, Spanish, Dutch|
|Distribution:||Free* [*Register to download]|
simple change free refactoring databases evolutionary database design pdf - refactoring  martin fowler talks about databases as being a troublesome area for. Request PDF on ResearchGate | Refactoring Databases: Evolutionary Database Design | Refactoring has proven its value in a wide range of development. Database Refactoring Framework for databases. This paper Evolutionary Database Design looks at Database Schema. Design as an.
For years the norm for developers was to work in an iterative and incremental manner but for database developers to work in a more serial manner.
Book site: Designing Effective Database Systems It provides concrete ways and examples to implement ideas in Refactoring Databases: I had heard a lot of praise for Scott Ambler's book: Database Refactoring: Evolutionary Database Design over the past few years.
Database Design for Mere Mortals 2nd Edition It's another relatively classic book that I've been slow to read. The basic techniques for evolutionary database design include refactoring the topic of the book , evolving the data model, database regression testing and configuration management and developer sandboxes.
Evolutionary Database Design. I just found a book on site that seems to be the database based equivalent to Martin Fowler's book.
Sunday, 17 March at When Refactoring was published in , no tools supported the technique. Just a few years later, every single integrated development environment IDE has code-refactoring features built right in to it. At the time of this writing, there are no database refactoring tools in existence, although we do include all the code that you need to implement the refactorings by hand. Luckily, the Eclipse Data Tools Project DTP has indicated in their project prospectus the need to develop database-refactoring functionality in Eclipse, so it is only a matter of time before the tool vendors catch up.
Agility in a Nutshell Although this is not specifically a book about agile software development, the fact is that database refactoring is a primary technique for agile developers. A process is considered agile when it conforms to the four values of the Agile Alliance www. The values define preferences, not alternatives, encouraging a focus on certain areas but not eliminating others.
In other words, whereas you should value the concepts on the right side, you should value the things on the left side even more. For example, processes and tools are important, but individuals and interactions are more important. The four agile values are as follows: Individuals and interactions OVER processes and tools. The most important factors that you need to consider are the people and how they work together; if you do not get that right, the best tools and processes will not be of any use.
Working software OVER comprehensive documentation. The primary goal of software development is to create working software that meets the needs of its stakeholders. Documentation still has its place; written properly, it describes how and why a system is built, and how to work with the system. Customer collaboration OVER contract negotiation.
Only your customer can tell you what they want. Unfortunately, they are not good at thisthey likely do not have the skills to exactly specify the system, nor will they get it right at first, and worse yet they will likely change their minds as time goes on. Having a contract with your customers is important, but a contract is not a substitute for effective communication. Successful IT professionals work closely with their customers, they invest the effort to discover what their customers need, and they educate their customers along the way.
Responding to change OVER following a plan. As work progresses on your system, your stakeholders' understanding of what they want changes, the business environment changes, and so does the underlying technology.
Change is a reality of software development, and as a result, your project plan and overall approach must reflect your changing environment if it is to be effective. How to Read This Book The majority of this book, Chapters 6 through 11, consists of reference material that describes each refactoring in detail. The first five chapters describe the fundamental ideas and techniques of evolutionary database development, and in particular, database refactoring.
You should read these chapters in order: Chapter 1, "Evolutionary Database Development," overviews the fundamentals of evolutionary development and the techniques that support it.
It summarizes refactoring, database refactoring, database regression testing, evolutionary data modeling via an AMDD approach, configuration management of database assets, and the need for separate developer sandboxes. Chapter 2, "Database Refactoring," explores in detail the concepts behind database refactoring and why it can be so hard to do in practice. It also works through a database-refactoring example in both a "simple" single-application environment as well as in a complex, multiapplication environment.
Chapter 3, "The Process of Database Refactoring," describes in detail the steps required to refactor your database schema in both simple and complex environments. With single-application databases, you have much greater control over your environment, and as a result need to do far less work to refactor your schema. In multi-application environments, you need to support a transition period in which your database supports both the old and new schemas in parallel, enabling the application teams to update and deploy their code into production.
Chapter 4, "Deploying into Production," describes the process behind deploying database refactorings into production. This can prove particularly challenging in a multi-application environment because the changes of several teams must be merged and tested.
Chapter 5, "Database Refactoring Strategies," summarizes some of the "best practices" that we have discovered over the years when it comes to refactoring database schemas. We also float a couple of ideas that we have been meaning to try out but have not yet been able to do so. This tradition reflects the fact that Martin's wife is a civil engineer, who at the time the book series started worked on horizontal projects such as bridges and tunnels.
This bridge is the Burlington Bay James N. At this site are three bridges: This bridge system is significant for two reasons. Most importantly it shows an incremental approach to delivery.
The lift bridge originally bore the traffic through the area, as did another bridge that collapsed in after being hit by a ship. The first span of the Skyway, the portion in the front with the metal supports above the roadway, opened in to replace the lost bridge. Because the Skyway is a major thoroughfare between Toronto to the north and Niagara Falls to the south, traffic soon exceeded capacity.
The second span, the one without metal supports, opened in to support the new load. Incremental delivery makes good economic sense in both civil engineering and in software development. The second reason we used this picture is that Scott was raised in Burlington Ontarioin fact, he was born in Joseph Brant hospital, which is near the northern footing of the Skyway.
Scott took the cover picture with a Nikon D70S. Acknowledgments We want to thank the following people for their input into the development of this book: In addition, Pramod wants to thank Irfan Shah, Narayan Raman, Anishek Agarwal, and my other teammates who constantly challenged my opinions and taught me a lot about software development.
I also want to thank Martin for getting me to write, talk, and generally be active outside of ThoughtWorks; Kent Beck for his encouragement; my colleagues at ThoughtWorks who have helped me in numerous ways and make working fun; my parents Jinappa and Shobha who put a lot of effort in raising me; and Praveen, my brother, who since my childhood days has critiqued and improved the way I write.
Chapter 1. Evolutionary Database Development Waterfalls are wonderful tourist attractions. They are spectacularly bad strategies for organizing software development projects.
Scott Ambler Modern software processes, also called methodologies, are all evolutionary in nature, requiring you to work both iteratively and incrementally. Working iteratively, you do a little bit of an activity such as modeling, testing, coding, or deployment at a time, and then do another little bit, then another, and so on. This process differs from a serial approach in which you identify all the requirements that you are going to implement, then create a detailed design, then implement to that design, then test, and finally deploy your system.
With an incremental approach, you organize your system into a series of releases rather than one big one.
Furthermore, many of the modern processes are agile, which for the sake of simplicity we will characterize as both evolutionary and highly collaborative in nature.
When a team takes a collaborative approach, they actively strive to find ways to work together effectively; you should even try to ensure that project stakeholders such as business customers are active team members.
Cockburn advises that you should strive to adopt the "hottest" communication technique applicable to your situation: Prefer face-to-face conversation around a whiteboard over a telephone call, prefer a telephone call over sending someone an e-mail, and prefer an e-mail over sending someone a detailed document. The better the communication and collaboration within a software development team, the greater your chance of success. Although both evolutionary and agile ways of working have been readily adopted within the development community, the same cannot be said within the data community.
Most data-oriented techniques are serial in nature, requiring the creation of fairly detailed models before implementation is "allowed" to begin. Worse yet, these models are often baselined and put under change management control to minimize changes.
If you consider the end results, this should really be called a change prevention process. Therein lies the rub: Common database development techniques do not reflect the realities of modern software development processes. It does not have to be this way. Our premise is that data professionals need to adopt the evolutionary techniques similar to those of developers. Although you could argue that developers should return to the "tried-and-true" traditional approaches common within the data community, it is becoming more and more apparent that the traditional ways just do not work well.
The bottom line is that the evolutionary and agile techniques prevalent within the development community work much better than the traditional techniques prevalent within the data community. It is possible for data professionals to adopt evolutionary approaches to all aspects of their work, if they choose to do so. The first step is to rethink the "data culture" of your IT organization to reflect the needs of modern IT project teams.
The Agile Data AD method Ambler does exactly that, describing a collection of philosophies and roles for modern data-oriented activities. The philosophies reflect how data is one of many important aspects of business software, implying that developers need to become more adept at data techniques and that data professionals need to learn modern development technologies and skills.
The AD method recognizes that each project team is unique and needs to follow a process tailored for their situation. The importance of looking beyond your current project to address enterprise issues is also stressed, as is the need for enterprise professionals such as operational database administrators and data architects to be flexible enough to work with project teams in an agile manner.
The second step is for data professionals, in particular database administrators, to adopt new techniques that enable them to work in an evolutionary manner.
In this chapter, we briefly overview these critical techniques, and in our opinion the most important technique is database refactoring, which is the focus of this book.
The evolutionary database development techniques are as follows: Database refactoring. Evolve an existing database schema a small bit at a time to improve the quality of its design without changing its semantics.
Evolutionary data modeling. Model the data aspects of a system iteratively and incrementally, just like all other aspects of a system, to ensure that the database schema evolves in step with the application code. Database regression testing. Ensure that the database schema actually works. Configuration management of database artifacts.
Your data models, database tests, test data, and so on are important project artifacts that should be managed just like any other artifact. Developer sandboxes. Developers need their own working environments in which they can modify the portion of the system that they are building and get it working before they integrate their work with that of their teammates.
Let's consider each evolutionary database technique in detail. Database Refactoring Refactoring Fowler is a disciplined way to make small changes to your source code to improve its design, making it easier to work with. A critical aspect of a refactoring is that it retains the behavioral semantics of your codeyou neither add nor remove anything when you refactor; you merely improve its quality.
An example refactoring would be to rename the getPersons operation to getPeople. To implement this refactoring, you must change the operation definition, and then change every single invocation of this operation throughout your application code. A refactoring is not complete until your code runs again as before. Similarly, a database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. You could refactor either structural aspects of your database schema such as table and view definitions or functional aspects such as stored procedures and triggers.
When you refactor your database schema, not only must you rework the schema itself, but also the external systems, such as business applications or data extracts, which are coupled to your schema. Database refactorings are clearly more difficult to implement than code refactorings; therefore, you need to be careful.
Database refactoring is described in detail in Chapter 2, and the process of performing a database refactoring in Chapter 3. Evolutionary Data Modeling Regardless of what you may have heard, evolutionary and agile techniques are not simply "code and fix" with a new name. You still need to explore requirements and to think through your architecture and design before you build it, and one good way of doing so is to model before you code.
Figure 1. With AMDD, you create initial, high-level models at the beginning of a project, models that overview the scope of the problem domain that you are addressing as well as a potential architecture to build to.
The amount of detail shown in this example is all that you need at the beginning of a project; your goal is to think through major issues early in your project without investing in needless details right awayyou can work through the details later on a just-in-time JIT basis.
Details are captured within your object model which could be your source code and your physical data model. These models are guided by your conceptual domain model and are developed in parallel along with other artifacts to ensure consistency.
If "cycle 0" was one week in length, a period of time typical for projects of less than one year, and development cycles are two weeks in length, this is the PDM that exists at the end of the seventh week on the project. The PDM reflects the data requirements, and any legacy constraints, of the project up until this point.
The data requirements for future development cycles are modeled during those cycles on a JIT basis. You need to take legacy data constraints into account, and as we all know, legacy data sources are often nasty beasts that will maim an unwary software development project. Luckily, good data professionals understand the nuances of their organization's data sources, and this expertise can be applied on a JIT basis as easily as it could on a serial basis.
You still need to apply intelligent data modeling conventions, just as Agile Modeling's Apply Modeling Standards practice suggests.
Database Regression Testing To safely change existing software, either to refactor it or to add new functionality, you need to be able to verify that you have not broken anything after you have made the change.
In other words, you need to be able to run a full regression test on your system. If you discover that you have broken something, you must either fix it or roll back your changes. Within the development community, it has become increasingly common for programmers to develop a full unit test suite in parallel with their domain code, and in fact agilists prefer to write their test code before they write their "real" code. Just like you test your application source code, shouldn't you also test your database?
Important business logic is implemented within your database in the form of stored procedures, data validation rules, and referential integrity RI rules, business logic that clearly should be tested thoroughly. Test-First Development TFD , also known as Test-First Programming, is an evolutionary approach to development; you must first write a test that fails before you write new functional code. As depicted by the UML activity diagram of Figure 1.
A test-first approach to development. Quickly add a test, basically just enough code so that your tests now fail. Run your testsoften the complete test suite, although for the sake of speed you may decide to run only a subsetto ensure that the new test does in fact fail.
Update your functional code so that it passes the new test. Run your tests again. If the tests fail, return to Step 3; otherwise, start over again. The primary advantages of TFD are that it forces you to think through new functionality before you implement it you're effectively doing detailed design , it ensures that you have testing code available to validate your work, and it gives you the courage to know that you can evolve your system because you know that you can detect whether you have "broken" anything as the result of the change.
Just like having a full regression test suite for your application source code enables code refactoring, having a full regression test suite for your database enables database refactoring Meszaros You first write your code taking a TFD approach; then after it is working, you ensure that your design remains of high quality by refactoring it as needed.
As you refactor, you must rerun your regression tests to verify that you have not broken anything. An important implication is that you will likely need several unit testing tools, at least one for your database and one for each programming language used in external programs. Configuration Management of Database Artifacts Sometimes a change to your system proves to be a bad idea and you need to roll back that change to the previous state.
For example, renaming the Customer. FName column to Customer. FirstName might break 50 external programs, and the cost to update those programs may prove to be too great for now. To enable database refactoring, you need to put the following items under configuration management control: Logical sandboxes to provide developers with safety.
To successfully refactor your database schema, developers need to have their own physical sandboxes to work in, a copy of the source code to evolve, and a copy of the database to work with and evolve. By having their own environment, they can safely make changes, test them, and either adopt or back out of them. When they are satisfied that a database refactoring is viable, they promote it into their shared project environment, test it, and put it under change management control so that the rest of the team gets it.
This promotion often occurs once a development cycle, but could occur more or less often depending on your environment. The more often you promote your system, the greater the chance of receiving valuable feedback. Finally, after your system passes acceptance and system testing, it will be deployed into production.
Impediments to Evolutionary Database Development Techniques We would be remiss if we did not discuss the common impediments to adopting the techniques described in this book. The first impediment, and the hardest one to overcome, is cultural. Many of today's data professionals began their careers in the s and early s when "code-and-fix" approaches to development were common.
The IT community recognized that this approach resulted in low-quality, difficult-to-maintain code and adopted the heavy, structured development techniques that many still follow today.
Because of these experiences, the majority of data professionals believed that the evolutionary techniques introduced by the object technology revolution of the s were just a rehash of the code-and-fix approaches of the s; to be fair, many object practitioners did in fact choose to work that way.
They have chosen to equate evolutionary approaches with low quality; but as the agile community has shown, this does not have to be the case. The end result is that the majority of data-oriented literature appears to be mired in the traditional, serial thought processes of the past and has mostly missed agile approaches. The data community has a lot of catching up to do, and that is going to take time.
The second impediment is a lack of tooling, although open source efforts at least within the Java community are quickly filling in the gaps. Just like it took several years for programming tool vendors to implement refactoring functionality within their toolsin fact, now you would be hard pressed to find a modern integrated development environment IDE that does not offer such featuresit will take several years for database tool vendors to do the same.
Clearly, a need exists for usable, flexible tools that enable evolutionary development of a database schemathe open source community is clearly starting to fill that gap, and we suspect that the commercial tool vendors will eventually do the same. What You Have Learned Evolutionary approaches to development that are iterative and incremental in nature are the de facto standard for modern software development.
When a project team decides to take this approach to development, everyone on that team must work in an evolutionary manner, including the data professionals.
Luckily, evolutionary techniques exist that enable data professionals to work in an evolutionary manner. These techniques include database refactoring, evolutionary data modeling, database regression testing, configuration management of data-oriented artifacts, and separate developer sandboxes. Chapter 2. Database Refactoring As soon as one freezes a design, it becomes obsolete. Fred Brooks This chapter overviews the fundamental concepts behind database refactoring, explaining what it is, how it fits into your development efforts, and why it is often hard to do successfully.
In the following chapters, we describe in detail the actual process of refactoring your database schema. Code Refactoring In Refactoring, Martin Fowler describes the programming technique called refactoring, which is a disciplined way to restructure code in small steps.
Refactoring enables you to evolve your code slowly over time, to take an evolutionary iterative and incremental approach to programming. A critical aspect of a refactoring is that it retains the behavioral semantics of your code.
You do not add functionality when you are refactoring, nor do you take it away. A refactoring merely improves the design of your codenothing more and nothing less. For example, in Figure 2. This change looks easy on the surface, but you may also need to change the code that invokes this operation to work with Invoice objects rather than Offering objects. After you have made these changes, you can say you have truly refactored your code because it works again as before.
Figure 2. Pushing a method down into a subclass. Clearly, you need a systematic way to refactor your code, including good tools and techniques to do so. Most modern integrated development environments IDEs now support code refactoring to some extent, which is a good start.
However, to make refactoring work in practice, you also need to develop an up-to-date regression-testing suite that validates that your code still worksyou will not have the confidence to refactor your code if you cannot be reasonably assured that you have not broken it. Many agile developers, and in particular Extreme Programmers XPers , consider refactoring to be a primary development practice. It is just as common to refactor a bit of code as it is to introduce an if statement or a loop.
You should refactor your code mercilessly because you are most productive when you are working on high-quality source code.
When you have a new feature to add to your code, the first question that you should ask is "Is this code the best design possible that enables me to add this feature? If the answer is no, first refactor your code to make it the best design possible, and then add the feature. On the surface, this sounds like a lot of work; in practice, however, if you start with high-quality source code, and then refactor it to keep it so, you will find that this approach works incredibly well.
Database Refactoring A database refactoring Ambler is a simple change to a database schema that improves its design while retaining both its behavioral and informational semanticsin other words, you cannot add new functionality or break existing functionality, nor can you add new data or change the meaning of existing data. From our point of view, a database schema includes both structural aspects, such as table and view definitions, and functional aspects, such as stored procedures and triggers.
From this point forward, we use the terms code refactoring to refer to traditional refactoring as described by Martin Fowler and database refactoring to refer to the refactoring of database schemas. The process of database refactoring, described in detail in Chapter 3, is the act of making these simple changes to your database schema. Database refactorings are conceptually more difficult than code refactorings: Code refactorings only need to maintain behavioral semantics, whereas database refactorings must also maintain informational semantics.
Worse yet, database refactorings can become more complicated by the amount of coupling resulting from your database architecture, overviewed in Figure 2. Coupling is a measure of the dependence between two items; the more highly coupled two things are, the greater the chance that a change in one will require a change in another.
The single-application database architecture is the simplest situationyour application is the only one interacting with your database, enabling you to refactor both in parallel and deploy both simultaneously.
These situations do exist and are often referred to as standalone applications or stovepipe systems. The second architecture is much more complicated because you have many external programs interacting with your database, some of which are beyond the scope of your control. In this situation, you cannot assume that all the external programs will be deployed at once, and must therefore support a transition period also referred to as a deprecation period during which both the old schema and the new schema are supported in parallel.
More on this later. The two categories of database architecture. Don't worry. In Chapter 3, we describe strategies for working in this sort of situation. To put database refactoring into context, let's step through a quick example. You have been working on a banking application for a few weeks and have noticed something strange about the Customer and Account tables depicted in Figure 2.
Does it really make sense that the Balance column be part of the Customer table? No, so let's apply the Move Column page refactoring to improve our database design. The initial database schema for Customer and Account. Single-Application Database Environments Let's start by working through an example of moving a column from one table to another within a single-application database environment.
This is the simplest situation that you will ever be in, because you have complete control over both the database schema and the application source code that accesses it. The implication is that you can refactor both your database schema and your application code simultaneouslyyou do not need to support both the original and new database schemas in parallel because only the one application accesses your database.
In this scenario, we suggest that two people work together as a pair; one person should have application programming skills, and the other database development skills, and ideally both people have both sets of skills. This pair begins by determining whether the database schema needs to be refactored. Perhaps the programmer is mistaken about the need to evolve the schema, and how best to go about the refactoring.
The refactoring is first developed and tested within the developer's sandbox. When it is finished, the changes are promoted into the project-integration environment, and the system is rebuilt, tested, and fixed as needed. To apply the Move Column page refactoring in the development sandbox, the pair first runs all the tests to see that they pass. A likely test is to access a value in the Account. Balance column.
After running the tests and seeing them fail, they introduce the Account. Balance column, as you see in Figure 2. They rerun the tests and see that the tests now pass. They then refactor the existing tests, which verify that customer deposits work properly with the Account. Balance column rather than the Customer.
They see that these tests fail, and therefore rework the deposit functionality to work with Account. They make similar changes to other code within the tests suite and the application, such as withdrawal logic, that currently works with Customer.
The final database schema for Customer and Account. Balance, for safety purposes, and then copy the data from Customer. Balance into the appropriate row of Account. They rerun their tests to verify that the data migration has safely occurred. To complete the schema changes, the final step is to drop the Customer. Balance column and then rerun all tests and fix anything as necessary.
When they finish doing so, they promote their changes into the project-integration environment as described earlier. Multi-Application Database Environments This situation is more difficult because the individual applications have new releases deployed at different times over the next year and a half. To implement this database refactoring, you do the same sort of work that you did for the single-application database environment, except that you do not delete the Customer.
Balance column right away. Instead, you run both columns in parallel during a "transition period" of at least 1. This portion of the database schema during the transition period is shown in Figure 2. Notice how there are two triggers, SynchronizeCustomerBalance and Synchronize AccountBalance, which are run in production during the transition period to keep the two columns in sync.
The database schema during the transition period. Because some applications currently are not being worked on, whereas other applications are following a traditional development life cycle and only release every year or soyour transition period must take into account the slow teams as well as the fast ones.
Furthermore, because you cannot count on the individual applications to update both columns, you need to provide a mechanism such as triggers to keep their values synchronized. There are other options to do this, such as views or synchronization after the fact, but as we discuss in Chapter 5, "Database Refactoring Strategies," we find that triggers work best. After the transition period, you remove the original column plus the trigger s , resulting in the final database schema of Figure 2.
You remove these things only after sufficient testing to ensure that it is safe to do so. At this point, your refactoring is complete. In Chapter 3, we work through implementing this example in detail. Maintaining Semantics When you refactor a database schema, you must maintain both the informational and behavioral semanticsyou should neither add anything nor take anything away.
Informational semantics refers to the meaning of the information within the database from the point of view of the users of that information. Preserving the informational semantics implies that if you change the values of the data stored in a column, the clients of that information should not be affected by the changefor example, if you apply the Introduce Common Format page database refactoring to a character-based phone number column to transform data such as and Although the format has been improved, requiring simpler code to work with the data, from a practical point of view the true information content has not.
Focusing on practicality is a critical issue when it comes to database refactoring. Martin Fowler likes to talk about the issue of "observable behavior" when it comes to code refactoring, his point being that with many refactorings you cannot be completely sure that you have not changed the semantics in some small way, that all you can hope for is to think it through as best you can, to write what you believe to be sufficient tests, and then run those tests to verify that the semantics have not changed.
In our experience, a similar issue exists when it comes to preserving information semantics when refactoring a database schemachanging to may in fact have changed the semantics of that information for an application in some slightly nuanced way that we do not know about. When the problem is eventually discovered, the report may need to be updated to reflect the new format.
Similarly, with respect to behavioral semantics, the goal is to keep the black-box functionality the sameany source code that works with the changed aspects of your database schema must be reworked to accomplish the same functionality as before. For example, if you apply Introduce Calculation Method page , you may want to rework other existing stored procedures to invoke that method rather than implement the same logic for that calculation.
Overall, your database still implements the same logic, but now the calculation logic is just in one place. It is important to recognize that database refactorings are a subset of database transformations. A database transformation may or may not change the semantics; a database refactoring does not. We describe several common database transformations in Chapter 11, "Non-Refactoring Transitions," because they are not only important to understand, they can often be a step within a database refactoring.
For example, when applying the Move Column earlier to move the Balance column from Customer to Account, you needed to apply the Introduce Column transformation page as one of the steps.
On the surface, the Introduce Column sounds like a perfectly fine refactoring; adding an empty column to a table does not change the semantics of that table until new functionality begins to use it. We still consider it a transformation but not a refactoring because it could inadvertently change the behavior of an application. For example, if we introduce the column in the middle of the table, any program logic using positional access for example, code that refers to column 17 rather than the column's name will break.
Furthermore, COBOL code bound to a DB2 table will break if it is not rebound to the new schema, even if the column is added at the end of the table. In the end, practicality should be your guide.
If we were to label Introduce Column as a refactoring, or as a "Yabba Dabba Do" for all that matter, would it affect the way that you use it? We hope not. We are often told by existing data professionals that the real solution is to model everything up front, and then you would not need to refactor your database schema. Although that is an interesting vision, and we have seen it work in a few situations, experience from the past three decades has shown that this approach does not seem to be working well in practice for the overall IT community.
The traditional approach to data modeling does not reflect the evolutionary approach of modern methods such as the RUP and XP, nor does it reflect the fact that business customers are demanding new features and changes to existing functionality at an accelerating rate.
The old ways are simply no longer sufficient. As discussed in Chapter 1, "Evolutionary Database Development," we suggest that you take an Agile Model-Driven Development AMDD approach, in which you do some highlevel modeling to identify the overall "landscape" of your system, and then model storm the details on a just-in-time JIT basis.
Take advantage of the benefits of modeling without suffering from the costs of overmodeling, overdocumentation, and the resulting bureaucracy of trying to keep too many artifacts up-to-date and synchronized with one another.
Your application code and your database schema evolve as your understanding of the problem domain evolves, and you maintain quality through refactoring both. Categories of Database Refactorings We also distinguish six different categories of database refactorings, as described in Table 2. This categorization strategy was introduced to help organize this book, and hopefully to help organize future database refactoring tools.
Our categorization strategy is not perfect; for example, the Replace Method With View refactoring page arguably fits into both the Architectural and Method categories.
We have put it into the Architectural category. Table 2. Moving a column from one table to another or splitting a multipurpose column into several separate columns, one for each purpose. Data Quality Chapter 7 A change that improves the quality of the information contained within a database. Making a column non-nullable to ensure that it always contains a value or applying a common format to a column to ensure consistency.
Adding a trigger to enable a cascading delete between two entities, code that was formerly implemented outside of the database. Architectural Chapter 9 A change that improves the overall manner in which external programs interact with a database.
Replacing an existing Java operation in a shared code library with a stored procedure in the database. Having it as a stored procedure makes it available to non-Java applications. Method Chapter 10 A change to a method a stored procedure, stored function, or trigger that improves its quality. Many code refactorings are applicable to database methods. Renaming a stored procedure to make it easier to understand. NonA change to your database Adding a new column to an existing RefactoringTransformation Chapter schema that changes its table.
Database Smells Fowler introduced the concept of a "code smell," a common category of problem in your code that indicates the need to refactor it. Common code smells include switch statements, long methods, duplicated code, and feature envy.
Similarly, there are common database smells that indicate the potential need to refactor it Ambler These smells include the following: Multipurpose column. If a column is being used for several purposes, it is likely that extra code exists to ensure that the source data is being used the "right way," often by checking the values of one or more other columns.
An example is a column used to store either someone's birth date if he or she is a customer or the start date if that person is an employee. Worse yet, you are likely constrained in the functionality that you can now supportfor example, how would you store the birth date of an employee?
Multipurpose table. Similarly, when a table is being used to store several types of entities, there is likely a design flaw.
An example is a generic Customer table that is used to store information about both people and corporations. The problem with this approach is that data structures for people and corporations differpeople have a first, middle, and last name, for example; whereas a corporation simply has a legal name. A generic Customer table would have columns that are NULL for some kinds of customers but not others. Redundant data. Redundant data is a serious problem in operational databases because when data is stored in several places, the opportunity for inconsistency occurs.
For example, it is quite common to discover that customer information is stored in many different places within your organization. In fact, many companies are unable to put together an accurate list of who their customers actually are. In this case, this is actually one person who used to live at Main Street but who moved last year; unfortunately, John did not submit two change of address forms to your company, one for each application that knows about him. Tables with too many columns.
When a table has many columns, it is indicative that the table lacks cohesionthat it is trying to store data from several entities. Perhaps your Customer table contains columns to store three different addresses shipping, billing, seasonal or several phone numbers home, work, cell, and so on.
You likely need to normalize this structure by adding Address and PhoneNumber tables. Tables with too many rows. Large tables are indicative of performance problems. For example, it is time-consuming to search a table with millions of rows. You may want to split the table vertically by moving some columns into another table, or split it horizontally by moving some rows into another table. Both strategies reduce the size of the table, potentially improving performance.
A smart column is one in which different positions within the data represent different concepts. For example, if the first four digits of the client ID indicate the client's home branch, then client ID is a smart column because you can parse it to discover more granular information for example, home branch ID. Another example includes a text column used to store XML data structures; clearly, you can parse the XML data structure for smaller data fields. Smart columns often need to be reorganized into their constituent data fields at some point so that the database can easily deal with them as separate elements.
Fear of change. If you are afraid to change your database schema because you are afraid to break somethingfor example, the 50 applications that access itthat is the surest sign that you need to refactor your schema.
Fear of change is a good indication that you have a serious technical risk on your hands, one that will only get worse over time. It is important to understand that just because something smells, it does not mean that it is badlimburger cheese smells even when it is perfectly fine.
However, when milk smells bad, you know that you have a problem. If something smells, look at it, think about it, and refactor it if it makes sense.
Craig Larman summarizes the research evidence, as well as the overwhelming support among the thought leaders within the IT community, in support of evolutionary approaches.
Unfortunately, most data-oriented techniques are serial in nature, relying on specialists performing relatively narrow tasks, such as logical data modeling or physical data modeling.
Therein lies the rubthe two groups need to work together, but both want to do so in different manners. Our position is that data professionals can benefit from adopting modern evolutionary techniques similar to those of developers, and that database refactoring is one of several important skills that data professionals require. Unfortunately, the data community missed the object revolution of the s, which means they missed out on opportunities to learn the evolutionary techniques that application programmers now take for granted.
In many ways, the data community is also missing out on the agile revolution, which is taking evolutionary development one step further to make it highly collaborative and cooperative. Database refactoring is a database implementation technique, just like code refactoring is an application implementation technique. You refactor your database schema to ease additions to it. You often find that you have to add a new feature to a database, such as a new column or stored procedure, but the existing design is not the best one possible to easily support that new feature.
You start by refactoring your database schema to make it easier to add the feature, and after the refactoring has been successfully applied, you then add the feature.
The advantage of this approach is that you are slowly, but constantly, improving the quality of your database design. This process not only makes your database easier to understand and use, it also makes it easier to evolve over time; in other words, you improve your overall development productivity. Notice how all the arrows are bidirectional. You iterate back and forth between activities as needed.
Also notice how there is neither a defined starting point nor a defined ending pointthis clearly is not a traditional, serial process. Potential development activities on an evolutionary development project. Database refactoring is only part of the evolutionary database development picture. You still need to test your database schema and put it under configuration management control. And, you still need to tune it appropriately. These are topics better left to other books. This is true of code refactoring, and it is certainly true of database refactoring.
Our experience is that coupling becomes a serious issue when you start to consider behavioral issues for example, code , something that many database books choose not to address.
The easiest scenario is clearly the single-application database because your database schema will only be coupled to itself and to your application. With the multi-application database architecture depicted in Figure 2.
Databases are highly coupled to external programs. An effective way to decrease the coupling that your database is involved with is to encapsulate access to it. You do this by having external programs access your database via persistence layers, as depicted in Figure 2.
A persistence layer can be implemented in several waysvia data access objects DAOs , which implement the necessary SQL code; by frameworks; via stored procedures; or even via Web services. As you see in the diagram, you can never get the coupling down to zero, but you can definitely reduce it to something manageable.
Reducing coupling via encapsulating access. What You Have Learned Code refactoring is a disciplined way to restructure code in small, evolutionary steps to improve the quality of its design. A code refactoring retains the behavioral semantics of your code; it neither adds functionality nor takes functionality away.
Database refactoring is one of the core techniques that enable data professionals to take an evolutionary approach to database development. The greater the coupling that your database is involved with, the harder it will be to refactor. Chapter 3. The Process of Database Refactoring A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.
Max Planck This chapter describes how to implement a single refactoring within your database. We work through an example of applying the Move Column page , a structural refactoring. Although this seems like a simple refactoring, and it is, you will see it can be quite complex to safely implement it within a production environment. Figure 3. Balance column to the Account table, a straightforward change to improve the database design. Moving the Customer. Balance column to Account.
In Chapter 1, "Evolutionary Database Development," we overviewed the concept of logical working sandboxesdevelopment sandboxes in which developers have their own copy of the source code and database to work with; a project-integration environment where team members promote and then test their changes; preproduction environments for system, integration, and user acceptance testing; and production.
The hard work of database refactoring is done within your development sandboxit is considered, implemented, and tested before it is promoted into other environments. The focus of this chapter is on the work that is performed within your development sandbox.
Chapter 4, "Deploying into Production," covers the promotion and eventual deployment of your refactorings. Because we are describing what occurs within your development sandbox, this process applies to both the single-application database as well as the multi-application database environments.
The only real difference between the two situations is the need for a longer transition period more on this later in the multi-application scenario. The process begins with a developer who is trying to implement a new requirement to fix a defect. The developer realizes that the database schema may need to be refactored. In this example, Eddy, a developer, is adding a new type of financial transaction to his application and realizes that the Balance column actually describes Account entities, not Customer entities.
Together they iteratively work through the following activities: The database refactoring process. Choose the most appropriate database refactoring. Deprecate the original database schema. Test before, during, and after. Modify the database schema. Migrate the source data. Modify external access program s.
Run regression tests. Version control your work. Announce the refactoring. There are three issues to consider: Does the refactoring make sense?
Perhaps the existing table structure is correct. It is common for developers to either disagree with, or to simply misunderstand, the existing design of a database.
This misunderstanding could lead them to believe that the design needs to change when it really does not. The DBA should have a good knowledge of the project team's database, other corporate databases, and will know whom to contact about issues such as this. Therefore, they will be in a better position to determine whether the existing schema is the best one. Furthermore, the DBA often understands the bigger picture of the overall enterprise, providing important insight that may not be apparent when you look at it from the point of view of the single project.
However, in our example, it appears that the schema needs to change. Is the change actually needed now? This is usually a "gut call" based on her previous experience with the application developer.
Does Eddy have a good reason for making the schema change? Can Eddy explain the business requirement that the change supports? Does the requirement feel right? Has Eddy suggested good changes in the past?
Has Eddy changed his mind several days later, requiring Beverley to back out of the change? Depending on this assessment,Beverley may suggest that Eddy think the change through some more or may decide to continue working with him, but will wait for a longer period of time before they actually apply the change in the project-integration environment Chapter 4 if they believe the change will need to be reversed.
Is it worth the effort? The next thing that Beverley does is to assess the overall impact of the refactoring.
To do this, Beverley should have an understanding of how the external program s are coupled to this part of the database. This is knowledge that Beverley has built up over time by working with the enterprise architects, operational database administrators, application developers, and other DBAs.
When Beverley is not sure of the impact, she needs to make a decision at the time and go with her gut feeling or decide to advise the application developer to wait while she talks to the right people.
Her goal is to ensure that she implements database refactorings that will succeedif you are going to need to update, test, and redeploy 50 other applications to support this refactoring, it may not be viable for her to continue.
Even when there is only one application accessing the database, it may be so highly coupled to the portion of the schema that you want to change that the database refactoring simply is not worth it. In our example, the design problem is so clearly severe that she decides to implement it even though many applications will be affected. Take Small Steps Database refactoring changes the schema in small steps; each refactoring should be made one at a time. For example, assume you realize that you need to move an existing column, rename it, and apply a common format to it.
Instead of trying this all at once, you should instead successfully implement Move Column page , then successfully implement Rename Column page , and then apply Introduce Common Format page one step at a time. The advantage is that if you make a mistake, it is easy to find the bug because it will likely be in the part of the schema that you just changed. Choose the Most Appropriate Database Refactoring As you can see in this book, you could potentially apply a large number of refactorings to your database schema.
To determine which is the most appropriate refactoring for your situation, you must first analyze and understand the problem you face. When Eddy first approached Beverley, he may or may not have done this analysis. For example, he may have just gone to her and said that the Account table needs to store the current balance; therefore, we need to add a new column via the Introduce Column transformation on page However, what he did not realize was that the column already exists in the Customer table, which is arguably the wrong place for it to beEddy had identified the problem correctly, but had misidentified the solution.
Based on her knowledge of the existing database schema, and her understanding of the problem identified by Eddy, Beverley instead suggests that they apply the Move Column page refactoring. Sometimes the Data Is Elsewhere Your database is likely not the only source of data within your organization.
A good DBA should at least know about, if not understand, the various data sources within your enterprise to determine the best source of data. In our example, another database could potentially be the official repository of Account information. If that is the case, moving the column may not make sense because the true refactoring would be Use Official Data Source page Deprecate the Original Database Schema If multiple applications access your database, you likely need to work under the assumption that you cannot refactor and then deploy all of these programs simultaneously.
During the transition period, you support both the original and new schemas in parallel to provide time for the other application teams to refactor and redeploy their systems.
Typical transition periods last for several quarters, if not years. The potentially long time to fully implement a refactoring underscores the need to automate as much of the process as possible. Over a several-year period, people within your department will change, putting you at risk if parts of the process are manual. Having said that, even in the case of a single-application database, your team may still require a transition period of a few days within your project-integration sandboxyour teammates need to refactor and retest their code to work with the updated database schema.
You first implement it within the scope of your project, and if successful, you eventually deploy it into production.