Collecting Excel-data into a relational database
Looking for an escape from what often is called “Excel hell”. The solutions proposed are numerous. So saying I have the solution for you is of course not the correct way of convincing you.
For many years I have been working with customers whose high valued Excel-solution for solving a business problem has come to a dead-end. The reason for this differs, some of them are:
- The “Excel-guru” responsible has stopped working in the department or left the company
- The volume of data is far to high. Practical limit seems in many cases to be below 100’ rows.
- The data has become to complex. After a while many Excel-solutions start to handle interrelated data. Combining data from different sheets sooner or later becomes a hassle.
- Complex integration with ERP-system. There will sooner or later be a need to feed data into the ERP-system of the company. This means the database in the ERP has to be in sync with the database stored in Excel.
These factors combined with the ordinary problems in Excel, where functions exists but the use of them are poorly known by the persons in charge.
- Keep tables free from duplicates
- Joining data between two sheets avoiding orphans
- Handling many versions of the truth in different workbooks.
Together this means that sooner or later you get the understanding that a relation database is a better place to store your data. In these systems there is a good support for avoiding many of the pitfalls in Excel.
In no way I would like to criticize the Excel-solutions I have had the benefit of investigating in my career. All of them are built by gifted persons knowing the business problem very well. They just want to find ways of doing it a little bit smarter with the use of their computer. That was their intention from the beginning. After a while persons in the neighborhood starts too give them ideas of how to make the solution better and solve some more related business problems. They put in a new sheet or two they create another workbook for this special case. In the beginning these habits work very well and the process of development goes very quick including new functions. After a while the work most of their time as a system developer and leave their ordinary responsibilities.
On my webb-site I have used these characters to describe the phases I have seen many professionals enter when becoming “Excel-gurus”. The upper row is the positive version and the lower is the negative in the three steps described. As always the time dimension is horizontal.
Unfortunately the step in knowledge needed for designing a relational database is often high if you not have any education in computer science.
The relational data model is very similar to Excel, it handles data in tables (sheets). It has functions supporting you to avoid duplicate rows in the table, and it has functions for describing the relationships between the data stored in tables in different sheets and workbooks. A very simple model could be Customers, Sales and Products, which you in an initial Excel solution probably would store in one sheet:
After a while you or someone else get the idea that it would be nice to have some more information about the customer and the items, you also find out that it would be valuable to have information about who took the order or if it was e-sales, and you would like to know the discount.. etc.
So the model grows and get wider and wider:
No problem the limit 256 columns is gone… The problem is that you are now staring to put in redundant information in your model. And if you do a misspelling of John’s name you will not discover he is a returning customer… As shown above you have put in one row the content of four different tables: Customer, Item, Invoice and Invoice row. Sooner or later this will give you trouble.