When I read yet another article about software delivery at big tech companies as Google I got amazed how they know what to do. I mean, most of us use agile practice and the best DevOps techniques, but after several months running a project, you might face 2 problems. The first problem is to believe that by following the process you will succeed with the project, but the reality is not that simple. By focusing on procedures you only succeed in a well-written project plans and perfectly defined checklists. The second problem is to be on the other side - don't follow procedures, as a result, you are increasing the risk of fire fighting and chaos.
I realize that everyone can fail a project and here is one of the examples. My point is that big tech companies have managed to build a culture of shared knowledge in software delivery. Some people call it data-driven engineering but it doesn't change the meaning too much. It's time to re-read you are not google.
But what if we all can be Google?
Engineering data lake
3 years ago I started a side project. The idea was simple - save all Jira tickets to a database and see how it can be analyzed. When it was done I switched to Git commits. After that I moved to operational data - releases, incidents and customer feedback, by writing integrations for OpsGenie, App stores, Twitter and other systems. What I got as a result - a pretty much database with lots of internal data about software delivery at my previous company. At that time I didn't think about data lake for engineering and operational data, but it turned out that was the exact name of what I built.
The rise of a data-driven culture
In the beginning, I had only my curiosity and determination, but soon my peers other engineering managers and product people started to ask me about additional features. I was in the epicentre of data-driven engineering movement at my company. It was an exciting time - we managed to improve the performance of the engineering team, find process bottlenecks in the design department and provide management with a clear understanding of what is going on.
You are not alone
When we started interviewing engineering managers from other companies we realised that almost the same problem exists in other organizations. At some point, engineering leaders start building internal tools for engineering and operational data. And what we also see that engineering leaders with internal engineering data lakes have got solid engineering culture, fewer problems with visibility and as a result achieve their career goals. And what comes to no surprise that big tech companies for ages have got their engineering data lakes.
How to build your engineering data lake?
I'd like to share my personal view on this. There 4 principals to build engineering data lakes:
1. Prefer unstructured data to no data at all - it's hard to know in advance what kind of question you can answer with data. For instance, your CI log data. This data can show you the server utilization even though you don't need this data right now.
2. Join all data records via connections - connecting your Jira issues with data from HR system will help you filter your data not only by names but by positions also.
3. Data should be accessible for everyone
4. Use the goal question metric approach - this approach saves a ton of time. What I see is that we have a tendency to create metric and only then connect it to the team' performance. Always start from why.
You can read more about metrics in Lean Analytics (https://www.amazon.com/Lean-Analytics-Better-Startup-Faster/dp/1449335675).
How can I start today?
This is why we built valycs.com. Take a look at our use case All your engineering data where you want it. In the next article, I'll review the database for creating engineering data lakes.