Research-driven - How spikes lay ground for projects

Photo of a row of old books
Photo by Thomas Kelley on Unsplash
Highly subjective content
This post reflects the author's personal opinions, shaped by their individual experiences, beliefs, and biases. It should be viewed as such, and interpreting it as factual may lead to misunderstandings. The author's perspectives are unique and may differ from the reader's, with no expectation of alignment.

All but the most trivial software projects require at least some research to be done, and technical decisions to be made throughout the project’s lifecycle - especially at its onset.

In some corporate cultures, technical decisions are made based on authority - the decision was made by “the big man”, such as a System Architect or a certain Manager, and thus is not to be challenged or asked about. Such decisions are frequently made in the privacy of an ivory tower, and the teams responsible for the projects rarely have a chance to learn about the why part - let alone being able to speak up and bring their perspective to the table. Such authority-based arrangements often suffer from various issues:

Therefore, I find it far more productive, healthy and inclusive to drive project-related decisions with thorough research, which in software engineering practice is often framed as a concept of Spike. Such research helps make better-grounded decisions, while also turning decision-making into a more impersonal, team activity. In my experience, it helps with making objectively better decisions while also improving team alignment, since team members can feel included and can trace back the rationale of project decisions, especially if followed by an Architecture Decision Record referencing relevant spikes.

What is a spike, then?

At a basic level, a spike is an activity within a software project aimed at providing input to action on later, rather than deliverables. While the details, definitions and ceremonies around spikes differ from methodology to methodology and from organization to organization, they usually aim at researching a topic, and delivering a document and/or PoC covering the research done. Following aspects are fairly typical for a spike regardless of particular software development methodology:

Preparing for a spike

To make the best out of a spike, it is vital to be well prepared before carrying it out. One cannot just do a spike without defining what is the goal of the spike, what needs to be done, and in what time frame.

Knowledge base

Most organizations have at least one destination for documentation and other reference resources. Some may have multiple - which may lead to confusion if it is not clear which should be used as the team’s knowledge base. This may lead to undesirable consequences:

In general, it is typically practical for a team to keep a single knowledge base to serve as a destination for internal project documentation. This might mean a dedicated documentation repository, Confluence Space, a dedicated directory tree within a broader Confluence Space etc. In any case, the next step to improve structure of such knowledge base is to create a dedicated destination for spike results - oftentimes another sub-directory, nested in project knowledge base. Furthermore, to improve accessibility of such knowledge base - and spike repository - is to cross-reference it frequently, for instance to provide context for work items. This way, team members can more easily access pieces of documentation relevant to the task at hand.

Clarify expectations

A well defined spike task includes the following elements:

With such inputs, a spike can be carried out efficiently, and bring tangible results that the team can action on later, or base the decisions on the research done.

Establish the knowns and unknowns

Let us reference the famous Rumsfeld Matrix in the context of spikes and research:

Awareness Knowledge
  Known Unknown
Known
 
 
Known Knowns
 
The facts and assumptions we are aware of.
Known Unknowns

The gaps we are aware of but don’t fully understand.
Unknown
 
 
Unknown Knowns
 
Things we don’t realize we know but could leverage.
Unknown Unknowns

The true surprises we don’t know exist.

Most of the time, the goal of the spike is to improve the team’s knowledge and/or awareness, especially by improving the knowledge gaps:

  1. Improving awareness of unknown unknowns so that they become known unknowns - if the team has poor awareness with regard to the topic, it is usually more efficient to start with a spike to establish a basic understanding of the topic, establish what are the options and industry standards with regard to the subject.
  2. Gathering knowledge and/or experience to turn unknown knowns into known knowns - once the team has built an awareness in the subject, and acknowledges its gaps, there is a room for a spike to explore a topic in-depth, gather available examples and/or prepare a PoC on how the subject could be addressed.

Generally speaking, I find spikes to be a good fit to address these two areas. With regard to unknown knowns, it is better handled with chores and housekeeping than with spikes, as it might indicate excessive tribal knowledge in the team and missing, outdated or poorly accessible project documentation. Addressing such issues with spikes may feel unwieldy, and may even make the situation worse if the results of the spike would suffer from accessibility issues.

In my experience, when one’s starting point is in the unknown unknowns quadrant, it is more practical to split the spike into stages, rather than expect to get to known knows state with a single spike. Otherwise, the results may be disappointing:

Therefore, it may be a good idea to begin with a spike to provide broad research of the topic and available options, do a rough comparison, and pick the most promising candidate(s) for further research. Then, one or more spikes are needed to evaluate in depth the options picked in the first stage. The work can be split between multiple team members to improve knowledge sharing, promote inclusion and avoid overwhelming one person with all of the burden.

Who should be responsible?

Though this is a highly opinionated statement, I firmly believe the responsibility for carrying out spikes should be shared across the team. There are multiple reasons for this:

Having said that, it is up to the team to decide how the research work should be distributed, and whether it should be done in an organized or organic manner. The team may find it desirable to trust the most experienced team members with carrying out spikes, or use them as a growth opportunity for juniors. It is also a matter of the team’s ways of working to determine if one person should pick up similar topics to become a team’s Subject Matter Expert, or related topics should be researched by multiple people to promote knowledge sharing.

Carrying out a spike

Once the preparations for a spike are complete, and the spike task is picked up by somebody, the most critical part begins. There are multiple factors affecting the quality of the spike outcomes:

Understandably, it is not always possible to not be opinionated beforehand about the subject, and there is always more that could be done about the research. However, it is vital to stay as neutral as feasible and let the findings drive the outcomes and conclusions, not vice versa.

How to structure spike outcomes?

It is quite customary to write down a document covering the spike’s findings, regardless of whether the spike was focused on exploring external resources, or carrying out a Proof of Concept. An approachable document structure improves readability, clarity and overall quality of the outcomes, and help maintain the flow of the document.

Over the years of carrying out spikes for various project, I have worked out a spike document structure that works well in this regard and often gets replicated as I introduce it:

  1. Table of Contents,
  2. Introduction, including:
    • Problem statement,
    • Objectives,
    • Constraints,
    • References to general internal and external literature.
  3. Research section(s):
    • One sub-section per considered option or scenario,
    • References to specific, detailed internal and external literature,
    • References to PoC sources (if present),
    • Written description of the findings,
    • Diagrams and/or images (if relevant).
  4. Conclusions:
    • Comparison of available options or considered scenarios (if present),
    • Summary of the findings,
    • Recommendations for decision-making (if needed),
    • A list of action items (if relevant).

While not fully exhaustive, this structure works reasonably well and can be applied quite flexibly to various spike subtypes. It’s structure helps navigate through the document, and omit parts the reader is not interested in - not all readers have time or will to read the entire document, and thus may choose to skip the details. For a more thorough reader or reviewer, references to documentation can be helpful for a deep dive into the topic, while the document itself only summarizes each aspect.

Introduction

Introduction sets the context for the entire spike document, providing the readers with the driving factors of a spike.

The problem statement states what issue, challenge, requirement or incident led to this spike being carried out. It can describe the observations made beforehand that led to the conclusion a spike is needed. An example of a problem statement could be as follows:

According to our metrics reports, the application resource utilization has been increasing over the last 3 months, leading to more frequent crashes caused by exhausted memory. Increased frequency of crashes led to reduced availability of the application - from 99.97% in May to 99.62% in August, getting close to the agreed SLO of 99.5% weekly availability.

The objectives describe what goals need to be achieved as a result of this spike, what are the expected outputs, and what knowledge gaps should be filled. The objectives of a spike could be defined as follows:

As a goal of this spike, we need to identify the root causes of increased memory utilization and come up with resolution to them, and/or with measures to prevent application availability from degrading any further.

The constraints focus what limitations does the team need to take into consideration. The spike constraints may include:

In our example, the constraints could be described as:

We estimate that at the current pace, the SLO would be breached in early October. For this reason, the resolution must be applicable within this time frame. Given the applications release schedule, the change must be ready for release before the end of September release cutoff. If root cause resolution cannot be delivered in this time frame, we need to come up with temporary measures first.

Finally, the references included in the introduction part should point to resources relevant to the spike as whole, rather than specific solutions considered. Examples include:

Research section(s)

These is the part where the analytical aspect of the spike is most exposed.

In case of a Proof of Concept, the research section would be committed to describing:

In case of comparing various available options, there would be multiple sections or sub-sections dedicated to each option. In this case, the sections should include:

Moreover, regardless of the spike type, the research should include the following information:

The advantages and disadvantages can either be summarized in a separate sub-section, or broken up and outlined in relevant sub-sections. It is vital to consider both the advantages and disadvantages of each solution - the Software Engineering industry is full of trade-offs, and the picture is almost never black-and-white. A reliable researcher should transparently and thoroughly document both pros and cons of each solution, regardless of their personal preferences and gut feelings. Otherwise, the spike outcomes could be considerably biased, and the document would raise red flags for readers and reviewers - and rightly so.

Conclusions

This is the closing section of a spike document, that should summarize the findings and provide guidance on the next steps.

In case of spikes exploring multiple options, it is useful to include a comparison sub-section that would briefly compare the available options in each respective aspect, as well as overall. Such an information-dense summary can then be easily referenced without losing out too many details. While the format of such comparison is up to the team and/or author to decide, I find the tabular format to be particularly useful, for instance:

Memory-related crash resolution alternatives:

Aspect / solution Carry out GC fine-tuning Find and fix memory leaks Increase memory limits
Time to production
 
+ should be doable within 1-2 weeks
- may run into unpredictable problems
- likely time-consuming
- difficult to estimate
+ can be done in less than a day
+ maintenance chore, can bypass release schedule
Root cause resolution
 
+ improves memory management
- does not solve suspected memory leak
+ resolves suspected root cause
 
- does not solve the root cause
- temporary measure only

The next useful sub-section is a summary, briefly describing the author’s conclusions considering the research findings. An example of such summary could be as follows:

The research has shown the increase in memory utilization is likely caused by one or more memory leaks in the application, which must have been introduced shortly before May. A long-term solution would be to find and resolve such memory leaks, so that the memory utilization does not increase indefinitely. However, this activity may be time-consuming, therefore as a short-term solution it is worth considering to increase memory limits and/or fine-tune application’s garbage collector.

If the spike is meant to be an input for decision-making, it is worth providing recommendations for the person or team responsible for making the decision, for instance:

Given the time constraints and possible gains, it is recommended to start with increasing memory limits by 10-25%, followed by GC fine-tuning. These efforts should secure the application’s SLOs short-term. Meantime, we should investigate and resolve suspected memory limits as a long-term solution.

If the spike is expected to be followed with actions instead of input for decisions, such action items should be listed, for instance:

  • Increase memory limits by at least 10%;
  • Create a task to fine-tune GC to work around memory leak;
  • Create a task to find and fix memory leaks in the application;

Once the spike is concluded, these action items would often be used to create follow-up work items in a tracker, such as Jira or Asana.

Spike document structure example

This Markdown snippet roughly reflects the actual structure of documents I typically write down, although I find Confluence to be a more common option:

# Spike title

* Auto-generated ToC items
* ...

# Introduction

## Problem statement

Describe what was the original problem that led to this spike

## Objectives

Describe what goals need to be achieved.

## Constraints

Describe the limitations that had to be considered in the spike.

## References

Links to relevant resources.

# Research

## Option A (B, C ...)

Include the following
* Links to resources relevant to the option
* Summary of the option

### Topic-specific subsection(s)

Describe how the option fits into the considered aspect in each sub-section.

### Advantages

Describe what are the advantages of the option.

### Disadvantages

Describe what are the disadvantages of the option.

### Examples / outcomes

If possible, provide examples of how the option can be applied, or if there is a PoC, the outcomes of it.

# Conclusions

## Comparison

Summarize the available options, their pros and cons in various aspects, e.g. in a table:

| Aspect   | Option A | Option B | ...
| -------- | -------- | -------- | ...
| Factor X | + pros   | + pros   | ...
|          | - cons   | - cons   | ...
| Factor Y | + pros   | + pros   | ...
|          | - cons   | - cons   | ...
| ...      | ...      | ...      | ...

## Summary

Summarize the findings, such as:
* whether the PoC was successful 
* which options seem to be promising, and which seem to be dead ends.

## Recommendations

If needed, provide recommendations for the decision makers based on the findings.

## Action items

If needed, provide action items that should be taken as a follow up of the spike.

What are the next steps?

When the active part of the spike is concluded, it is time for a spike review.

Similarly to code contributions and ADRs, spike outputs should be reviewed by the team to ensure consistent quality. Additionally, a review is a great opportunity for the team to get acknowledged with the findings. While knowledge bases rarely provide robust tooling for reviews, the generally available features should suffice, for instance:

Lastly, once the spike is reviewed and closed, the spike results should actually be used by the team and decision-makers, for the commitment to be useful in the first place.

Spike review

The primary goal of the review is to make sure the output meets the desired standards. When reviewing a spike document, or asking for a spike review, I generally expect following aspects to be considered and reviewed:

Taking action

The value of a spike is in its outputs usefulness for future work and decision-making. If a spike gets carried out, but then its outputs are never read or used later - it is essentially a wasted effort. Moreover, dismissing the spike outputs does not help foster a meritocratic and inclusive culture. After all, if the team’s inputs are not even considered when making a decision, it says a lot about how the team’s contributions are perceived and respected.

There are a few major ways the outputs of a spike can, and should be utilized:

In case of follow-up action items, the outputs of the spike that created said action items should be linked to ensure the context is easily accessible. Likewise, if decisions are documented in an Architecture Decision Record, the spike that provided input for the decision should be cross-referenced for clarity and transparency.

Summary

Spikes are a great utility to conduct research that helps drive projects to better results. If handled well by the team, they foster a more inclusive, meritocratic culture, while also improving knowledge sharing and team’s alignment with decisions made. By adopting spikes and utilizing their outputs in decision-making processes, teams are able to make better informed decisions.