Scoping Data Science Projects

Share This Post

Scoping data science project is an iterative process and the scope gets refined both during the scoping process as well as during the project, however it is important to have an initial scope to have an idea of the project goals, deadlines, costs and requirements.

Data Science Project Scope

First of all it is critical to identify key stakeholders in the project including people who understand the business/policy problem, people who understand what data is available, people who will consume the outputs of the system or take action using it, and the people who will be affected by it. 

Once we have identified them, it is highly important to understand the problem and impact it has on business. 

1. Understanding the problem

One of the key elements here is to understand the true need of the project. In most of the cases once you start digging deeper you see that the iceberg is huge and the problem is bigger than even stakeholders may think. 

Business understanding is the critical first step of any Data Science project. Asking many “Why” questions during the initial meetings helps the team better define the most important questions at the heart of the project. For example, if the task is “We need to have more info about our users,” the Data Science team may ask: “Why do you want to know more about them? What is the goal? Do you want to remarket to them? Do you want to make them more engaged? If the answer is engagement, then “what are some behaviors that show the customers are engaged?” The Data Science team can identify the root business problem, the key pain points, and design an actionable solution with measurable impact. 

  Key Data Engineer responsibilities

Also, it is very important to know who or what is affected by the problem, how many are affected, and how much they are affected ( the magnitude of the problem). For example, low sales during the first week of August. Which markets? Any specific pieces or low sales in general? Kids section or overall? 

Then normally the data science team asks the organization to explain why the problem is a priority now and how they have been tackling the problem. Understanding how the organization has tackled the problem can help the data science team identify ways that data analysis can inform the organization’s actions to achieve its goals. 

2. Defining the goals

While defining goals we have to take into account context, needs, vision and outcome. 

Key question here is how the success of the solution will be measured.  

Be as specific as you can, for example “Increase overall august sales by 30%”. The objective here is to take the outcome we’re trying to achieve and turn it into a goal that is measurable and achievable. 

3. Defining what data do you have and what data do you need

It is very important to make a list of data sources that are available inside the organization. This is an iterative process, since many organizations may not have a comprehensive list of their data sources.  Another key questions here are: what data do you need? Can you start collecting the data? Can you purchase the data? Can you legally or ethically use the intended data?

  Top Data science blogs to follow

How clean is this data? Etc. 

4. What analysis needs to be done?

Analyses can use methods and tools from different areas: computer science, machine learning, data science, statistics, and social sciences. Is this a descriptive analysis, a predictive model, or a detection or behavior change task? How will the analysis be validated? What validation can be done using existing, historical data? 

How will you deploy your analysis as a new system so that it can be updated and integrated into the organization’s operations? How will you evaluate the new system in the field to make sure it accomplishes your goals? How will you monitor your system to make sure it continues to perform well over time?

Once you have these stages clear, you can see what profiles do you need to make this data science project happen: data science lead, data engineer, ML engineer, python developer, etc. 

If you need help with estimating and scoping your data science project, let us know! We are experts in this field. 

Author

  • Ekaterina Novoseltseva

    Ekaterina Novoseltseva is an experienced CMO and Board Director. Professor in prestigious Business Schools in Barcelona. Teaching about digital business design. Right now Ekaterina is a CMO at Apiumhub - software development hub based in Barcelona and organiser of Global Software Architecture Summit. Ekaterina is proud of having done software projects for companies like Tous, Inditex, Mango, Etnia, Adidas and many others. Ekaterina was taking active part in the Apiumhub office opening in Paseo de Gracia and in helping companies like Bitpanda open their tech hubs in Barcelona.

    View all posts
  Getting Started with Pandas - Lesson 1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Subscribe To Our Newsletter

Get updates from our latest tech findings

Have a challenging project?

We Can Work On It Together

apiumhub software development projects barcelona
Secured By miniOrange