Performance Metrics for Data Scientists

Share on facebook
Share on twitter
Share on linkedin
Share on reddit

Run-through

Testing and Automation Tools to Save a Developer’s Time

Testing and Automation Tools to Save a Developer’s Time

For many software engineers, having a well-configured set of solutions is critical for their productivity and results delivered. Mostly, it is the automation and testing that makes it possible to speed up software development processes and increase the efficiency of work on projects of every developer.
This article will describe 8 handy tools, the use of which will increase the productivity and quality of the final product, help with testing and automating recurring tasks, and reduce the time required for product development.

Read More »

Can automated performance analytics work for data scientists? Yes. Though the job title may change, the mission of performance analytics remains the same. Our basic goal is to improve team dynamics and facilitate excellence in software development. But, how exactly does that work with respect to data scientists? That’s where the fun comes in. Two major issues can surface with the role of a data scientist – a) their job description, and b) a global shortage of data scientists.

What Do Data Scientists Do?

Data scientists use Big Data to create value. They find data, analyze it for potential value, and build tools to distill it into actionable insights people can understand without having an aneurism. A data scientist’s job description covers more tasks, but that’s enough heavy-duty work to keep us busy for a little while.

Performance Metrics for Data Scientists

Big Data involves data catalogs measuring in the Terabytes (1 Tb = 75 million pages of text). It’d take a few hundred people a year just to read through it, say nothing of understanding it or analyzing it. The data can be in hundreds of different file formats (txt, rtf, pdf, xslx, even jpg’s). It’s likely derived from a hundred or more sources, some in hard copy. Ultimately, they’re able to assess if there’s a meaningful relationship between two or more data points.

Let’s say we have a data scientist for an international retailer. They may track a few hundred data points for each product in both their own and their competitor’s inventory. They could look at a company’s last promotion of a specific product to see how it affected their sales and their competitor’s sales. Data scientists can take that much, much further. To gather such insights, data scientists must create algorithms – and display the data in a way that people can understand.

The Big Problem? A Global Shortage of Data Scientists

According to QuantHub, there was a shortage of 250,000 data scientists in 2020 concurrent with steadily increasing demand. The US Bureau of Labor Statistics estimates a growth rate of 31% for data scientists and related positions through 2029. There are over 57,000 data scientist job postings on LinkedIn at the moment. All of this helps underscore that 86% of companies find it challenging to hire qualified IT talent.

Data scientists are often expected to have a Master’s degree and several years of experience. But, companies also want industry professionals familiar with their particular market dynamics. And they want some coding skills. And, in many cases, despite the lockdowns, candidates should either live locally or be willing to relocate. And then… Mick Jagger just ran up to me and slapped me like the Orangeman from a Tango commercial singing, “You can’t always get what you want, but if you try, sometimes – you get what you need.”

As the Toptal data scientist job description above more succinctly describes, data scientists are,
“x% scientist, y% software engineer, and z% hacker.” Only, a lot of times, it’s also necessary to add, “w% industry expert.” We can simply say that many data scientists are still in the process of learning and developing their job-specific skills. So, while they may be challenging to find, they are by no means unique with respect to the need and value of continued education and training.

Moreover, companies with data scientists are likely to have more than one. They are very likely to be part of a team with other software engineers and developers. Companies with the need for more data scientists are often trying to source from within – perhaps training their engineers to be scientists while helping developers become engineers and engineering managers. So, knowledge sharing and team development are also factors that come into play.

How Does Software Development Performance Analytics Help?

Data scientists have metrics for almost everything related to what they’re analyzing and the value they create for the business. Their metrics don’t always extend to performance metrics like how productive or efficient they are in coding. That’s probably not something they should be spending their time on anyway, as well – automated solutions like Gitential have you covered in this respect.

1. How much time do you want your data scientists to spend actually coding?

Several issues are tied to this question. Which programming languages do your data scientists know and which ones are they best at? Are they recent graduates? New to your company? Are you looking to have them help develop the data science skills of your entire development team? As it is, most software developers are lucky to spend half of their time coding. For many companies the scarcity of the data scientist’s skillsets requires balancing several priorities:
  1. Keeping their skills fresh and aligned to your company’s projects and SOP
  2. Continuously improving and expanding their skills
  3. Expanding the data science capabilities of your team
  4. Building company/team value
Data scientists new to your team will need to go through an onboarding process like all other team members. It can take 3-6 months for any new team member to achieve “normalized” efficiency. Stripe’s Developer Coefficient presents that industry-wide inefficiency in software development is running around 31.6%. Improvements can be made even with mature development teams. If your data scientist is expected to spend any significant amount of time coding, it’s worth measuring so you can:
  • Objectively assess a data scientist’s coding skills, on a per programming language basis, so they can be assigned tasks of corresponding complexity as needed.
  • See how they’re improving and where they can improve with respect to code churn, test coverage, defect rate, etc.
  • Evaluate the best mutual pairing options for including them in code reviews.
  • Get a sense of whether they’re starting to burnout and adjust their workload or if they’re looking for other jobs so you can take time to explore local options.

2. Determine the programming languages they use most and know best

Most data scientists specialize in a few programming languages. SQL and R are usually mandatory. They may be fluent or learning additional languages like VBA, Python, Java, JavaScript, C/C++, Scala, Matlab, SAS, TensorFlow, etc. Their skillsets will define what language/s are used within the company. However, data scientists using different languages can implement the same thing, just with a different way of getting to the results. That’s why it’s helpful to understand what language/s they know best. Instead of forcing them to implement in one language, let them use the one they are the most familiar with, unless project or client requirements specify differently. Gitential automatically tracks which programming languages your team is working on (we have ~99.5% accuracy with about 1,000 different file formats). This makes it easy for you to understand the languages in which your data scientists are most proficient. You’ll easily identify the languages used most by your team – and individual team members. In turn, you’ll have insight on each developer’s efficiency and churn rate.

3. Code Reusability

Your data scientists may very well be in a position of creating new wheels – so that your developers don’t have to. It’s considered a best practice for developers to use existing solutions whenever possible. Creating new solutions involves more time and introduces a greater chance of introducing bugs. When creating algorithms for pulling and analyzing data from a catalog or repository, it’s entirely possible that you can create derivatives with some minor changes. The individuals running reports often like to see the data in different ways, using different dimensions and variables.

A data scientist can show your developers how to modify an algorithm to show the data in different ways and different/additional data points. This cannot be tracked directly, but it should translate to an increase in developer productivity.

Comparing Project Complexity

For an end user, a data science project can be as simple as entering one variant number for a price elasticity calculator, and the tool will spit out different price ranges which can be competitive on the market. However, the data science model behind the scene is way more complex than that. One objective is to understand the complexity and code heaviness of different projects. Code complexity can provide a better understanding of how much effort it took the team to code the actual tool vs. how much time it took to design it (e.g. logged hours on brainstorming sessions).

From this, you can better assess the skills needed most on different types of projects within the same organization. You can also better understand what’s happening with projects having the same, or comparable, complexity. For software development agencies, increasing your capacity for more complex projects expands the scope of projects you can confidently take on.

Cycle Time & Waste Production

For all of the reasons why devs are lucky to spend half their time coding, data scientists have a lot of tasks on their plate, too. We can start with participating in code reviews and retrospectives, PR merges, setting up tests, but also finding data, cleaning, organizing, and analyzing it. Another portion of their role is often to communicate it in a way that people understand – a task often as much an art as a science. So, they may need to write code to spit out everything into easy-to-understand reports. Then, to share what they’ve done so everyone else can understand it, they’re forced to make their own PowerPoint presentations! Sometimes the art of data science is handed off to Data Artists. But, a portion of a scientist’s role is to experiment. The sheer volume and variety of data that’s available today prompts questions that have never been asked in the history of civilization – is there a meaningful statistical relationship between the type of car one drives and the type or brand of beverage they prefer? Though probably not useful as a standalone statistic, taken together with other consumer statistics can be used to create highly targeted promotional campaigns. Not all data is going to be useful, and even if it’s accurate – it may not be valuable. So, there’s a lot of effort involved there – but how does that fit into cycle time? How much of their time is spent on writing code that never gets pushed to production? Is that work a waste or can it be used for another project later down the road?

In Summary

The characteristics of a perfect data scientist include, at a minimum, strong programming and analytical skills, industry expertise and experience, good communication and interpersonal skills. They already know everything about your business and they… live right across the street. There are a limited number of unicorns in our universe. But, a company with a continuous development program that prioritizes skills when hiring for the role can have a hand in creating as many unicorns as they need.

Sometimes, some simple coding experience and enthusiasm, with a measure of guidance, can grow a person to have a successful career as a data scientist. Everyone starts somewhere – no one spontaneously wakes up as a data scientist… even if it is a dream job.

Even the best software developers have a vested interest in improving their coding skills – and the skills of their teammates. At times, it may be necessary for a data scientist to fill the role of a software developer. However, the scarcity of their skills for most companies warrants a strategic view for knowledge sharing. Your developers and data scientists can work together to continuously and mutually improve their industry knowledge and coding skills.

Gitential for Data Scientists

Two additional points are worth mentioning. While all experiments will not be useful, those that are tend to make it easier to find other useful experiments. Some teams utilize their data scientists to improve their team’s overall data science capabilities. As Winston Churchill said, “Success consists of going from failure to failure without loss of enthusiasm.” Add that to Silicon Valley’s mantra of “fail fast and fail frequently” (one would also hope “painlessly”). If only by coaching developers how to also ask questions likely to lead to valuable insights, their value can go exponential. The work of a data scientist is an abstract and creative process. There are performance metrics that can support their work and help their team leads to understand the strengths and weaknesses of their data science teams and individual team members. At Gitential, we would like to provide a visibility on the working habits and coding practices of data scientist teams so product managers and team leads can have a clearer view on expected timelines and challenges in a such a creative and sometimes unpredictable area. If you have any questions, please let us know at gitential@gitential.com. We welcome you to sign up for a free trial, no credit card is needed!
Testing and Automation Tools to Save a Developer’s Time

Testing and Automation Tools to Save a Developer’s Time

For many software engineers, having a well-configured set of solutions is critical for their productivity and results delivered. Mostly, it is the automation and testing that makes it possible to speed up software development processes and increase the efficiency of work on projects of every developer.
This article will describe 8 handy tools, the use of which will increase the productivity and quality of the final product, help with testing and automating recurring tasks, and reduce the time required for product development.

Read More »

Did you like our content?

Spread the word

Share on facebook
Share on twitter
Share on linkedin
Share on reddit

Subscribe to Our Newsletter

Don't miss our latest updates.
All About Software Engineering Best Practices, Productivity Measurement, Performance Analytics, Software Team Management and more.

Did you like our content?

Spread the word

Share on facebook
Share on twitter
Share on linkedin
Share on reddit

Subscribe to Our Newsletter

Don't miss our latest updates. All About Software Engineering Best Practices, Productivity Measurement, Performance Analytics, Software Team Management and more.