Development of an R package for calculating territorial indicators

A few words to introduce yourself?

My name is Cindy Andrieu-Dupin. I am Director of Digital Studies and Projects for Transportation at DRIEAT, the Regional Department for Development and Transportation in Île-de-France, which is a decentralized service of the Ministries of Territorial Development and Ecological Transition. As part of a ministry initiative called the Knowledge Mission.

I’m currently leading a national project in collaboration with the General Commission for Sustainable Development and the General Directorate of Infrastructure, Transport and Mobility (DGITM).

This project involves developing a tool for monitoring public mobility policies called the Sustainable Mobility Dashboard.

This tool is presented as a web application that serves both state transportation services and local authorities. Its objective is to help identify action levers that can be mobilized to reduce transport-related greenhouse gas emissions and guide mobility strategies at the territorial level based on common indicators calculated at different scales - municipalities, intercommunal bodies, departments, regions, and nationwide. These indicators are constructed from national public data with annual updates.

These indicators allow us to understand travel patterns in the territory, for example, modal share, vehicle fleet composition, or the range of available mobility services such as cycling infrastructure lengths or public transport network coverage.

These indicators are calculated using the R language, then stored in an SQL database which is then directly queried by the web application. And it’s actually to help us structure and optimize the calculation of these indicators that we called upon Data Champ’.

How did you discover Data Champ’, and what convinced you to collaborate with us?

When we contacted Data Champ’, we were in an experimental phase of application development. We had already created a first prototype with about a dozen indicators that were calculated within an R package.

As part of scaling up the project and developing the final application, we wanted to review the structure of this package and optimize it. So, this package for calculating indicators and also completely restructuring the calculation chain with the goal of standardizing these steps that allow us to calculate what we call territorialized indicators, meaning indicators broken down at different geographical scales.

For this mission, we did some sourcing, particularly on the Internet, and we contacted several providers with R expertise since that was the main skill we were targeting, among which we found Data Champ’.

And actually, during our interviews with the different possible providers, well, in our interviews with them, it was with Charles and Marc-Aurèle. We were immediately impressed by their very good understanding of our needs. They already had initial work proposals to offer us, even before sending the commercial proposal.

When we received the technical proposal, which was very clear and well-detailed, it confirmed our choice to collaborate with Data Champ’ for this mission.

What struck you the most about our way of working?

What I really, really appreciated about the Data Champ’ team is that in addition to their undeniable R expertise, I find that the team has a real comprehensive expertise in all data processing steps, and the team also took into account the business context, which isn’t always easy to integrate into a context you don’t know.

The fact that you took into account the business context allowed for proposing processing assumptions that were consistent with the data being handled. So, when there were suspicious values in the source data, you always tried to dig deeper to understand the ground reality and implement the appropriate treatments accordingly.

I remember a small issue with duplicates of electric vehicle charging stations where we had stations with different attributes. And then Charles, you tried to look on the internet to see what the truth was and to see what the best treatments were to remove these duplicates from the source data.

So that was really, really appreciated to not just limit yourself to doing the job without trying to understand the data, and also worth noting that you had to adapt to our hardware constraints, which was not negligible, whether it was the calculation limitations of our PCs or the access restrictions of the ministerial network. And you knew how to work around these limitations to offer us solutions that still allowed optimizing certain complex calculations that required specific treatments.

I’m thinking particularly of the indicators for public transport stops from the GTFS files we have of all transport networks, which represented a really very large volume of data that exceeded our PCs’ storage capacity, and for which you still managed to find a solution to make the processing work.

And also the processing of cycling infrastructure with municipal redistricting that required optimized calculations with DuckDB. And there, Marc-Aurèle was very patient in helping us work around the installation blocks of DuckDB on our ministerial network. And in the end, it worked.

So, a big thank you for that.

Can you describe your overall experience collaborating with our team?

We were really very satisfied with our collaboration with them, with Data Champ’. For us, the team showed great professionalism while being attentive to our needs. Very responsive too throughout the mission to adapt to our constraints, like what I just mentioned.

And I also found that our weekly follow-up points were always very qualitative and in good spirits too, which makes work pleasant. And so we really collaborated in confidence with the team throughout the mission.

How does our service fit into the overall project?

Well, thanks to the work done by Data Champ’, we now have, on one hand, a clean R package dedicated to calculating territorialized indicators, and not just for transport.

So it’s a package that could be shared for indicator calculation needs in other areas than transport. We actually have other ministry projects that are starting to use it, particularly in the context of a tool that is being developed, also on energy renovation of housing.

So we’re going to promote this package within the ministry to reuse it for other purposes, and in our case, for our sustainable mobility dashboard.

Well, now thanks to Data Champ’, we have a real R project dedicated to calculating transport indicators for our application, and now the DRIEAT team has taken good ownership of it. It meets expectations well, both in terms of use and maintainability.

And today the dashboard application is being deployed. We now also have an application that is enriched with new indicators since Data Champ’ also calculated other indicators that weren’t initially present.

And so now we are well equipped to continue enriching our application with other indicators and also update existing indicators with new annual data each year.

What did you think of the solution we developed? Did it meet your expectations?

It’s true that at the beginning, in the specifications, we had already emphasized that we wanted to try to make all the functional blocks of data processing and indicator calculation reusable. And could be reused by others. And this meant we weren’t quite sure yet technically about the possibilities.

Whether, so we already had the idea of managing to break down a package with calculation functions and then a specific R project for transport indicators, but things weren’t clear about technical feasibility and actually on this the Data Champ’ team very quickly proposed this breakdown into two packages that were optimized throughout the mission.

Since well, we were also moving forward step by step, there was also the application deployment that was being done in parallel. Which challenged us again on the output data format. So that wasn’t easy to manage either.

And on this, it’s really Data Champ’s expertise that was of great help to have. So today now really this separation into two between a package that can calculate any type of indicators with reusable functions and our project with the calculation functions for mobility indicators that feed our application.

What aspect of our collaboration did you appreciate the most?

For me, this really connects to the fact that it’s the vision, not just R maybe that I had more this feeling with from other experiences.

Saying well, you really master the whole data chain and also multiple skills, whether it’s SQL or other technologies where we couldn’t exploit more than that.

The use of DuckDB even if we did a bit but we could have gone further or gone to files, park things like that I think to further improve the application, well the calculation, optimize the calculations.

We didn’t go that far. But in any case, these were things where we knew you had the skills to support us.

And that was a big strength.

And as I was saying also, really taking into account the business context which is still which is still important because even if you can’t know everything, of course, about a business vision, you still try each time to understand by yourself or by asking us about what is this data? What does it represent to, to know if the results obtained were consistent.

Is there any point you think we could improve to make our service even more suitable?

I don’t have many remarks. I was really really satisfied with everything. So apart from sometimes warning when you do a push because we don’t always sometimes we didn’t see the pushes. Jokes aside, I don’t have much to add on improvement.

How do you see the future?

We are currently at the application deployment stage. But of course, it’s an application that is meant to be enriched over time with new indicators.

So, today, we’ve taken ownership of the package and we’re more autonomous in adding new indicators. We’ve already added a few since the end of the service.

We don’t exclude potentially calling upon Data Champ’ again either for complex indicators that would require calculation optimizations that we don’t have the skills for.

I’m thinking particularly of working with DuckDB. These are things we’re discovering and so we could very well call upon Data Champ’ again to help us with these points and or possibly to optimize queries globally.

There are possibilities indeed to store data rather in parquet format to speed up queries rather than directly in SQL tables.

For now it’s not on the priority points. The application works well as it is, but we don’t exclude trying after to well to still improve it and optimize it.

And in any case no, we won’t hesitate to collaborate again with them with Data Champ’ on the continuation of this project or maybe on other future projects.

Thank you Cindie for taking the time to answer our questions !

Comments

Leave a comment

The required fields are marked with an asterisk *

Markdown accepted

Comments are manually validated.
The page will refresh after sending.