Data drives Rappi's operations, everything from personalized recommendations to rapid delivery logistics. Zhong and his team leverage data to power decisions, optimize delivery times, run A/B tests, and refine algorithms. Data plays a crucial role in supporting the business’s diverse offerings, from groceries to fintech.
Rappi’s culture combines startup speed with a strong focus on collaboration. Zhong emphasizes a low-ceremony, high-urgency approach, encouraging his team to learn from each other and prioritize speed over perfection. He champions an open data culture, blending in-house and third-party solutions to move fast and keep things simple.
Rappi’s data stack includes tools like Fivetran, Snowflake, Power BI, and Airflow. They rely on a mix of Google Cloud and AWS for compute power, and use Amplitude for clickstream analysis. With a focus on standardization, Rappi is consolidating their tools and improving data governance to maintain efficiency as they scale.
Zhong’s team transformed Rappi’s data infrastructure by centralizing fragmented data sources, creating a single source of truth for metrics, and drastically improving data latency. These changes led to more accurate, real-time insights for business decisions, optimizing everything from delivery efficiency to pricing strategies.
Zhong’s focus is on cleaning up legacy data workflows and establishing better lineage tracking for smoother operations. The team is also eyeing the next level of data efficiency, from improving governance to exploring new tools. Their goal is to create a strong, scalable data infrastructure that supports Rappi's rapid growth across multiple regions.
Tarush Aggarwal (00:00)
to another episode of People of Data, where we get to highlight the wonderful people of data. are our data leaders from various industries, geographies, and disciplines, how companies are using data, their achievements, and the challenges they face. on the show, we're thrilled to have Yuan Zong, who is a data leader with over 15 years of experience, in the fields of data engineering, machine learning, and AI.
his extensive experience leading cross -disciplinary data organizations, including Ten Years at Amazon, engineering is an evergreen bedrock of his scope to power downstream functions across business analytics, data science, operations research, and artificial intelligence. Currently, Yuan is the VP and Global Head of Data and AI at Rappi Yuan, welcome to the show.
Yuan Zhong (00:47)
Thank you for having me, Tarush
Tarush Aggarwal (00:50)
course, very excited to have you here. Are you ready to get into it?
Yuan Zhong (00:52)
Let's get to it.
Tarush Aggarwal (00:54)
So what does Rappi do? How you guys make money? role does data play in helping Rappi do what it wants to?
Yuan Zhong (01:01)
So for listeners that might not be familiar with Rappi, which is a Latin American based on demand delivery super app. So Rappi was founded in 2015 in Colombia in Latin America and right now operates in nine different countries in that region, you know, serving 300 ,000 more merchants and more than 30 million users in the area. So we are an on demand quick commerce delivery app and a super app.
That means that we offer kind of ordering and delivering for customers across restaurants, groceries, pharmacies, and many other categories all in one app. We even have our own sort of a FinTech and kind of banking capabilities and travel booking capabilities. And it's very widely sort of a popular app for residents living in the region. And we make money essentially by serving four very distinct kind of group of audiences all in this app.
We're serving the end users who need quick and high quality delivery from a wide assortment. We charge them service fees and delivery fees. And also we make money through serving and take rates from merchants and restaurants and grocery shops who listed in our platform. We also make money through monetization of advertisement placement and co -marketing budget for the merchants joining our platform.
And the other sort of a side of the audience that we also serve, which we don't make money from, but we also serve, is the hundreds of thousands of delivery agents living in the area. This is like for Latam, it's kind of a project in social undertaking, transforming a generation of sort of workers and sort of gig workers that are actually riding bicycles and scooters and motorcycles and driving cars.
delivering hustling in the cities in that we, Rappi as a platform also serves their best interest, helping them make a living and raise a family. And to the good part of, know, data and data in sort of Rappi, say it's not an exaggeration to say data is in everything we do here at Rappi, making decisions, monitoring performance and sort of optimizing on the efficiency of the business from
just the metricizing on the key KPI so that we can keep improving to running high volume of A -B testing experimentation to figure out the right sort of assortment, the right treatment, the right pricing strategies to the right algorithms that we roll out for, you know, dispatch and bundling to pricing and to all the backend sort of optimization that we have here. And all the way up to, you know, data feeding into features.
that fed into data science and machine learning and artificial intelligence models to allow for better recommendation systems and personalization of a site experience. So data is in everything we do as a company. A lot of the executives working at Rappi they come from other companies that have gone through a journey of data -driven decision -making and were embracing this culture like Oxygen.
Tarush Aggarwal (03:58)
That's incredible. I've been a personal consumer of Rappi and been blown away with this concept of sub 10 minute deliveries with a bigger and bigger library every time I go check it out. fan the culture like at Rappi? such a
Yuan Zhong (04:12)
Thank you.
Tarush Aggarwal (04:17)
such an interesting platform and you mentioned, data plays a big part in it. the overall macro culture play like and is the data culture being on the data team, just sort of cross -functional data teams typically are. is of the culture which you are distilling the data group?
Yuan Zhong (04:34)
that's another awesome question. I probably can start first by saying the Rappi overall as a sort of as a company, but the culture is still very distinctively that of a startup culture, a founder led startup. We founded in 2015 and till now we still haven't reached our first decade, right? Our existence. So a lot of things are putting together with a lot of entrepreneurial kind of spirit and a lot of hassle.
in place with the sense of urgency and eagerness and willingness to make mistakes and then to commit fast. And with that, the company culture is one such that is, in my opinion, very low ceremony and very high pace. And then also a very sort of blunt kind of openness to each other, providing feedback, oftentimes very heated ones, but behind the scene, people also hock it out as brothers and sisters, really sharing a common vision there. And that's
very refreshing in many ways, especially if the people working at Ratti came from another sort of a bigger, more established company and seeing the drastic difference of a very flat organization and very high speed and very low level of patience for slowness and for bureaucracy. And that's the overall kind of a company culture that we operate in. And when it comes to data teams and
Tarush Aggarwal (05:42)
Yeah.
Yuan Zhong (05:50)
It's very hard to say right now there is any other more distinct culture outside of the broader corporate culture. But when it comes to data being cross -functional and multidisciplinary, we do have a very strong culture of openness, embracing open source kind of tools and also embracing vendor solutions. And then we do not really have an obsession to say, this solution has to be built in -house by my team, otherwise I'm not going to use it. We really are indexing on how we can move faster.
how we can really scale better. And also sometimes we were even over -indexing on just trying to get things to work and worry about how we'll clean up a bit later. That's double -edged sword, but overall there isn't kind of a preference, very deliberate preference of just trying to move faster and then figure out how we scale as we grow. And then another kind of a culture, subculture within the data teams is the attitude and embracing of the cross -
Tarush Aggarwal (06:34)
Yeah.
Yeah.
Yuan Zhong (06:44)
this bring their nature of data where, know, we, know, data engineers and analysts and then machine learning engineers and operations researchers, economists, they do have very different skill sets and they do learn from each other. There isn't that much of a kind of snobbishness of certain type of data skills is more higher end than the others. If anything is really about, you know, let me learn from you, show me how you're doing this. And then people generally just a very low ceremony. Again, I keep mentioning this word.
Tarush Aggarwal (07:03)
Yeah.
Yuan Zhong (07:12)
and then very open to sort of share and coach from each other.
Tarush Aggarwal (07:17)
That's so incredible. love the openness to it. And about being a startup, is the company footprint look like? How big is the company? How big is the data team today? the breakup?
Yuan Zhong (07:28)
So not counting on like, for example, the temp workers and the contractors and then the delivery kind of agents and curators that we have, the corporate employees, we're still is in the region of, you know, four to 6 ,000 people is, you know, ups and downs based on the, you know, cycle of the business that that's across nine different countries. And the data teams that are, you know, being centralized as we speak right now, and then also
We say for the people who are explicitly wearing a data type of job title, we're talking in the region of 300 plus people. And then we do have another range of people who are not wearing a data job title, but a lot of things they do are either empowered by data or they're heavy users of data as well.
Tarush Aggarwal (08:13)
very interesting. typically, again, this is just having done a bunch of these episodes and my experience, we typically see that about technology, as a marker, about typically about 20 % of organization and data is about 20 % of technology. So we look at about 4 % at about 5 ,000 people. You would expect that about 200 people to be roughly in data. So given that it's 300 plus plus,
is a great
Yuan Zhong (08:39)
From that standpoint, we're from a racial standpoint, we're over -indexing overall among the tech employees. And we do consider data people as part of the tech employees. But then also we don't want to take it as a, because we probably are not even provisioned right for tech overall. But for data, yeah, for data, there is some very concentrated investment growing that army and being very deliverable.
running A -B testing and switchback testing and A -A testing to inform a lot of decisions. We learned a lot from Amazon and Meta from that standpoint and also Expedia and then sort of a DoorDash Instacart.
Tarush Aggarwal (09:15)
Yeah, that makes a ton of sense and over -indexing on data, given that, as you mentioned, at a cultural perspective, is this sort of backdrop on top of which you're building. mentioned is becoming again of data. How do you, an org a few hundred people and a business expanding in multiple sort of countries, how do you look at centralization versus decentralization? And the longer -term approach of this?
Yuan Zhong (09:40)
But it's very thoughtful. then when it comes to central versus decent centralization, the consideration is less about geographical decentralization or not, the actually because we operate in nine different countries and they're even like people working outside of Latin. From a remote standpoint.
people working virtually together, it's not something, it's not new, even before the pandemic, is already the practice for Rappi The centralization is more talking about pulling people, the data experts into a central organization under the tech and data kind of a leadership. So we have consistent standard evaluate, coach, develop, and drive synergies of those.
data people rather than having them in getting overly embedded into each vertical business and operational lines. It's not that, as I mentioned, there is a second ring of professionals who are very data proficient. They're just not wearing a data title. They could be an operations manager, a product manager or finance professional who does data all the time. But those people are still those experts. They're still embedded with business. But when it comes to now you're running central BI tooling, BI
data ops and data infrastructure. You're building like search engine and building catalog related AI capabilities. We centralized them under a few functional leaders within the tech world. we've seen like companies go centralized, decentralized, and then going hybrid all the time. for Rappi this current centralization came from, is preceded by
heavy decentralization when each vertical are just given the mission to go really fast, grow and scale their business and hire data professionals. But once we reach a certain point, we realize, okay, the standards are not necessarily consistent and the tax that they use and data standards and data catalog, they're all over the place. So, and we started to see synergies to centralize. And I'm not surprised at a certain stage later, we realized, okay, we can centralize this much, but then we're going to send back
Tarush Aggarwal (11:21)
Yeah.
Yeah.
Yuan Zhong (11:40)
certain centralized resources now back to be embedded with the business team, no longer separate by data engineers versus analysts versus scientists. They could now become part of the vertical data team that reports to the head of the business for that vertical. I can totally see that happen, right? We essentially talking about how Meta organizes data. They're all rolled up to the engineering kind of a VP in many cases versus Amazon where the data folks are most
commonly load up under the general manager of that business. And Rappi right now is more went from the previous Amazon approach to the Meta approach. But as we grow bigger, is anybody to guess where we're going? But there will always be, I think, a component of the data team being centralized because there's just lot more efficiency for those professionals working together.
Tarush Aggarwal (12:11)
Yeah.
Yeah.
a ton of sense. is the data footprint look like? What stack do you use? What are some of the vendors being used today?
Yuan Zhong (12:37)
Yeah, so when we talk, I think when this, when you say data is more around data engineering and data infrastructure, right? From that standpoint, it was not deliberately trying to go with different vendors. And we did think about, you know, centralizing that under a few vendors, but then this is, we're in the process of consolidating the data teams that they do come with.
right, different choices. think of this way from the raw data from systems and from logs, we need to go through connectors. And right now when we use connectors, the main connector kind of a vendor we use is Fivetran And then when it comes to compute, you know, it's a combination of we have Google's compute and Amazon's compute, but heavily a snowflake And then in terms of our workflow orchestration,
Airflow and then we've actually gone through Airflow and built our own instances of Airflow. And then we work with, you know, the managed version of Airflow, Astronoma very closely right now, co -developing the capabilities. When it comes to sort of visualization and business intelligence, dashboarding, we explore different tools. Right now we're heavily with Power BI. When it comes to sort of a clickstream and then sort of a site journey, funnel analytics, Amtitude is the tool that we're using predominantly. And then there are, of course, there are some open source implementation, some for some niche use cases. And when it comes to storage is S3. And when it comes to sort of A B testing experimentation, you know, we were centering around, you know, the tool called split and, you know, static, those two tools. And then, you know, even for for a single purpose, they live, there's sometimes it's more than one tool that is used across Rappi. But we would try not to over the last year, we'll try not to further proliferate those tools. But instead, we're consolidating and we're standardizing, governing the access and the usage and metering as well.
Tarush Aggarwal (14:21)
You know, and especially if you think about multiple different geographies and preferences of data people over there, we see sort of companies around the size you are today. It's Snowflake or Databricks. It's very often both, right? you know, this makes a lot of sense. thing which was, I didn't hear, which is, you know, very typically a usual suspect in these stacks is something like dbt for data modeling.
Yuan Zhong (14:35)
Yup.
Tarush Aggarwal (14:45)
this something which is used or, you know?
Yuan Zhong (14:48)
It was used by engineering and development teams there. But then overall, we didn't further proliferate or didn't further branch out into a major implementation. Not yet. Not yet. And we're trying to form a point of view over there as well. Because a lot of existing vendors, are also approaching us to say, we can cross sale, right? And then you're using us already with such a big contract when we come to renewal time.
Tarush Aggarwal (14:58)
Yeah.
Yuan Zhong (15:12)
Yeah, here is another deal and it's a very, very common kind of conversation we're having and we're open minded.
Tarush Aggarwal (15:17)
Yeah.
What is one which you are particularly proud of the data team for achieving? Something which could be impacting the business, something which sort of Rappi wants to do.
Yuan Zhong (15:30)
So actually this, are too many things that we're proud of almost every day, at different dimensions, at different dimensions. But maybe today I'm going to highlight the area that sounds the least sexy, but actually for a company that have to thrive on operational excellence in a cutthroat low margin business, that is actually very important, which is really to enable the rest of the company to have ready and very low latency
find granularity access to their performance data. We do data for recommendation systems, for image recognition, all the way to lot of Gen .ai LLM applications and fraud detection. But the thing that the business can see in the field every day, the least sexy one actually is the one that we get recognized and will get a lot of satisfaction from as well. So just to give you the concept of before the centralization,
the company doesn't have or didn't have kind of an essential data dictionary and catalog to define metrics, the same metrics are calculated so many different ways. And even for the same kind of a KPI, there are multiple or dozens of different kind of a tables and versions of the same tables and very dated queries. And then it's a matter of who happens to have a query, who happens to be closer to the leader and then they be pulling the data. So it went from
that state of wild west. We consolidate the source of truth to build them into sort of a medallion kind of a brown, silver and gold kind of a structure and layer on top the governance of the essentially the query banks and definitions and data dictionaries. And then on top you further engineer so that the refresh cadence, the latency, right, of the operation report we went from sometimes the data is late for one or two days.
to now we can refresh some of our most important data on hourly and sub -hour basis for the key KPI dashboard.
And those are like transformational for the business to be able to immediately pick up signals where things are not working as intended. Because we're a startup, operations were not that perfect in many ways, but we needed to be able to know and cut down on detection, the time to detection. And those are the things that we have managed to enable over time over the sort of last couple of quarters to just make a night and day difference in that one.
Tarush Aggarwal (17:35)
Yeah.
Yuan Zhong (17:44)
In parallel, the other science work are also benefiting from just the cleaner data features for the model from a much lower cost of just the future processing to begin with.
Tarush Aggarwal (17:55)
Yeah,
it. It's to the basics and doing an incredibly good job at decision support and being able to support the business with on -time, reliable, trustworthy data.
been one challenge with the growth and the scale and the sort of with multiple countries, is solving and which is sort of keeping you up at night and something which you're working towards?
Yuan Zhong (18:17)
Again, similar to the previous one, there are many, and then just to carry on the same spirit of looking at those boring topics, but actually does make a difference. The things that keep our data infrastructure side of the team really kind of on our edge and stay very humbled to everything we do is the fact that we're working with a train of
that is going at super high speed operating at super high speed while we really need to also change the engine on top and the data engine per se. And also another aspect of that is that we inherited or the company has evolved band -aid after band -aid and then groups after groups and acquisition after acquisition. We've inherited a very messy kind of a recurring and scheduled
kind of ongoing data workflow that are not very well -tapped, the lineage of the data. You have the lineage of the jobs, but also the tagging of the job. What is that job, that particular DAG or that particular data feed? What exactly is that consumed for? You know, they're loaded into certain data assets, right? But you do not really know for that data asset, which team actually should be responsible for telling us whether they need it or not.
Tarush Aggarwal (19:24)
Yeah.
Yuan Zhong (19:31)
whether the logic still update or when their alarm is triggered, it's very, we didn't really have the proper lineage or tagging to even for us to know who to talk to. And we're in this process of just the running deprecation campaigns, running lineage tagging and running additional governance policies so that any new workflow being introduced to the pipeline do have the tags and we can really trace it back and then we can really baseline them.
Tarush Aggarwal (19:37)
Yeah.
Yuan Zhong (19:55)
And without those things, it becomes even harder for us to really know the true impact when there is an outage. We know how many DAGs got impacted, but we don't know how many real business decisions might have been brought by it. So this is an ongoing kind of work that we are actively working on right now to clean up to have really a good baseline going forward and say, now whichever vendor has a better solution.
Tarush Aggarwal (19:55)
Yeah.
Yuan Zhong (20:20)
try to beat this baseline that we've got. than, you know, right now, any vendor solution, the pitch is all very attractive in many, many ways. But then there are really fundamental house cleaning that has to be done by somebody and it has to be done by the team that is closest to the business.
Tarush Aggarwal (20:22)
Yeah makes a ton of sense. thank you so much for your time and you for your wisdom today.
Yuan Zhong (20:44)
Well, thank you for asking those thoughtful questions and thank you for organizing this event.