UNICEF Innovation Fund: Thinking Machines
On 05 April 2018, the UNICEF Innovation Fund announces 6 new investments in open source technology solutions –Thinking Machines is among one of six new portfolio companies to receive investment. Thinking Machines will join a cohort of companies working on solutions using data science and artificial intelligence.
The UNICEF Innovation Fund invests in technology start-ups from developing markets that are working on open source solutions to improve children’s lives. The Innovation Fund applies a venture capital approach to source solutions that can impact the lives of the most vulnerable children. These solutions are clustered around $100billion industries in frontier technology spaces. Check out www.unicefinnovationfund.org for more information – including real-time data – on each investment.
Thinking Machines: Developing OnTrackPH – a robust record matching engine and web tool optimizing huge numbers of records and datasets.
Suppose you have three identical card decks shuffled together. In order to make them useful, you decide to sort the cards into three different decks. Now let’s apply the card metaphor to a real-life organization. Suppose that instead of dealing with 52 cards – you are handling millions of data records (cards) across multiple databases (decks). In order to use your data effectively, you need to match each record and eliminate unnecessary duplicates. It is the sort of job that would leave you “lost in the shuffle”.
A huge amount of useful information is locked inside of organizations because there’s no easy way to match it with information outside of the organization. This makes things like “double checking” your spending very difficult. Thinking Machines provides the technology to help data analysts quickly and accurately identify matching entities across millions of records of data.
Our solution is a robust record matching engine and web tool that helps organizations optimize and get more out of their data. Specifically, the platform can be used to find duplicate records within a single dataset or to connect records in one dataset with other records representing the same thing in another dataset. Beyond standard exact text matches, it handles descriptive text, keys hidden inside large documents, missing values, and inconsistent formats with better-than-average accuracy. Users need only to upload their disparate datasets to our web app. Within minutes, a merged dataset can be downloaded.
What is unique about your solution and how is it different from what currently exists?
Other offerings are cost-prohibitive. What would happen if the Department of Education needed to remove duplicates from student data? The only options to choose are from a set of expensive software-as-a-service packages. We solve this problem by offering our tool free-of-charge to social and public sector groups. Another differentiating aspect: other options are not as optimized for usability. Most are directed at profit-seeking organizations with vast amounts of commercial data. By contrast, our focus is on government agencies and NGOs that own project or individual-level data and deliver vital services to underserved communities. The most useful keys for matching records, in this case, are typically related to the location of the community, which is a special case not often encountered in commercial matching tools. This allows us to strategically develop our platform around well-defined and impact-centered use case scenarios.
Why does being open-source make your solution better?
Taking an open-source approach gives us the most potential to get the closest to what users ultimately want. It’s not a matter of some private vendor developing a record matching engine that they think government or nonprofits will need. Instead, we’re bringing together a community of domain practitioners and grassroots analysts who, together, will help to shape and improve our record matching algorithm. By making our algorithms open source, we will be able to tap into a global community of data scientists who can further improve the algorithm and imagine new use cases in different communities, which will in turn help us to become better data scientists.
We believe in open data, open source, open AI, and building great technology ecosystems.
How did you come up with your solution and what inspired you to form your company?
In 2016, we piloted “OnTrackPH”, a proof-of-concept algorithm for matching millions of infrastructure records across data silos. We were able to link 3 different Philippine government databases—each with identifiable data—and accurately match records in a fraction of the man-hours needed to do so manually. We noticed that record matching was a common problem for many NGOs and government agencies across developing countries. We wondered whether we could build a better version of our tool that can handle different types of records, is more user-friendly, and designed with underserved organizations in mind. The concept for “OnTrackPH 2.0” was born – a web platform that would allow any individual to easily match disparate datasets and download merged files.
How did your team come together? What is your team’s MO and drive towards the problem you’re trying to solve?
We’re a startup whose mission is to build data systems for humans. We care deeply about the community we live in, and want to build a great data science institution in the Philippines. Our team members share a drive to use technology to make a positive impact. As individuals, all of us have had great careers in various sectors, but we’ve chosen to work together because we think it takes a holistic team to shape a future where AI elevates humans. We believe in open data, open source, open AI, and building great technology ecosystems.
What do you plan on doing with UNICEF’s Venture Fund investment and how will you use that to leverage raising follow-on investment?
We will use the support from UNICEF Venture Fund investment to improve our record matching algorithm and build a web tool available for students, NGO analysts, journalists, and citizen users. Additionally, we will be developing impact-focused use cases for the platform, in collaboration with long-standing Thinking Machines partners such as the World Bank, Teach for All, and Save the Children. A successful proof-of-concept will allow us to validate the business model we need to scale the technology globally.
Photo Credits | © Thinking Machines