Hackathon Showcase

Kora: Synthetic Healthcare Data Generation

Team consisting of Leonce (Senior Data Modeler, HCA & Sony Music), Karthik (Data/ML Engineer, Michigan Tech), Henry (Vanderbilt ML/NLP researcher), and Varun — Python, SQL, AWS, GCP, Airflow, PySpark, RAG.

4 members

Project Description

Kora aims to solve the reproducibility crisis in science. In Healthcare research, researchers are able to make the code they used in their papers available to other researchers to reproduce their findings, but face privacy or regulatory limitations on what data they can share. What we’ve built is an application that takes a published paper, and optionally the metadata describing the dataset that the paper uses, uses semantic understanding to generate synthetic data that approximates the distribution of the real data used in the paper. We then use the DBtwin API to scale this sample data so that other researchers can reproduce the findings. One team member has a vested interest in the product given his experience doing research at Vanderbilt, and having papers that he would like other people to reproduce for validity of findings but can no longer access the data. The application is built with python using the streamlit framework, and deployed to Huggingface.

Team

Products & Tools

AI Tinkerers Google Gemini Flash The Lighthouse

Additional Links

https://karthikgarimella-kora-health.hf.space

Link to deployed Product

Summarizing URL...

https://drive.google.com/file/d/1kQHsNfviIyP0eSLuR5QweJKNSHBQbi4r/view

Link to Product Demo

Summarizing URL...