Top Data Science Projects from CodeDay Labs 2020!
Our computer science and STEM student interns embarked on a journey into so many different areas of profession and areas of interest in the tech industry from web development to AI! What is also exciting and worth the highlight are the data science projects. The projects helped students gain a deeper understanding of real-world data science solutions including current world issues such as COVID-19.
Priyanka Mishra, Lalla Sankara, and Samantha Fernandes mentored by Anupam Dewan:
This project is centered around keeping track of our health and to better assess COVID-19 detection and diagnosis. isCovid() is a web application based on an open database of 406 COVID cases containing chest x-rays and CT images to train their regression model. The model also included patient features such as temperature, pO2 saturation, and leukocyte count. To build it, they used flask implementation and simple HTML code. To train the model, the team went from data processing to image processing, implementing a Resnet50 pretrained model, fine tuning and batch normalization, and included a custom fully connected layer and final classifier. The images came from COVID positive and COVID negative files for classification and identification of positive and negative cases! They fully utilized and implemented their data set into a functioning identifier for COVID cases, which makes it easier for people to determine how to better take care of themselves moving forward.
You can catch more details in their tech talk here:
Mehrab Hafiz, Shania Shani, and Yen Lu mentored by Omar Shehata:
This project visualizes Lyfts self-driving car data in an open source way (for the very first time). This group used cesium ion, JS, Cesium JS, HTML5, CSS, Python, and Glitch. Goals for the project included converting Lyfts perception data from .bin to .las, visualizing the perception data set, and extracting movement data from the autonomous vehicle and nearby cars, cyclists, pedestrians, etc. This team overcame numerous challenges such as converting translation points to lat/lon, setting origin points to georeference the cars, and converting location data to real-world coordinates. They successfully implemented heat-wave coloring and a 3D car model to showcase movement throughout their 3D map, and while the project has been challenging for these students, they were able to clearly illustrate vehicle engagement with the map and calculate movement prediction.
Watch more about what they learned and what they plan to do for further work and improvements in their tech talk:
Sai Thatigotla, Gandhar Viragi, and Rakil Ahmed mentored by Yang Xu:
This teams original problem was to create a REST API for COVID-19 data and create an application to call API and calculate the maximum death rate. Additions they made to the project were a frontend dashboard to help users visualize and interact with the data about current COVID-19 updates. To create the REST API, they used node.js and express, and built a Java application with MongoDB being the data storage. With that, a user dashboard was created in React, and the API and Dashboard was hosted online on Heroku and Vercel. They successfully implemented a fully interactive map, stat columns that focus on the US cases alone and a column of cases around the world, and also included total cases vs recovered vs deaths to see the full scope of the pandemic! This project really made it easier to keep track of the current pandemic statistics and see everything all in one place.
You can see how they implemented this more in detail in their tech talk:
EFI: Endangered Flora Identification
Nzinga Eduardo, Thanh Le, Sonali Shintre mentored by Keith Callenberg:
This project is based on a research paper that identifies wood from 46 different kinds of endangered flora species. They utilized modern techniques such as future extraction, feature engineering, and deep learning to identify these species with an accuracy goal of 97% or higher than the research paper indicated. They gathered the same data set and trained their machine learning models based on the prediction for the classification of these various species. On the client side, all the person using the tool would need is a photo for the model to identify which species it is! For creating the API, they used a Seldon core and created a docker container as well to push the images to a cloud platform. This project was incredibly complex with many layers implemented including creating a CNN model from scratch that generated augmented images.
Watch more of their process through the project and how much was involved in their tech talk: