top of page

This site was designed with the

website builder. Create your website today.Start Now

Liam McCarthy

Data Scientist

"Replicating HinDroid"

In my academic career, I created a pipeline of data ingestion, feature engineering, and model evaluation to replicate the Malware classification problem described in the HINDROID paper. Within the project, APKs are downloaded from APKPure and decompiled into smali files using apktoolkit. These smali files are then cleaned and formed into dictionary structures corresponding to the matrices in Hindroid (Apps and APIs, APIs and APIs in the same code block, and APIs and APIs that share a package). A linear SVM model is then run using different matrix multiplication kernels and the results across various metrics are recorded. The results can be seen in Figure 1 below.

Screen Shot 2020-03-25 at 5.17.56 PM.png

Figure 1: Table showing the results of the classification using the HinDroid method.

"What Makes Wine Fine?"

In my academic studies, I worked in a pair to analyze trends in wine review data to determine characteristics of different wines. This was done by finding different correlations between different aspects of wine (such as origin, tasting notes, and age) and the wine's rating. We then took the information we collected and created a dashboard visualization for the vintage, tasting notes, expected price, and origin of different varieties of wine using JavaScript, CSS, and HTML to aid novices in the selection process. A sample of the dashboard can be found below in Figure 2 with the fully functioning website is available at https://lamccart.github.io.

Screen Shot 2020-03-25 at 5.24.50 PM.png

Figure 2: A sample of the charts featured on the dashboard showing the tasting notes and expected price for Chardonnay wine.

"What Regions of San Diego Require Better Collision Response?"

In my academic studies, I utilized Tableau to complete analysis on a dataset of collisions recorded by the San Diego Police Department. After cleaning the data by handling the missing values, I set out to learn something about the collisions through the creation of geographical plots like the one found in the upper left of Figure 3 below. This plot includes the number of people injured and killed in collisions separated by Police division. I discovered certain regions had significantly higher amounts of people injured and set out to answer the question of "which areas of San Diego require better collision response?". Using Tableau, I created the plot in the bottom left of Figure 1 which is a heat map of the average number of people killed in each specific town of San Diego. The table on the right contains all of the areas serviced by police with injury rates on the upper half of the color gradient specified in the heat map.

What Regions of San Diego Require Better

Figure 3: Tableau dashboard of the plots and tables created using data concerning vehicle collisions in San Diego

"Video Game Sales and Ratings"

In my academic studies I worked in a group of three people to analyze a real-world data set and to draw conclusions in order to find the most profitable genre and rating of game. We found correlations between different variables such as user and critic score and genre of game (shown in Figure 3 below) and sales of the games. We then graphed our conclusions using Python and found that mature games in the platforming genre were higher grossing worldwide. I spent time working collaboratively with my teammates to come up with a hypothesis and route to prove it. We also went thorough trial and error processes to design code and graphics to convey our message clearly and concisely so it would be digestible for the general public.

Figure 4: Number of Games Sold for Each Genre of Video Game

"Ants vs. SomeBees"

Another project I worked on in my studies was the creation of a video game similar to the tower defense style of the game "Plants vs. Zombies," but with ants and bees. I created the game with a partner and we worked closely with an API to create different sprites and classes for the different ants and bees (shown in Figure 5 below). We used Python code to implement concepts we learned in the classroom such as, object-oriented programming, class inheritance and abstract data types like linked lists, stacks and queues. All the code was created in the same space as my partner. The creation of new code and the debugging process were completed collaboratively.

Figure 5: GUI used to run classes and functions implemented for the game mechanics.

"Product Ratings with Scalable Analytics"

In my academic studies as a Data Scientist, I was able to work on a project utilizing a Spark cluster and Spark modules to clean a dataset of products and reviews each with millions of entries. The various cleaning tasks had to do with calculating various statistics on the users and products, imputing null values, and flattening entries with either Map datatypes of arrays of array datatypes all in a scalable and computationally efficient manner. I also engineered features using one hot encoding and PCA as well as word2vec to extract meaning from categorical features and text features. These features were then utilized with a Decision Tree Regressor to predict the user’s rating for a product and then several hyperparameters were tuned in order to find the best performing parameters for the model. In addition, all functions were timed and the cleaning tasks along with the entire pipeline for over 9 million entries ran in under 15 minutes.

bottom of page