Sign in

Data Scientist @ Day & Zimmermann

Part 1: Data Acquisition (Web Scraping)

To build a live prediction model you need a data source that contains historical & current data. Finding a good data source is key.

The site(s) or source(s) you choose will determine how much of the process you can automate.

Topics Covered:
- What sites allow scraping from their robots.txt file
- Scraping HTML tables with python.
- Advanced Scraping: Inspecting how data is communicated to a website.
- Storing the data.

Can you scrape this site?

To check if a site allows web scraping or not, you need to check the robots.txt file for that site. To find this, go the websites domain name…

Ever wanted to build your own predictive model to create optimal dfs lineups using data science & machine learning?

This article will be your road map to connect the various processes needed to develop predictive player model(s) & optimize your predictions. This will be the hierarchical structure to several other posts that together will get you making predictions in no time.

Each section will contain its own article, with this article being the hub tying them all together. The data & python code repo will be linked on each article. The main repo is here.

- Data Acquisition (Web Scraping)…

Part 5: Lineup Optimization

The last step to building your predictive dfs model, is implementing linear algebra to figure out what your best lineups are based on your predictions. For this we will be using the python package PuLP. This article will cover how to add in salary & position constraints, as well as additional constraints to modify the lineups.

The data used is located here. Python Code is located here.

Lets import our modules, and load our data. …

Part 4: Live Model Implementation

Now that you have built your dataset, created some features, trained & tuned your model, you need to bring it all together to create your live prediction. That is, your prediction before an event occurs, in this case NFL Sunday.

For this to occur we need to gather the necessary features for the upcoming week to make predictions on. This is why we used the .shift() function in ETL. So we can make predictions on current week, with previous weeks data.

The data used is located here. Python Code is located here.

First thing we need to do is update…

Part 3: Model Selection & Analysis

Now that we have a feature set we will try out some models, analyze results & come up with a gameplan to predict our next weeks results.

The data used is located here. Python Code is located here.

In this section we will build predictive models based on the quarterbacks in our dataset. We will try two popular boosting machine learning algorithms. XGBoost & LightGBM.

Our target variable will be the QB draftkings points scored rank for a given week. You can easily change this target variable to predict who will throw most tds, or run for most yards etc.

Part 2: ETL & Feature Engineering

Now that we have a dataset, it is time to shape & manipulate the data to a state we can create a predictive model.

The data used is located here. Python Code is located here.

Based on the data we have, a model for each position seems appropriate. In this article we will be going over the ETL to create a dataset to make predictions for a running backs fantasy rank in a given week. The file linked will include the transformations necessary to do each position with their defensive components as well. (Rb vs Defense)

We have two main…

Taylor Monticelli

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store