Legal Innovation & Technology Lab
@ Suffolk Law School

Data Wrangling and Feature Engineering

Estimated Time (Reading & Exercises): ~45 min.

You are welcome to join our Slack Team. There you can ask and answer questions relating to this lesson under the #howto-datasci channel. See How To for more.

As mentioned previously in Why Data Science, data scientist make use of a number of open source tools. There is some debate over what tools one should use, but the primary fracture comes down to R vs Python. Of course, as lawyers we know the answer really is "it depends." That being said, for our purposes we'll be using Python largely because it is a programing language in its own right, and we'll be using this as an opportunity to learn some coding as well. Don't worry, we assume no prior programing experience on your part. That being said, we also expect you to follow the Lab rule re. tech issues: If you set to doing something and you are hitting your head against a wall for 30 plus minutes, ask for help in the appropriate Slack channel.

To help ease us into coding, we'll be making use of notebooks. Notebooks are documents accessed by your web browser which contain blocks of code and text, allowing you to run code next to notes and documentation. Normally, notebooks live on your local computer and just happen to use the browser as an interface. However, to avoid the troubleshooting that comes with having folks install something on a bunch of different computers, we'll be using a cloud-based version of Jupyter Notebooks from Microsoft.

To access the notebooks associated with this exercise, you'll need a Microsoft account. If you don't have one, create one here. If you haven't already, you may want to follow the suggestions found under Get Your Accounts in Order.

Visit this collection of files (library) we've put on Azure Notebooks for this lesson, and click Clone.

screen shot

This will prompt you to login with your Microsoft account, grant Azure permissions, and create a copy of the files that you can actually play with (not just read). You should be brought to your copy of the files, what the site calls a library. Find the file named Wrangling.ipynb and click it.

screen shot

This will spin up a virtual server running the notebook, and inside the notebook you will find the actual lesson with tips for wrangling your data, including instructions meant to acquaint you with the use of notebooks and Python. Just follow along.

screen shot
← Why Data Sci Training Models →