Author Archives : Will


Dean Wampler at Data Science at Scale with Spark

Meet-Up Recap: Data Science at Scale with Spark

Summary: Dean Wampler from Lightbend presented at the Direct Supply MSOE offices on Tuesday, 4/5/2016.  Dean covered a high-level overview of Spark and its benefits (business logic is focus of code and it’s faster).  Those wanting to learn more should pick up Learning Spark at O’Reilly books.


Build a Reporting Swipe File

Summary: Building a repository of good report components helps you quickly assemble reports that work.  Typical things to watch for are: Opening statements, summary sections, key takeaways, useful dimensions and metrics, and recommendations.

Core Elements of Reports

Wordcloud generated in R for Brother's Grimm Stories

Text Mining Packages and Options in R

Summary: The tm and lsa packages provide you a way of manipulating your text data into a term-document matrix and create new, numeric features.  The ngram package lets you find frequent word patterns (e.g. “The cow” is a bi-gram or 2-gram; “The cow said” is a tri-gram or 3-gram).  Lastly, for a quick visualization (though […]


Free Data Mining and Data Science Books

I’m on a bit of a reading kick as of late so I wanted to compile a short list of some useful and free data mining / data science books.  Most are of a technical nature and come from academia Free Academic Texts on Data Mining An Introduction to Statistical Learning with Applications in R: Covers […]