Applied Text Analysis with Python

Enabling Language Aware Data Products with Machine Learning

Benjamin Bengfort & Rebecca Bilbro & Tony Ojeda 2017


E-Book: 73 pages

Price: Free

Download: Applied Text Analysis with Python: Enabling Language Aware Data Products with Machine Learning (Bengfort & Bilbro & Ojeda 2017).



The programming landscape of natural language processing has changed dramatically in the past few years. Machine learning approaches now require mature tools like Python’s scikit-learn to apply models to text at scale. This practical guide shows programmers and data scientists who have an intermediate-level understanding of Python and a basic understanding of machine learning and natural language processing how to become more proficient in these two exciting areas of data science.

This book presents a concise, focused, and applied approach to text analysis with Python, and covers topics including text ingestion and wrangling, basic machine learning on text, classification for text analysis, entity resolution, and text visualization. Applied Text Analysis with Python will enable you to design and develop language-aware data products.

You’ll learn how and why machine learning algorithms make decisions about language to analyze text; how to ingest, wrangle, and preprocess language data; and how the three primary text analysis libraries in Python work in concert. Ultimately, this book will enable you to design and develop language-aware data products.


About the Author

Benjamin Bengfort is a Data Scientist who lives inside the beltway but ignores politics (the normal business of DC) favoring technology instead. He is currently working to finish his PhD at the University of Maryland where he studies machine learning and distributed computing. His lab does have robots (though this field of study is not one he favors) and, much to his chagrin, they seem to constantly arm said robots with knives and tools; presumably to pursue culinary accolades. Having seen a robot attempt to slice a tomato, Benjamin prefers his own adventures in the kitchen where he specializes in fusion French and Guyanese cuisine as well as BBQ of all types. A professional programmer by trade, a Data Scientist by vocation, Benjamin’s writing pursues a diverse range of subjects from Natural Language Processing, to Data Science with Python to analytics with Hadoop and Spark.

Rebecca Bilbro is a data scientist at the U.S. Department of Commerce Data Service. She specializes in data visualization for machine learning and has given several talks related to improving the model selection process with visualization.

Tony Ojeda  is the founder of District Data Labs and focuses on applied analytics for business strategy. He has published a book on practical data science, and has experience with hands-on education and data science curricula.