Make News Credible Again

Using Machine Learning to Classify News Articles

Approach

Our goal was to build a model to discern the credibility of an article based solely on its textual content.

Data Collection

Collect news articles from a set of credible and non-credible websites. Get training labels from OpenSources, a professionally curated database.

Sampling

Sample from our corpus in such a way that the training set contains an even number of unique articles from both credible and non-credible sources for each day of data collection.

Classifier

Build an ensemble classifier that considers the predictions of two separate models:
1. "Content-only" model (Multinomial Naive Bayes)
2. "Context-only" model (Adaptive Boosting)

Daily Learning

Each classifier is retrained daily and subjected to cross validation testing to obtain updated accuracy scores. These scores are used to update weights in the final ensemble classifier.

A more detailed discussion of how we handled specific challenges throughout the course of this project can be found on Medium.

Team of Data Enthusiasts

About Us

Meet the team behind the project. Graduates from UC Berkeley's Masters in Information & Data Science program, we are keen on developing machine learning solutions to real world problems.

Umber Singh

Umber works on the Pricing & Product Strategy team at Salesforce, residing in San Francisco, CA.

Brennan Borlaug

Brennan works in urban transportation research and analytics at the National Renewable Energy Laboratory. He lives in Boulder, CO.

Sasanka (Sashi) Gandavarapu

Sashi has experience in building and deploying impactful and scalable analytics using machine learning and data science. He currently resides in Atlanta, GA.

Talieh Hajzargarbashi

Talieh has a PhD in Mechanical Engineering from the University of Arizona and works as a Lead Data Analyst at Opera Solutions.