One critical factor when trading stocks is keeping up with the news. Going through news outlets and constantly managing what is noise versus what has material impact is difficult if you’re doing it manually.
A great way to streamlining this process is by scraping news headlines from Google and build a funnel to filter down the results. Here’s some code I’ve built to help you easily scrape news from Google.
How the Code Works?
- The code intakes a list of keywords you want to search for
- From they keyword search, it’ll return the headlines, the link to the article, python date object for when the article was posted, source of the article and your search term
Install the required Python library
pip install pytz pip install feedparser
def google_rss(keywords): # Google RSS News def parse_rss(rss_url): return feedparser.parse(rss_url) def get_headlines(rss_url): headlines =  feed = parse_rss(rss_url) for item in feed['items']: headlines.append(item['title']) return headlines def get_links(rss_url): links =  feed = parse_rss(rss_url) for item in feed['items']: links.append(item['link']) return links def get_pub_dates(rss_url): pub_dates =  feed = parse_rss(rss_url) for item in feed['items']: pub_dates.append(item['published']) return pub_dates def get_sources(rss_url): sources =  feed = parse_rss(rss_url) for item in feed['items']: sources.append(item['source']['title']) return sources headline =  link =  pub_date =  source =  rss_source =  keyword =  for x in keywords: url = 'https://news.google.com/rss/search?q=' + x + '+news' for a in get_headlines(url): headline.append(a) rss_source.append('google') keyword.append(x) for b in get_links(url): link.append(b) for c in get_pub_dates(url): gmt = pytz.timezone('GMT') eastern = pytz.timezone('US/Eastern') date = datetime.strptime(c[:-4], '%a, %d %b %Y %H:%M:%S') date_gmt = gmt.localize(date) date_eastern = date_gmt.astimezone(eastern) pub_date.append(date_eastern) for d in get_sources(url): source.append(d) rss_list = list(zip(headline, link, pub_date, source, keyword, rss_source)) return rss_list
Here are my thoughts on where to go next to improve this project.
- Automatic populate a keyword list based on symbols of my portfolio holdings
- Correlate news with price movement to determine if the news had an impact on the corresponding stock price
- Filter out duplicate headlines from multiple sources
- Filter news that had no notable impact in the stock price
- Populate a summary for the week of the most notable articles that 1) had the most impact to the market or the stock price or 2) popularity of the article
Stay tuned in to the blog for more updates to this project.