Web crawling

Thursday Extra: "Computational linguistics: crawling the Web for non-English data"

On Thursday, September 19, Kim Spasaro 2014 will discuss the construction of an digital collection of written text in a specific language. She writes:

This summer I interned with Carnegie Mellon's Language Technologies Institute. While there, I was part of a project working to enable machine translation for Bantu languages. More specifically, I was responsible for building a corpus of Kinyarwanda phrases to be used for machine learning. At this talk, I will discuss how I used the Apache Nutch web crawler to launch a large-scale web crawl in search of Kinyarwanda data.

Refreshments will be served at 4:15 p.m. in the Computer Science Commons (Noyce 3817). The talk, “Computational linguistics: crawling the Web for non-English data,” will follow at 4:30 p.m. in Noyce 3821. Everyone is welcome to attend!

Syndicate content