big data

Thursday Extra 11/3: Developing bioinformatics tools for analysis of big DNA sequence data

Thursday, November 3, 2016
4:15 p.m. in Science 3821
Refreshments at 4:00 p.m. in the Computer Science Commons (Science 3817)

Xiaoqiu Huang, Professor of Computer Science at Iowa State University, will present "Developing and using bioinformatics tools for analysis of big DNA sequence data."

Recent advances in next-generation sequencing technology provide an opportunity to develop and use bioinformatics tools for analysis of big DNA sequence data in order to further our understanding of living systems at the molecular level. In this talk, Huang will describe his recent work in developing and using bioinformatics tools to further our understanding of how genetic variation is generated in an asexual plant pathogen.

Huang's previous research interests include development of computer algorithms and software for reconstruction of genome sequences and for finding genes and other functional elements in genomes. He is currently interested in understanding evolutionary processes by applying these computer programs to big data sets of genomic DNA sequences. He is the author of a widely used CAP3 assembly program. He and his collaborators have developed a whole-genome assembly program named PCAP. PCAP has been used by Washington University Genome Center in chimpanzee and chicken genome projects.

CS Table 9/13: Data Privacy in Higher Ed

For the CS Table on September 13, Peter-Michael Osera would like to discuss data collection and privacy in a place we normally don’t consider: higher education. In efforts to streamline operations and better the student experience via data analytics, universities are frequently turning to the cloud for answers. Does this have implications for how we as students and faculty manage our data? Read these two articles:

Printed copies of the readings will be available after noon on Friday at Charlie Curtsinger’s office (Noyce 3827). In addition to the readings, there is a short “homework” activity to get you in the spirit of the discussion. Try to answer these brief questions:

  1. What FERPA is and how does it relate to your personal information? Read more about FERPA here: US DOE. "Family Educational Rights and Privacy Act (FERPA)”. http://www2.ed.gov/policy/gen/guid/fpco/ferpa/index.html
  2. Grinnell employs a number of third-party services that handle our (digital) data in various ways. List as many as you can.
  3. Why can these third-party services handle sensitive data that would otherwise be protected by FERPA? You can find the answer in this FERPA FAQ: US DOE. “FERPA Frequently Asked Questions (FAQ)”. http://familypolicy.ed.gov/faq-page#t62n218
  4. Completing this homework isn’t required to attend CS Table, but we will start the discussion by tackling these questions. So please come prepared if you have the time!

    Computer science table (CS Table) is a weekly meeting of Grinnell College community members (students, faculty, staff, etc.) interested in discussing topics related to computing and computer science. CS Table meets Tuesdays from 12:00-1:00pm in JRC 224B. Contact the CS faculty for the weekly reading. Students on meal plans, faculty, and staff are expected to cover the cost of their meals. Visitors to the College and students not on meal plans can charge their meals to the department.

CS Table: Privacy, Anonymity, and Big Data in the Social Sciences

On Friday, 26 September 2014, at CS Table, we will consider some recent ethical issues with the use of "Big Data" in social sciences research, including data from xMOOCs (Massive, Open, Online, Courses). Our reading will include a short article from Atlantic Monthly on the recent Facebook Controversy and a CACM article on uses of xMOOC data.

Sara M. Watson. Data Science: What the Facebook Controversy is Really About. The Atlantic. July 1, 2014. Available online at http://www.theatlantic.com/technology/archive/2014/07/data-science-what-the-facebook-controversy-is-really-about/373770/>.

Facebook has always “manipulated” the results shown in its users’ News Feeds by filtering and personalizing for relevance. But this weekend, the social giant seemed to cross a line, when it announced that it engineered emotional responses two years ago in an “emotional contagion” experiment, published in the Proceedings of the National Academy of Sciences (PNAS).

Since then, critics have examined many facets of the experiment, including itsdesign, methodology, approval process, and ethics. Each of these tacks tacitly accepts something important, though: the validity of Facebook’s science and scholarship. There is a more fundamental question in all this: What does it mean when we call proprietary data research data science?

As a society, we haven't fully established how we ought to think about data science in practice. It's time to start hashing that out.

Jon P. Daries, Justin Reich, Jim Waldo, Elise M. Young, Jonathan Whittinghill, Andrew Dean Ho, Daniel Thomas Seaton, and Isaac Chuang. 2014. Privacy, anonymity, and big data in the social sciences. Commun. ACM 57, 9 (September 2014), 56-63. DOI=10.1145/2643132 http://doi.acm.org/10.1145/2643132.

Open data has tremendous potential for science, but, in human subjects research, there is a tension between privacy and releasing high-quality open data. Federal law governing student privacy and the release of student records suggests that anonymizing student data protects student privacy. Guided by this standard, we de-identified and released a data set from 16 MOOCs (massive open online courses) from MITx and HarvardX on the edX platform. In this article, we show that these and other de-identification procedures necessitate changes to data sets that threaten replication and extension of baseline analyses. To balance student privacy and the benefits of open data, we suggest focusing on protecting privacy without anonymizing data by instead expanding policies that compel researchers to uphold the privacy of the subjects in open data sets. If we want to have high-quality social science research and also protect the privacy of human subjects, we must eventually have trust in researchers. Otherwise, we'll always have the strict tradeoff between anonymity and science illustrated here.

Printed copies of the readings are available next to Science 3821.

Computer science table is a weekly meeting of Grinnell College community members (students, faculty, staff, etc.) interested in discussing topics related to computing and computer science. CS Table meets Fridays from 12:10-12:50 in the Day PDR (JRC 224A). Contact Sam Rebelsky rebelsky@grinnell.edu for the weekly reading. Students on meal plans, faculty, and staff are expected to cover the cost of their meals. Students not on meal plans can charge their meals to the department.

Computer Science Table: Privacy in the age of big data and analytics

At this week's Computer Science Table (at noon on Friday, April 18, in Rosenfield 224A), we will discuss privacy in the age of big data and analytics, and specifically the issues are raised in two videos (one recent, one classic):

“Demo: Big data and analytics at work in banking”
IBM Big Data and Analytics, YouTube, September 7, 2013
http://www.youtube.com/watch?v=1RYKgj-QK4I

“Scary pizza”
American Civil Liberties Union, YouTube, January 15, 2009
https://www.youtube.com/watch?v=33CIVjvYyEk

For more extensive discussions of some of these issues, you might want to read:

“Big data and the future of privacy”
John Podesta, whitehouse.gov, March 3, 2014
http://www.whitehouse.gov/blog/2014/01/23/big-data-and-future-privacy

“Comments of the Electronic Privacy Information Center to the Office of Science and Technology Policy: Request for information: Big data and the future of privacy”
Electronic Privacy Information Center, April 4, 2014
https://epic.org/privacy/big-data/EPIC-OSTP-Big-Data.pdf

Computer Science Table is an open weekly meeting of Grinnell College community members (students, faculty, staff, etc.) interested in discussing topics related to computing and computer science.

Syndicate content