Undergrad Computer Science access to journals and why we are doomed to repeat our mistakes

I’ve identified a wonderful perk at work – a wonderful library with vast access to academic journals. In part to some work research, and in part to my own interest I’ve started reading (or in some cases re-reading) some of the classical works of computer science.

My most recent read was E.F. Codd’s “A relational model for large shared data banks“. Reading Codds paper, you find a paper that succinctly lays out the advantages of relational design, in easy to read language and put forwards a strong argument for their use. In 10 pages Codd covers relational databases compared to flat files, introduces relational algebra and set theory, discusses elementary issues in scalability and covers the basics of redundancy and consistency.

But this should be no surprise, because this paper that kicked of the relational database movement. In his paper, Edgar Codd (as in Boyce-Codd Normal Form) convinced the computer science world that flat files weren’t good enough for large scale applications and that a better solution was needed.

Looking back as a relatively recent graduate of computer science I have to ask – why have I not read this paper before? There was nothing that Codd said that I hadn’t read in lecture notes based on more recent work, and it was no more or less easy to read that anything else I’d read. But this was not just any article justifying relational databases, this was the article.

Sadly, this was all too common, and most of my studies were based on recent works, rather than examining the classics in computer science. And this leads me to believe that this is part of the reason we see so much repetition in computer science. Not because we are reexamining the well, but we are reinventing the wheel. At an undergraduate level I never saw the drive to explore early computer science theory (partly my own fault, but also of my lecturers for pointing them out).I don’t think this was due to malice or lack of access, as many of these papers are available for free or through a university library. Mostly, because with the fast growth in computer science, what is new becomes old very quickly,and with so much information being generated it takes effort just to keep up. But based on my reading of Codd, I can say that there is no good reason not to look back When trying to rationalise the use of a relational database, who better to convince a student than the man who literally convinced the world.

So, with this new found access I’ve begun rereading these ‘forgotten’ papers, because there is still so much to learn from them. In Codds case, I’ve learned that there is no better source of information than its originator, and will encourage others to read his work as well. In hindsight his work seems obvious, its historical context and ability to share this idea sets it apart from every other database text I’ve read. If you get a chance to read this work, which is available freely thanks to UPenn, I strongly recommend it.