Having taken the course and made my share of mistakes, the following contains some information I think people should know to be successful in CS 373.
Expect a divided feel between class and projects. To ease the transition into the class, the first two projects are relatively inline with what is gone over in class, however once project three comes up, you’ll quickly notice a divide. There is homework for class quizzes in the form of readings, and there are projects where you go off on your own to learn almost whatever you want. This is great if you’re a self-learner, but if not you could potentially make an A in the class but only take away 50% of what you could have learned.
You will need to work as a team and class. Besides the first project, all following projects are either pair or team programming (groups of 5-7). However, you’ll soon find that if you collaborate with the other teams, then the work you do becomes easier. If you feel up to taking greater responsibility for the class, be prepared to do whatever it takes to ensure a certain amount of consistency across teams. You’ll know where you will need to be consistent during the projects.
Do you already know web development? The second half of the projects (note I don’t say the second half of the class, because there is a division between the class and projects) focuses heavily on creating a website in Python. If you are already familiar with web dev in general, you will have a huge advantage – and could take on leadership roles in both your team and class, as I mentioned earlier.
Overall, be prepared to think of Software Engineering as two classes somewhat melded together – one reading-oriented (involves reading English and code), one project-oriented. You can technically make an A in the class by doing well in the reading-oriented side. But you will not learn what I believe to be the most important aspect of the class.
Everything’s pretty laid back now with presentations going on. Most of the work is done (although there are still readings), and the air of presentations seems to be more relaxed than professional.
With a test coming up (in my case, 3 tests) there’s still much to do, but I’d like to spend this blog going over some pointers to other teams (including mine) for their presentations.
Over-reliance on Prezi
Prezi is interesting if you see one out of a slew of PowerPoints and KeyNotes. Not so when everyone does it. In that way it becomes just another presentation, and just like reading off of slides in PowerPoint, it gets just as boring when people read straight off the Prezi frames. Focus the presentation on the site you spent 6 weeks working on, not the Prezi you spent a few hours making.
Describing the visuals
We can all see what you did by just looking at the visual you provided. Don’t tell us that you made a straight-up list of hyperlinks because you didn’t have time – we noticed that from the straight-up list of hyperlinks when the page loaded – and assumed the obvious. Tell us what’s going on behind the scenes. Or if you’re using a library, describe the library rather than just saying “here’s the output from it”.
Cluttering the speaker
If the intention is for one person to speak, don’t have all 6 people shoulder-to-shoulder front-stage. One person is up front and the rest should be to the side – watching the presentation from the perspective of the audience, not the presenter. If the intention is for two people to speak, two people are up front and the rest are to the side – etc. This idea of having the whole team cluttered next to one person speaking exists nowhere in a proper presentation setting. There is no need to stand right beside the speaker and look pretty unless that’s what your job is.
Hopefully these recommendations didn’t sound like attacks on previous presenters. In short, it is proper etiquette to give the speaker his or her deserved spotlight, and provide your audience a different experience from one of just handing out your presentation slides.
I’ve always thought of refactoring as more of an art than a science. But thanks to the works of Martin Fowler and William Opdyke that is probably no longer the case. Looks like I’ll just have to find a new term.
Why do I say this? A program’s I/O is always the same before and after refactoring (ideally), and likewise with optimization. If you black box a program, no one would be any the wiser on exactly how it does what it does. At the same time, there are many refactorings I believe should just be labeled under common sense and others where I don’t see the sense at all. I’ll only focus on one of these to spare all the electrons that will have been inconvenienced from you reading this blog.
The most obvious refactoring technique that I don’t see sense in is method extraction (I personally am not a fan of “Extract Method” – sure it’s short, but we’d probably want to run a “Rename Title” on it so that it flows better). I respect the rights of programmers to choose to have more methods than statements in their program, but I personally think that following a logic five or ten methods down is extremely annoying. Code as text is not designed for that sort of extraction – code is written in a linear format when it really should be represented as a directed network. There is no good way of ordering the methods such that a person could naturally read the methods in the proper order for every use case.
I could go on for pages but here’s the best example I could think of the express what I mean just on this one topic (yes, I must agree I like Martin Fowler’s use of examples, even though I personally don’t like the examples): let’s take a very small excerpt of Python 2’s grammar production, which is basically how a string is comprehended by a Python interpreter:
I tried to extract the minimum required statements possible (and probably screwed up somewhere along the way) from here, and what could I write from it?
1 is 1
With the above, I can’t even add – to be honest; I don’t think I can even use numbers yet. Now try implementing an if statement. This is how I feel when reading through code where people become trigger happy with functions and methods. Provide your methods with a sense of worth; one way of doing that is by having them do more than one line of code before returning. It’s sort of like that friend you call who know exactly why you called him and knows that you’re just going to ignore him after you do what he wants you to. And are all those levels of indirection really needed (ok, maybe when designing a compiler)? I’m not happy because code written this way is harder to read (at least, in linear files), and the CPU isn’t happy because of all the jump statements. Let’s at least try and keep the CPU happy.
I didn’t expect to go on a rant about Prezi until I finished this post, but I feel like I should just get it out there since I’ve kept my dislike for it bottled up over the past years.
With phase II of the WCDB out of the way, it’s time to consider the presentation (well, since my group has search done already I guess we don’t have to worry about it as much). Keeping the sense of individualism in our project, we are approaching a method where each person limits the scope of his presentation to only what he worked on for the duration of the project.
Although I am considering that we end up with each person using whatever form of description works best for his part of the presentation (the report could just be pulling up a PDF document whereas the UML will require opening MySQL Workbench or a similar tool), this may end up conflicting with the requirement of using a Prezi presentation. Maybe my choice is just due to an internal hatred of Prezi as a novice designer and engineer. It’s a one-trick pony design-wise, and provides almost no room for specialization and customization engineering-wise. And I’m willing to bet that at least one team will take the hackneyed “zoom out and see the overall picture in the end” approach.
In all honesty, I’ve never used the software, but that’s because I’ve seen enough presentations on it to know that I don’t want to use it. Either that, or I’ve only seen the really bad, generic Prezis for the past couple years of my life. I would prefer PowerPoint (or even KeyNote) over Prezi any day since anything that can be done in Prezi can also be emulated in PowerPoint with enough ingenuity, and there are a slew of features PowerPoint has that Prezi can’t even dream of implementing because of the limited scope of its design. If we had to have a web-based presentation, I’d much rather hack something together with JS than use Prezi so I would have more control over what I wanted to do.
With the end of the first test comes the beginning of Compilers – which, along with Software Engineering, Circuit Theory, and Artificial Intelligence, should be plenty of fun for the next four weeks. Luckily, my group already implemented the WCDb’s UI and search feature in Phase I, so I won’t have to work as hard with all the other classes going on.
Other than speaking about the test (which I won’t, because I think it’s already been talked to death), there’s not much to say. My group technically hasn’t met since Phase II started. In fact, we’ve only met a total of two times since Phase I started. Maybe that should change for the future. Unfortunately, it seems like we never really have time to all meet. I’ll have more to say next iteration.
I think this blog is a good fit for answering those four questions at the end of each iteration, so that’s what I’ll go ahead and do.
What did we do well?
Speaking for the class, I’d say we made a major leap forward by mostly (if not all) agreeing on the same XML schema. But we took that a step further by (for all practical purposes) agreeing on a database design that would model the XML as well [thought-provoking aside: did professor Downing say no other class ever agreed on a schema as a technique to get us to agree? Or has no other class actually agreed on a schema before?]. Either way, I’d say this is our greatest achievement – which would have been the most difficult part of Phase III (citation).
While we’re celebrating, let’s go ahead and add the fact that we’re pushing the class’s standards by using virtualenv and Heroku – two popular industry tools for deploying apps not only in Django but also PHP and Ruby. Knowing the pace of technology, there’s a good chance that these tools will be replaced by new standards by the time I graduate, but that’s definitely part of what makes computer science interesting. I didn’t even know people used Python for web development until I heard about Django.
What have we learned?
I can’t speak for anyone here except myself, so I’ll go ahead and give my thoughts. Besides heroku and virtualenv (I’ve heard of them before but honestly was too lazy to use them because I could get Django and PHP working perfectly fine with Apache), I felt one of the most important skills I’ve learned was how to integrate work produced by others. My electrical engineering professor always told me never to reinvent the wheel, and that can also be applied to programming. Although a few decades ago people could be pioneers in everything from the transistor-level to the user interface, now that is not the case. Too many branches and details have been added to this field for that to be practical. The need to understand how a base code works, black box it, and use it to its utmost potential is necessary to thrive with the amount of information available today. There is no substitution for being able to build off of what has already been done.
What can we do better?
We must remember that we are human and our physical and mental health comes before anything. For my team in general, I feel like there has been a division into two factions: those who feel overworked and those who feel the pace is too quick to keep up. I hope to address this issue in full with a simple solution on our meeting Monday.
What puzzles us?
Again, I speak for myself when I say this. When I’m puzzled, I find a way to become unpuzzled. Therefore, I am never puzzled for a long time, especially after a full iteration. But I do wonder why I’m conscious – and what is consciousness – and if computers will be conscious one day. I also wonder how many blog posts will be following this format this week.
Hardly any code written these days is authored by one person. And why should it? Some people claim that programming solo is a great opportunity to establish one’s own process, but by merely opening up an IDLE window or compiling a C program people are already participating in a global collaboration which led to the existence of programming languages in the first place. Engineers and programmers always build off each other, and even when working alone they are influenced by each other.
So why not take advantage of that observation? One of the biggest realizations I found when working in teams is not only how to break up projects into sizable chunks to work on – that can be done even when programming alone – but how to make those modular chunks parallel, so that code can be developed in multiple areas at a time. And changes in one part of the code are reflected in the others as soon as they are committed.
This idea brings to mind programming by intention, a concept where code is written with the intention in mind, but with a need to implement that intention later on, referenced in Extreme Programming Installed by Jeffries here. Although I feel there is a bit more of a subtle difference when working in teams which develop modules in parallel, and it seems better to use the term programming by assumption. In programming by assumption, there is no need to be bogged down with wondering how an intended aspect of a program will work. You know that someone else is working on it, and – if you trust them to know what they’re doing – you can already assume it’s complete. There is no need to worry about too many levels of indirection when programming by assumption because the program is already assumed to work.
I think most people will feel the two ideas are exactly the same, so I’ll give my personal experience this week on my thoughts. As a more client-sided programmer in this project, I did not want to have to worry about the details of where my data was coming from while I was creating the HTML canvas for the home page. I knew that the data would eventually come from XML, and that there were people working on getting the data and putting it where I wanted it to be so that everything worked out. Therefore, I could assume that certain parts of the code already worked. And with the ability to take all that important but irrelevant information out of my mind, I could devote more effort to building an enriching experience for the end user. All because I could trust the other team members to get their parts of the project done.
Well, it looks like the world crisis project hasn’t changed much from last semester besides the migration from the Google App Engine to Django. I think that was a good idea – Django > GAE in my opinion – so I don’t mind.
I wish there was a skeleton for the XML, though. Having to wait two weeks for a solidified schema is agonizing. Maybe we will be free to construct our own models. I don’t think there will be much time left to publish if we had to reach another consensus on that. At the moment progress is slow since everything depends on the back end which is undefined. Anyways, that’s all my ranting for now. I’ll probably just busy myself with circuit theory in the meantime.
. . . an hour later . . .
Ok, so rather than doing circuit theory, I read a bit about XML. Apparently, it’s for data storage and transfer. Now I’m beginning to wonder why we’re even using XML if we’ll be using MySQL later. It just seems to be a middle man that can be easily cut out without much loss design-wise. Not to mention the fact that MySQL can do much more. You can’t store your user’s passwords in XML unless you’re looking to build a failed product from the ground up.
. . . many hours later . . .
I’m logging when I write this so there’s an explanation for why things don’t connect as much as normal. I’d just like to add on that I found it’s not too difficult to develop on the front end while waiting for the XML/database to catch up. Getting some UI out of the way is always nice.
First of all I must say this was a good week. Lots of work was completed, and I look forward to the experiment before me with the amazing people in this class.
Now, with regards to XML – it was surprisingly simpler than expected due to the help of Python’s ElementTree – those of you who are reading this and think otherwise should probably read the documentation more carefully – and do some experimenting. I must admit this would not be the fastest method, but it is the required method and arguably the easiest. I’d rather not use recursion, but at the moment I haven’t thought of a better/more elegant way of solving the problem. I should probably stop talking so I don’t start writing all the code for you.
Finally, I must admit that professor Downing’s method of teaching Python is very interesting. Rather than taking a broad look at the language, we’re going into a lot of tiny quirks that almost seem like they’d be on some kind of Programmer’s Game Show (no doubt that one show called the midterm). I do enjoy the little intricacies, and I will thank these lectures when I prevent a bug using this knowledge. However at the moment, I don’t really see the immense value of the fact that there is only one empty tuple – except that it makes sense to only have one. Maybe I need some more enlightenment.
As a web developer, I find myself constantly looking for better ways to protect my users’ security, and a part of that process involves ensuring their passwords are secure – even when my website is compromised. This is how I came across “A Future-Adaptable Password Scheme”, a 1999 paper by Niels Provos and David Mazières illustrating bcrypt: a password hashing method that security and cryptography sites still hold as a paradigm to this day.
Note: Although not necessary for an appreciation of this post, I shall begin my evaluation of “A Future-Adaptable Password Scheme” by providing as many resources as possible for those interested in learning more about this paper but for some reason or other do not or can not read the paper (I understand, I once had a phobia for writing long papers – they’re scary things). However, for a better understanding of this post, it is recommended that the reader be familiar with common password hashing and cracking methods, along with some more specific information provided below. I provide external references to avoid regurgitation of information that is already freely available on the internet, and because I could not have explained these concepts better myself.
Unix’s crypt – http://en.wikipedia.org/wiki/Crypt_(Unix)
The Feistel cipher – http://en.wikipedia.org/wiki/Feistel_cipher
What is bcrypt?
To explain as simply as possible, bcrypt runs the eksblowfish algorithm and outputs that result encrypted with a 192-bit value 64 times.
Here’s what it looks like (courtesy of Wikipedia):
Eksblowfish is modeled off a Feistel network which ciphers a salt and key for a specific cost. And you should know what a Feistel network is. More information on eksblowfish can be found on the page covering bcrypt.
The key stretching algorithm: bcrypt’s strength and weakness
I will hit this point first because bcrypt’s predominant adaptation is centered around the idea of key stretching. In essence, bcrypt includes a cost to run which is user-adjustable. The higher the cost, the more work bcrypt must do to generate a state to encrypt with, and the longer time it takes for bcrypt to hash an input. Why is this good? Well, when someone has access to your database and pulls all the hashed passwords, they will take longer to crack. This is extremely beneficial as computer hardware, especially the GPU, becomes faster to the point that constant-time hashes that once would take hours to crack end up taking minutes or seconds.
So why is this a weakness? First, let’s take a look at the tradeoff. If cracking the password takes longer, so does hashing the password. This means that users have to accept a longer wait time at the login screen while a processor in the background is running the computations to see if their password is correct. Usually this tradeoff is reasonable – having a hash that takes a second or even a tenth of a second is long enough to deter all but the most well-equipped crackers.
But do you see a problem now? What happens if someone does not wish to log in to a system, but is instead malicious? Password fields are often not limited by developers, and sending a few megabytes of data can add an enormous overhead to a server from a user who does not expect to be authenticated. A coordinated attack of this style from multiple sources has a strong potential of causing a denial of service to unprepared sites. Is this preventable? Yes. But developers who implement bcrypt or another key stretching algorithm dependent upon user input should be aware of the possible exploit and design their code with that in mind.
Security through obscurity?
I find an interesting point in this paper to be the authors’ assertion that “If a function of a password is secure . . . its output should not let an attacker guess any predicate more accurately than she could have without the function’s output.”
side-note here: I find it amusing that the authors of this paper tend to use the feminine form of practically all gender-associated pronouns they write, as if to compensate for the lack of women in the field - I hope this doesn't sound offensive, it's just an observation and should be treated as such.
However, later in the paper they mention that “bcrypt passwords start with “$2a$”, which obviously helps to predict that the function uses bcrypt. Although this may not be the proper scope of the authors’ intention, I still believe that providing a signal that bcrypt was used in the output provides unnecessary information to someone – or at least points them in the right direction – towards cracking the code. Had the person not known which hashing method was used, they would only know the length of the output.
Key stretching: final note
Let’s consider key stretching again. What if you were running a netbook? Or say, hosting your website on a Raspberry Pi? Well, bcrypt may not be as useful to you because while your rig may take up to one second to authenticate users, the same cost hash on a cracker’s 100-GPU cluster might take as little as 0.01 seconds to run. Be wary that if you will be using a low-cost bcrypt hash, you may as well go for a constant time salted SHA.
So that’s my evaluation of this paper. I hope you were able to glean some insight about how to make safer passwords and build more secure sites for your users. Shoot a comment below if you have anything to say, or maybe have some kind of cool idea or critique about the paper yourself. For example, what if you try cracking passwords with high-precision analog circuits? Would the methods described in this paper, mainly concerned with increasing computational cost on digital systems, be rendered trivial by a more optimized system of variable signals? I’m not sure, but that sounds neat! Definitely let me know if you find any mistakes or if something needs updating – don’t want people to try and learn from cave paintings!