Notes at the halfway point
I'm halfway through the 3 month contract (a relationship with the WMF that I hope to continue) and it seems like a good place to write up some of my lessons learned.
Too many times I haven't spent enough time going over my code. Or, when I'm refactoring someone else's code, I don't look over it enough. Sometimes this is due to my own inexperience with the MW code base: If I had more experience, I would have a better idea from just looking at the code that dieUsage() was being used incorrectly.
Still, experience can be created, and when it is experience you lack, the effort to create experience must be made. This is where testing comes in. More than once while refactoring the UploadChunks api, Tim has just looked at the code and told me I wasn't testing it properly.
The same could be said for more trivial things like whitespace. Mixing whitespace commits with more substantive changes as well as just the way Emacs formats the code by default have irritated other developers. From this experience, I think I need to borrow an idea from Atul Gawande's Checklist Manifesto and make — gasp — a checklist.
The checklist is the first step, but the second would be automating as much of the checklist as possible and creating a pre-commit hook that I could use to check my own code.
From my experience, here are some ideas for a checklist:
- Whitespace use.
- Tabs, not spaces
- avoiding spaces for indention and lining up code
- spaces around parens
- Code correctness.
- Ensure each exit point has a test. (Since I prefer to write unit tests for my code, noting a test for each exit point in a code comment would take care of this. An automated check would just verify that the exit point has a notation.)
- Make sure no E_STRICT warnings are thrown during tests
Note that any automation here is going to fail to really enforce anything: it is impossible to verify code-correctness completely without actually running the code. Instead the idea is not to verify that the code is bug-free, but just to make sure that I've caught the things that more experienced MW developers aren't going to roll their eyes at.
That said, the number of developers actively working on MW code as well as the CodeReview extension on mediawiki.org make developing MW code an overwhelmingly positive experience. These things mean that — combined with the depth of PHP and MediaWiki knowledge and care that people take — problems see the light of day really quickly. When your code is likely to end up on one of the top-10 sites on the Internet, this amount of care is crucial.
I finished up work on cookies and HttpFunctions at the beginning of the week and found a few problems with the Pure PHP implementation regarding its handling of 404 errors. The code will still have problems with PHP versions before 5.2.10 (when the ingore_errors parameter was added), but it is better now. The only way around that, that I can think of offhand is to write a Pure PHP implementation based on using Socket rather than fopen. Which might not be a bad idea in the (very) long run — it would avoid requiring allow_fopen_url to be on — but isn't necessary right now.
With the help of Michael Dale, I got chunked uploading working against the mw:MwEmbed mwEmbed client. I'm still frustrated with the protocol that FireFogg has come up with — it seems very minimal — but at least the code is working.
It was a major relief for me (as well as others) to get the bogus calls to exit() out of the code. I'm always troubled when I see those. The next ugly bit was the embedded echo statements. These were fixed, but I think more work could be needed later for other third-party APIs.
PHPUnit Memory Problems
I also did some research on the PHPUnit tests and the memory problems. I have a strong feeling there are more than one bug in PHPUnit itself — the TestSuite code that I used in adapting the parser tests seems to have gotten minimal use — and since I didn't have any brilliant insights into tracking down memory problems I didn't spend too much time on it. (Tim gave me some good hints yesterday, though.)
Lacking any insight on the memory problem, I pulled out my old Centos4 and Ubuntu Hardy installations to test the HttpFunctions code against and discovered the perplexing problems mentioned above — namely that there doesn't seem to be a way to handle 404s in PHP's fopen prior to 5.2.10. Speaking of: I'm going to have to find a way to remotely trigger PHPUnit tests on other platforms for testing purposes. PHPUnit already has something to enable this (iirc), but now I need to implement it.
First couple of days
- Found a good resource on cookie security and implemented it.
- Began looking at PHPUnit and memory problems -- complaints about the test suite ballooning to use 1-2GB -- which really is excessive.
- Got prompted by TimStarling to focus on Chunked uploading
- Fixed HttpFunctions.php so that Http::get() returns false again if there are 404s or whatever. Found a bug in the Pure PHP implementation that didn't return proper errors on 404.
- Got a good portion of the complaints on my chunked uploading work addressed.
- Chunked uploading bugs galore.
- Tested chunked uploading against mwembed, got it working.