<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Bad Nomenclature</title>
	<atom:link href="http://blog.paulbiggar.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.paulbiggar.com</link>
	<description>Compilers, optimization and scripting languages.</description>
	<lastBuildDate>Mon, 08 Feb 2010 15:51:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>A rant about PHP compilers in general and HipHop in particular.</title>
		<link>http://blog.paulbiggar.com/archive/a-rant-about-php-compilers-in-general-and-hiphop-in-particular/</link>
		<comments>http://blog.paulbiggar.com/archive/a-rant-about-php-compilers-in-general-and-hiphop-in-particular/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 15:46:39 +0000</pubDate>
		<dc:creator>Paul Biggar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.paulbiggar.com/?p=54</guid>
		<description><![CDATA[I&#8217;ve worked on phc since 2005, and been its maintainer since 2007. I wrote the optimizer, and nearly everything performance related. I had mixed reactions upon hearing about the release of HPHP [1], the new PHP compiler from Facebook. There are a few aspects to this so I&#8217;ll start with the technical stuff. I always [...]]]></description>
			<content:encoded><![CDATA[<p><em>I&#8217;ve worked on</em> <a class="reference external" href="http://phpcompiler.org">phc</a> <em>since 2005, and been its maintainer since 2007. I wrote the optimizer, and nearly everything performance related.</em></p>
<p>I had mixed reactions upon hearing about the <a class="reference external" href="http://developers.facebook.com/news.php?blog=1&amp;story=358">release of HPHP</a> <a class="footnote-reference" href="#hphp" id="id1">[1]</a>, the new PHP compiler from Facebook.  There are a few aspects to this so I&#8217;ll start with the technical stuff.  I always love the social aspect, so skip to the <a class="reference internal" href="#bottom">bottom</a> if you like whining and tears.</p>
<div class="section" id="how-does-it-work">
<h1>How does it work?</h1>
<p>I don&#8217;t know the answer to this.  I haven&#8217;t seen anyone even mention PHP&#8217;s references, which are incredibly gnarly for a static analyser <a class="footnote-reference" href="#refs" id="id2">[2]</a>.  HPHP might just ignore it, which isn&#8217;t necessarily a bad idea.  I&#8217;m wary of ignoring edge cases, as they tend to interact in horrible ways, but I guess Facebook already run all their code off it, so it can&#8217;t be that bad.</p>
<p>In general, I&#8217;ve found that ignoring the edge cases is bad when compiling PHP.  There are a million of them, and they all interact.  They interact worst of all in static analysis, because you have to consider all possible paths.  Its the sort of thing where if you nail it 100%, then you have something amazing and widely applicable, so that&#8217;s what I was aiming for with my PhD.  I suspect HPHP doesn&#8217;t consider all paths, and makes all sorts of hacky assumptions. <a class="footnote-reference" href="#hacks" id="id3">[3]</a> This is probably a really good idea.  I did the opposite in the optimizer, and the result is instead <a class="reference external" href="http://code.google.com/p/phc/source/browse/#svn/branches/dataflow">immature and slow</a>.</p>
</div>
<div class="section" id="speed">
<h1>Speed</h1>
<p>Facebook said HPHP reduces by half their number of servers.  PHP&#8217;s libraries are already written in C, which gives it the appearance of being fast, even though the interpreter is dog-slow.  This implies that HPHP-compiled PHP code is much more than twice as fast as the PHP interpreter.  This probably means HPHP is way faster than <a class="reference external" href="http://phpcompiler.org">phc</a>&#8216;s compiled code as well.</p>
</div>
<div class="section" id="they-could-have-built-a-jit">
<h1>They could have built a JIT!!</h1>
<p>I saw some criticism for not building a JIT on LLVM.  But:</p>
<ol class="arabic simple">
<li>LLVM isn&#8217;t mature enough for a proper dynamic JIT yet, as the Unladen Swallow team <a class="reference external" href="http://www.python.org/dev/peps/pep-3146/">found out</a>.</li>
<li>JITs are very hard to build. They ratchet up the complexity of building a compiler by about 10 times, so its probably best to avoid them if you can. <a class="footnote-reference" href="#says" id="id4">[4]</a></li>
<li>PHP doesn&#8217;t really need a JIT. Server side programs in PHP don&#8217;t do a great deal of dynamic stuff, and it would be incredibly rare to load some random code at run-time, so a JIT wouldn&#8217;t be all <em>that</em> useful.</li>
</ol>
<p>PHP is not like other dynamic languages.  Duck-typing is possible, but most of the community best practices come from Java (along with the class system), and so its not used that much.  Monkey-patching &#8212; switching out classes and methods from objects at run-time &#8212; isn&#8217;t possible, except with <a class="reference external" href="http://php.net/manual/en/book.classkit.php">the hackiest of hacky unsupported extensions</a>.  Dynamicism in PHP tends to involve templates instead, like with Smarty.  If you want to analyse it, then you just need to run it a few times, get all the templates to be instantiated, and compile all the generated PHP code.  I&#8217;ve started calling this &quot;deployment-time analysis&quot;, since in server-side, you probably know all the code you&#8217;re going to compile at deployment-time.  So a compiler is a perfectly reasonable approach for PHP, and a JIT is probably not needed.</p>
</div>
<div class="section" id="will-it-be-useful-for-me">
<h1>Will it be useful for me?</h1>
<p>People seem to want to know if HPHP is widely useful outside of Facebook, and some people are saying &quot;no&quot;.  I disagree strongly.  In order for HPHP to be useful, you need to have a PHP application which is suffering due to PHP interpreter performance.  That matches Facebook perfectly, and they&#8217;ve always been the canonical example I use to explain why PHP compilers are interesting.  But you don&#8217;t have to be Facebook size or scale to have performance problems.</p>
<div class="section" id="do-you-really-need-more-speed">
<h2>Do you really need more speed? <a class="footnote-reference" href="#nospeed" id="id5">[5]</a></h2>
<p>I&#8217;ve heard the argument &quot;you don&#8217;t need a compiler, since PHP is rarely the bottleneck&quot; for many years.  I think its complete bollox.  But I wrote a compiler for PHP, so I would say that.</p>
<p>Unless your PHP server is sitting there idling (which is probably the case for many PHP servers out there), then you could make use of a PHP compiler.  For small timers, all components of your application are going to be sitting on the same box, contending for the same resources.  Even if you assume the DB is the bottleneck, the resources the interpreter consumes could be more profitably spent on the DB.</p>
<p>The PHP interpreter is also quite memory hungry, as interpreters go.  Any PHP value in your program uses 68 bytes of overhead <a class="footnote-reference" href="#bit" id="id6">[6]</a>.  An array of a million values takes over 68 MB.  If HPHP is able to convert your million value array to native C types, it will only take 4MB.  I&#8217;m sure your caches could make good use of that savings.</p>
<p>However, optimization isn&#8217;t only about speed.  The main value is that they give you freedom in how you code.</p>
<p>There is a meme in the scripting language communities that {PHP,Ruby,Python,Perl,etc} are &quot;fast-enough&quot;.  If you need it to go faster, then you should take your hot-loop and rewrite it in C.  HPHP will free you from such concerns.</p>
<p>You should consider also that PHP is considered relatively fast.  Its not &#8212; the interpreter is dog-slow &#8212; but programs written in PHP are typically not <em>that</em> slow.  This is because most of PHP&#8217;s huge standard library is written in C, with a thin layer of PHP over it.  Anytime you call a string function, your PHP C string is passed into the C library, the pointers are manipulated and the bits are twiddled, and then it&#8217;s handed back to your code.  Its a bit like driving in America: it takes a few minutes to get on the freeway, but once you&#8217;re on it you&#8217;re there in no time.</p>
<p>This is not necessarily a good thing:</p>
<ol class="arabic simple">
<li>if you want to write a library, and it needs to be fast, then it needs to be written in C,</li>
<li>if there is a PHP function that does almost what you need, and you write your own version instead, it will be slow.</li>
</ol>
<p>Believe me, if your entire application just ran PHP interpreted code, it would not be fast at all.  But people who write PHP functions and libraries don&#8217;t want to write C.  They like PHP, are productive in it, and any time spent arsing around in C is wasted when there&#8217;s a website falling apart and a long list of features due yesterday.  HPHP will free you from such concerns.</p>
<p>Compilers also provide other niceties.  You don&#8217;t have to unroll your own loops, or move constant expressions out of loop headers.  I don&#8217;t know if HPHP supports these, but I&#8217;m sure it could.</p>
<p>Allowing your existing code to go faster is hardly the point though.  Really, the point is that you can do more in less time.  Suppose you&#8217;ve decided that your application needs to response to the user in 500ms.  The DB takes 200ms, the request takes 200ms, the framework takes 50ms and your code only has 50ms to run <a class="footnote-reference" href="#madeup" id="id7">[7]</a>.  That&#8217;s quite a constraint.  This leads to people using PHP as a simple templating layer, instead of as the Turing-complete langauge it is.  I expect we&#8217;ll hear a lot more about HPHP, simply because of how freeing it is to the user.</p>
<p>So even if you only have a small VPS, instead of massive server farms like Facebook, you&#8217;re likely to find a use for HPHP.  I&#8217;m sure shared hosts will set it all up soon for their users, and everyone will be happy.</p>
</div>
<div class="section" id="dynamic-constructs">
<h2>Dynamic constructs</h2>
<p>A better question is, how widely applicable is a compiler which doesn&#8217;t support all of PHP&#8217;s dynamic constructs?  Funnily enough I did a bit of research on this.  We chose the opposite tactic for <a class="reference external" href="http://phpcompiler.org">phc</a>: trying to stay 100% compatible with Zend, all the extensions (even the 3rd-party, unpublished, top-secret, proprietary ones), etc.  You would expect we would first research whether this was useful?  Well no.  We built it, and then I did some analysis to check whether it was useful.  You can read it in depth <a class="footnote-reference" href="#dynamicresearch" id="id8">[8]</a>.  Basically, I downloaded 700 packages off sourceforge, and wrote some <a class="reference external" href="http://phpcompiler.org">phc</a> plugins to check for <em>eval</em>s and dynamic includes (a dynamic include uses a variable as its parameter, instead of a constant or literal).</p>
<p>Result: 40% of PHP packages use dynamic constructs.  Now this isn&#8217;t quite as scientific as it should be.  Lots of those programs were old, and styles have changed.  <em>eval</em> is discouraged these days, but it&#8217;s probably still used, if only to get around the weaknesses of the PHP parser.  In particular, this doesn&#8217;t imply that HPHP is somehow fatally flawed for not supporting dynamic constructs. It just means it might not be so useful to you, the common PHP programmer.</p>
<p>An easy way to get round dynamic includes is to just consider the PHP files in the directory structure.  There was a good <a class="reference external" href="http://portal.acm.org/citation.cfm?id=1273442.1250739">research paper by Wassermann</a> on supporting that, but I find it very hard going, so you probably will too.  Still, a naive approach is to just stick a switch statement in, and compile everything that makes sense.  This is how you would deal with things like WordPress plugins.  It does mean that if you change your plugins, you&#8217;re just going to have to recompile.  If you&#8217;re using a compiler, I doubt you would find that a problem.</p>
</div>
</div>
<div class="section" id="social-stuff">
<span id="bottom"></span><br />
<h1>Social stuff</h1>
<p>But not everyone is happy with this new compiler, such as, well, me.</p>
<p>Lets start with a quick whine.</p>
<ul class="simple">
<li>I contacted Facebook two years ago to test and demo <a class="reference external" href="http://phpcompiler.org">phc</a>,</li>
<li>I went and gave a <a class="reference external" href="http://www.paulbiggar.com/research/#phc-talks">talk</a> at Facebook, and met a number of engineers,</li>
<li>I know they&#8217;ve used <a class="reference external" href="http://phpcompiler.org">phc</a> internally in the past,</li>
<li>They&#8217;re releasing a PHP compiler.</li>
</ul>
<p>You would think they could at least invite me to the party. Those bastards.</p>
<p>More seriously, I actually was annoyed at all the news reports about HPHP, principally because they were largely bullshit.  And I knew it, and I couldn&#8217;t say more because I told Facebook I wouldn&#8217;t.  There is very little more irritating than idiots being wrong on the internet, and the news stories brought out thousands of them!  Reddit and Hacker News were literally covered with stories about HPHP.  Hundreds of trolls emerged from under their bridges, not knowing the difference between a bytecode-based interpreter, a caching PHP accelerator, and a native compiler (which is fine, until they start saying they&#8217;re all the same).  Think about it now still makes me angry.</p>
<p>I&#8217;m also slightly annoyed that people all of a sudden care about PHP compilers.  I worked on one for 4 years and I could not convince anyone to <a class="reference external" href="http://www.reddit.com/r/programming/comments/7itjo/compile_php_to_executable_new_release_of_phc_php/c06rrsd">give a shit</a>.  But now that its got the Facebook logo on it, all of a sudden PHP compilers are the greatest thing ever.  Bah.</p>
<p>One saving grace is that they didn&#8217;t patent it.  I have an email in my inbox from one of the HPHP developers saying he couldn&#8217;t talk to me about the compiler because they might patent it.  That&#8217;s pretty shitty.  Thank god they open-sourced it instead <a class="footnote-reference" href="#opensourced" id="id9">[9]</a>.  It sounds like it was a bit touch and go for a while.</p>
</div>
<div class="section" id="the-most-important-question">
<h1>The most important question!</h1>
<p>Which bring us to the question of whether they should have used <a class="reference external" href="http://phpcompiler.org">phc</a>?</p>
<p>Obviously it would be great if they had used <a class="reference external" href="http://phpcompiler.org">phc</a>, and I&#8217;m not privy to the reasons they didn&#8217;t <a class="footnote-reference" href="#letmeknow" id="id10">[10]</a>.  The design decisions we made in <a class="reference external" href="http://phpcompiler.org">phc</a> were aimed at maximum compatabilty, and the performance suffered <a class="footnote-reference" href="#techtalk" id="id11">[11]</a> as a result.  The optimizer was designed to solve these problems, and I believe it would have, but it is not mature enough now, and was still a twinkle in my eye when HPHP started two years ago.</p>
<p>Facebook was solving their performance problems, not building a PHP compiler for general use.  If they were doing the latter, it would be much easier to criticise their approach, but for now I can&#8217;t say I would have advised them otherwise.  On the other hand, they probably didn&#8217;t need to build their own parser &#8211; its a tricky problem and <a class="reference external" href="http://phpcompiler.org">phc</a>&#8216;s parser and front-end are excellent.  Had they gone another way, then they could probably have started to use <a class="reference external" href="http://phpcompiler.org">phc</a>&#8216;s optimizer <a class="footnote-reference" href="#phdthesis" id="id12">[12]</a>, which while immature and slow to compile, is pretty state-of-the-art and has great potential (if I do say so myself).</p>
<p>A better approach would probably have been to hire all the programmers who worked on PHP compilers, to get that expertise in house.  They did try to hire me, but only recently.  I&#8217;m honestly surprised that they haven&#8217;t tried to hire the Shannon Weyrick, who is currently working on <a class="reference external" href="http://code.roadsend.com/rphp">rphp</a>, his second PHP compiler <a class="footnote-reference" href="#roadsend" id="id13">[13]</a>.</p>
<div class="section" id="what-does-this-mean-for-phc">
<h2>What does this mean for phc?</h2>
<p>When they annouced HPHP, I would have said it was <a class="reference external" href="http://phpcompiler.org">phc</a>&#8216;s death toll.  The original <a class="reference external" href="http://phpcompiler.org">phc</a> authors, Edsko and John, have moved on to other projects, and I&#8217;ve run it mostly solo for about two years.  But I havent worked on <a class="reference external" href="http://phpcompiler.org">phc</a> in about 6 months, and my hatred of PHP makes it unlikely I will again.  My <a class="reference external" href="http://phpcompiler.org/contribute.html">requests for new contributors</a> to step up has fallen on deaf ears, and my summer intern hasn&#8217;t decided to take over either.</p>
<p>Since no-one wants to take on the compiler, the new competition from Facebook should probably kill it, right?  Maybe not.  Over the last week, traffic to the <a class="reference external" href="http://phpcompiler.org">phc</a> website has increased by five times <a class="footnote-reference" href="#traffic" id="id14">[14]</a>.  Facebook has unleashed some sort of latent interest in PHP compilers that I haven&#8217;t been able to extract from people.  So perhaps this might be the rebirth of <a class="reference external" href="http://phpcompiler.org">phc</a>, not its death.</p>
<p>And <a class="reference external" href="http://phpcompiler.org">phc</a> is better than HPHP in some ways.  HPHP is almost certainly faster because they didn&#8217;t have to deal with <em>eval</em>, dynamic stuff, and because they don&#8217;t use the Zend libraries.  But <a class="reference external" href="http://phpcompiler.org">phc</a> was specifically designed to work with the Zend libraries, with <em>eval</em>, with everything.  So it&#8217;s probably a better fit for most projects than HPHP.</p>
<p>If you want to take over <a class="reference external" href="http://phpcompiler.org">phc</a>, then join the mailing lists, download the source code, read the <a class="reference external" href="http://www.phpcompiler.org/lists/phc-general/2010-January/001027.html">death notice</a> and contribution page, and email me for commit access.</p>
<p><a class="reference external" href="http://phpcompiler.org">phc</a> will likely live on anyway.  The front-end is pretty slick: Facebook ran it over their million lines of code and only had one or two problems.  It gives a lovely AST to allow all sorts of code transformation tools, has a nice plugin interface (for C++ lovers) and an XML interface (for the rest of you), and will spit your code out largely as you put it into it.  Its certainly the most mature and well tested part of the whole project.</p>
<p>The optimizer is pretty slick as well, but in a different way.  I&#8217;m pretty sure its the most advanced static analyser for PHP, and it&#8217;s waiting to be put to good use.  That said, it&#8217;s damn slow, and not mature (read: pretty buggy), and itself doesn&#8217;t support <em>eval</em> and dynamic includes (surprise!!).  The optimizer is waiting for some love &#8212; I could imagine it making a pretty nice &quot;automatically find out what types your function may be passed&quot; kind of linter.</p>
<p>Otherwise, <a class="reference external" href="http://phpcompiler.org">phc</a> will only live on as part of the <a class="reference external" href="http://code.roadsend.com/rphp">Roadsend Raven</a> compiler.  I understand that they&#8217;re going to take the optimizer and the parser from <a class="reference external" href="http://phpcompiler.org">phc</a>, and that will be really interesting.</p>
<p>Finally, what does it mean for me?  Well, I&#8217;ve left that ship already.  I&#8217;ve hated PHP for a long time, and have no desire to go back to it.  I&#8217;m doing a <a class="reference external" href="http://www.newslabs.com">startup</a> now, but when I go back to regular employment I will be looking for another scripting language run-time.  There are plenty to choose from, in particlar Unladen swallow and TraceMonkey.  Mozilla looks like an amazing place to work, so I think I&#8217;ve worked out my backup plan.</p>
<table class="docutils footnote" frame="void" id="hphp" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1">[1]</a></td>
<td>I can&#8217;t bring myself to call it Hiphop. Worst name ever. Rumor has it that the &#8216;H&#8217; in HPHP stands for &quot;Haiping&quot;, the author of HPHP, so I like that name better.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="refs" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2">[2]</a></td>
<td>See chapter 6 of my thesis</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="hacks" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3">[3]</a></td>
<td>This is just a hunch. Obviously I have no way of verifying this, and I don&#8217;t really want to read the source when it comes out to check. So as accusations go, this one is obviously pretty baseless.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="says" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4">[4]</a></td>
<td>Says a guy who wrote a PHP compiler. I may be biased.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="nospeed" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id5">[5]</a></td>
<td>I&#8217;m trying to imply you don&#8217;t, but everyone loves more speed. Plus, compilers are cool toys, even if you don&#8217;t need them. So 99% of the people who start using HPHP will use it cause its cool and they love to go fast, not because they&#8217;ve carefully considered the design of their project and determined that a compiler would solve something.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="bit" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id6">[6]</a></td>
<td>68 bytes is on 32-bit systems. I think its 96 bytes on 64 bit.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="madeup" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id7">[7]</a></td>
<td>I pulled these numbers out of my arse.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="dynamicresearch" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id8">[8]</a></td>
<td>You can read the <a class="reference external" href="http://www.paulbiggar.com/research/#wip-sac-journal">paper</a> (see Section 7.5) for my method, results, etc.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="opensourced" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id9">[9]</a></td>
<td>Open-sourcing and patenting aren&#8217;t strictly mutually exclusive, but presumably they won&#8217;t patent it now. A firm word on the topic from Facebook would be nice. And yes, I&#8217;m aware of the patent climate in America, and how you have to patent everything you can get your hands on, and it doesn&#8217;t make you evil. It still fucks with people who write compilers though.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="letmeknow" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id10">[10]</a></td>
<td>If you know why, let me know. This is open source, I can take it.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="techtalk" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id11">[11]</a></td>
<td>I went into a bit of detail in my <a class="reference external" href="http://www.youtube.com/watch?v=kKySEUrP7LA">Google Tech Talk</a>, if you&#8217;re interested.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="phdthesis" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id12">[12]</a></td>
<td>Interested parties can find more information in chapter 6 of my <a class="reference external" href="http://www.paulbiggar.com/research/#wip-phd-dissertation">PhD thesis</a></td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="roadsend" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id13">[13]</a></td>
<td>This is pronounced RoadsEnd, apparently. I&#8217;ve been mispronouncing it for years.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="traffic" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id14">[14]</a></td>
<td>The <a class="reference external" href="http://phpcompiler.org">phc website</a> used to get 200 visits per day. On Tuesday it got 1100 visits, going up to over 2000 on Wednesday. And 1000 downloads by the look of it. Does anyone know how to find out how many people check the code out of a Google Code svn repostory?.</td>
</tr>
</tbody>
</table>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.paulbiggar.com/archive/a-rant-about-php-compilers-in-general-and-hiphop-in-particular/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Introducing Malicious Code Reviews</title>
		<link>http://blog.paulbiggar.com/archive/introducing-malicious-code-reviews/</link>
		<comments>http://blog.paulbiggar.com/archive/introducing-malicious-code-reviews/#comments</comments>
		<pubDate>Wed, 24 Dec 2008 01:11:00 +0000</pubDate>
		<dc:creator>Paul Biggar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.paulbiggar.com/?p=7</guid>
		<description><![CDATA[I&#8217;m inventing a new sport today, which I call &#8220;malicious code reviews&#8221;. I spent a few hours reading some really very bad code, and in retaliation against its author(s), I&#8217;m going to code review it [1]. The code comes from PHP version 5.2.8, the latest stable release. This particular file is Zend/zend_operators.h [2]. You might [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m inventing a new sport today, which I call &#8220;malicious code reviews&#8221;. I spent a few hours reading some really very bad code, and in retaliation against its author(s), I&#8217;m going to code review it <a id="id1" class="footnote-reference" href="#id9">[1]</a>. The code comes from PHP version 5.2.8, the latest stable release. This particular file is <a class="reference external" href="http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?revision=1.94.2.4.2.14&amp;view=markup">Zend/zend_operators.h</a> <a id="id2" class="footnote-reference" href="#id10">[2]</a>. You might want to open it in a new window, or in a <a onclick="javascript:window.open(this);return false;" href="http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?revision=1.94.2.4.2.14&amp;view=markup">popup</a>, so that you can follow along.</p>
<hr class="docutils" />I&#8217;ll start <a id="id3" class="footnote-reference" href="#id11">[3]</a> at the top:</p>
<div class="highlight">
<pre><span style="color: #BC7A00">#if 0</span><span style="color: #408080; font-style: italic">&amp;&amp;HAVE_BCMATH</span> <span style="color: #408080; font-style: italic">#include "ext/bcmath/libbcmath/src/bcmath.h"</span> <span style="color: #408080; font-style: italic">#endif</span></pre>
</div>
<p>I&#8217;ve skipped a few minor problems to go straight to the laughably poor, the <tt class="docutils literal"><span class="pre">#if</span> <span class="pre">0</span></tt>. Funny story though: I saw a &#8220;code review&#8221; in PHP recently which chastised the addition of a <tt class="docutils literal"><span class="pre">#if</span> <span class="pre">0</span></tt>. I initially thought that, finally, someone is actually stepping up to stop the rot within the PHP engine. Sadly, they instead <a class="reference external" href="http://article.gmane.org/gmane.comp.php.cvs.zend/7034">complained</a> that according to the rules of the PHP project, an <tt class="docutils literal"><span class="pre">#if</span> <span class="pre">0</span></tt> must also have the <strong>author&#8217;s name</strong> added to it. The mind boggles.
<div class="line-block"></div>
<p> There is a limited amount of reasonable code, which I&#8217;ll skip, followed by the <a class="reference external" href="http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?revision=1.94.2.4.2.14&amp;view=markup#l97">is_numeric_string() function</a> <a id="id4" class="footnote-reference" href="#id12">[4]</a>, only one of the finest examples of poor code I&#8217;ve ever seen. I linked to it above, so I recommend that you actually <a class="reference external" href="http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?revision=1.94.2.4.2.14&amp;view=markup#l97">read along</a> as I go. This will be no fun unless you can actually see it.</p>
<p>It starts off, surprisingly, with the <a class="reference external" href="http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?revision=1.94.2.4.2.14&amp;view=markup#l97">most thorough comment</a> I have yet to see in PHP. It is merely average in terms of what you might read in a <em>gcc</em> source file, but here it is a shiny gold nugget floating in a murky brown sea. However, it degrades fairly rapidly. You might notice this giant function is in a header, and that it is declared to be <tt class="docutils literal"><span class="pre">static</span> <span class="pre">inline</span></tt>. This is a prelude for what&#8217;s to come.</p>
<p>The function starts with some whitespace skipping <a id="id5" class="footnote-reference" href="#id13">[5]</a>:</p>
<div class="highlight">
<pre><span style="color: #008000; font-weight: bold">while</span> (<span style="color: #666666">*</span>str <span style="color: #666666">==</span> <span style="color: #BA2121">' '</span> <span style="color: #666666">||</span> <span style="color: #666666">*</span>str <span style="color: #666666">==</span> <span style="color: #BA2121">'\t'</span> <span style="color: #666666">||</span> <span style="color: #666666">*</span>str <span style="color: #666666">==</span> <span style="color: #BA2121">'\n'</span>
    <span style="color: #666666">||</span> <span style="color: #666666">*</span>str <span style="color: #666666">==</span> <span style="color: #BA2121">'\r'</span> <span style="color: #666666">||</span> <span style="color: #666666">*</span>str <span style="color: #666666">==</span> <span style="color: #BA2121">'\v'</span> <span style="color: #666666">||</span> <span style="color: #666666">*</span>str <span style="color: #666666">==</span> <span style="color: #BA2121">'\f'</span>) {
   str<span style="color: #666666">++</span>;
   length<span style="color: #666666">--</span>;
}</pre>
</div>
<p>Just two lines above the function they have two macros, called <tt class="docutils literal"><span class="pre">ZEND_IS_DIGIT</span></tt> and <tt class="docutils literal"><span class="pre">ZEND_IS_XDIGIT</span></tt>. Could they not have added <tt class="docutils literal"><span class="pre">ZEND_IS_WHITESPACE</span></tt>? A pity, but a tiny flaw compared to the gaping maw of despair that follows a little bit later. The code continues fine: check for a digit, check if its hex (with comments, very good) until we come to this line:</p>
<div class="highlight">
<pre><span style="color: #008000; font-weight: bold">for</span> (type <span style="color: #666666">=</span> IS_LONG;
     <span style="color: #666666">!</span>(digits <span style="color: #666666">&gt;=</span> MAX_LENGTH_OF_LONG
        <span style="color: #666666">&amp;&amp;</span> (dval <span style="color: #666666">||</span> allow_errors <span style="color: #666666">==</span> <span style="color: #666666">1</span>));
     digits<span style="color: #666666">++</span>, ptr<span style="color: #666666">++</span>)</pre>
</div>
<p>I&#8217;d like somebody to come forward and explain why</p>
<div class="highlight">
<pre>type <span style="color: #666666">=</span> IS_LONG</pre>
</div>
<p>is in the loop initialization statement. And why the loop condition is so unreadable. And why the elements of the loop header are not related in any way at all!!! But this is just the start. The next line is a doozy:</p>
<div class="highlight">
<pre><span style="color: #A0A000">check_digits:</span></pre>
</div>
<p>Do you feel the fear? I feel the fear. Its a label. That means that somewhere in this function, there is a goto. Not that there&#8217;s anything wrong with gotos.  Sure, if misused they can lead to unreadable, spaghetti co&#8211; <strong>OH MY GOOD GOD</strong>. I&#8217;ve found some gotos, but they go to a <em>different label</em>. Two labels.  And the first one is in a for-loop! Don&#8217;t panic, maybe its readable. Maybe the second one is also in the for-loop. Please? Pretty please?</p>
<p>Fuck. Fuck fuck fuck. I&#8217;d like to suggest you try to work it out yourself, but you&#8217;d probably prefer not to. If you haven&#8217;t looked at the code yet, <a class="reference external" href="http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?revision=1.94.2.4.2.14&amp;view=markup#l149">now is the time</a>.  Don&#8217;t worry if you don&#8217;t know C &#8212; with code like this, knowing the language is not the advantage you&#8217;d expect.</p>
<p>The first label, <tt class="docutils literal"><span class="pre">check_digits</span></tt>, is in a for-loop. That for-loop has two gotos (and one continue and one break, just to make things more readable), which both go to the <em>other label</em> <tt class="docutils literal"><span class="pre">process_double</span></tt>. <tt class="docutils literal"><span class="pre">process_double</span></tt> is outside the loop, deeply nested in a completely separate series of <em>if-else</em> statements. They have also given up on comments by now). After checking a few more conditions, only then do you jump back into the previous for-loop!!!! Oh wait.  No, that&#8217;s not right. They&#8217;re in different paths. There actually added control-flow edges from an else-body to within a for-loop in the if-body. Just wow.</p>
<p>I&#8217;d like to say that this horrendous function is over. After all, this is just the first function I&#8217;ve come across. But while there is only simple (but uncommented) code remaining in the function, I have a final nit to pick.</p>
<div class="highlight">
<pre><span style="color: #008000; font-weight: bold">if</span> (ptr <span style="color: #666666">!=</span> str <span style="color: #666666">+</span> length) {
    <span style="color: #008000; font-weight: bold">if</span> (<span style="color: #666666">!</span>allow_errors) {
        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0</span>;
    }
    <span style="color: #008000; font-weight: bold">if</span> (allow_errors <span style="color: #666666">==</span> <span style="color: #666666">-1</span>) {
        zend_error(E_NOTICE,
            <span style="color: #BA2121">"A non well formed numeric value encountered"</span>);
    }
}</pre>
</div>
<p>There is a check to see if <tt class="docutils literal"><span class="pre">allow_errors</span></tt> is -1, even though the comment only mentioned two possible values for <tt class="docutils literal"><span class="pre">allow_errors</span></tt>. So what does it mean for it to be -1? We&#8217;re saved from figuring it out because the check can&#8217;t even trigger.  If <tt class="docutils literal"><span class="pre">allow_errors</span></tt> was non-zero, the function would have returned already.</p>
<p>Now, you might consider this a minor nit, and its easy to see that it wasn&#8217;t fixed when you consider how deeply it was nested <a id="id6" class="footnote-reference" href="#id14">[6]</a>. But this sort of thing is the rule, not the exception in PHP sources. Broken windows built on top of other broken windows.</p>
<hr class="docutils" />The rest of the header is OK, considering. There is a macro that&#8217;s not appropriately guarded <a id="id7" class="footnote-reference" href="#id15">[7]</a>, and a <a class="reference external" href="http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?revision=1.94.2.4.2.14&amp;view=markup#l337">few with no guards and no comments</a>.  A few macros use their parameters more than once (bane of post-incrementers everywhere), and some duplicate some code between them, or write the same code in multiple ways <a id="id8" class="footnote-reference" href="#id16">[8]</a>. There are then a ton of macros which can be used as lvalues:</p>
<div class="highlight">
<pre><span style="color: #BC7A00">#define Z_DVAL(zval)            (zval).value.dval</span>
<span style="color: #BC7A00">#define Z_STRVAL(zval)          (zval).value.str.val</span>
<span style="color: #BC7A00">#define Z_STRLEN(zval)          (zval).value.str.len</span></pre>
</div>
<p>except one</p>
<div class="highlight">
<pre><span style="color: #BC7A00">#define Z_BVAL(zval)            ((zend_bool)(zval).value.lval)</span></pre>
</div>
<p>where an enterprising soul mustn&#8217;t have thought hard before committing. PHP has long been lambasted for its inconsistency; it turns out that this inconsistency is not limited to just its libraries, syntax or semantics.</p>
<hr class="docutils" />To finish off this file is a frankly baffling piece of code. Let this be a lesson about commenting. Sometime the &#8216;why&#8217; of the comment is not sufficient &#8211; occasionally, you will need to explain what the code is doing.</p>
<div class="highlight">
<pre><span style="color: #BC7A00">#if HAVE_SETLOCALE &amp;&amp; defined(ZEND_WIN32) &amp;&amp; !defined(ZTS) \\</span>
<span style="color: #BC7A00">    &amp;&amp; defined(_MSC_VER) &amp;&amp; (_MSC_VER &gt;= 1400)</span>
<span style="color: #408080; font-style: italic">/* This is performance improvement of tolower() on Windows</span>
<span style="color: #408080; font-style: italic"> * and VC2005</span>
<span style="color: #408080; font-style: italic"> * GIves 10-18% on bench.php</span>
<span style="color: #408080; font-style: italic"> */</span></pre>
</div>
<hr class="docutils" />
<div id="and-we-re-done" class="section">
<h1>And we&#8217;re done</h1>
<p>Unfortunately, this file is not an isolated incident.  The entire <em>Zend/</em> directory &#8212; the core of the whole PHP implementation &#8212; is a filthy mess.  While the <tt class="docutils literal"><span class="pre">is_numeric_string()</span></tt> function might be the worst code I have ever seen, most of the <em>Zend/</em> files contain a lot of hideous code: poor organized, badly written, badly documented, unreadable messes.</div>
<hr class="docutils" />
<div id="notes" class="section">
<h1>Notes:</h1>
<table id="id9" class="docutils footnote" border="0" frame="void" rules="none">
<colgroup>
<col class="label"></col>
<col></col>
</colgroup>
<tbody>
<tr>
<td class="label"><a class="fn-backref" href="#id1">[1]</a></td>
<td>
<p class="first">I&#8217;m aware that as a &#8216;code review&#8217;, this is actually pretty poor, primarily due to its lack of constructive criticism. So this is more of a detailed, flame-bait-y rant about the quality of code in this file, which I will use to assure people that the rest of the code I&#8217;ve seen in the PHP project is of similar caliber. And none of that is really very constructive.</p>
<p class="last">As it happens, I am preparing a much more constructive piece about <em>why</em> the code in PHP is so bad, how it got this way, and how to fix it. But first I&#8217;d like to demonstrate how poor the code currently is, and it&#8217;s difficult to do this without some bile and vitriol.</td>
</tr>
</tbody>
</table>
<table id="id10" class="docutils footnote" border="0" frame="void" rules="none">
<colgroup>
<col class="label"></col>
<col></col>
</colgroup>
<tbody>
<tr>
<td class="label"><a class="fn-backref" href="#id2">[2]</a></td>
<td>I was originally planning to do zend_operators.c, and thought I&#8217;d quickly do the header first. But this was so poor that I ended up writing about it.</td>
</tr>
</tbody>
</table>
<table id="id11" class="docutils footnote" border="0" frame="void" rules="none">
<colgroup>
<col class="label"></col>
<col></col>
</colgroup>
<tbody>
<tr>
<td class="label"><a class="fn-backref" href="#id3">[3]</a></td>
<td>
<p class="first">I didn&#8217;t want to bog down the intro with my method, but it should probably be here anyway.</p>
<ul class="last simple">
<li>I&#8217;m analysing the latest release, PHP 5.2.8. Its a little difficult to choose which version of PHP to pick on, but the latest stable release is probably not too unfair.</li>
<li>All files that I&#8217;ll choose come from the <em>Zend/</em> directory, which makes up the core of the PHP interpreter.</li>
<li>I could do some code archaeology, and find how the code got to the state that it did, and perhaps personally hunt down the person who did it, but I wont. Even if I could find a sole contributor to blame, there are so many broken windows in PHP that I feel the blame should be spread across all the PHP internals developers.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<table id="id12" class="docutils footnote" border="0" frame="void" rules="none">
<colgroup>
<col class="label"></col>
<col></col>
</colgroup>
<tbody>
<tr>
<td class="label"><a class="fn-backref" href="#id4">[4]</a></td>
<td>Unfortunately, <tt class="docutils literal"><span class="pre">is_numeric_string()</span></tt> was cleaned up at some point (though the version I review here is what&#8217;s in all the 5.2.x releases, and is scheduled to be in 5.3.), so this post loses a smidgen of its sting.</td>
</tr>
</tbody>
</table>
<table id="id13" class="docutils footnote" border="0" frame="void" rules="none">
<colgroup>
<col class="label"></col>
<col></col>
</colgroup>
<tbody>
<tr>
<td class="label"><a class="fn-backref" href="#id5">[5]</a></td>
<td>I should point out that I&#8217;ve tidied up the code to fit in the blog. So if you&#8217;re thinking &#8220;at least the lines aren&#8217;t too long&#8221;, well, they are.</td>
</tr>
</tbody>
</table>
<table id="id14" class="docutils footnote" border="0" frame="void" rules="none">
<colgroup>
<col class="label"></col>
<col></col>
</colgroup>
<tbody>
<tr>
<td class="label"><a class="fn-backref" href="#id6">[6]</a></td>
<td>When they fixed this particular piece of code, the <tt class="docutils literal"><span class="pre">allow_errors</span></tt> checks became much <a class="reference external" href="http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?revision=1.135&amp;view=markup&amp;pathrev=MAIN#l239">less obfuscated</a>, but still was not removed, sadly.</td>
</tr>
</tbody>
</table>
<table id="id15" class="docutils footnote" border="0" frame="void" rules="none">
<colgroup>
<col class="label"></col>
<col></col>
</colgroup>
<tbody>
<tr>
<td class="label"><a class="fn-backref" href="#id7">[7]</a></td>
<td>A macro guard (if that is indeed the right name, and not one I just made up), is when you wrap a macro in a <tt class="docutils literal"><span class="pre">do-while(0)</span></tt> loop.</td>
</tr>
</tbody>
</table>
<table id="id16" class="docutils footnote" border="0" frame="void" rules="none">
<colgroup>
<col class="label"></col>
<col></col>
</colgroup>
<tbody>
<tr>
<td class="label"><a class="fn-backref" href="#id8">[8]</a></td>
<td>
<p class="first">If you know Zend internals, spot the difference:</p>
<div class="highlight">
<pre>SEPARATE_ZVAL_IF_NOT_REF(ppzv);</pre>
</div>
<p>and</p>
<div class="last">
<div class="highlight">
<pre><span style="color: #008000; font-weight: bold">if</span> (<span style="color: #666666">!</span>(<span style="color: #666666">*</span>ppzv)<span style="color: #666666">-&gt;</span>is_ref) SEPARATE_ZVAL(ppzv);</pre>
</div>
</div>
</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.paulbiggar.com/archive/introducing-malicious-code-reviews/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
