tag:blogger.com,1999:blog-64629959510940919772024-03-18T20:29:12.322-07:00php for funThis is a blog for anyone who loves PHP as much as I do. Using PHP for dynamic websites is so boring. Abstract and more complicated projects are way more fun. hehda404lewzerhttp://www.blogger.com/profile/02995961453909475185noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-6462995951094091977.post-87250706185111630422007-11-28T01:31:00.000-08:002007-11-28T01:57:24.516-08:00Learning, the Long WayDoing something like<br /><blockquote>print 'This is a test';</blockquote>is pretty simple, wouldn't you say? Print is a simple command that outputs what its given to the browser. Indeed simple. Now lets look at this:<br /><blockquote>function foo($data){<br /> echo $data;<br />}<br />${'*'}='foo';<br />${'*'}('This is a test');<br /></blockquote>This example is a bit more abstract. It does the same thing, but in a different way. First, ${} is called a variable-variable. it allows you to use a string to specify a variable. $name and ${'name'} are the same variable. Valid PHP variables start with an understore or alpha character and then may later have numbers. Here ${'*'} creates a variable $* if you will. This is not a valid variable! Consider the following:<br /><blockquote>echo $*; //ERROR<br />echo ${'*'}; // outputs: foo<br /></blockquote>Ok, so $* is invalid, but how does it work? If a variable-variable defines it, anything flys. Moving on, the second line ${'*'}(); is what is called a variable-function. Basically, it allows you to store the name of a function inside of a variable. Then, by adding () to the variable, it runs the function thats name matches the variable data. If no function is found an error will occour. Consider the following lines which are now technically the same.<br /><blockquote></blockquote><blockquote>${'*'}('Test 1-2-3'); // outputs: Test 1-2-3<br />foo('Test 1-2-3'); // outputs: Test 1-2-3<br /></blockquote>The reason we used foo as an echo wrapper is because echo and print are language constructs and not functions. Be sure to play with variable-variables and variable-functions, they are alot of fun and are the next step towards making modular components and classes! This is technically an alternative to:<br /><blockquote>call_user_func('foo', 'This is a test');</blockquote><h1 class="refname"></h1>More info here: <a href="http://www.php.net/manual/en/language.variables.variable.php">Variable-variables</a> and <a href="http://www.php.net/manual/en/functions.variable-functions.php">Variable-functions</a><br /><br />wika wika outda404lewzerhttp://www.blogger.com/profile/02995961453909475185noreply@blogger.com0tag:blogger.com,1999:blog-6462995951094091977.post-31583892313400451312007-11-23T16:56:00.000-08:002007-11-24T15:31:13.743-08:00MySQL speed improvementsMyISAM seems to be faster than InnoDB with the queries that I've been using for my web spider. I switched over my tables and found that my crawls were running 3 times faster because MyISAM supports INSERT DELAYED.<br /><br /><span style="font-weight: bold;">The Queries<br /></span><span>For the example, we will using entry #</span>383558 (<span style="font-style: italic;">http://en.wikipedia.org/wiki/Category:World_Wrestling_Entertainment_alumni</span>).<br /><br /><blockquote><span style="font-weight: bold;">SELECT </span>uID, uURL <span style="font-weight: bold;">FROM </span>urls <span style="font-weight: bold;">WHERE </span>uError = 0 <span style="font-weight: bold;">AND </span>uUpdated < <span style="font-style: italic;">DATE_SUB</span>(<span style="font-style: italic;">NOW()</span> , interval 12 hour ) <span style="font-weight: bold;">ORDER BY </span><span style="font-style: italic;">rand()</span> <span style="font-weight: bold;">LIMIT </span>1 </blockquote>Select a random URL from the url table that hasn't errored and hasn't been updated in over 12 hours.<br /><br /><blockquote><span style="font-weight: bold;">UPDATE </span>urls <span style="font-weight: bold;">SET </span>uUpdated = <span style="font-style: italic;">NOW()</span> <span style="font-weight: bold;">WHERE </span>uID = 383558 <span style="font-weight: bold;">LIMIT </span>1</blockquote>Set the last updated time to now() in the url table.<br /><br /><blockquote><span style="font-weight: bold;">INSERT DELAYED INTO</span> data (dURLID, dData) <span style="font-weight: bold;">VALUES </span>( 383558, 'Category:World Wrestling Entertainment alumni - Wikipedia, the free encyclopedia\n /**/...' ) <span style="font-weight: bold;">ON DUPLICATE KEY UPDATE</span> dData=<span style="font-weight: bold;">VALUES</span>(dData)</blockquote>This inserts (or updates) the current page's text into the data table.<br /><br /><blockquote><span style="font-weight: bold;">INSERT DELAYED IGNORE INTO</span> urls (uURL, uAdded, uSiteID) <span style="font-weight: bold;">VALUES </span>( 'http://en.wikipedia.org/favicon.ico', <span style="font-style: italic;">NOW()</span>, 383558 ),( 'http://en.wikipedia.org/wiki/Kurt_Angle', <span style="font-style: italic;">NOW()</span>, 383558 ),( 'http://en.wikipedia.org/wiki/Bryan_Clark', <span style="font-style: italic;">NOW()</span>, 383558 ),( 'http://en.wikipedia.org/wiki/Peter_Gasperino', <span style="font-style: italic;">NOW()</span>, 383558 ),( 'http://en.wikipedia.org/wiki/Robert_Horne_%28wrestler%29', <span style="font-style: italic;">NOW()</span>, 383558 )</blockquote>This inserts more urls to the url table and records the parent url.<br /><br /><blockquote><span style="font-weight: bold;">UPDATE </span>urls <span style="font-weight: bold;">SET </span>uError = 1 <span style="font-weight: bold;">WHERE </span>uID = 383558 <span style="font-weight: bold;">LIMIT </span>1</blockquote>This marks the page as an error (e.g. 404). It will not be spidered in the future without this mark being removed.<br /><br /><br />This is an extract from a typical set of querys that would get executed after a crawl. Obviously the dData wouldn't be truncated and there would be a lot more urls lol. tacos. l8rda404lewzerhttp://www.blogger.com/profile/02995961453909475185noreply@blogger.com2tag:blogger.com,1999:blog-6462995951094091977.post-84081866618864502502007-11-22T03:37:00.002-08:002007-11-22T15:17:35.485-08:00Using "IF Comments" in PHPHere's a nifty hack using comments to quickly enable/disable a block of code. I haven't seen examples of this before so I feel special. Here we go.<br /><br /><span style="font-weight: bold;">Teh Test</span><br />Suppose you have the following code:<br /><blockquote>$data = get_crap();<br /><br />echo "Debug:"<br />print_r($data);</blockquote>I store some data to $data and as a debug I want to view the contents of $data. I want to disabled the debug quickly. This is going to look weird at first but I will explain.<br /><br /><span style="font-weight: bold;">Example Code:<br /></span><blockquote>$data = get_crap();<br /><br /><span style="font-weight: bold; color: rgb(51, 204, 0);">//</span><span style="color: rgb(51, 204, 0);">/*</span><br />echo "Debug:"<br />print_r($data);<br /><span style="font-weight: bold; color: rgb(51, 204, 0);">//</span><span style="color: rgb(51, 204, 0);">*/</span><span style="color: rgb(51, 204, 0);"></span><br /></blockquote><span style="font-weight: bold;">///*</span> is a valid comment. it starts with //. php ignores the /*<br />The echo and print_r work because only the line above was commented.<br /><span style="font-weight: bold;">//*/</span> is a valid comment. it starts with //. php ignores the */<br /><br /><span style="font-weight: bold;">To disable execution of code:</span><br />(Remove the very first set of //'s)<br /><blockquote>$data = get_crap();<br /><br /><span style="font-weight: bold; color: rgb(51, 204, 0);">/*</span><br /><span style="color: rgb(51, 204, 0);"> echo "Debug:"</span><br /><span style="color: rgb(51, 204, 0);"> print_r($data);</span><br /><span style="color: rgb(51, 204, 0);"> //</span><span style="font-weight: bold; color: rgb(51, 204, 0);">*/</span></blockquote><span style="font-weight: bold;">/*</span> is a multi-line block comment. php ignores everything until it finds a matching */<br />The echo and print_r are not seen by the compiler.<br /><span style="font-weight: bold;">//*/</span> was ignored. Execution resumes after the */<br /><br /><span style="font-weight: bold;">Wait a second...</span><br />Why not just do this: ?<br /><blockquote>$data = get_crap();<br /><br /><span style="font-weight: bold; color: rgb(51, 204, 0);">/*</span><br /><span style="color: rgb(51, 204, 0);"> echo "Debug:"</span><br /><span style="color: rgb(51, 204, 0);"> print_r($data);</span><br /><span style="color: rgb(51, 204, 0);"> </span><span style="font-weight: bold; color: rgb(51, 204, 0);">*/<br /></span></blockquote><span style="font-weight: bold; color: rgb(51, 204, 0);"></span>Why not just use /* */ why /* //*/ ??<br /><br />Beacuse! if you remove the first /* to 'uncomment' without removing the */ (an extra step by the way) you get this:<br /><blockquote>$data = get_crap();<br /><br /><br /> echo "Debug:"<br /> print_r($data);<br /> */<span style="font-weight: bold; color: rgb(0, 0, 0);"></span></blockquote>And this:<br /><br /><b>Parse error</b>: parse error in <b>c:\www\project\file.php</b> on line <b style="font-style: italic;">n<br /><br /></b>meh, l8r<b style="font-style: italic;"><br /></b>da404lewzerhttp://www.blogger.com/profile/02995961453909475185noreply@blogger.com0tag:blogger.com,1999:blog-6462995951094091977.post-19672262680394847812007-11-22T00:16:00.000-08:002007-11-22T03:55:01.166-08:00My Web SpiderI'm writing a web spider as a test to better my php skills. A web spider is a script that looks for urls and gathers page data for later usage, GoogleBot is an example of a web spider. So far I've Indexed 8,861 of the 142,967 urls already spidered. I've only been running the spider a few hours (4 tabs) and my page_data table is already 736,217mb. This is only text that gets stored, I strip out html tags.<br /><br /><span style="font-weight: bold;">How it works:</span><br />A page is set to reload every 2 seconds. This initiates the parsing of a page, stores all new urls into the database, and displays a cool url list to show me it's working.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj01y-t_mmeb6zYvDEhmVWDds-FPOvNrw6xQqRpPliHWLBvCw2W3u4Y_C8LErhjYhum-E-vIB2Pf1dg_8ulyt9Xb4rcydamaW7nNnYQ4bd5RgMAOsai_Ov2HrlzrOawUPikN93o0GuXQaq6/s1600-h/digg.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj01y-t_mmeb6zYvDEhmVWDds-FPOvNrw6xQqRpPliHWLBvCw2W3u4Y_C8LErhjYhum-E-vIB2Pf1dg_8ulyt9Xb4rcydamaW7nNnYQ4bd5RgMAOsai_Ov2HrlzrOawUPikN93o0GuXQaq6/s400/digg.jpg" alt="" id="BLOGGER_PHOTO_ID_5135584178768887138" border="0" /></a><br />Every time the page loads, a random url is loaded. The page is scanned and it's urls are stored along with the page data. There is a 10% chance that the spider will go into 'source mode' which scans a preset list of urls that contain a constant supply of new urls like digg and del.icio.us.<br /><br />The regular expression that parses out the urls:<br /><blockquote>/href="([^"]+)"/</blockquote>The urls that are loaded from the database are loaded at random and must of been updated over 3 hours earlier (unless in source mode). I have had to keep pushing this time back as more urls are added. I will most likely need to create a system to determine which urls should be parsed faster to allow faster updating of news/social sites.<br /><br />The engine automatically ignores javascript, doubleclick.net and a few other gay things. I'll add more checks as I find need to.<br /><br />I need another soda. I'll write more later as I feel like adding to the engine. l8<span style="text-decoration: underline;">r</span><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj01y-t_mmeb6zYvDEhmVWDds-FPOvNrw6xQqRpPliHWLBvCw2W3u4Y_C8LErhjYhum-E-vIB2Pf1dg_8ulyt9Xb4rcydamaW7nNnYQ4bd5RgMAOsai_Ov2HrlzrOawUPikN93o0GuXQaq6/s1600-h/digg.jpg"><br /></a>da404lewzerhttp://www.blogger.com/profile/02995961453909475185noreply@blogger.com5tag:blogger.com,1999:blog-6462995951094091977.post-58127514795232828432007-11-20T14:28:00.000-08:002007-11-20T14:40:10.251-08:00Hello EveryoneMy name is Charlie and I love writing applications in php. I've been using php close to 5 years and have learned a lot of shortcuts along the way. I hope to be able to share greatest my ideas with people who love to code.<br /><br />A lot of my 'fun' projects are what I call 'retarded.' I like to go out of the box to achieve similar results by writing my own functions. This allows me to better understand what php is doing internally. Variable-variables, variable-functions, and objects: I LOVE YOU GUYS. These are some (not all) of the best things in php.<br /><br />As I start to think of interesting things to put on here I shall. Until then, peace my bitches!da404lewzerhttp://www.blogger.com/profile/02995961453909475185noreply@blogger.com0