Food, Web Development, Music, and the funny crap
RT @FAILBlog Boat Name FAIL - FAIL Blog: Epic Fail Funny Pictures and Funny Videos of Owned, Pwned.. http://bit.ly/9P1pcW
My RSS Feed My Facebook My Twitter

Exxxcavate’s final decent

  • here it goes. I have decided no more f*&&$*in around. I am finishing this thing today !

    The crawler has been proving itself better as a chron job then a manual file. Please don’t ask me why – i am not really sure. But the experience is as follows.

    First i created a php file ( and other files included in there ) to browse websites cataloged in the database and find video clips of adult content – record the title, the listing location, views count, duration of the video, and tags and the category it was placed under. I then took that info and placed it into wordpress as a post, using the obvious info as normal post info ( title, category, tags ) and the rest as Custom fields.

    The site is hosted on cirtex, a shared hosting company boasting itself as a ‘adult content leader’ , being that they are the only shared hosting that allows adult content . We all know shared hosting means problems. You can’t develop crap on a shared hosted account because its built for less then 1k daily visitors ( this isn’t always true – some shared hosting accounts actually have a robust balance server load and actually take care of there customers – they expect some sort of ability for the developers to actually make something, and ususally dont give the boot until 15k daily visitors are reached, as in the case of bluehost.com )

    The site will be moved to a dedicated server, after the project is complete. The cron job has a multi crawler, multi site memory, and can be split in many ways simply by adding different vars. Once the mysql limitations of cirtex was realized, i droped the file down to grabbing a whole page of listings to only one at a time. This actually had a lot of benefits, as i was able to handle errors on a more detailed level.

    Today, i am adding search-by-video-length feature, and will be adding a landing page. Also making the css in the site act a bit normal ( my screen resolution is 1280 width, and it looks like shit on screen of 1060 or less ). All of that is not a big deal and probably about six hours of work. I need to add 30 more sites to the crawlers database, and that might extend some work on my part. I am thinking… all nighter?!