Results 1 to 6 of 6

Thread: Web aggregator / scraper

  1. #1
    Senior Member autotuner's Avatar
    Join Date
    Aug 2009
    Location
    I have no idea...
    Posts
    518
    Thanks
    93
    Thanked 209 Times in 134 Posts
    Rep Power
    220
    Reputation
    1279

    Default Web aggregator / scraper

    I'm no Web developer by any means, so please bear with me...
    I have to keep an eye on half a dozen sites in real time (don't ask why).
    Having to monitor several Web pages from differing sources is a pain.

    Anyone have any suggestions for a package to pull/scrape/extract stuff from differing pages, and publish it all on one page?
    I have downloaded half a dozen trials with varying degrees of success, but all fall short.
    The stuff I want might be pictures, text or a combination of both, and I want to be able to ignore the surrounding crap.
    Oh, and its not all static url. The main part of the url will remain the same, but below this content/links could be dynamic...

    Any ideas?
    I'd rather have a bottle in front of me than a frontal lobotomy...



Look Here ->
  • #2
    Senior Member
    Philquad's Avatar
    Join Date
    Jan 2008
    Location
    nelson bay
    Age
    55
    Posts
    3,872
    Thanks
    192
    Thanked 1,305 Times in 783 Posts
    Rep Power
    665
    Reputation
    16938

    Default

    do you mean having to login to different hosting packs for each site?
    if so, i suggest a hosting pack with addon domains
    then you can access all domains from 1 ftp login

  • #3
    Senior Member autotuner's Avatar
    Join Date
    Aug 2009
    Location
    I have no idea...
    Posts
    518
    Thanks
    93
    Thanked 209 Times in 134 Posts
    Rep Power
    220
    Reputation
    1279

    Default

    Phil,
    no, these are normal sorts of Web pages, not mine, but they provide a service to the company I work for.
    But there is a load of crap on them, besides the stuff I need, and there are a few of them.
    So what I wanted to do was have an app in the background, just pulling what I want from those pages and presenting it in a single page.

    if that makes any sense
    I'd rather have a bottle in front of me than a frontal lobotomy...

  • #4
    Senior Member Globe's Avatar
    Join Date
    Jan 2008
    Location
    Lost In The Matrix.
    Posts
    908
    Thanks
    66
    Thanked 273 Times in 128 Posts
    Rep Power
    255
    Reputation
    1399

    Default

    I've used yahoo pipes to do this sort of thing, but you need to set it up so your aggregator can understand it. I'm no expert, but this can be done.


  • The Following 2 Users Say Thank You to Globe For This Useful Post:

    autotuner (19-08-10),Crabman (20-10-10)

  • #5
    Senior Member autotuner's Avatar
    Join Date
    Aug 2009
    Location
    I have no idea...
    Posts
    518
    Thanks
    93
    Thanked 209 Times in 134 Posts
    Rep Power
    220
    Reputation
    1279

    Default

    Thanks Globe, that may just do the job..
    I'd rather have a bottle in front of me than a frontal lobotomy...

  • #6
    Junior Member
    Join Date
    Oct 2010
    Location
    Sydney
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Rep Power
    0
    Reputation
    10

    Default

    You could quite easily achieve this with PHP using cURL and regex

    So all you do is load the page with cURL (which gets the source code into a variable) go through the source code, find the data that surrounds the html you want and match it with a regular expression, make any changes you want to it (strip anything out you don't want) or if they have images that are on their own domain and the src is relative ie /images/something.jpg you will have to find all of them (another regular expression to match all src=(.*) and update the url of the website to make it then output the html.

    Repeat for your half a dozen websites.

    Note this method requires an understanding of html (finding out what to match in your regular expression), understanding of regular expressions and a basic understand of php.

  • Bookmarks

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •