Results 1 to 7 of 7

Thread: Extracting data from website

  1. #1
    Senior Member
    trash's Avatar
    Join Date
    Jan 2008
    Location
    Tamworth
    Posts
    4,089
    Thanks
    148
    Thanked 3,229 Times in 1,451 Posts
    Rep Power
    1288
    Reputation
    47674

    Default Extracting data from website

    I'm wondering if anybody knows of the best or simplest way to extract data from a web site and then take that data and say send it to a serial port.
    Obviously there's a lot of applications for mining real time data from websites and using it to control various things.

    An example is to look at the BOM website and look at the various weather stations looking for approaching wind gust and then controlling an external device by either
    sending values to an external microcontroller to make the decision and control the hardware or the PC making the decision and sending the immediate command to the device.
    The other thought was to reassemble the data and then send it to another third party web page.

    It's not just the BOM, that's just one example. There are quite a few pages I'd like to mine real time data from.
    Yes I am an agent of Satan, but my duties are largely ceremonial.



Look Here ->
  • #2
    Premium Member

    Join Date
    Jan 2008
    Posts
    4,311
    Thanks
    5,982
    Thanked 4,171 Times in 1,771 Posts
    Rep Power
    1349
    Reputation
    50392

    Default

    Interesting project, trash.

    I had thought of doing much the same thing at one stage but decided to leave it for more pressing tasks at the time.

    I believe that the term for monitoring such data on websites is "web scraping". See .

    One way would be to write an application that examines the html code for a particular web page (web site) and to search for and list any changed variables. (The source html code for a web page can be readily examined in most popular web browsers).

    All or a particular piece of data (variable) could be monitored in this way.

    Out of interest, I'll compare html code for a page on the BOM website over several days to see what happens.

    A quick Google for "Web-scraping software" found . It looks like possible starting point.

  • #3
    Junior Member
    Join Date
    Nov 2011
    Location
    The land of Oz
    Posts
    9
    Thanks
    1
    Thanked 6 Times in 4 Posts
    Rep Power
    0
    Reputation
    60

    Default

    Trash, not sure how game you are but you could try some of the powershell cmds. Fairly easy to manipulate and break data out into other files or write out to a serial port. I haven't used this cmds specifically myself, but work with PS constantly in the "non hobby" hours of the weekdaze.. with a bit of planning you can do some pretty cool stuff.

    As tristen suggested you could write a script that does a compare of previous content and then write out what you need to a com port.

    Examples here:

    Reading http content


    Writing to com ports

  • The Following 2 Users Say Thank You to simonm4 For This Useful Post:

    autotuner (11-05-13),tristen (09-05-13)

  • #4
    Banned

    Join Date
    Feb 2012
    Posts
    2,361
    Thanks
    166
    Thanked 1,206 Times in 607 Posts
    Rep Power
    0
    Reputation
    16631

    Default

    If the data to be extracted are RSS formatted (means the data structure is known to you), with a simple MortScript "RSSReader.mscr" you can easily fetch these data and process them anyways.

  • #5
    Junior Member wwalford's Avatar
    Join Date
    Dec 2013
    Location
    Pretoria - South Africa
    Posts
    25
    Thanks
    0
    Thanked 8 Times in 6 Posts
    Rep Power
    128
    Reputation
    90

    Default

    I have personally done this extraction for real time stock data . I would read the HTML look for the the various fields and then save them into a Database. 3 things I struggled with:

    1. Frames, the website had multiple frames so I had to find the original URL for each frame and process each frame separately.
    2. Various changes to the HTML, because I did not own the website when ever the owner made slight changes to the HTML my code would fail. I had used various HTML parsers downloaded as well as written by myself. And they all struggled with some aspect of change to the HTML code.
    3. IP blocking, I found that I could not read to quickly or too often as the IP got banned. Not all sites do this but some do.

    What I was doing was not illegal it would be the same as me browsing each page and writing it down instead just my program did it for me. But i did stop it eventually as I found I started missing data so I then found it easier to by the stock data I was looking for. If the data comes in as an RSS it is super easy, if you using the original HTML good luck if you have a highly active website.

  • The Following User Says Thank You to wwalford For This Useful Post:

    best4less (12-12-13)

  • #6
    Senior Member
    Rick's Avatar
    Join Date
    Nov 2010
    Location
    Tassi
    Posts
    4,175
    Thanks
    4,174
    Thanked 3,474 Times in 1,534 Posts
    Rep Power
    1343
    Reputation
    52015

    Default

    Son of satan what are you up to now trash

  • #7
    Senior Member
    Chieflets's Avatar
    Join Date
    Jan 2008
    Posts
    1,408
    Thanks
    225
    Thanked 581 Times in 314 Posts
    Rep Power
    378
    Reputation
    6677

    Default

    can do this via vb.net.. I mean do some stuff and then signals via serial port or usb to the external device.


    Chieflets

  • Bookmarks

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •