Quantcast
Channel: Extract plain-text xhtml data in php (regex)? - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Answer by Poiz for Extract plain-text xhtml data in php (regex)?

$
0
0

Perhaps you may want to do this in a twisted & convoluted fashion (ie. within a function that uses Regex and a Looping Construct to fetch the data you need). Consider this Function below. It is worth noting that though it may appear twisted & convoluted, you are not limited to just getting the Date value. This means you also have access to all the key-value Pairs in that file: in case you need to at some point...

<?php        $file   = __DIR__ . "/file.txt";   //<== THE NAME OF THE FILE CONTAINING YOUR DATA        /*************** BEGIN FUNCTIONS ***************/        function parseFile($file){            $arrFileContent    = [];            // IF THE FILE DOES NOT EXIST RETURN NULL             if(!file_exists($file)){                return null;            }            // GET THE DATA FROM THE FILE & STORE IT IN A VARIABLE            $strFileDataContent = file_get_contents($file);            // IF THE FILE CONTAINS NOTHING RETURN NULL AS WELL              if(empty($strFileDataContent)){                return null;            }            // SPLIT THE CONTENTS OF THE FILE (STRING) AT THE END OF EACH LINE            // THUS CREATING AN ARRAY OF LINES OF TEXT-DATA            $arrFileDataLines   = explode("\n", $strFileDataContent);            // LOOP THROUGH THE ARRAY PRODUCED ABOVE & PERFORM SOME PATTERN MATCHING            // AND TEXT EXTRACTION WITHIN THE LOOP            foreach($arrFileDataLines as $iKey=>$lineData){                $arrSubLines   = explode("\n", $lineData);                foreach($arrSubLines as $intKey=>$strKeyInfo){                    $rxClass    = "#(^@class:)(\s*)(.*$)#i";                    $rxSpan     = "#(^span:)(\s*)?(.+$)#si";                    preg_match($rxClass, $strKeyInfo, $matches);                    preg_match($rxSpan,  $strKeyInfo, $matches2);                    if($matches) {                        list(, $key, $null, $val) = $matches;                        $keyA   = str_replace("rtm_", "", $val);                        if (!array_key_exists($keyA, $arrFileContent)) {                            $arrFileContent[$keyA] = $val;                        }                    }                    if($matches2) {                        list(, $key2, $null, $val2) = $matches2;                        $keyB   = $keyA ."Data";                        if (!array_key_exists($keyB, $arrFileContent)) {                            $arrFileContent[$keyB] = parseSpanValues($val2, str_replace("rtm_", "", $keyA));                        }                    }                }            }            return $arrFileContent;        }        function parseSpanValues($spanData, $prefix){            $arrSpanData    = explode(", ",  preg_replace("#[\{\}\[\]\"\'\#\@]#", "", $spanData));            $objSpanData    = new stdClass();            $cleanVal       = "";            if($prefix == "tags"){                $cnt = 0;                foreach($arrSpanData as $i=>$val){                    if(!stristr($val, ":")){                        $cleanVal  .= ", " . $val ;                        $cnt++;                    }                }                $arrSpanData[2] = $arrSpanData[2] . $cleanVal;                array_splice($arrSpanData, 3, $cnt);            }            foreach($arrSpanData as $iKey=>&$spanVal){                $arrSplit   = preg_split("#\:\s#", $cleanVal . $spanVal);                $key        = "text";                if($iKey == 0){                    $key    = "{$prefix}Text";                }else if($iKey == 1){                    $key    = "{$prefix}TextClass";                }else if($iKey == 2){                    $key    = "{$prefix}Value";                }else if($iKey == 3){                    $key    = "{$prefix}ValueClass";                }                if(isset($arrSplit[1])){                    $objSpanData->$key  = $arrSplit[1];                }            }            return $objSpanData;        }        /*************** END OF FUNCTIONS ***************/        var_dump(parseFile($file));        // PRODUCES SOMETHING LIKE:         array (size=10)'due' => string 'rtm_due' (length=7)'dueData' =>             object(stdClass)[1]              public 'dueText' => string 'Due' (length=3)              public 'dueTextClass' => string 'rtm_due_title' (length=13)              public 'dueValue' => string 'Sat 16 Jul 16' (length=13)              public 'dueValueClass' => string 'rtm_due_value' (length=13)'priority' => string 'rtm_priority' (length=12)'priorityData' =>             object(stdClass)[2]              public 'priorityText' => string 'Priority' (length=8)              public 'priorityTextClass' => string 'rtm_priority_title' (length=18)              public 'priorityValue' => string '1' (length=1)              public 'priorityValueClass' => string 'rtm_priority_value' (length=18)'tags' => string 'rtm_tags' (length=8)'tagsData' =>             object(stdClass)[3]              public 'tagsText' => string 'Tags' (length=4)              public 'tagsTextClass' => string 'rtm_tags_title' (length=14)              public 'tagsValue' => string 'gcal-work, github, stack-overflow' (length=33)              public 'text' => string 'rtm_tags_value' (length=14)'location' => string 'rtm_location' (length=12)'locationData' =>             object(stdClass)[4]              public 'locationText' => string 'Location' (length=8)              public 'locationTextClass' => string 'rtm_location_title' (length=18)              public 'locationValue' => string 'none' (length=4)              public 'locationValueClass' => string 'rtm_location_value' (length=18)'list' => string 'rtm_list' (length=8)'listData' =>             object(stdClass)[5]              public 'listText' => string 'List' (length=4)              public 'listTextClass' => string 'rtm_list_title' (length=14)              public 'listValue' => string 'Work' (length=4)              public 'listValueClass' => string 'rtm_list_value' (length=14)

So as it is right now, if you wanted to get the date for the first instance in the Array [Element dueData], you can simply do something like this:

<? php        $data          = parseFile($file);          $dateDateValue = $data['dueData']->dueValue;                var_dump($dateDateValue);  // PRODUCES:: 'Sat 16 Jul 16'

Hope this attempts (at all) to give you a vague idea on how to improvise on your own.

Cheers & Good Luck!!!


Viewing all articles
Browse latest Browse all 3

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>