WikiFeed: Wikipedia's Latest Headlines via RSS

17 October 2005 • article | PHP • PermaLink

I recently discovered Wikipedia’s Current Events page, and find it a nice source of international news headlines. Too bad they don’t have an RSS feed, I thought.

A bit of PHP and “screen scraping”, and I came up with wikifeed.rss.

Feel free to subscribe to it yourselves (don’t be abusive, or I’ll have to take it down). Or, better yet, install your own version with the source file, wikifeed.php.

<  php
define
('NL', "\n");

$root_url = 'http://en.wikipedia.org';
$source_url = $root_url . '/wiki/Current_events';

$source = file_get_contents($source_url);

$build_date = date('r');

header('Content-type: application/rss+xml');

echo <<< EOB
<  xml version="1.0"   ><rss version="2.0">
<channel><title>WikiFeed: Current Events</title>
<link>
{$source_url}</link>
<description>Current events, from Wikipedia, the free encyclopedia</description>
<language>en-us</language>
<lastBuildDate>
{$build_date}</lastBuildDate>
<copyright>http://www.gnu.org/copyleft/fdl.html</copyright>
<generator>Colin Viebrock</generator>
EOB;

for (
$i=0; $i<3; $i++) {

 
$now = mktime(0,0,0,date('m'),date('d')-$i,date('Y'));

 
$key = date('j_F_Y_.28l.29', $now);
 
$guid = date('Ymd', $now);

 
$pos = strpos($source, '/w/index.php  title=Current_events' );
 
$pos = strpos($source, $key, $pos);
 
$start = strpos($source,'<ul>', $pos);
 
$end = strpos($source,'</ul>', $start);

 
$data = trim(substr($source,$start+4,$end-$start));

  if (
preg_match_all('/<li>(.*  )<\/li>/', $data, $matches)) {

 
$j = count($matches[1]);
  foreach(
$matches[1] as $match) {

 
$clean = strip_tags($match);
 
$relinked = str_replace('href="/wiki', 'href="' . $root_url . '/wiki', $match);
  if (
strlen($clean)>50) {
 
$pos = strpos($clean, ' ', 50);
  } else {
 
$pos = strlen($clean);
  }

  echo
'<item>' . NL;
  echo
'<title>' . htmlentities(substr($clean,0,$pos)) . ' ...</title>' . NL;
  echo
'<description>' . htmlentities($relinked) . '</description>' . NL;
  echo
'<link>' . $root_url . '#' . $key . '</link>' . NL;
  echo
'<guid isPermaLink="false">' . $guid . '-' . $j . '</guid>' . NL;
  echo
'</item>' . NL;

 
$j--;

  }
  }

  echo
NL;

}

echo
NL . '</channel></rss>';
  >

Comments and improvements welcome.

Comments

  1. Just a note that my source-code plugin for Textpattern is making the indentation a bit wonky.
    Colin Viebrock
    17 October 2005, 16:08 • PermaLink