At long last and by popular request (not by Wiki regulars of course – they want the raw Wiki power live, not a zipped version of it), the Offline Wiki has been set up. Thanks to the BeyondUnreal staff, especially MalHavoc and QAPete, for supporting it.
Post here if you'd like to discuss this feature, or if you have any problems using it. If you find anything that looks like a bug to you, we'd be happy to hear about it as well.
Discussion
Mychaeel: Phew. Took long enough, but now it's there. Will make a press release later today.
Jan: Cool. ;) Do you need a offline viewer?
Tarquin: Nice screenie. pity it's rendering with IE :( - the blue sidebar should go all the way down in a real browser :p
Jan: This tool is only a side project, to view all my web documents offline. What I miss is a folder structure inside the wiki offline version.
Mychaeel: Nice. :-) I've flattened the folder structure to simplify the relative referencing of other pages, images, smileys and so on; the page names have to be rewritten anyway to accommodate various platforms' limitations (no more than 32 characters on the Mac, case-insensitivity on Windows). However, I could rather simply have the script create a .txt file containing a mapping table between actual page names and the generated file names – an offline reader could read that file and display the actual structure.
El Muerte TDS: to bad there's no MS HTML Help compiler for linux, or else you could easily create HTML Help files. HTML Help files are very nice since they provide some very usefull features like full text search. Project and glossary files are easy to create.
Mychaeel: Sunir got me thinking (over at Project Copyright/Discussion). Maybe we should make it very clear on Offline Wiki that the download is for private use only and not to be distributed commercially. (That's implied in Project Copyright already, to be sure, but we know people think if they don't read copyright statements they don't apply to them.)
Mychaeel: For some reason several files that only differ in the capitalization of their file name are in the "shared" directory (for instance, InterWiki-Wikipedia.png and InterWiki-Wikipedia.PNG). When extracting the archive under Windows that leads to unnecessary and confusing "Overwrite this file?" questions. Please delete one of the files, whichever is unnecessary.
ZxAnPhOrIaN: Agreed. We should adopt a standard on only lower-case file types (a.jpg, not a.JPG). it would solve that problem.
Mychaeel: Yeah... well. I take it it was an accident. Hope tarquin sees it. That's why I put that log message in that you overwrote by posting your comment.
Tarquin: Seen. It's because Photoshop puts CAPS extensions, and I forget to change them before uploading. The perl script cares about case, but the eventual URL does not. ARG! But will fix.
Mychaeel: ...the server doesn't? http://wiki.beyondunreal.com/wiki-ext/wikilogo.jpg?link displays the Wiki logo, but http://wiki.beyondunreal.com/wiki-ext/WiKiLoGo.JpG?link yields a 404.
ZxAnPhOrIaN: I thought that you can't have mixed-case in file extensions.
Mychaeel: File extensions are just part of a file name separated from it by a dot, by convention. You can have any case you like in there like everywhere else.
Chema: Hi people. Neat site you are running here. Was not until I found it that I felt like "mastering unreal". The offline wiki is another cute detail, especially for 56krs like me, or netecofreaks; like me ;-) Could I suggest using a solid compression algo, like RAR or Bzip2? Documents with repetitive text like headers and sidebars are the ideal meat for them. I just tryed it: todays wiki (29,444,530 bytes) shrinks to 42% (12,706,692 bytes) when zipped; but it gets down to 26% (7,937,234 bytes) using solid RAR! That's almost 5 MB less. Even if you made the archive self extracting, it would be a lot lighter. Keep up the good work, and count me in, even if I spend most of my wiki time on the offline version!
Mychaeel: We're limited by the compression software that's available on the server. We have zip and bzip2 there, but no RAR compression utility (or I'm just bad at guessing names). If you could point me to one that runs under Linux, I'll ask the BeyondUnreal admins to install it.
El Muerte TDS: unrar is free, rar isn't.
El Muerte TDS: btw here are some stats:
34226176 in total 12728922 test.zip 8134137 test.tar.bz2 8966783 test.tar.gz
bzip2 generates a high system load, gzip doesn't. So I think using tar gzip is the best solution.
Mychaeel: Sounds like a good idea. Is .tgz widely enough supported by Windows archivers to allow us to completely scratch the .zip version in favor of a .tgz one?
Mysterial: WinZip and WinRAR support it and they're the two most widely used archivers.
Chema: Usually bandwith is more precious ('myyyy preshiouuus' ;-) ) than some extra cpu cycles, but bzip2 is indeeded not widely supported in Windows: just by WinRAR and, well, bzip2.exe, to my knowledge (WinZip barely provides tgz support – just decompression, couse zip provides "similar functionality". haha).
Well, you could provide both formats, but that means even more cycles. But even better would be to have weekly snapshots, that would weight a few hundred KBs in tgz. A simple "find . -ctime 7" would do the trick; well, if your usual script is that simple too.
But I see there is still the zip pack. If I can help writing the stuff, just tell me so (with a brief, err, debriefing: sh or perl? target for cron?).
El Muerte TDS: I don't know if weekly snapshots are such a good idea, because it would require you to download the snapshot everyweek, you can't mis a single week or you will have to download the full snapshot again. And the offline wiki is only intresting for users without broadband.
Mychaeel: Adding support for RAR is a matter of (1) having a RAR compressor for Linux and (2) adding a single line of code to the current Offline Wiki (Perl) script. My concern is more for BeyondUnreal's bandwidth than the users' in this case; right now a single 13 MB-file is uploaded to all mirrors once a day, and each compression format we'd provide in addition to that would add to that load (multiplied by the number of mirrors, naturally).
Chema: Err, you got me wrong: I was talking about bzip2 format, not RAR. Anyway, yes, I think that having 2 formats its not elegant (not 'perlish' ;-) ), thats why I think the weekly snapshot is much better.
In response to El Muerte, you don't have to get the snapshot every week: I think they should be really tiny (100-500KB at most, in tgz? well, not counting when I upload my vactaion pics!), and several of them could be stored on the Offline Wiki page. So you just need to check your last update, and get the newer snapshots.
El Muerte TDS: what about this: "snapshot on request" you can select the timeframe of the snapshot and then a archive will be created on the fly and feeded realtime to your browser ;)
Mychaeel: Hmm... maybe that's not even so far off. Executing the Offline Wiki script takes several minutes to complete, but that's mostly due to several thousand pages that have to be read and formatted. Actually packing a subset of pages into a downloadable archive file takes much less time. – However, the problem remains that file downloads shouldn't be served directly from a BeyondUnreal-hosted site.
GRAF1K: Gehn, the offline wiki file size does change frequently due to constant edits. Thanks for the update.
Mychaeel: Offline Wiki is up and running again after it went down April 29, 2004, and nobody bothered telling me/us... according to hal from BeyondUnreal, the Perl script building the Offline Wiki has been throwing warning messages for quite a while, but they chose to ignore them rather than to tell us about them. :-/
Tarquin: I had no idea...
El Muerte: I've put up an readonly mirror of the UnrealWiki here, it uses this offline wiki for it's content. Quite some things don't work (like search), and the page naming isn't 1on1, but at least we'll have a fallback when BU goes offline again.
El Muerte: it's broken, the current archive is from november 5
Techno_JF: Just so it's clear, the offline wiki hasn't been updated in a while. I don't know if that's by accident or by design, but I think people should be aware of it either way. I just downloaded it, and the date on the pages said "Sat, November 5, 2005". For reference, today's date is February 21, 2006.
HC Denton: Is someone going to fix this? The offline wiki is a nice thing because it ensures you have access to all docs even if the wiki goes down temporarily..
FEID: i noticed the same problem the offline wiki isn't working and
can't get recent Added Pages!!!
fyfe: Almost a year now and the offline wiki is still offline. Is it dead for good, or is it just a glitch the higher ups haven't noticed :p
Tarquin: It's down to BU admins. You can contact them yourself :)
feid: so you can do nothing at all, i mean whats the point for this page if nothing is updated, it pretty much useless when new information get published, and what about when our precious wiki is gone, i want somthing to remener it by, in the future.
Tarquin: Basically, no, I can't do anything. BU's server must be doing something wrong.
fyfe: Sent an e-mail to qapete, but I never got a response.
fyfe: Got a e-mail from qapete, it's now fixed :)
fyfe: But it doesn't look like it's hit the mirrors yet :(
fyfe: The mirrors have caught up now :D
Meindratheal: When will this next get updated? I (and a lot of others, I guess) could do with a more up-to-date version so I can read up on everything at home :)
Blocking offline readers
fyfe: Not sure if you've got anything like this implemented, but if you wanted to block the casual user from strip mining the wiki with an offline reader you can add the following to .htaccess
RewriteEngine on # Send offline readers to offlinewiki.html RewriteCond %{HTTP_USER_AGENT} (Wget|CherryPickerSE|CherryPickerElite|ExtractorPro|WebStripper|WebCopier|WebZIP) RewriteRule ^.*$ offlinewiki.html [L]
then create a file called /offlinewiki.html (don't put any links in it)
<html> <head> <title>Never use an offline reader to download the entire Unreal Wiki.</title> <meta name="robots" content="noindex,nofollow"> </head> <body style="color:#000000;background-color:#FFFFFF;" bgcolor="#FFFFFF" color="#000000"> <p><strong>Never use an offline reader to download the entire Unreal Wiki. If you do, we will ban you instantly and permanently.</strong> Offline readers put an insane stress on the web server because they follow each and every of the processing-intensive maintenance links on the Wiki that are used seldomly by real people. Don't say you haven't been warned.</p> <p>We're providing the entire Wiki content as a convenient single download on Offline Wiki [<span style="color:#FF0000;">http://wiki.beyondunreal.com/wiki/Offline_Wiki</span>] though. It's smaller, zipped and downloads faster than what any offline reader could give you. Of course you can also save individual pages using your web browser without problems.</p> </body> </html>
The same could be done for bad robots that ignore /robots.txt just change the RewriteRule to
# Send a "403 Forbidden" response RewriteRule ^.*$ - [F]
Download Links
Wormbo: Appearantly the FileFront mirror linked on BU FileWorks points to the wrong file. Q: How long until the download comes out?