Line: 1 to 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Added: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > | Using multiple disks for DataDir and PubDir
MotivationA TWiki site may reach a point where a single disk drive cannot house all files. Having PubDir on a different disk from the others doesn't help a lot because on a large site, PubDir takes the most capacity by far. In that case, there is no option but use multiple disks for PubDir.It's possible to have a single multi-terabyte file system. But that doesn't mean it's practical to use a file system of such scale in your environemnt. You may need to use storage provided by a team you don't have control over. And the size limit might be 1T or 500G bytes. For reasons described later, it's not possible to utilize additional disks simply by symbolic links without enhancing TWiki code.
OverviewUsing multiple disks means that having multiple directories for $TWiki::cfg{DataDir} and $TWiki::cfg{PubDir}. And there need to be a mechanism to specify each web is housed in which.As you see below, utilizing multiple disks is not trivial. You need to cope with various implications. Because of that, even with this enhancement, it's better to seek ways to have a large single disk space. NOSQL databases may provide good alternatives.
Things to knowHow to enableSetting $TWiki::cfg{MultipleDisks} a true value enables this feature.
On which disk a web residesThis feature requires that metadata repository is in use. This is because a mechanism to specify which web resides on which disk is required. Since this feature is for a large site having thousands of webs, it's not practical to use a topic for the disk specification purposes.
Obviously, the webs metadata file needs to be populated properly.
Specifically, each web record needs to have the The disk whose ID is "" (a zero width string) is required. When you add a disk, its ID needs to be the current largest ID plus 1. You must not skip a number. You can have the following disk IDs on a TWiki:
Which disk corresponds to which pathThen, you need to have the data and pub directories specified for each disk. There are two ways to do it. One is by the sites metadata in MetadataRepository and the other is by $TWiki::cfg{DataDir}, $TWiki::cfg{DataDir}, $TWiki::cfg{DataDir1}, $TWiki::cfg{PubDir1}, ...
Sites metadataIf you have the sites metadata in the MetadataRepository, for each site, you are supposed to have the datadir, pubdir, datadir1, pubdir1, ... fields specifying the paths of the data and pub directories in each disk.If you have the sites metadata and $TWiki::cfg{MultipleDisks} is true, $TWiki::cfg{DataDir} and $TWiki::cfg{PubDir} are not used. But just in case there are bugs in the TWiki core regarding those and there are plug-ins referring to those, you should set those.
$TWiki::cfg{...}If you don't have the sites metadata in MetadataRepository, you are supposed to have $TWiki::cfg{DataDir}, $TWiki::cfg{DataDir}, $TWiki::cfg{DataDir1}, $TWiki::cfg{PubDir1}, ... set.
Non numeric disk IDsActually, non numeric disk IDs are allowed on purpose. You can have disk IDs such asa , b , and c for read-only webs.
TWiki::getDiskList() doesn't return non numeric disk IDs except the zero width string one ("").
Non numeric disk IDs may be handy for migration to a newer version of TWiki. The new version can show as read-only and include webs not yet migrated without bothering them.
TrashxNxEach disk need to have its own Trash, which is named TrashxNx, e.g. Trashx1x, Trashx2x, ... For topic, attachment, and web deletion, a proper TrashxNx needs to be selected automatically. %TRASHWEB% is no longer constant and expanded to the proper trash web name depending on the current web.Every time you add a new disk to TWiki, you need to register and create the trash web of the disk. You may think Trash1, Trash2, ... are straightforward and desirable rather than Trashx1x, Trashx2x, ... The reason for the naming convention is the Trash web might be aged - every week or every day the new Trash web is created after Trash is renamed to Trash1 after Trash1 is renamed to Trash2, ... after Trash10 is deleted. The aged Trash webs would clash their names with trashes for extra disks.
Attachment URLsThey must be stableUsing multiple disks and exposing them in URLs are two different matters. Topic URLs are not affected by the disk on which topics reside. But attachment URLs may if attachment retrieval is handled by the web server directly without TWiki involved.Attachment URLs must not be affected by disks they are housed in because if they are affected, an attachment's URL will change when the topic is moved to another disk. For example, an attachment's URL is /pub1/FooWeb/BarTopic/file.png), when FooWebweb moves to the disk 2, it will be /pub2/FooWeb/BarTopic/file.png. This is inconvenient. If %ATTACHURL% or %PUBURL% is used to refer to the attachment and they are expanded to different paths depending on which disk the topic desides in, it keeps working. Still it is a bad idea because:
How to achieve itTo achieve attachment URL stability, you need to do either of the following
Other implicationsSo far, things you need to know to use multiple disks are discussed. You don't have know the following thing(s) to set it up, but still they are good to know.
%WEBLIST{...}%With %WEBLIST{...}%, the "canmoveto" filter (introduced by ReadOnlyAndMirrorWebs) of the webs parameter eliminates webs further. The filter eliminates webs residing in a disk different from the current web.
Example![]() Metadata repository's webs file:
Non-federated siteIf it's on a stand-alone, non-federated site:$TWiki::cfg{DataDir} = '/disk0/data'; $TWiki::cfg{PubDir} = '/disk0/pub'; $TWiki::cfg{DataDir1} = '/disk1/data'; $TWiki::cfg{PubDir1} = '/disk1/pub'; $TWiki::cfg{DataDir2} = '/disk2/data'; $TWiki::cfg{PubDir2} = '/disk2/pub';
Federated sitesIf it's on federated sites, Metadata repository's sites file would be:
Why enhancement is required
Having additional disks and putting symbolic links under PubDir for some webs to off-load the primary disk doesn't work.
This is because when a topic is deleted, For consideration on symbolic link based implementation, please read TWiki:Codev/UsingMultipleDisks#Considerations_on_symbolic_link Related Topics: AdminDocumentationCategory, MetadataRepository, ReadOnlyAndMirrorWebs
|