quick note before i start: as mentioned on our social media pages, we've decided to change the (official)
rate of these blog posts to one every two weeks. this is mainly because i feel that i'm not able to
accomplish enough work within a single week to warrant a blog post, in addition to the fact that we've
already been missing the weekly deadline for the past two weeks.
anyways, onto the actual post!
almost the instant after posting the previous blog post, i posted a follow-up on my personal social media
pages, stating that i had made a miscalculation in the design of the filesystem. unfortunately, this
miscalculation was a rather large one, and as such, i've had to spend the past two weeks reworking the
parts of the filesystem that were affected by it.
as you may have already seen, the miscalculation in question was regarding the design of direct and indirect
block pointers in the inodes. originally, i had decided on a design of either 12 direct blocks, or 9 direct
blocks and 3 indirect blocks. however, after a very short amount of running the numbers through a
calculator (which should've been done way earlier), i realised that this design would (assuming a relatively
average block size of 512 bytes) limit the maximum file size to less than a megabyte. this is obviously
undesirable, and as such, i decided to rework things to be a bit more future-proof.
the new design is a bit more complex, in that it essentially allows for an unlimited number of direct blocks
and indirect blocks. the way that this works is that each indirect block can specify (for each pointer)
whether the pointer is an indirect block or a direct block. this means that the maximum file size is now
limited by the maximum number of indirect datablocks, which theroetically could be up to around 2^64 blocks,
although reaching this limit would require an incredibly large amount of storage space.
one potential issue with this design is that it requires a lot of work to index the blocks of a file. say
you wanted to access the second block of a file, you would have to read the first indirect block, then
depending on the pointer type of the first pointer listed, either skip it (if its a datablock), or read
that datablock just to figure out if the second pointer is stored in that datablock or the next one. this
also would mean that you'd need to essentially do this forever until you reach the datablock you want,
which is a big performance hit. however, i've decided to add one extra feature to the pointer listing to
account for this.
each pointer is accompanied by a u8 value, which is either 0 or 1. depending on that value, the pointer
either points to a datablock or an indirect block. importantly, immediately after that u8 comes a 56-bit
integer (thus in total, flags + u56 is a u64), which contains absolute number of datablocks that this indirect block points to (undefined if its
not an indirect pointer). this means that, if you want to access the second block of a file, you can
simply read the first indirect block, and then skip the first pointer if you know that the second block is
not stored in that indirect block (i.e. if the u56 value is less than or equal to 1). this should hopefully
drastically improve performance, while keeping the design relatively simple. (either that or i'll regret my
confidence in however long until i can test it).
in an ideal implementation, blocks are layed out by attempting to create a tree of
indirect blocks that is as balanced as possible. this means that, if you have a file that is 5 blocks long
(assuming a block size that can contain only two blocks for the purpose of this example), the blocks will
be layed out like this:
please note my uncertainty when i say this, but i'm pretty sure that this is how the implementation
i've written works. in practice, it may actually add a useless indirect block in between the root and datablock 5,
but i'm not sure. i'll have to look over the code again (and probably rewrite it to be better).
overall, this is pretty much all the work i've done over the past two weeks. trying to get the idea of this
into my head was somewhat tricky at first, and as such that took me quite a bit of the implementation time.
notably, a big issue with this implementation is that files can no longer easily be non-entirely updated via
a journaled multiblock write. this is because of the complexity of this structure, and i may end up either
changing the way the multiblock write works or adding another type of write to account for this.
all changes to the source code should be public by the time this is up, as always you can view it at
our git.
that should be it for this week's blog post! i'm hoping that i'll be able to get back on track with the
development of the filesystem, and hopefully by next time i'll have gotten the B-Tree implementation for
directories working! (i'm not sure if i'll actually be able to get it working by then, but i'll try my best!)
donations to our libera pay are of course appreciated as always (:
besides that, i'll see you all in two weeks!