[ Content contribution] Several situations in which hard disk usage does not match file size
poster avatar
Super Moderator
2023-12-01 00:19

Includes content related to Unix-like systems and Windows. Help you find "missing" hard drive space.


  1. Correspondence of clusters

The smallest unit of file storage on the hard disk is not a byte, but a space composed of a continuous series of bytes, called a cluster. Files use several clusters to store data. If the last part of the file is not enough to occupy a cluster, the remaining space in the cluster is wasted, and if you want to store another file, you must start from another cluster.

The size of the cluster is determined by the file system design and options when creating the file system (formatting). Choosing how large a cluster should be is also a complicated matter and may affect hard disk space utilization and performance. This article will not explain it in detail. If the selected cluster size is too large, storing files will occupy more disk space. This problem especially affects the storage of a large number of small files. Usually, what we call the size of a file is the size of the file itself, and the disk occupation of the file is the size of the cluster occupied by the file. The latter will be larger than the former. Some file managers will show both, but some may only show one or the other.

You can create a file and use the following command to see the difference. If the created file is small enough, the former displays the size of a cluster.

du -B 1 <文件>

du --apparent-size -B 1 <文件>

If you want to change the size of the cluster, you usually have to recreate the file system (format).

  1. Hidden files

Everyone is familiar with this. In the design of Unix, files starting with dot . are hidden files. This is because we usually use a dot. to represent the current directory itself, and two dots.. to represent the upper-level directory. In some applications, if you want to hide these two directories with special meanings, you use a simple method of judging whether the first character is a dot., and this also hides other directories starting with a dot.. Later, we made mistakes and kept this tradition.

Usually we can switch whether hidden files are visible by pressing Ctrl+H in the file manager. Common graphical applications also create hidden files or directories to store configuration or temporary files. However, the more recommended approach now is to place it in the .local or .config directory (otherwise it will look too messy).

In the design of Windows, hidden files are not related to file names. Any file may become a hidden file. This is actually a feature of the Windows file system ntfs. Therefore, Windows reading the file system of a Unix-like system or vice versa will expose hidden files.

In GNU/Linux systems, some applications store cached data in hidden directories. Therefore regular cleaning will save some hard drive space. By the way, in Windows, application data is usually stored in the folder specified by the %AppData% variable (but some unprivileged applications will also use this path as the installation directory).

If you need to delete these hidden files, you need to carefully identify their contents. Deleting some files may cause the application to lose configuration. Although these files can be regenerated, the configuration is also an important asset for many users.

GNU/Linux has many very intuitive hard disk usage analysis tools, such as GNOME's baobab, etc., which can help you see which directory takes up the most space.

  1. Virtual memory

This doubt is more common on Windows. The default state is to automatically allocate some space on the system disk as virtual memory, and users can also modify the settings to let how much hard disk space on a certain partition be used as virtual memory. This will of course take up some hard drive space (even if you delete all the files, you will still see some disk usage).

In addition to virtual memory, if you turn on the hibernation function of Windows, it will also occupy some hard disk space (not exceeding the size of the memory). You can use the administrator to run cmd or powershell and enter the following command to turn hibernation on and off (you can also use the built-in disk cleanup and select "Clean System Files". However, after it is turned off, it can only be turned on with the command):

powercfg -H ON

powercfg -H OFF

In Unix-like systems, we call similar technology swap (swap space), which can be opened using a swap partition or swap file. Swap also stores hibernation data, so there is no need to set up a separate hibernation file. Since swap is usually planned by the user themselves, and it is more common to use swap partitions, the operation of adjusting the hard disk pressure by changing the virtual memory usually only works on Windows.

You can adjust the mounting scheme of Unix-like systems by changing /etc/fstab (of course you can also modify the swap settings). Some graphical partition editors can also assist users in completing these configurations, such as GNOME disks.

  1. Lost files

Sometimes some file system or hardware errors can damage the files in the file system. One situation is that the index of the file is lost, but the data of the file is still retained. In this way, no matter what kind of file management is used, these files can no longer be seen, and the hard disk space is indeed still occupied. In addition to re-creating the file system (formatting), these files can be recovered using a file system check tool.

On Unix-like systems, we use the fsck command to check and repair file system problems. If the lost files are restored, they will be stored in a directory named lost+found. This directory is not a directory in our usual sense, but more like a function of the file system. If it is deleted, it needs to be rebuilt using the special command mklost+found.

For the file systems ntfs and exfat developed by Microsoft, the tools on GNU/Linux may not be enough to handle all situations. If a Windows system is installed, you can use the command chkdsk on Windows to repair it. The restored files will be placed in the FOUND.000 directory. However, in some versions of Windows this directory is still not visible in File Explorer (even if hidden files are displayed). Some commands can be used to display and manipulate these files. For example, this command lists files:

dir /A H 目录

Or in the built-in disk cleanup, you can "delete old chkdsk files".

It should be noted that file system check is a relatively dangerous command. Try to back up important data in advance and prepare enough time and power to operate (some hard drives may need to run for several hours), and some immature repair work may cause the system to fail. The situation worsened. You should read the instructions carefully and proceed with caution before executing these commands.

Power outage, abnormal shutdown, and disconnecting the storage device without waiting for safe ejection (current systems basically use write cache, so after clicking on safe ejection of the U disk, you still need to wait until the system officially prompts that it can be removed before unplugging it) This may cause file loss. However, due to the different designs of various file systems, the ability to recover files also varies. Some file systems can automatically check for problems, but others require manual commands to recover hard drive space.

  1. Conversion relationship

Although this is common sense, I still get confused a lot, and the loss of hard drive space is actually a false alarm. There are two digital units we use, one is based on binary 1KB=(2^10)B=1024B, and the other is based on decimal 1KB=(10^3)B=1000B. Although now we rename the binary-based units of the former to KiB, MiB, and GiB in order to distinguish them, it is still common to use them interchangeably. When we use various programs to calculate hard disk space, we must first pay attention to whether the unit used is 1024 or 1000, especially since 1TiB is nearly 100GB more than 1TB. Such an "error" cannot be ignored.

Reply Favorite View the author
All Replies
2024-02-27 18:51

Thank you for sharing this informative article about the correspondence of clusters in file storage. It is interesting to know that the smallest unit of file storage on a hard disk is not a byte but a cluster of soccer skills, consisting of a continuous sequence of bytes.

Reply View the author