An open source object storage solution
Object Lesson
The world of data storage has evolved over the years, and as the amount of data we generate and manage continues to grow, we need more efficient and scalable storage solutions. One such solution is object storage, which has become increasingly popular because of its flexibility, cost-effectiveness, and scalability.
Object storage is a data storage architecture that stores data as objects, rather than as files in a hierarchical filesystem or blocks in a block storage system. Each object includes the data, metadata, and a unique identifier, allowing for easier access and management of large amounts of unstructured data.
In this article I introduce you to MinIO, a popular object storage solution. MinIO's source code is available under the GNU Affero General Public License v3.0, which means you can customize, extend, and contribute to the project.
Overview of MinIO
MinIO is a high-performance, open source, object storage system compatible with Amazon Simple Storage Service (S3) and designed for unstructured data (see the "Key Features" box). Developed with a focus on simplicity, scalability, and performance, MinIO is designed for use in private, public, and hybrid cloud environments. MinIO supports a wide range of use cases, including big data analytics, machine learning (ML), backup, and archiving. In particular, MinIO's high throughput and low-latency performance make it well-suited for artificial intelligence (AI) and ML workloads. Data scientists around the world use MinIO to store large volumes of training data and serve it to ML models for training and inference.
Key Features
One of the most prominent MinIO features is S3 application programming interface (API) compatibility, which makes it easy to integrate with existing applications and services built for Amazon S3. Since S3 was released in 2006 it has been heavily marketed and has been used by thousands of projects as one of the standard back-end options available. Whenever you use a popular solution that supports various storage back ends, chances are S3 is among them.
Similar to S3, MinIO is designed for high-speed, low-latency access to data, making it suitable for use in demanding applications and environments. Mind you, in this case, the overall performance depends on you – especially on the hardware components you choose. Amazon S3 is famous for its high availability (99.99%, or "four nines," for S3 Standard storage class), but this number is achieved by Amazon having multiple nodes in its data centers around the globe. Although Amazon doesn't publish official numbers, some experts estimate that at least four copies of each object are stored in S3. If you need similar data durability and availability, you need to invest in hardware.
Another important feature is scalability: MinIO can scale seamlessly from a single node to a multinode distributed setup, allowing it to grow with your data storage needs. In terms of data protection, MinIO uses erasure coding and bit rot protection to ensure data durability and integrity. Additionally, it supports server-side and client-side encryption for data security.
Because MinIO is designed with a small footprint, it is suitable for deployment in containerized environments such as Docker and Kubernetes. It is also cost effective: By using commodity hardware and erasure coding for data protection, object storage can reduce the overall cost of ownership compared with traditional storage solutions. As with all object storage solutions, the flat address space and unique object identifiers make it easy to manage and access data, regardless of size.
Although the major public cloud platforms will tell you that going multicloud is an anti-pattern and will implement features to discourage you from using the services of the competition, in reality most large companies use more than one cloud. For example, they might use AWS for their core business, Azure for some of its features such as directory management, and Google Cloud Platform (GCP) for Kubernetes. MinIO's compatibility with the S3 API and support for multitenant environments make it an attractive option for hybrid cloud and multicloud storage deployments. Several companies use MinIO to create a storage layer across on-premises data centers, private clouds, and public cloud services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.
MinIO vs. Amazon S3
When you hear about MinIO as an object storage solution with full S3 API compatibility, your first question is probably, "How does it compare with S3?" Both are popular options for storing and managing vast amounts of unstructured data, but each comes with its own set of features and trade-offs.
Starting with similarities, the obvious one is the S3 API: Both MinIO and Amazon S3 use the same API for interacting with the storage service, making it easy to switch between them or integrate applications and services built for Amazon S3 with MinIO. They also employ the object storage model, which stores data as objects with unique identifiers and metadata in a flat address space. This structure enables seamless management and access to large amounts of unstructured data with excellent scalability: Both solutions offer virtually unlimited storage capacity, allowing you to scale horizontally across multiple nodes or clusters as your data storage requirements grow. Also, both MinIO and Amazon S3 provide data protection mechanisms, such as erasure coding and replication, to ensure the durability and availability of your data, even in the case of hardware failures.
As for differences, the first is related to deployment options. Amazon S3 is a managed cloud storage service provided by Amazon Web Services, whereas MinIO is an open source solution that can be deployed on-premises, in public clouds, or in hybrid cloud environments. This flexibility allows you to choose the deployment model that best meets your organization's requirements in terms of data sovereignty, compliance, and latency. In fact, if you wanted to, you could even deploy MinIO on AWS – although such a setup wouldn't make much sense in most cases.
Another practical difference lies in cost: Amazon S3 follows a pay-as-you-go pricing model that is based on data storage, data transfer, and the number of requests. MinIO, on the other hand, is free to use and deploy, with costs associated only with the underlying infrastructure and optional commercial support subscriptions. MinIO, then, can be a more cost-effective option, particularly for organizations with large data storage needs or variable workloads, and is probably the main reason why MinIO is so popular in large organizations: When you work with petabytes of data, the cost of S3 becomes overwhelming, and at some point it makes sense to deploy a more cost-efficient solution. Because MinIO is an S3-compatible and battle-proven solution, it becomes the main choice in such scenarios.
Things become interesting when it comes to performance: MinIO is designed for high-speed, low-latency access to data and can deliver better performance than Amazon S3, especially in on-premises or private cloud deployments where network latency can be minimized. In other words, although you can't control the performance of S3, you can do a lot to optimize MinIO and make sure it works extremely fast if you eliminate bottlenecks.
As an open source solution, MinIO offers more customization and control over the storage system, allowing you to fine-tune configurations and even contribute to the project's development. Amazon S3, being a managed service, offers less control and customization, but also requires less management overhead.
As for security, both MinIO and Amazon S3 provide robust security features, including encryption, access control, and audit logging. However, with MinIO, you have complete control over your data and security configurations, which can be an advantage for organizations with strict compliance and data privacy requirements.
Installation
Before you begin the installation process, ensure that your system meets the requirements listed in Table 1.
Table 1
Before You Install
Factor | Requirement |
---|---|
Operating system | MinIO is compatible with Linux, macOS, and Windows operating systems. However, for production deployments, a Linux-based OS is recommended. |
Hardware | MinIO requires a minimum 1GB of RAM and enough disk space for your data storage needs. For production environments, it's recommended to use multiple drives or nodes for data protection and performance. The faster the drives, the better the performance of the whole cluster. |
Software | MinIO requires a modern web browser for its web-based management console, the MinIO Browser. |
In general, you have two ways to install and use MinIO: as a standalone server or as a multinode cluster. The first option is perfect for testing and initial evaluation; the second is used for production. Installing MinIO as a standalone server is a simple process, and the steps differ slightly depending on your operating system. In Linux, you start by downloading the MinIO server binary from the official MinIO website [1], make the downloaded binary executable, move the binary to the /usr/local/bin
directory, and start the MinIO server with the default credentials and the console on port 9090:
wget https://dl.min.io/server/minio/release/linux-amd64/minio chmod +x minio sudo mv minio /usr/local/bin mkdir /data minio server /data --console-address :9090
After a few moments you should see output similar to that in Listing 1. At this point you should be able to connect to the console running on port 9090 (Figure 1) with the default credentials (minioadmin as the login and password). I show you how to change these insecure details later.
Listing 1
server Command Output
API: http://192.0.2.10:9000 http://127.0.0.1:9000 RootUser: minioadmin RootPass: minioadmin Console: http://192.0.2.10:9090 http://127.0.0.1:9090 RootUser: minioadmin RootPass: minioadmin Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html $ mc alias set myminio http://192.0.2.10:9000 minioadmin minioadmin Documentation: https://min.io/docs/minio/linux/index.html WARNING: Detected default credentials 'minioadmin:minioadmin', we recommend that you change these values with 'MINIO_ROOT_USER' and 'MINIO_ROOT_PASSWORD' environment variables.
Although you could stop here, it is convenient also to install the auxiliary MinIO Client (mc
). Users of the venerable Midnight Commander will note a name clash, so you will need to rename one program or the other.
Start by downloading the client, making it executable, and moving it to somewhere in your $PATH
:
wget https://dl.min.io/client/mc/release/linux-amd64/mc chmod +x mc sudo mv mc /usr/local/bin/mc
One of the most useful client commands to run just after the installation sets the so-called alias for your local deployment that will make it easier for you to refer to the deployment later. For example, the command
mc alias set mycluster1 http://127.0.0.1:9000 minio minio
names the local deployment mycluster1 and sets the access key and secret key to minio .
Buy this article as PDF
(incl. VAT)