What would happen if the iPhone 17 just launched, and Apple decided to auction off the first one?

In the final 30 seconds, 100,000 people simultaneously spam the "Place Bid" button with prices jumping from $1,000 to $10,000. How would the system handle it when the database has to update the current price thousands of times per second? Who would win when 1000 people bid $9,999 in the same millisecond?

This is exactly the situation that auction platforms like eBay face on a daily basis. And this is also the problem I want to discuss in this article. We'll go from the most basic designs of an auction system to scaling the system and handling this hot iPhone scenario.

Introduction to the Problem

Requirements

Buyers can bid on products, and after the auction ends, the winner can purchase the product.
Handle multiple people bidding on the same product simultaneously.
When popular items are being auctioned, the system should respond within an acceptable time frame.

Overview of Auction Flow

A basic auction flow will include several steps:

After the seller posts a product and sets the auction start and end times, anyone can join the auction for that product
Each time a buyer places a bid, the system will check the current price, and if the bid is higher, it will replace the old bid.
This process continues until the auction ends.
At this point, the system determines the winner and sends them a notification to make payment.

From here, we have a simple design as follows:

Initial Design

API: Of course, to handle requests from users like creating products, browsing, and placing bids
- Additionally, the API will take on the task of updating the current price for users and notifying when someone else bids higher than them. We can use websocket or simpler server-sent events to implement this feature.
Database: Store information about products and orders.
- So which type of database should we choose? Our system must handle concurrent bidding, so ACID properties are essential for maintaining data consistency => Relational databases (like PostgreSQL, MySQL, etc.) would be suitable.
- We can start with simple modeling using 2 tables: one table to store products and one table to store bidding history. Although the problem requirements don't need to store history, to ensure transparency and prevent fraud, this data is essential in a bidding platform.
A cronjob to handle auction endings. Since we've stored the auction end time, the cronjob just needs to query using this field to satisfy the requirement.
- SELECT … FROM AuctionProduct WHERE endBid < NOW() AND status = "in-auction"
- However, when the system scales, the number of auction products increases more and more, this approach is not yet optimized.

While this basic design works for small-scale auctions, it faces significant challenges when we scale up to handle thousands of concurrent users.

How to handle multiple people bidding on the same product simultaneously?

Each time a user places a bid, the system will check step by step:

Is the bidding time valid?
Is the bid price higher than the current price?

When multiple people bid simultaneously, race conditions will inevitably occur. How can we solve this? I think we can start with using DB transaction combined with lock. The idea is that for one product, at one time, only one session can update currentPrice and add to history.

BEGIN;

-- 1. Lock the auction product row
SELECT currentPrice, startBid, endBid
    FROM AuctionProduct
WHERE id = :auctionProductId
    FOR UPDATE;

IF (NOW() < startBid OR NOW() > endBid) THEN
    ROLLBACK;
END IF;

IF (:bidPrice <= currentPrice) THEN
    ROLLBACK;
END IF;

INSERT INTO Bids SET userId=...,price=...,time=...

UPDATE AuctionProduct
    SET currentPrice = :bidPrice
WHERE id = :auctionProductId;

COMMIT;

When multiple people bid simultaneously, this design approach performs poorly:

If they're bidding on the same product, the locking mechanism will cause transactions to wait for extended periods (due to having to process 2 DML commands in the transaction as well), risking hanging the entire application (due to Pool Exhaustion).
Even if users aren't bidding on the same product, having too many transactions with 2 DML commands at one time will also exhaust database resources.

Next, let's handle the remaining issues in this design. There are several difficult problems to solve.

Handling Scaling Issues

Let's examine the main bottlenecks in our initial design and explore solutions.

1. Optimizing Auction End Processing Flow

Instead of having to query continuously, is there a more optimized approach?

Every time we query, we're checking all current auctions, so instead of checking everything, is there a way for auctions that have ended to automatically send back to the system? Like using a message queue, but messages will be sent back to the system after a certain time period?

Fortunately, such queues do exist. You can refer to RabbitMQ Delayed Message Plugin as a typical example. We can set up messages to appear in the queue only after a certain time period, and then the system processes them.

What if the seller changes the auction end time? We can delete the current message and create a new one.

This approach not only separates the two logic flows but also helps us easily change when there are new requirements, such as some platforms allowing auctions without end time, only closing after the last person bids for about 1-2 hours. Using Delay message queue is an optimized approach in this case.

2. Optimizing When Multiple Users Bid Simultaneously

The simple immediate approach is that we have more servers processing user requests, add cache to the system, and add a separate Push Service to handle real-time updates for the UI.

Cache will help optimize the system in many aspects:

Cache product information => browsing products is also faster
Cache current product price: at this point, if users bid lower than the current price, we'll reject that request before executing a transaction to the database.

However, this solution doesn't yet handle the case where the number of users bidding suddenly increases (popular items).

The advantage of the current approach is that users will receive feedback after they place a bid. So what if we consider another direction: asynchronous processing?

After processing at the API layer (comparing with current price in cache, eliminating if lower), instead of processing the user's request immediately, we push that command into a queue and process it later. Next, the system will immediately respond to the user that their command has been received.

We'll add a bunch of workers to process these commands sequentially. At this point, the system will avoid congestion from having to receive a sudden high volume of bids and will process them gradually afterward. With proper setup, I believe the system can completely handle 5-10 bids/second (400k-800k per day).

So what if on a given day, there's a product that's too popular, leading to 100k bids?

At this point, our system still processes slowly, 10 times/second. That means after the session ends, it takes 3 hours for us to get the winner result. Everyone participating in that auction session has to wait for server feedback, bidders are eager to purchase but don't know if their bid price is good enough.

3. Optimizing When There Are Popular Items

When processing asynchronously, we already have a very large set of commands stored in the queue, so is there a way to use batching to optimize the write flow to DB? Instead of one transaction only writing 1 bid, we'll write 1000 bids?

BEGIN;

-- Start: Product 1
SELECT currentPrice, startBid, endBid
    FROM AuctionProduct
WHERE id = :auctionProductId
    FOR UPDATE;
IF (NOW() < startBid OR NOW() > endBid) THEN
    ROLLBACK;
END IF;
IF (:bidPrice <= currentPrice) THEN
    ROLLBACK;
END IF;
INSERT INTO Bids SET userId=...,price=...,time=...
UPDATE AuctionProduct
    SET currentPrice = :bidPrice
WHERE id = :auctionProductId;
-- End: Product 1

-- Start: Product 2
...
-- End: Product 2

-- Start: Product 3
...
-- End: Product 3

COMMIT;

The weakness of this approach is that if any product fails, the entire transaction fails.

Note that when there's a popular item, most events in the queue will contain bidding commands for it => we'll focus on optimizing events for one product.

So what if we check the timing of events, calculate the highest price at the API, then simultaneously write them to the database?

This approach optimizes the previous method, but the events are still being routed randomly to workers => multiple workers processing batching for one product => leading to transactions waiting due to locks.

To make batching more effective plus limit locks, we can set up so that one product is always routed to only one worker (can be achieved through setting up subscription/partition for the queue). At this point, the flow for processing commands will be as follows:

So the transaction really has the ability to process multiple events (of one product) and is least likely to wait due to locks, performance has been optimized much more. Assuming a product really has 100k bids, then each second only needs to process 1 transaction batching 1000 events, which reduces calculation time to just 2 minutes.

Is there really a way to eliminate locks?

If at one point in time, for a specific product, only one thread handles updating its price, then race conditions can be avoided.

To achieve this, first we need to make sure that one product is always routed to only one worker as above. And need to make sure that the worker won't process two threads for one product simultaneously.

This approach optimizes bidding a bit more, but requires careful attention and thorough testing. Even better, we can use Optimistic Locking technique (combined with retry) here, since the chance of conflicts has dropped very low (you can research more in this direction).

In summary, we have a final design as follows:

Design When Scaling

And the flow for the bidding feature:

Although there are many more ideas to expand and optimize for this system, the article is already long so I'll list them briefly here:

Handle locks at the caching layer to store the highest bid in Redis directly, so the system can easily respond accurately to which bids are invalid, update that product's price in real-time, especially when there are popular items like above, immediate response will create a very good experience. However, we need to consider handling HA for Redis, and make sure that the highest price in Redis and Database don't get out of sync.
Additionally, partition techniques to optimize write performance for the database are also a direction to optimize performance.

Summary

Okay, so you've gone through this entire article about this system with me. I believe that a real auction system would have more complex requirements, more complex models, and the approaches in my article might no longer be suitable. But I believe these analyses will provide insights into how to build a system from the starting point to when scaling.

See you in the next articles.