Thesis Joris Roovers: 2010

Sunday, October 31, 2010

Different scenarios require different markets

Up until now, I've never really blogged about different type of markets as exist in other commodity markets. With different type of markets, I mean markets that trade commodities within different time frames. That is, in a normal commodity market model, a difference is made between a futures market and a spot market. When thinking about a cloud computing resource market, it is not difficult to imagine these market equivalents. That is, in a cloud resources market, cloud computing resources can be traded for a specific time frame in the future or directly on the spot. There is however, also a 3th kind of market that needs to be considered: the on-demand market. The on-demand market trades - as the name implies - cloud resources that are requested on-demand; which means that the resources need to be directly accessible, but that no additional information is given by the consumer on how long the resources will be needed.

While all 3 market types share a lot of ideas (e.g. constraint language), each one also has properties that make them different from the other types. Let's have a more in-depth look at each of these 3 markets.

Futures market
If we take wikipedia's definition of a futures market, we can modify it so that it can be applied to a cloud setting:

A futures exchange is a central cloud resource exchange where people can trade standardized futures contracts; that is, a contract to buy specific quantities of cloud resources at a specified price for a specified time interval in the future.

In other words, if we put this on a timeline, a cloud resource future contract is a contract that is traded on time t, that specifies a specific amount of cloud resources (that meet certain constraints) that need to be available from a time t+x until a time t+y.

Trading on the futures market allows providers to do better resource provisioning in their datacenters. Since trading resources on the futures market allows providers to reduce risk (that is, they can reduce the risk of having idle resources at some time in the future), future markets usually offer resources cheaper than on the e.g. on-demand market. However, there are many situations in which this can be different (e.g. if the electricity prices are expected to be higher in the future, than the future price of an ask might be higher than the current (on-demand or spot price) price).
Speculating on future prices (either by going short or going long) might result in significant financial gains or losses. These principles are not new, and are an inherit part of any futures commodity market. Making future markets a part of my market model is therefor absolutely necessary, if the model is to be considered economically viable.

It might be interesting to combine the on-demand idea (that is, not obligating the consumer to specify when a resource is no longer needed) with the idea of a futures contract. That is, it might be intersting for a consumer to specify that it he will need a specific set of resources in the future, without needing to specify how long he will need them.

This is however, not very interesting from the provider's point of view since in this scenario he no longer has any garantuee on how long the resources will be used. However, if we allow a user to specify a minimum end time and a maximum end time, we allow the provider to make fairly accurate resource provisioning plans while at the same time giving the consumer more flexibility in time specification. This is especially helpful when it is hard for a consumer to specify exactly how long he will need the resources, while it is fairly easy for him to give a minimum running time.

The futures market model as proposed here fits quite nicely in the market model I've been developing until now. For the futures market, an offer (both ask and bid) should contain:

metainfo
a price (minprice for asks, maxprice for bids)
a timeconstraint (which timeframe is being considered)
a set of constraints (resources, legal, etc)
an expiration date (until when is the offer valid)

Spot market
Let's again modify wikipedia's definition of a spot market to get a definition for a cloud spot market:

The cloud spot market is a public cloud computing resource market, in which cloud resources are traded and delivered immediately.

Spot market timeline, a bid starts at a time t and runs until t+3 when the spot price of the provider exceeds the maximum price of the bid

Spot markets typically represent the immediate need of the market, that is, providers will sell current overcapacity on the spot market, while consumers buy current undercapacity on the spot market. If demand is high, spot prices will be high, when demand is low, spot prices will be low.
A spot price is the price of a single unit of trade on the spot market. This unit of trade is always exactly defined. For example, on the gold spot market, gold is always traded per 1000 ounces. The spot price at a given time thus reflects the value the market attaches to 1000 ounces of gold at that moment.
However, as with electricity (to some extent), a cloud computing unit cannot be expressed in a single dimension, since the heterogenity between different cloud resources is so large. Multiple dimensions (price, time and type) are indeed needed to specify a single cloud unit. This makes spot trading more complex. In order to trade on the spot market and keep things simple, we will thus have to set one of these dimensions to a fixed value in order to compare separate offers with each other. For now, I fix the time dimension to a single hour (this is also done in the electricity spot market). This means that the price and constraints can vary between offers, but the timeframe for which on offer is valid is always defined as a single hour.

This also implies that cloud providers and consumers will need to resubmit their offers before the start of the next hour. This allows both providers and consumers to adapt their prices based on their current needs.

Although this model works, it does not take into account that consumers might want to buy resources on the spot market for several hours at once. That is, cloud spot markets (such as Amazon's EC2) are primarily used to run lower-importance applications as long as the current price of the spot market does not exceed a certain threshold. In the model I just described, consumers need to re-buy cloud resources every hour, which implies migrating applications every hour. Obviously, this is not the desired behaviour. We can solve this by adapting our bidding language to reflect cloud spot market principles.

An spot-ask should then contain:

meta-info
the current spot price for the hour
a set of constraints (resources, etc)
no timeconstraints
no expiration date (an ask always expires after a single hour)

While a spot-bid contains:

meta-info
a maximum price
a set of constraints (resources, etc)
no timeconstraints
no expiration date

A spot-bid will then match with a spot-ask, if the price of the spot-bid is lower than the price of the spot-ask. A match implies a contract between provider and consumer of undefined duration. However, this contract can always be broken by the consumer (on-demand character of spot market), and should only be broken by a provider when it's current spot price becomes higher than the maximum price specified by the consumer in his bid.

Although spot markets allow a lot of immediate flexibility for both consumers and providers, the spot market does not provide consumers with any garantuee on how long their instances will be able to run. That is, it could be that the spot price of a specific provider is very low at a given moment, and very high the next hour (which probably results in a shutdown of the consumers instance, since the spot price would exceed the consumers maximum price). Surely, a slightly more expensive provider that has less volatile spot prices might be more interesting, since the probability that an instance running on that provider is shutdown is a lot lower. This is especially so for jobs that have a high startup cost.

In order to accommodate this desire, I propose to use price history constraints on both spot bids and asks. Such a constraint might for example assert that the average spot price (over a certain time frame) of a specific provider does not exceed a certain value. Using such constraints allows the consumer to have better garantuees over the volatility of the spot price.

Of course, using such constraints raises an issue of trust. Can a provider be trusted to deliver correct history constraints? Probably not. Therefore, cloud providers should use brokers that independently monitor the provider's spot prices and attach the history constraints to the provider's ask when submitting the ask to the market (Note: to bootstrap this mechanism, the data from e.g. cloudexchange.com can be used). Using the custom constraint mechanism discussed in my previous post, these brokers can then distribute their price history constraint to consumers.

On Demand Market
As already said before, on the on-demand market, the traded cloud resources should be directly available, but the consumer does not need to specify how long he will be using these resources.

On demand market

To implement the on-demand market, we again need to slightly adjust our bidding language:
A offer (bid or ask) should contain:

metainfo
a unit price
a unit time granularity
constraints (resources, etc)
an expiration date

That is, an offer should both contain a price as the time indication for which this price is valid. For example, an offer can have a price of 10 ct for a single hour or 100 euro for 1 month. Using this bidding language allows providers with different accounting granularities to participate in the market.

However, if we wish to keep using the price sorted queues (as used by the continous double auction), the market will need to normalize the prices in the queues (e.g. price per minute) and take into account the unit time granularity as an extra timeconstraint when matching offers. Although this makes the matching algorithm more complex, I believe that the improved flexibility for providers justifies this.

Now, the question poses itself, what if the user knows how long he will be using a certain set of resources but also requires his bid to start immediatly. Does this kind of bid that requires a direct start but that also specifies an end time 'belongs' to the futures or on-demand market? The answer is that it can 'belong' to both. However, one would reasonably expect that an offer on the future market is cheaper since the insurance that a resource is used for a specific time frame is worth something for a provider (recall that reducing risk for the provider is the primary idea behind the futures market). By introducing a DurationConstraint alongside timeconstraints in the futures market, we can allow offers on the futures market that need to start directly while garantueeing a minimum resource usage for the provider.

Combining on-demand and futures market

Obviously, an end-user should not need to choose on which market to submit his offer. Brokers should submit a consumers bid on both the futures and on-demand market, and return the cheapest match. Note that in order for this to work, the market must allow matching without obligations for both consumers and providers. Allowing this complicates things, since in such a scenario we need a seperate service that contacts both the provider and consumer to seal the deal. I might shed further thought on this issue in the future since I think it is a good idea to add the match execution verification (do consumer and provider hold their promises?) to the same kind of service.

Component Based Pricing

I'm not entirely sure yet how the component based pricing reverse auction I discussed in the previous blog post fits in the models I just discussed (does every market have a seperate reverse auction, or is there a seperate reverse auction that is shared between the different markets ?). I will be exploring this further in the coming week.

That's about it for now :-) Until soon.

Saturday, October 30, 2010

Component based offers and custom constraints

It has been a while since my last blogpost. Although nothing appeared on this blog, I've been busy. In the past 2 weeks I've - among other things- continued developing the first market prototype. At this moment, that prototype contains most features described in its design which I discussed in a previous post.

Having implemented the prototype, I tried to validate it by creating bids and asks based on offers of some real-world cloud providers. The results were not really satisfying.

Although some of the providers fit nicely into my proposed double auction model, some others really don't. Most notably, providers that offer component based pricing (such as CloudSigma) really cannot participate into the market model I proposed. Also, I realized that some/many providers offer specific functionality that can not be easily expressed by a resource constraint (SLA's, response times, certification, etc). Furthermore, cloud customers might have specific wishes or constraints (region of deployment, SLA's, certification, etc) that similarly can not be expressed by these resource constraints.

In other words, by developing the prototype I discovered 2 problems:

There is no way to allow component based pricing in the current model
A more generic constraint mechanism is needed for both consumers and providers

When brainstorming for solutions about these 2 issues, I quickly discovered the (presumably) easy fix: allow aks to contain the logic to calculate the price for a bid and allow custom constraints in the market that include custom logic to determine whether it matches with an other (customer) constraint.

Although both ideas allow greatly improved flexibilty and differentation for cloud providers, they do make things considerably harder from a technical point of view. When allowing remote code to run within the market, we need to put certain mechanisms in place that verify that the remote code does not breach market security. In other words, we cannot allow arbitrary code to run within the market; extended analysis of the remote code is absolutely needed if we want the market to remain credible.

Doing such security analysis is far from trivial. Although examples exists (such as the mechanisms in Apache River - formerly Jini), implementing such mechanisms in the market would require a lot of time and effort.

There is however an other way to solve this problem. Instead of running remote code on the market itself, the market can also contact providers during the matching process and ask them for a price for a specific resource set or ask them whether a specific constraint matches an other constraint.

Before we have a closer look at both situations (requesting a price and matching a constraint), we first need to consider the impact of this 'reversal of control'.

By delegating market logic to external parties, we can offer improved flexibility without needing mechanisms to do complex code analysis since no remote code is running within the market. However, making such calls to external parties is typically a very slow operation since it involves webservice calls over a network. Since the market potentially needs to do a lot of these calls, it is important that we require a certain amount of QoS from the services that execute the remote logic. The most important constraint will probably be response time (max, avg).

To achieve this, it is probably interesting for providers to work with brokers that have a very fast network access to the market. Such brokers can for example run in the same datacenter or even on the same physical machine (different jvm) as the market itself. Providers can then deploy their custom logic on these brokers, in which way they garantuee that QoS requirements are met. Note that the brokers will still need a mechanism to verify that the remote code of providers can be trusted. The market however, does not; a compromised broker can never lead to a security breach on the market. (A compromised broker can however start generating a lot of network traffic or start using a lot of cpu, disk or memory resources, which can be problematic when the market runs on the same host or within the same datacenter. However, this is not an issue since both the jvm as the virtual machines in which the brokers are ran can easily be setup in a way that limits their resources).

The diagram below gives an conceptual overview of how the brokers, market and providers interact (for a component based price ask, custom constraint checking uses the same idea).

Let's take a moment to give some more thought to the specific issues of component based pricing and custom constraints.

Component Based Pricing

When introducing component priced asks, the market needs an updated mechanism to efficiently match incoming bids with asks (note that the market also still accepts 'normal', 'fixed' asks).

This can be done by combining the already existing continuous double auction mechanism with a reverse auction for component priced asks. Basically, we will now have 2 queues and a list in the market. The 2 queues still have the same function as before: there is a ask and a bid queue sorted by price (by increasing and decreasing price respectively). The list contains the component priced asks. When a new component based ask comes in, it is added to the list. A new 'normal' ask is first matched to the bid queue and if no match is found, it is added to the ask queue (this is the same behaviour as before).

The situation is however a bit different when a new bid comes in. When a new bid comes in, 2 threads are started. The first thread does the normal matching as before. The second thread will do the reverse auction by contacting every provider (or broker - typically referred to as a 'bidding agent') in the component based ask list (using webservices) and requesting a price for the incoming bid. When all these prices are in, the algorithm selects the cheapest component based provider. Then there are 2 possibilities: either the other thread has already found a match or it hasn't. If a match is already found, the algorithm will compare the price of the match with the cheapest component based price and return the cheapest of the two as the actual match. If no match has been found yet, the algorithm will look whether the price of the cheapest component based match is less than the price of the ask that is currently being matched with the incoming bid. If this is the case, the component based match is returned. If this is not the case, the algorithm continues searching for a match that is cheaper than the cheapest component based match. If no such match exists, the component based match is returned. If the price of the cheapest component based match is the same as the price of the cheapest non-component based match is the same, one of the two is randomly chosen and returned.

This algorithm can be improved by caching the prices of the component based offers. This can easily be done by associating an expiration date (or lease) with the returned price. By letting the market charge more for component based asks that return a short expiration date, there is an incentive for providers to limit the volatility of their component based prices. This increases the effectiveness of the caching mechanism.

By making sure that providers (or brokers) meet certain QoS constraints and by caching the prices, this algorithm is still fairly efficient while improving the market's flexibility.

Constraints

In we want to allow more generic constraints, it is a good idea to make or matching mechanism more generic by letting the constraints match with each other. In other words, move the matching logic out of the market itself and into the constraints. By doing this - note that we are using the idea of a command pattern here - the market no longer needs to be aware of specific constraint logic, which simplifies the overal market design. An offer (both bids and asks) will now thus have less components:

Details about who made the offer (including accounting info, etc)
A set of constraints
A price

Note that, in this model, there is no longer a difference between a resource, time or any other constraint.

A constraint contains an id which uniquely identifies its type. The market will only match constraints with the same type-id (this is done for efficiency reasons, trying to match every constraint of a bid with every constraint of an ask is very inefficient). Besides a type-id, a constraint also contains (key, value) pairs that give it meaning. For example, a time constraint of an ask may contain ("start", time) and ("end", time+5) to indicate a time interval in which an ask is valid.

Now, how do custom constraints fit into this picture? The idea is that there are many type of constraints already offered by the market. These include resource constraints (e.g. memory/cpu/disk constraint), legal constraints (e.g. SLAs, certification), time constraints and possibly others. These constraints that are offered by the market itself and thus trusted by the market. The code that does the actual constraint checking can thus safely run within the market, which is fast since it is executed locally.

Besides these constraints, there are these so-called custom constraints. These constraints can contain (key, value) pairs as normal constraints, but are also annotated with a webservice that can be called to perform the constraint check. That is, when the market wants to match 2 custom constraints of the same type, it will make a call to the annotated webservice (passing the 2 custom constraints), requesting a constraint check.

In order to optimize this idea, we need to minimize the number of webservice calls that are made. Caching might be an option, but is possibly less interesting here since it is more complex (can we use leases?). I'll need to have a further look into this. Caching is however not the only way to optimize the use of custom constraints. By providing a set of generic constraints that check for property equality or order, we can decrease the amount of custom constraints that are needed. That is, if we provide an 'EqualConstraint' that will check whether the properties of the 'EqualConstraint' it is checked against are equal to its own properties, we can probably already avoid the use a lot of custom constraints. Similarly, 'LessThan' and 'GreaterThan' constraints could verify that the properties of the constraint that is checked against are all less or greater than the properties of the constraint itself. Since the code that does this checking is provided by the market, it can also run safely and fast within the market.

The only question that remains is then how these custom constraints are shared between providers and consumers. A directory service in which providers can add custom constraints is the most logical solution and simple solution.

Prototypes

Although flawed in some ways, the initial prototype already contains most of the CDA logic. I can definitely reuse parts of this prototype. During this week, I also made a quick (local) prototype of the ideas concerning the custom constraints. I'll probably update this post to put that code online tomorrow. I've yet to make a reverse auction prototype for the component based asks, but I think this can be done rather quickly.

The next step involves putting all these prototypes together and making the actual webservice implementation. However, I think I'll explore some other open questions first.

These include:

Does this updated model can be applied in the real world?
Exploring application based bid specification (how do data storage and transfer costs fit into the model?)
Accounting, how does the market earn money ?
Trust: how can we be sure that a provider actually delivers?
Price discovery (see Dube)
Match execution: a match is found, how do consumer and provider connect with each other?
How do we bootstrap the market?
Updating the market architecture (see previous blog post)

Sunday, October 17, 2010

Bids, Offers and Prototype

I'd like to begin this post with an analysis of the necessary properties of bids and offers. In other words, of which parts does an bid or offer exist?

A bid should contain the following information:

Details about the consumer that made the bid
A specification of the required set of resources
Time constraints (spot/future bid, in which time frame should the requested resources be available)
Price constraints (max price, price tolerance)

A offer should contain similar information:

Details about the provider that made the offer
A specification of the offered set of resources
Time constraints (spot/future bid, in which time frame are the offered resources available)
Price constraints (min price, price tolerance)

The bids and offers should thus contain the same kind of information.
The 'details about consumers or providers' should contain some general information about the client (name, email, address, etc), accounting information (credit card number), and possibly some more technical consumer details (public keys, certificates, etc).
To define the resources that are part of the bid/offer, I think I'll go for Dube's resource sets (this is the most logic approach).

This means that we define the required resources as a set. Each resource in the set is of a certain type (e.g. cpu_architecture) and has a certain value (e.g. x86). By using sets we can allow the specification of partial resource sets, that is, a resource set does not need to contain a value for all the available resource types.

For example, the following are both valid resource sets:

{cpu_architecture:x86, cpu_clock: 2.2Ghz, memory: 2GB}
{cpu_architecture:x86, cpu_clock: 2.4Ghz, memory: 4GB, disk: 128GB, os: Ubuntu Linux 10.10}

Note that this allows for partial matching between bids and offers (as long as all requirements of the bid are fulfilled). For example, if the in the example above, the first set was a requirement set and the second set an offered set, the first set could be matched with the second set since the second set contains all the necessary resources.Matching the resource set is done by matching each of the items in the set with items in the matching set.

There are 2 different kind of matches that can occur here: exact matches and what I call 'inclusive' matches. An exact match means that the 2 resources need to have the exact same value (e.g. cpu_architecture: x86) in order to have a match. An inclusive match means that a resource is matched by a resource that is better (has a higher value) than the requested resource (e.g. cpu_clock: 2.2 Ghz is matched with cpu_clock: 2.4 Ghz).

This specification of resource sets is good enough to create a first implementation that allows matching. Further iterations can improve on this idea. More details about the other parts of the bids and offers (the time and price constraints in particular) will follow in a new blogpost.

IaaS Market: a first prototype

Based on this first definition of bids and offers, I decided to create a little prototype that already implements some of this functionality. The Java code can be found here. The prototype focuses on the definition of the resource sets. The UML scheme below gives an overview of the implementation of the resource sets.

Notice that I made a difference between Comparable resources (for which there exists a total order on its values; and inclusive matching is thus possible) and the NonComparable resources (for which an exact match is always necessary). By letting the comparable resources implement the Java Comparable interface it is possible to see if a value of a certain resource type is better than another value.

The prototype also includes a very basic emulator implementation that can generate random sets of resources, random bids and offers (at this moment those bids and offers only contain random resource sets, no time or price constraints) and can make bids and offers to a very rudimentary market. This market tries to do a matching using a simple matching algorithm: when a new bid or offer comes in, it searches the offer or bid list (queue) for an offer or bid that has a matching resource set. At this moment, the first match is always chosen. This means that it could be the case that there is a better match available (that would benefit the provider and consumer more), but in the current implementation this match is not returned. In order to do that, we first need to define what the 'best match' is.
I already have a few (unexplored) ideas about defining this 'best match'; I'll probably specify this more in a future post.

Note that all of this is implemented as a local console application; that is, no webservices have been used up to this point.
The zip-file contains a .project file to import the project into Eclipse. The emulator.Emulator class contains the main function.

Expect a post about price- and timeconstraints, possibly with a first definition of the 'best match' within the next few days.

Thursday, October 14, 2010

The GridEcon Project - Thoughts and Ideas

As I mentioned in a previous blogpost I will now write something more about the GridEcon Project.
On the website of the GridEcon Project, the following project description can be found

GridEcon 1.0 is a computational resource auctioning system built upon an innovative bid matching algorithm tailored specifically for the trading of computing power. It brings together both consumers and providers of available computational resources and not only has the capability to deliver those resources just like Amazon, IBM and Microsoft, but also allows you to trade your own excess computational power... all in a single, cost-effective marketplace.

In other words, the GridEcon project is in fact an attempt to implement the system that I am currently researching.

Is my work done then? Not quite. Although GridEcon's market mechanism contains some good ideas, it also makes an assumption that prevents it from being useful in a real world situation.
That is, it assumes that the 'unit of trade' is a single VM, and that providers give consumers the possibility to let such a VM run on their machines. Although GridEcon acknowledges the fact the providers offer different types of VMs, with different characteristics, it does not really provide an elaborate way for providers and consumers to specify which kind of VM is offered or requested (both parties should choose from a predefined set of VMs). Although virtualization can be considered as a core technology behind most IaaS implementations, and standardization of VM formats is considered by many as critical for the real commoditization of cloud resources, I believe that only offering a predefined specific set of VMs is too constraining. A more elaborate resource specification model (such as the one proposed by Dube) is required. This allows for more differentiation between different cloud providers and more choice for the consumer. Note that this also allows consumers to only specify their actual requirements, without needing to worry about all the other specifics of the VM. For example, a university researcher only needs to now that the machine he'll be using runs Fedora Linux 11 and has a dual core 2.2 Ghz processor with 2Gb of RAM. The amount of disk space, network connectivity, etc might not be important for the application that is used by the researcher (of course, these components should also always have a certain minimal value, e.g. 2Gb disk space, 10Mbit WAN connectivity). Using a more elaborate requirement/offering syntax (again, such as the one proposed by Dube) would allow matching of such partial requirement specifications.

Besides this remark, I do believe that GridEcon provides some other valuable ideas that, combined with some of Dube's concepts, can lead to a good, more realistic market model.
For example, the GridEcon project gives more attention to the issue of time constraints that are often part of offers and bids in a grid/cloud resource auction. Dube also very shortly mentions time constraints, but doesn't take it into account when matching bids with offers (this is partly because Dube's focus is not on the matching algorithm itself, but more on the market behavior as a result of the actions of the consumers and providers). When making a real-world implementation of a cloud spot market, taking those time constraints into account is a necessity. Because the market clearing algorithm of the GridEcon project does take these constraints into account, I believe that it may be valuable when I will be looking more close at the matching algorithm.

GridEcon also gives a good overview of which parts of the market can be considered as axillary services, and can thus be provided by brokers instead of the market itself. Although GridEcon's market architecture can be considered as complex, I like the concept of keeping the market itself fairly simple and delegating more complex tasks to brokers (possibly controlled or monitored by the market or by a separate Grid Authority). Such a simple market would only consist of mechanisms to accept bids and offers, match those offers and bids and contact other brokers to do accounting, administration and connect consumers with providers.

GridEcon demo and implementation
Since the GridEcon project already implements many of the ideas I'm researching, I thought it was a good idea to have a better look at the online demo (username:user1, password:user1) and at the source code.

Although the online demo seems to work well (I was able to place bids and offers and let them match), I was a bit disappointed at the offered functionality. For example, although the GridEcon research papers mention different types of VMs that can be chosen by the provider/consumer, the demo only seems to allow a single type of VM, but offers the user different types of applications to offer/use. This is not really the functionality I had in mind (In my opinion the market should not know which applications are run on the offered VMS. Even more, the market really should not be allowed to know this.)

Submitting an offer in the GridEcon demo

Although the basic functionality seems to work, I'm was not really convinced by this demo (although that some concepts of it might be usable).

When looking at the Java source code, I confirmed that the whole project uses Webservices to connect the different brokers and components (I already read about this in the various papers). I think that using webservices is certainly a good choice since after all, with cloud computing we are working in a webservices environment.

After investigating some more, I was supprised to see that some parts of it were relatively well documented. However, my promoter Kurt Vanmechelen pointed out to me that large parts of the code were less well documented and that a lot of the classes are generated using Apache Axis and that setting up the whole project (to see if the project could be reused) would probably involve regenerating all those classes (or using older versions of Axis/Tomcat etc, which is also not very beneficial). Since this can potentially involve quite a lot of work (without being sure that I can actually use some of the code), I decided that I won't be doing that for now.

Combining GridEcon and Dube: an overview
Having read about the GridEcon project and Dube's thesis, I will now try to outline a first rough idea of how I believe a market model for IaaS resources can be implemented.

I believe that the best way to approach the problem is to work top-down and incrementally. That is, we should first start by defining and implementing the market itself, gradually specifying the exact workings of it's inner components and gradually adding other components that interact with it. As said before, this market should focus on accepting bids and offers and matching them. This functionality should be offered using a webservice API. The syntax for bids should be expressive enough to contain detailed application requirements, price (range) and time constraints. The syntax for offers should allow detailed instance specification, price and time constraints. Using XML to define this syntax, so that it human-readable and plays nice with webservices, is probably the way to go.
The matching algorithm should maximize profit for both parties (note that this a core concept of the double auction), while taking into account price and time constraints. It is probably a good idea to first consider future bids/offers (in which time constraints are less of an issue), and then handle spot bids/offers (more realtime behaviour). The exact algorithm details are to be defined later (having a more in depth-look at GridEcon's and Dube's matching algorithms is probably a good idea).
Once those components are in place, adding a notification service is probably required to keep both provider, consumer and any (future) administrative services updated.
After that, I think that providing a separate emulation service is a good idea in order to test the market and monitor it's behaviour under various provider and consumer strategies. Note that this implies that the market itself should provide an interface to easily extract information about it's current state.

Initial Market Design

Of course, once this is implemented, other components should also be provided (such as accounting and consumer-provider connector services). Although providing these components are a necessity to make the market actually usable, I'm not sure whether I'll implement those components as they require a substantial coding effort. I'm a strong believer of quality above quantity, and as such I will first focus on delivering quality market and emulation components.

A last remark: technical details
Although it is too early to decide which technologies I'll be using, it speaks for itself that I'll be using open technologies that are widely used by cloud providers and on the web. The only thing that I'm quite sure of is that because of the it's pervasiveness around the web (and quite frankly, because of my personal affinity) I'll most probably be using the Java programming language.
I'll also pay attention to the ease of deployment, since I personally feel that this is important if any component is to be reused in the future.

Expect and update within a couple of days when I've worked out some of the details of this first proposal.

Tuesday, October 12, 2010

SuperComputing Futures: the Next Sharing Paradigm for HPC Resources

The past few days, I have been reading Nicolas Dube’s PhD thesis [1] that proposes and discusses an economic model for HPC Resource trading. I found this thesis very interesting to read, and believe that I will be able to use at least some (if not many) of the ideas that are presented.

Small summary of the thesis
The first chapter of the thesis gives a general introduction into High Performance and Grid Computing. It also explains the fundamental ideas of the grid as a utility (utility computing), and how theories and ideas from economics can apply to this new way of viewing computer resources (HPC in particular). The chapter ends with the problem statement and further thesis outline.
The second chapter gives an introduction to finance and market economics; specifically the topics of commodity trading (using auctions) and its analysis. At the end of the chapter, the author links these concepts to grid resources and explains how by viewing these resources as commodities, the same economic market principles as used in other commodity markets can be applied to those grid resources.
In the third chapter, Dube proposes his actual market model (or grid exchange). Some of the key issues addressed in this chapter are how to specify resources in bids and offers when participating in a auction, the notion of Grid Credits and a price discovery mechanism for providers and clients.
In the fourth chapter, a lot of simulations are done to verify the model proposed in the third chapter. Some interesting results concerning market equilibrium and price stabilization are shown. Special interest is given to the bootstrapping of the proposed market model.
In the fifth chapter, some possible model optimizations are discussed, together with some other issues that should be considered when actually installing a grid exchange.
Finally, the sixth chapter contains an overview of the discussed material and achieved results.

Personal Ideas
Dube takes a theoretical approach to designing a grid exchange, using proven concepts from economics and finance. I believe that I can benefit from the results of Dube’s thesis by incorporating some of his ideas in a more real-world implementation.
For example, the ideas of Resources Sets to specify the needed resources of clients and resources offered by providers can easily also be used in a cloud setting.
Also, a price discovery mechanism is a necessary tool for providers and clients to have an idea of how to make intelligent offers and bids. Dube’s proposed mechanism is at least a good attempt to fill-in the theoretical logic used by such a tool.
Furthermore, I can surely benefit a lot the various other remarks and ideas spread throughout the thesis.

However, a ifference between Dube’s grid and the grid that we are considering is the fact that we are working with cloud resources that typically use virtualization in order to optimize resource usage. This primarily affects the providers as in the cloud setting they will typically offer different types of instances instead of barebone hardware (under utilization of a machine is less of an issue, since the left-over capacity of a server is used to schedule other VMs).
Note that it can be interesting to create a market that allows both types of resources to be offered (barebone hardware vs VMs), since in some HPC applications, the need for specific hardware platforms will always remain.

I have not given further thought to whether VMs can easily be integrated into Dube’s model, this might be the topic of an other blog post.

However, in general I think that it would be a good idea to combine some of the more theoretical ideas from Dube with some more practical cloud oriented ideas and algorithms from the GridEcon Project (together with some of my own ideas ofcourse ;-) )

I will post some more ideas about Dube’s Resource Sets within the next couple of days.

References
[1] SuperComputing Futures: the Next Sharing Paradigm for HPC Resources, Nicolas Dube, http://www.nic.qc.ca/Site/PhD_Thesis.html

The past 2 weeks

Since this blog is only an hour old, and I've already been researching some topics in the past 2 weeks, I believe that giving an overview of what I've done until now is a good starting point.

An important starting point for the thesis was researching the energy trading market and evaluating the differences an parallels with the IaaS market. In this respect, I read the 5 first chapters of Fundamentals of Power System Economics [1].
These chapters give a good overview of how energy trading markets function, and on which principles they are based. They also cover some basic concepts from (micro-)economics. Although was already familiar with many of the explained topics (about micro-economics), I believe that it is good that those were repeated as I'll come accros the same concepts throughout my further research. I uploaded an overview of the contents of the first 5 chapters to give you an idea of the covered topics.
I did not read chapters 6-8 because these chapters focus more on the technical aspect of trading electrical power. In order to fully understand these chapters, a broader electrical engineering background is required. Futhermore, reading these chapters provides little added value to the research I'm doing.

Besides reading in [1], I also read some papers about the GridEcon project that has been developed as part of the European Community funded "Sixth Framework Programme".

GridEcon 1.0 is a computational resource auctioning system built upon an innovative bid matching algorithm tailored specifically for the trading of computing power. It brings together both consumers and providers of available computational resources and not only has the capability to deliver those resources just like Amazon, IBM and Microsoft, but also allows you to trade your own excess computational power... all in a single, cost-effective marketplace.

More specifically, I read the papers

Market Mechanisms for Trading Grid Resources [2]
A MarketPlace and its Market Mechanisms for Trading Commoditized Computing Resources [3]
GridEcon: A Market place for Computing Resources [4]

Additionally, I had a quick look at the online demo and the available source code . More about the GridEcon project in a future blog post.

I also started reading in the PhD thesis of Nicolas Dube [5], which seems to contain some interesting concepts concerning resource specification for markets. I also be writing a seperate blogpost to discuss some of the ideas of Dube's thesis.

On Wed 06/10/10, I had a first meeting with my Promoter Kurt Vanmechelen, with who I discussed the work I already did.
We also determined a preliminary outline for my thesis.

H1 Introduction
- What is cloud computing?
- Explain analogy with power systems
- Short comparison of power systems with clouds (level of market structure + goods)
- Research questions
H2 Market Design Principles
- Why a market?
- What are the advantages of introducing it?
- What are the difficulties introducing it?
- What properties should a market ideally have ?
   - Allocative efficiency
   - Individual Rationality
   - Incentive Incompatibility (is there an incentive to game the system or are bidders inclined to bid their true costs/valuations)
   - Computational Tractability
- Of what components does a market exist?
   -> Bidding language
   -> Clearing / allocation rule
   -> Pricing rule
- Scoping (what features will (not) be modelled by the market's bidding language)
H3 Cloud Exchange Market Design
- Description of market components
- Architectural diagram of market design
   -> Execute transactions
   -> Software component design + deployment (Web service technology)
H4 Evaluation / Simulation
- Assess a number of market properties
   -> Does the market behave as expected (i.e. price reaction to demand/supply changes)
- Evaluate scalability / computational tractability
H5 Future Work
H6 Conclusions

Normally, I'll finish reading Dube's thesis [5] today, and will also add a blogpost about that or about GridEcon.

References

[1] Fundamentals of Power System Economics, Daniel S. Kirschen, Goran Strbac, Wiley 2004
[2] Market Mechanisms for Trading Grid Resources, Costas Courcoubetis, Manos Draminitos, Thierry Rayna, Sergios Soursos and George D. Stamoulis
[3] A MarketPlace and its Market Mechanisms for Trading Commoditized Computing Resources, Jorn Altman, Costas Courcoubetis, Marcel Risch
[4] GridEcon: A Market place for Computing Resources, Jorn Altman, Costas Courcoubetis, George D. Stamoulis, Manos Dramitinos, Thierry Rayna, Marcel Risch, Chris Bannink
[5] SuperComputing Futures: the Next Sharing Paradigm for HPC Resources (Economic Model, Market Analysis and Consequences for the Grid), Nicolas Dube

Detailed description of the thesis

The following is the detailed description of the goals of my thesis (as it can also be found on http://www.esp.win.ua.ac.be/projects/show/541):

IaaS clouds deliver infrastructure level-resources (processing power, memory, data storage, networking) as a commodity to clients through an on-demand and pay-per-use model [1]. At present, the model for trading cloud resources at the major providers (Amazon, Microsoft, Rightscale), involves the provider defining its own trading mechanism. Amazon for example [2], currently one of the most successful IaaS providers, sells its cloud resources through a spot market with prices that vary based on supply and demand during the week, through a reservation-based market with fixed prices, and through a fixed pricing scheme for non-reserved resources.

The commoditization of IT resources brought forth by the IaaS paradigm eases the deployment and management of applications on third party remote resources significantly. It also increases the possibilities for trading IaaS resources on a scale that moves beyond the individual provider level. Such inter-provider trading is used in other commodities markets such as electricity markets, to efficiently balance supply and demand on relatively short timescales. In addition, an open market for IaaS resources can efficiently open up the cloud IaaS provider space to less dedicated resources such as pools of desktop machines which represent a huge potential pool of processing power.

At present however, the possibilities for creating such open markets for IaaS resources have been largely unexplored. The goal of this thesis is to investigate: 1) Whether it is theoretically feasible to design an open market for cloud resources 2) The design options for open IaaS markets. In this respect, you will also look into the design of markets for energy trading and evaluate the differences and parallels with IaaS markets. 3) The behavior of the proposed market design in terms of price formation and stability through simulation

[1] M. Armbrust et. al, Above the Clouds: A Berkeley View of Cloud Computing, Armbrust et al., EECS Department, University of California, Berkeley, 2009, http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html

[2] Amazon, LLC, Amazon EC2 Pricing http://aws.amazon.com/ec2/pricing/

[3] N. Dube, Supercomputing futures the Next Sharing Paradigm for HPC Resources, http://www.nic.qc.ca/files/nicdubePhDthesis.pdf

Welcome to my thesis blog

Welcome to my thesis blog.
In the next academic year I will continue studying cloud computing, and I'll be researching whether it is possible to come up with (possibly implement) a market for IaaS Cloud Computing resources.

I will try to update this blog frequently so that my progression can be followed.