Inspecting gradients in Chainer

Chainer is my choice of framework when it comes to implementing Neural Networks. It makes working with and trouble shooting deep learning easy.

Printing out the gradients during back propagation to inspect their values is sometimes useful in deep learning, to see if your gradients are as expected and aren’t either exploding (numbers too large) or vanishing (numbers too small). Fortunately, this is easy to do in Chainer.

Chainer provides access to the parameters in your model, and for each parameter, you can check the gradient during the back propagation step, stored in the optimizer (such as SGD or Adam). To access these, you can extend to additionally output the gradients, by defining your own StandardUpdater like so:

class CustomStandardUpdater(
    def __init__(self, train_iter, optimizer, device):
        super(CustomStandardUpdater, self).__init__(
            train_iter, optimizer, device=device)

    def update_core(self):
        super(CustomStandardUpdater, self).update_core()
        optimizer = self.get_optimizer('main')
        for name, param in
            print(name, param.grad)

In lines 9-10 you can see the parameters (weights) of your neural network being accessed through the optimizer, and for each parameter, the name and gradient is being output. This StandardUpdater can be attached to your training module as follows:

model = MyChainerModel()
optimizer = chainer.optimizers.Adam()
train_iter = chainer.iterators.SerialIterator(train_dataset, batch_size=32, shuffle=True)
updater = CustomStandardUpdater(train_iter, optimizer, gpu)
trainer = training.Trainer(updater, stop_trigger=(100, 'epoch'))

Calculating running estimate of mean and standard deviation in Python

Say you have a stream of means and standard deviations for a random variable x that you want to combine. So essentially you’re combining two groups of means and standard deviations, G_{x,1} = \mu_{x,1}, \sigma_{x,1} and
G_{x,2} = \mu_{x,2}, \sigma_{x,2} .

If you have access to the random variable x‘s value coming in as a stream, you can collect the values for some n_{x,1} number of values and calculate the mean and standard deviation to form a group G_{x,1}, and then combine it with the mean and standard deviation of the next group G_{x,2} consisting of the next n_{x,2} values of x to form: G_{x,1:2} = \mu_{x,1:2}, \sigma_{x,1:2}

The formulas for the combined means and standard deviations are:

\mu_{x,1:2}=\frac{(n_{x,1}\mu_{x,1}) + (n_{x,2}\mu_{x,2})}{ n_{x,1:2} }
\sigma_{x,1:2}=\sqrt{\frac{(n_{x,1}-1)\sigma_{x,1}^{2} + (n_{x,2}-1)\sigma_{x,2}^{2} + n_{x,1}(\mu_{x,1} - \mu_{x,1:2})^{2} + n_{x,2}(\mu_{x,2} - \mu_{x,1:2})^{2} }{n_{x,1:2}-1}}

Note that this is the Bessel corrected standard deviation calculation according to, which I found leads to a better estimate.

In Python code, this is what it looks like:

import numpy as np

def combine_mean_std(mean_x_1, std_x_1, n_x_1, mean_x_2, std_x_2, n_x_2):
    n_x_1_2 = n_x_1 + n_x_2
    mean_x_1_2 = (mean_x_1 * n_x_1 + mean_x_2 * n_x_2) / n_x_1_2
    std_x_1_2 = np.sqrt(((n_x_1 - 1) * (std_x_1 ** 2) + (n_x_2 - 1) * (
                            std_x_2 ** 2) + n_x_1 * ((mean_x_1_2 - mean_x_1) ** 2)
                            + n_x_2 * ((mean_x_1_2 - mean_x_2) ** 2))
                        / (n_x_1_2 - 1))
    return mean_x_1_2, std_x_1_2, n_x_1_2

total_mean_x = None
total_std_x = None
total_n_x = 0

all_x = None # For getting the actual mean and std for comparison with the running estimate

for i in range(10):
    x = np.random.randint(0, 100, np.random.randint(1, 100))
    if all_x is None:
        all_x = x
        all_x = np.concatenate((all_x,x),axis=0)
    mean_x = x.mean()
    std_x = x.std()
    n_x = x.shape[0]
    if total_mean_x is None and total_std_x is None:
        total_mean_x = mean_x
        total_std_x = std_x
        total_n_x = n_x
        total_mean_x, total_std_x, total_n_x = combine_mean_std(total_mean_x, total_std_x, total_n_x,
                                                                mean_x, std_x, n_x)

print(total_mean_x, total_std_x, total_n_x)
print(all_x.mean(), all_x.std(), all_x.shape[0])

If you run the code above and inspect the values printed at the end, you’ll note that the running estimate in total_mean_x and total_std_x are almost exactly the same as the actual mean and std output by literally collecting all x values and calculating the two values (but which may not be possible or feasible in your task).

Changing floating point size in Chainer

The default floating point size in Chainer is 32 bit. That means for deep learning, Chainer will expect numpy.float32 for CPU or cupy.float32 for GPU under the hood, and will exit with error if the data is set at a different size.

However, there may be times you want more than 32 bits, such as when you’re getting NaN’s or inf’s in your training routine and want to troubleshoot.

Changing Chainer to use float64 is simple:

import chainer
import numpy as np
chainer.global_config.dtype = np.float64

Call this at the beginning of your program. And of course, you’ll want to make sure that the ndarray dtype’s for your data are set to float64 (as in np.array(…).astype(np.float64)) before being passed to Chainer.

Probabilistic Model-Based Reinforcement Learning Using The Differentiable Neural Computer

My experiments found that a model learned in a Differentiable Neural Computer outperformed a vanilla LSTM based model, on two gaming environments.

➡ Thesis PDF:


For my MSc Artificial Intelligence at the University of Edinburgh, my dissertation included 4 months of research. I investigated the use of the Differentiable Neural Computer (DNC) for model-based Reinforcement Learning / Evolution Strategies. A predictive, probabilistic model of the environment was learned in a DNC, and used to train a controller in video gaming environments to maximize rewards (score).

The difference between Reinforcement Learning (RL) and Evolution Strategies (ES) are detailed here. However, in this post and my dissertation, the two are used interchangeably as either is used to accomplish the same goal — given an environment (MDP or partially observable MDP), learn to maximize the cumulative rewards in the environment. The focus is rather on learning a model of the environment, which can be queried while training an ES or RL agent.

The authors of the DNC conducted some simple RL experiments using the DNC, given coded states. However, to the best of my knowledge, this is the first time the DNC was used in learning a model of the environment entirely from pixels, in order to train a complex RL or ES agent. The experiments I conducted showed the DNC outperforming Long Short Term Memory (LSTM) used similarly to learn a model of the environment.

Learning a Model

The model architecture is borrowed from the World Models framework (see my World Models implementation onGitHub). Given a state in an environment at timestep t, s_t, and an action a_t performed at that state, the task of the model is to predict the next state s_{t+1}. Thus the input to the model is [s_t + a_t], which produces output (prediction) s_{t+1}. Note that the states in this case consist of frames from the game at each timestep consisting of pixels. These states are compressed down to latent variables z using a Convolutional Variational Autoencoder (CVAE), therefore more specifically the model maps [z_t + a_t] => z_{t+1}.

The model consists of a Recurrent Neural Network, such as LSTM, that outputs the parameters of a Mixture Density Model — in this cased, a mixture of Gaussians. This type of architecture is known as a Mixture Density Network (MDN), where a neural network is used to output the parameters of a Mixture Density Model. My blog post on MDNs goes into more details. When coupled with a Recurrent Neural Network (RNN), the architecture is known as MDN-RNN.

Thus, in learning a model of an environment, the output of the MDN-RNN “model” is not simply z_{t+1}, but the parameters of a Gaussian Mixture Model (\alpha, \mu, \sigma) which are then used to sample the prediction of the next state z_{t+1}. This allows the model to be more powerful by becoming probabilistic, and encode stochastic environments where the next state after a given state and action can be one of multiple.

For the experiments I conducted, the architecture of the model used a DNC where the RNN is used in World Models, thus, the model is composed of a MDN-DNC. Simply, the recurrent layers used in the MDN are replaced with a DNC (which itself contains recurrent layers that are additionally coupled with external memory).

The hypothesis was that using the DNC instead of vanilla RNNs such as LSTM, will allow for a more robust and algorithmic model of the environment to be learned, thus allowing the agent to perform better. This would particularly be true in complex environments with long term dependencies (meaning, a state perhaps hundreds or thousands of timesteps ago needs to be kept in context for another state down the line).

Experiments And Results

Experimentation schematic, based on the World Models framework.

The model of the environment learned is then used to train an RL agent. More specifically, features from the model are used, and in the case of the World Models framework, this consists of the hidden and cell states h_t, c_t of the LSTM layers of the model at every timestep. These “features” of the model, coupled with the compressed latent representation of the environment state, z, at a given timestep t is used as input to a controller. Thus, the controller takes as input [z_t + h_t + c_t] to output action a_t to be taken to achieve a high reward. CMA-ES (more details in my blog post on CMA-ES in Python) was used to train the controller in my experiments.

The games the MDN-DNC was tested on were ViZDoom: Take Cover and Pommerman. For either game, a series of experiments were conducted to compare the results with a model of the environment learned in a MDN-DNC versus a MDN-LSTM.

ViZDoom: Take Cover

In the case of ViZDoom: Take Cover, a predictive model was trained in the environment using MDN-LSTM and MDN-DNC. Each was trained for 1 epoch, on random rollouts of 10,000 games which recorded the frames from the game (pixels) and actions taken at each timestep. The model was then used to train a CMA-ES controller. Note that the controllers were trained in the “dream” environment simulated by the model, as done in World Models.

A simulation of the environment — “dream” — where the controller is used to train using the learned model only, rather than the actual environment.

The controllers were tested in the environment throughout the generations for a 100 rollouts at each test. The results are plotted below. The MDN-DNC based controller clearly outperformed the MDN-LSTM based controller, and solved the game (achieving a mean score of 750 over 100 rollouts).

Comparison of a DNC based model versus a LSTM based model, used for training a controller in ViZDoom: Take Cover. The DNC based controller outperforms the LSTM controller.


In the case of Pommerman, only the model’s predictions were used to test the capacity of the predictive model learned in a MDN-DNC and a MDN-LSTM. A controller was not trained. This was possible given that the states in Pommerman are coded as integers, rather than pixels. Thus, given [s_t + a_t], the predicted state [s_{t+1}] could be compared with the ground truth state from the actual game for equality, and to measure how many components of the state (position, ammo available, etc) were correctly predicted.

Here again, the MDN-DNC model outperformed the MDN-LSTM model, where both were trained exactly the same way for the same number of epochs. The MDN-DNC was more accurately able to predict the individual components of the next state given a current state and an action.

The predictive power of a DNC based model versus a LSTM based model in the Pommerman environment. The DNC based model was able to predict future states more accurately.


Model-based Reinforcement Learning or Evolution Strategies involve using a model of the environment when training a Reinforcement Learning or Evolution Strategies agent. In my case, the World Models approach to learn a predictive, probabilistic model of the environment in an Mixture Density Network was used. The Mixture Density Network consisted of a Differentiable Neural Computer, which output the parameters of a Gaussian mixture model that were used to sample the next state in a game. My experiments found that a model learned in a Differentiable Neural Computer outperformed a vanilla LSTM based model, on two gaming environments.

Future work should include games with long term memory dependencies, whereas with the experiments performed for this work it is hard to justify there being such dependencies in the ViZDoom: Take Cover and Pommerman environments. Other such environments would perhaps magnify the capabilities of the Differentiable Neural Computer. Also, what exactly is going on in the memory of the Differentiable Neural Computer at each timestep? It would be useful to know what it has learned, and perhaps features from the external memory of the Differentiable Neural Computer itself could be used when training a controller. For example, the Differentiable Neural Computer emits read heads, r_t, at each timestep, which are selected from the full memory, and used to produce the output (a prediction of the next state). Perhaps the contents of the read heads, or other portions of the external memory, could provide useful information of the environment if exposed directly to the controller along with the hidden state and cell state of the underlying LSTM.

Full details on this work can be found in my MSc thesis at:


[1] Alex Graves et al., Hybrid computing using a neural network with dynamic external memory, 2016.
[2] David Ha, Jürgen Schmidhuber, World Models, 2018.
[3] Nikolaus Hansen, The CMA Evolution Strategy: A Tutorial, 2016.
[4] Christopher M. Bishop, Mixture Density Networks, 1994.

World Models Implementation In Chainer

World Models is a framework described by
David Ha and Jürgen Schmidhuber: The framework aims to train an AI agent that can perform well in virtual gaming environments. World Models consists of three main components: Vision (V), Model (M), and Controller (C).

World Models Schematic

As part of my MSc Artificial Intelligence dissertation at the University of Edinburgh, I implemented World Models from the ground up in Chainer. My implementation was picked up by Chainer and tweeted:

David Ha himself also tweeted it:

The full implementation and more details can be found at:

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) in Python

According to OpenAI, Evolution Strategies are a scalable alternative to Reinforcement Learning. Where Reinforcement Learning is a guess and check on the actions, Evolution Strategies are a guess and check on the model parameters themselves. A “population” of “mutations” to seed parameters is created, and all mutated parameters are checked for fitness, and the seed adjusted towards the mean of the fittest mutations. CMA-ES is a particular evolution strategy where the covariance matrix is adapted, to cast a wider net for the mutations, in an attempt to search for the solution.

To demonstrate, here is a toy problem. Consider a shifted Schaffer function with a global minimum (solution) at f(x=10,y=10):

f(x,y)=0.5+\frac{sin^{2}[(x-10)^{2}-(y-10)^{2}]-0.5}{[1+0.001[(x-10)^{2}+(y-10)^{2}]]^{2}},\ \ \ \ \ f(10,10)=0

The fitness function F() for CMA-ES can be treated as the negative square error between the solution being tested, and the actual solution, against the Schaffer function:

F(s_1,s_2) = -(f(s_1,s_2) - f(10,10))^2

Therefore, the task for CMA-ES is to find the solution (s_1=10, s_2=10). Given the right population size \lambda and the right \mu for CMA-ES, it eventually converges to a solution. With \lambda=64 and \mu=0.25, a visualization of CMA-ES as it evolves a population over generations can be seen below.

The animation below depicts how CMA-ES creates populations of parameters that are tested against the fitness function. The blue dot represents the solution. The red dots the entire population being tested. And the green dot the mean of the population as it evolves, which eventually fits the solution. You see the “net” the algorithm casts (the covariance matrix) from which the population is sampled, is adapted as it is further or closer to the solution based on the fitness score.

CMA-ES in action, quickly finding a solution to a shifted Schaffer function.

A simple (yet powerful) implementation for CMA-ES in Python, and this toy problem, is available at I translated the (mu/mu_w, lambda)-CMA-ES algorithm to Python.

Mixture Density Networks in Chainer

There are excellent blog posts and guides available on Mixture Density Networks, so I will not try to replicate the effort. This post provides a quick summary, and implementation code in the Chainer deep learning framework.

In summary, Mixture Density Networks are Neural Networks that output the parameters of a Mixture Model, such as a Gaussian Mixture Model, instead of the desired output itself. The Mixture Model is then sampled from to get the final output. This is particularly useful when given a certain input, there could be multiple outputs based on some probability.

The outputs from the neural network, in the case of Gaussian Mixture Models, include a set of probabilities \alpha (coefficients), set of means \mu, and set of standard deviations \sigma. For example, if output is y given an x, and you choose to have 3 Gaussian mixtures, the output of your neural network would be: \alpha_1, \alpha_2, \alpha_3, \mu_1, \mu_2, \mu_2, \sigma_1, \sigma_2, \sigma_3. The \alpha sum to 1 and represent the probability of each mixture being the likely candidate, and the \mu and \sigma represent the distribution of y within the given mixture and can be used to sample y.

Take as a toy example consider the dataset of (x, y) coordinates represented in the graph below:

Toy dataset of (x, y) coordinates graphed.

The blue dots represent the desired y value given an x value. So at x=0.25, y could be {0, 0.5, 1} (roughly). In this case, training a neural network using Mean Squared Error to output y directly will cause the network to learn the average of the y values given x as inputs, as shown below (red dots represent y output from the neural network):

A neural network trained with Mean Squared Error produces an averaged output.

For this type of problem, a Mixture Density Network is perfectly suited. If trained properly, and sampled from enough times given all x values in the dataset, the Mixture Density Network produces the following y outputs. It better learns the distribution of the data:

A Mixture Density Network can learn the distribution of the data.

The output of the neural network is simply the number of dimensions in your output (1 in this example), times the number of desired mixtures, times 3 (the coefficient, mean, and standard distribution). In Chainer, a Linear layer can be used to output these numbers.

The loss function is essentially the negative log of the Gaussian equation multiplied by the softmax’d coefficients: -\ln\{\alpha\frac{1}{\sqrt{2\pi}\sigma}\exp{-\frac{(y-\mu)^2}{2\sigma^2}}\}, where \sum\alpha=1 [1]. This can be represented in Chainer easily as follows:

alpha = F.softmax(alpha)
density = F.sum(
    alpha *
    (1 / (np.sqrt(2 * np.pi) * F.sqrt(var))) *
    F.exp(-0.5 * F.square(y - mu) / var)
    , axis=1)
nll = -F.sum(F.log(density))

Full implementation of a Mixture Density Network in Chainer, for the toy problem shown above, can be found at:

[1] Christopher M. Bishop, Mixture Density Networks (1994)

Conditional GANs For Painting and Celebrity Generation


As a 4-month research project for a course at the University of Edinburgh, my group decided to use Generative Adversarial Nets to generate new paintings using the Painter by Numbers dataset. Additionally, we conditioned the paintings, allowing the generation of paintings with certain attributes (such as gender in the case of paintings of portraits). The project turned out to be a big success, and won 2nd place out of 124 other projects in a competition hosted by IBM!

The deep convolutional GAN architecture we chose for the basis of our model enforced a Lipschitz continuity constraint on the spectral norm of the weights for each layer in the discriminator, as proposed by Miyato et al. (2018). This technique, called SN-GAN by the authors, stabilizes the discriminator, allowing the generator to learn the data distribution more optimally.

We built upon the SN-GAN architecture to add conditioning, which involved introducing a one-hot encoded y vector of supervised labels to the generator and discriminator. The y vector provides some information about the image, such as gender in the case of portrait paintings, allowing the generator to get conditioned on the label to generate the class of image specified during test time. Our contribution was a novel model which we called SN-conditional-GAN or SNcGAN, which is illustrated below.

The SNcGAN architecture. The one-hot encoded y label is introduced to the generator and discriminator.

More details can be found in our paper linked above.

After the publication of the paper, we additionally trained a SNcGAN model on the celebA dataset, allowing the generation of conditioned celebrity images.

Calculate CryptoCurrency cross-exchange arbitrage in Java

CryptoCurrency cross-exchange arbitrage means buying or selling CryptoCurrency at separate exchanges to attempt to profit off the price differences for that currency at the exchanges.

Below I’ve pasted the Java code for a small project I undertook, which I later quickly abandoned. But I thought it might be useful to someone else, to give an idea how it could be done. The code is a hodge-podge and not very well put together. Take it for what it is, a quick experiment and proof of concept.

In summary, it looks at two CryptoCurrencies: Bitcoin and Ethereum. Across 3 exchanges. (Can easily be extended to do more currencies and exchanges). It gets the live books using the exchanges’ RESTful API (most provide these), and bases calculations off real asks and bids currently on the order books of each exchange. It then takes all combinations of buying and selling a crypto at one exchange vs another, and outputs what profits (or losses) could be had at the current moment. And that’s it.

It’s a nice tool for exploring what arbitrage opportunities may exist out there, and that’s all. It does not do any trading. Proceed with caution!

package cryparb;

import java.nio.file.Files;
import java.nio.file.StandardOpenOption;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.Date;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.TimeZone;

import org.json.JSONArray;
import org.json.JSONObject;

public class Calculate 
	public static final float INITIAL_TRADE_AMOUNT_ETH = 50f;
	public static final float INITIAL_TRADE_AMOUNT_BTC = 10f;
	public static final int WEBSERVICE_GET_RETRY_LIMIT = 3;
	public static final int WEBSERVICE_GET_RETRY_AFTER_SECS = 15;
	public static final int RETAIN_BOOKS_CACHE_SECS = 30;
	public static final String QUADRIGACX = "quadrigacx";
	public static final String GEMINI = "gemini";
	public static final String GATECOIN = "gatecoin";
	public static final LinkedHashMap<String,Exchange> exchanges = new LinkedHashMap<String,Exchange>()
				new Exchange(
				new Exchange(
				new Exchange(
	public static final HashMap<String,JSONObject> booksCache = new HashMap<String,JSONObject>();	

	public static final String QUADRIGACX_BOOKS_URL = "";
	public static final float QUADRIGACX_TRADE_FEE_PERCENT = 0.2f;
	public static final float QUADRIGACX_FIXED_WITHDRAWAL_FEE_BTC = 0f;
	public static final float QUADRIGACX_FIXED_WITHDRAWAL_FEE_ETH = 0f;
	public static final String QUADRIGACX_BIDS_KEY = "buy";
	public static final String QUADRIGACX_ASKS_KEY = "sell";
	public static final String QUADRIGACX_AMOUNT_KEY = "amount";
	public static final String QUADRIGACX_RATE_KEY = "rate";

	public static final String GEMINI_BOOKS_URL = "";
	public static final float GEMINI_TRADE_FEE_PERCENT = 0.25f;
	public static final float GEMINI_FIXED_WITHDRAWAL_FEE_BTC = 0f;
	public static final float GEMINI_FIXED_WITHDRAWAL_FEE_ETH = 0f;
	public static final String GEMINI_BIDS_KEY = "bids";
	public static final String GEMINI_ASKS_KEY = "asks";
	public static final String GEMINI_AMOUNT_KEY = "amount";
	public static final String GEMINI_RATE_KEY = "price";

	public static final String GATECOIN_BOOKS_URL = "";
	public static final float GATECOIN_TRADE_FEE_PERCENT = 0.35f;
	public static final float GATECOIN_FIXED_WITHDRAWAL_FEE_BTC = 0f;
	public static final float GATECOIN_FIXED_WITHDRAWAL_FEE_ETH = 0f;
	public static final String GATECOIN_BIDS_KEY = "bids";
	public static final String GATECOIN_ASKS_KEY = "asks";
	public static final String GATECOIN_AMOUNT_KEY = "volume";
	public static final String GATECOIN_RATE_KEY = "price";

	public static final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36";
    public static class Exchange
    	public String name;
    	public String booksUrl;
    	public float tradeFeePercent;
    	public float fixedWithdrawalFeeBtc;
    	public float fixedWithdrawalFeeEth;
    	public String bidsKey;
    	public String asksKey;
    	public String amountKey;
    	public String rateKey;
    	public Exchange(
	    	    	String name,
	    	    	String booksUrl,
	    	    	float tradeFeePercent,
	    	    	float fixedWithdrawalFeeBtc,
	    	    	float fixedWithdrawalFeeEth,
	    	    	String bidsKey,
	    	    	String asksKey,
	    	    	String amountKey,
	    	    	String rateKey
    	{ = name;
        	this.booksUrl = booksUrl;
        	this.tradeFeePercent = tradeFeePercent;
        	this.fixedWithdrawalFeeBtc = fixedWithdrawalFeeBtc;
        	this.fixedWithdrawalFeeEth = fixedWithdrawalFeeEth;
        	this.bidsKey = bidsKey;
        	this.asksKey = asksKey;
        	this.amountKey = amountKey;
        	this.rateKey = rateKey;
    public static class FullLoop 
    	public String originExchange;
    	public String terminationExchange;
    	public String originCryptoCurrency;
    	public float originAmount;
    	public float profit;
    	public Date timestamp;
    	public FullLoop(String originExchange, String terminationExchange, String originCryptoCurrency, float originAmount, float profit, Date timestamp)
    		this.originExchange = originExchange;
        	this.terminationExchange = terminationExchange;
        	this.originCryptoCurrency = originCryptoCurrency;
        	this.originAmount = originAmount;
    		this.profit = profit;
    		this.timestamp = timestamp;

    public static String getUrlAsString(String urlString) throws Exception 
        StringBuilder response = new StringBuilder();
        int retry = 0;
                URL url = new URL(urlString);
                HttpURLConnection httpUrlConnection = (HttpURLConnection) url.openConnection();
        		httpUrlConnection.setRequestProperty("User-Agent", USER_AGENT);
                BufferedReader in = new BufferedReader(
                                        new InputStreamReader(
                String inputLine;
                while ((inputLine = in.readLine()) != null) 
        	catch(Exception e)
        		System.out.println("Unable to fetch "+urlString+". Will sleep and try again.");
        		try { Thread.sleep(WEBSERVICE_GET_RETRY_AFTER_SECS*1000); } catch(Exception e1) { }

        return response.toString();

	public static float convertEthToBtcAtExchange(String exchangeName, float amountEthToConvertOriginal) throws Exception
		float btcGotten = 0;
		Exchange exchange = exchanges.get(exchangeName);
		float amountEthToConvert = amountEthToConvertOriginal - (exchange.tradeFeePercent/100)*amountEthToConvertOriginal;
		System.out.println("Converting eth->btc on "+exchangeName+": "+amountEthToConvertOriginal+"eth, after fee ("+exchange.tradeFeePercent+"%) amount: "+amountEthToConvert+"eth");

		JSONObject exchangeBooks = booksCache.get(exchangeName);
		if(exchangeBooks==null || (exchangeBooks.has("timestamp") && new Date().getTime()-exchangeBooks.getLong("timestamp")>(RETAIN_BOOKS_CACHE_SECS*1000)))
			System.out.println("Refreshing books");
			exchangeBooks = new JSONObject(getUrlAsString(exchange.booksUrl));
			exchangeBooks.put("timestamp", new Date().getTime());
			booksCache.put(exchangeName, exchangeBooks);
			System.out.println("Using cached books");
		JSONArray exchangeBids = exchangeBooks.getJSONArray(exchange.bidsKey);
		float ethRemaining = amountEthToConvert;
		for(int i=0; i<exchangeBids.length(); i++)
			JSONObject thisBid = exchangeBids.getJSONObject(i);
			float amount = Float.parseFloat(thisBid.get(exchange.amountKey).toString());
			float rate = Float.parseFloat(thisBid.get(exchange.rateKey).toString());
			float amountEthSpentOnThisOrder = ethRemaining<amount?ethRemaining:amount;
			float amountBtcGottenFromThisOrder = amountEthSpentOnThisOrder*rate;
			btcGotten += amountBtcGottenFromThisOrder;
			ethRemaining -= amountEthSpentOnThisOrder;
			System.out.println("Amount: "+amount+". Rate: "+rate+". Eth remaining before this order: "+(ethRemaining+amountEthSpentOnThisOrder)+". Selling "+amountEthSpentOnThisOrder+"eth, amounting to "+amountBtcGottenFromThisOrder+"btc. Total btc gotten so far: "+btcGotten+". Eth remaining after order: "+ethRemaining);
		System.out.println("Total eth spent: "+amountEthToConvertOriginal+". Total btc gotten: "+btcGotten+".");

		//TODO apply btc withdrawl fee if applicable
		return btcGotten;
	public static float convertBtcToEthAtExchange(String exchangeName, float amountBtcToConvertOriginal) throws Exception
		float ethGotten = 0;
		Exchange exchange = exchanges.get(exchangeName);
		float amountBtcToConvert = amountBtcToConvertOriginal - (exchange.tradeFeePercent/100)*amountBtcToConvertOriginal;

		System.out.println("Converting btc->eth on "+exchangeName+": "+amountBtcToConvertOriginal+"btc, after fee ("+exchange.tradeFeePercent+"%) amount: "+amountBtcToConvert+"btc");
		JSONObject exchangeBooks = booksCache.get(exchangeName);
		if(exchangeBooks==null || (exchangeBooks.has("timestamp") && new Date().getTime()-exchangeBooks.getLong("timestamp")>15000))
			System.out.println("Refreshing books");
			exchangeBooks = new JSONObject(getUrlAsString(exchange.booksUrl));
			exchangeBooks.put("timestamp", new Date().getTime());
			booksCache.put(exchangeName, exchangeBooks);
			System.out.println("Using cached books");

		JSONArray exchangeAsks = exchangeBooks.getJSONArray(exchange.asksKey);
		float btcRemaining = amountBtcToConvert;
		for(int i=0; i<exchangeAsks.length(); i++)
			JSONObject thisAsk = exchangeAsks.getJSONObject(i);
			float amount = Float.parseFloat(thisAsk.get(exchange.amountKey).toString());
			float rate = Float.parseFloat(thisAsk.get(exchange.rateKey).toString());
			float amountBtcAvailableOnThisOrder = amount*rate;
			float amountBtcSpentToThisOrder = btcRemaining<amountBtcAvailableOnThisOrder?btcRemaining:amountBtcAvailableOnThisOrder;
			float amountEthGottenFromThisOrder = amountBtcSpentToThisOrder/rate;
			ethGotten += amountEthGottenFromThisOrder;
			btcRemaining -= amountBtcSpentToThisOrder;
			System.out.println("Amount: "+amount+". Rate: "+rate+". Btc available on this order: "+amountBtcAvailableOnThisOrder+". Btc remaining before this order: "+(btcRemaining+amountBtcSpentToThisOrder)+". Selling "+amountBtcSpentToThisOrder+"btc, amounting to "+amountEthGottenFromThisOrder+"eth. Total eth gotten so far: "+ethGotten+". Btc remaining after order: "+btcRemaining);
		System.out.println("Total btc spent: "+amountBtcToConvertOriginal+". Total eth gotten: "+ethGotten+".");

		//TODO apply eth withdrawl fee if applicable

		return ethGotten;
	public static String printProfits(ArrayList<FullLoop> fullLoopList)
		StringBuilder profitsStr = new StringBuilder();
		for(int i=0; i<fullLoopList.size(); i++)
			FullLoop fullLoop = fullLoopList.get(i);
			profitsStr.append(fullLoop.timestamp+" - Profit if started with "+fullLoop.originAmount+fullLoop.originCryptoCurrency+" at origin exchange '"+fullLoop.originExchange+"' and ended at termination exchange '"+fullLoop.terminationExchange+"': "+fullLoop.profit+fullLoop.originCryptoCurrency+"\n");
		return profitsStr.toString();

	public static void main(String args[]) throws Exception
		File logFile = new File("log.txt");
		ArrayList<String> exchangeNames = new ArrayList<String>(exchanges.keySet());
		LinkedHashMap<Date,ArrayList<FullLoop>> fullLoops = new LinkedHashMap<Date,ArrayList<FullLoop>>();
			Date currentTimestamp = new Date();
			ArrayList<FullLoop> fullLoopList = new ArrayList<FullLoop>();
			fullLoops.put(currentTimestamp, fullLoopList);
			for(int i=0; i<exchangeNames.size(); i++)
				for(int j=0; j<exchangeNames.size(); j++)
					String originExchangeName = exchangeNames.get(i);
					String terminationExchangeName = exchangeNames.get(j);
					System.out.println("Origin '"+originExchangeName+"'. Termination '"+terminationExchangeName+"'");
					float btcGottenOriginExchange = convertEthToBtcAtExchange(originExchangeName, INITIAL_TRADE_AMOUNT_ETH);
					float ethGottenOriginExchange = convertBtcToEthAtExchange(originExchangeName, INITIAL_TRADE_AMOUNT_BTC);
					float ethRevertedTerminationExchange = convertBtcToEthAtExchange(terminationExchangeName, btcGottenOriginExchange);
					float btcRevertedTerminationExchange = convertEthToBtcAtExchange(terminationExchangeName, ethGottenOriginExchange);
					float profitEth = ethRevertedTerminationExchange-INITIAL_TRADE_AMOUNT_ETH;
					float profitBtc = btcRevertedTerminationExchange-INITIAL_TRADE_AMOUNT_BTC;
					fullLoopList.add(new FullLoop(originExchangeName, terminationExchangeName, "eth", INITIAL_TRADE_AMOUNT_ETH, profitEth, currentTimestamp));
					fullLoopList.add(new FullLoop(originExchangeName, terminationExchangeName, "btc", INITIAL_TRADE_AMOUNT_BTC, profitBtc, currentTimestamp));

			Collections.sort(fullLoopList, new Comparator<FullLoop>() {
			    public int compare(FullLoop o1, FullLoop o2) {
			        return new Float(o2.profit).compareTo(o1.profit);
			String profitsStr = printProfits(fullLoopList);
			Files.write(logFile.toPath(), (profitsStr+"\n").getBytes(), StandardOpenOption.CREATE, StandardOpenOption.APPEND);

Resuming a HTTP download in Java

Let’s say you’re downloading a large file using Java, like so:

File file = new File("/downloads/some.file.ext");
URL url = new URL("http://some.url/some.file.ext");
HttpURLConnection httpUrlConnection = (HttpURLConnection) url.openConnection();
//...any other httpUrlConnection setup, such as setting headers
BufferedInputStream in = new BufferedInputStream(httpUrlConnection.getInputStream());
FileOutputStream fos = new FileOutputStream(file);
BufferedOutputStream bout = new BufferedOutputStream(fos, 1024);
    byte[] data = new byte[1024];
    int x = 0;
    while ((x =, 0, 1024)) >= 0) 
    	bout.write(data, 0, x);
catch(Exception e)
	throw e;

Now let’s say the file already exists locally and you need to resume it. Assuming the HTTP server on the other end supports file resume using the HTTP Range header, you can simply do the following:

//Add this right after you initialize httpUrlConnection but before beginning download
	httpUrlConnection.setRequestProperty("Range", "bytes="+file.length()+"-");

//And then you'd initialize the file output stream like so:
	fos = new FileOutputStream(file, true); //resume download, append to existing file
	fos = new FileOutputStream(file);

And that’s it! The Range header you we set for httpUrlConnection does the magic.