fredag 21 juli 2017

Solving OpenAI Gym's CartPole-v0 environment

This is a cross-post of the write-up of my submission for OpenAI's CartPole problem (with some corrections and clarifications).

Introduction and background:

I'm a novice when it comes to Reinforcement Learning but it's a subject I've become obsessed with lately. I've taken a couple of courses and done some machine learning experiments before (more specifically, a supervised learned bot for Codingame's Coders Strike Back), but on reinforcement learning, pretty much all of which I've read is Andrej Karpathy’s Pong from pixels (which is also how I found the AI Gym, incidentally).

I did not use any framework for the neural network, it is based on my own rudimentary architecture for supervised learning.

As for environment, I used deeplearning4j's gym-java-client to be able to code in Java. I should also mention I had some issues with crashes in the simulator. This is why the evaluation is so short and probably why it converges so early (it forced me to optimize so it would succeed before crashing).
You can see the evaluation and write-up here: neph1's evaluation


I trained a logistic neural network with two hidden layers. Input nodes were the observations of the last 5 steps.
I believe I'm using a gradient policy approach.

Neural Network Layout:

The size of the network had to be carefully designed. Deviating from the described network by only one or two nodes can cause severe degradation in performance. See “Alternatives and Red Herrings” for information about other designs I tried.
20 input nodes (observations from 5 latest steps)
26 hidden nodes in first layer
14 hidden nodes in second layer
2 output nodes
All nodes are logistic. Layers are fully interconnected

Initial Weights:

All weights are randomly initialized to a value between -1.0 and 1.0.

Learning Rate:

Learning rate matters, but it is also quite forgiving. Anything between 0.15 and 0.7 works. But it seems keeping it around 0.25 is slightly better. This evaluation used 0.28 I believe.


The input each step is divided  by the (observed) bounds of that range. The exception is value 2 and 4 where I found that blowing these up (dividing by 0.05) led to a significant performance in learning. I expect this is due to enforcing a low velocity. Those values are then also capped to between -5 and 5. I believe that by blowing up the values I do something that RL purists might not be OK with, since I essentially tell the algorithm which values to focus on.


I tried a number of different scoring methods.
What worked best in the end was:
Compare position of each observed input with the one from the previous.
If better (closer to zero), set gradients to positive (1.0), otherwise negative (-1.0). I’ve tried more variable gradients, like using the score directly, but it didn’t work as well.

 float score = 0;   
   for (int i = 0; i < 4; i++) {   
    float delta = (Math.abs(observation[i + 4]) - Math.abs(observation[i]));   
    score += delta;   
   if (score > 0) {   
    score = 1f;   
   } else {   
    score = -1f;   
   float[] scores = lastAction == 0 ? TEMP_SCORE_LEFT : TEMP_SCORE_RIGHT;   
   decisions.add(new float[][]{lastInput, scores});   
   updateScore(3, score);   

Then also update the previous two inputs with a diminishing value in the updateScore method. The factor seems to be forgiving, 0.55 to 0.95 have been tried, where 0.8 seems to be a good level).

 private void updateScore(int turnsAgo, float scoreChange) {  
   int size = decisions.size();  
   int min = Math.min(size, turnsAgo);  
   for (int i = 0; i < min; i++) {  
     float[] scores = decisions.get(size - i - 1)[1];  
     // increase scores by scoreChange  
     scoreChange *= 0.8f;  


When it came to training and using the training data I tried two different models.
1. Keep neural network between episodes. Train using only the data from the latest episode.
2. Reinitialize network each episode (keep initial weights). Keep all training data and train on whole set (all episodes).
In both cases the training data is shuffled before use and the back propagated.
Apart from the obvious benefit of less total training time on model 1, model 2 also led to much better learning performance.

Soft Max:

For a long time I used Soft Max to determine the action. Once the network improved, it became obvious that a deterministic approach led to better outcome (just use whichever output is higher).


To observe performance I printed out the following information after each episode:
Episode number
Score for episode
Average over all episodes so far
Rolling average (this differed from between 10 and 100 episodes)
Highest value so far (less interesting once reaching highest score was frequent. Still interesting to see when it reaches 200 the first time).
Total reward

Alternatives and red herrings:

Possible to train with 6 time steps (but not at same level of performance)
24 input nodes
32 hidden nodes in first layer
16 hidden nodes in second layer
2 output nodes

One scoring method I tried was to compare relative movement with the previous step. The idea was that even if the model made a good move, if the velocity in the opposite direction was too high, it wouldn't get noticed when just watching the position of the rod and cradle. So, by comparing delta v of the previous move, I thought it might be possible to spot those things. But either my theory was wrong, or I didn't manage to find good values for the scoring.

Another thing I experimented with was comparing the current position(s) with that more than one step back. Again, I was not able to find any good way of factoring this information into the scoring. So instead I placed my bets on the network figuring this out for itself.

At one point when I was thinking about how to improve the learning, I thought that more tranining examples would be a good thing (it's one of the fundamental rules in supervised learning, after all). Since the observation space is symmetric (or so I thought), I should be able to mirror all observations and scorings and get twice as many training examples. Awesome! Right? It turned out that wasn't the case. I was not able to use the mirrored data. I still think this would be worth exploring further.

Further improvements:

I still think mirroring the training examples should be possible. Doubling the training data should improve it significantly.

Given the design of the network, it seems a Recurrent Neural Network (RNN) would be a good alternative.

lördag 3 december 2016

Playing video in jMonkeyEngine 3 using vlcj

A little while ago I made a basic example of how to play video on a texture in jMonkeyEngine 3. I used vlcj as the backing video player due to its relative ease of use and good tutorials. I'm making this companion post to describe the method (code) and how to set it up (config).

This describes how to set it up in Netbeans on Windows (should be the same for other Maven projects).
First, set up vlcj and build it:
  1. Download vlcj.
  2. Set it up as a new project in Netbeans.
  3. Change config to 'release'
  4. Download suitable VLC.
  5. Place libvlc and libvlccore in project root/win32-x86-64
  6. Distribution ends up as a zip the project root/target folder.
Now you can set up a project that will use the vlcj build and point it to use the .jars in the target folder as libraries.

I created a JmeRenderCallbackAdapter class based on this Direct Rendering tutorial. The class takes a Texture2D as an input and will update its ImageBuffer on every onDisplay call.
The method is very basic and there's an improvement to be made by using the native buffer directly.

The TestVlcj class shows how to display the result.
The setupView method creates a Quad and an Unshaded material to display the video on.
It also creates a Texture2D (which is passed to JmeRenderCallbackAdapter) and an Image the same size as the application.
The way BufferFormatCallback and DirectMediaPlayerComponent work are again taken directly from the Direct Rendering tutorial.
TestVlcj takes a path to a video as input argument. The input argument can be overridden and mediaFile can be hard coded for testing.

måndag 13 juli 2015

Running jMonkeyEngine 3 on Android using AndroidHarnessFragment

There's a new way of running jMonkeyEngine 3 on Android, using fragments. Since I couldn't find a description of how to do it or a use case, I thought I'd write down how I did.
The old (and still functional) used a class called AndroidHarness that extended Activity and contained all the app specific information in it. This post describes how to use the new AndroidHarnessFragment.
  • First of all, AndroidHarnessFragment contains a lot of app-specific settings. The most important of these are appClass, which is a String containing the qualified name of the Application to run. This and a lot of other fields are protected and it seems the intended way of using the class is by extending it. This way you can set all of them in the constructor of the new class, even if there are other ways more aligned with Android conventions (such as using a Bundle).
  • Previously the auto-generated MainActivity class extended AndroidHarness. Now it should just extend Activity.
  • To create the fragment, we can create a layout file for the activity:
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android=""
<fragment android:name="com.mycompany.mygame.MyAndroidHarnessFragment"
android:layout_height="match_parent" />
  • Finally we tell MainActivity to use the layout file we just created.
protected void onCreate(Bundle savedInstanceState) {
That's all that seems to be needed to run jMonkeyEngine inside a fragment!

söndag 22 februari 2015

Google Cardboard support for jMonkeyEngine

I've just commited support for Google Cardboard in jMonkeyEngine 3. I plan on doing a post with more details of the inner-workings, but in the meantime here's a brief outline and how to use it.

First of all, it's a completely separate integration from the Android VR support project I started. This uses the Google Cardboard API directly and in a way that didn't fit with the architecture in that project.

Why is it useful?

It enables those who wish to use a complete game development package to deploy their application as a Google Cardboard app.

So, how do I use it?

Download the jme-cardboard.jar from the repo and add it to your jMonkeyEngine project.
  1. Turn on Android deployment for your project (Properties/Application/Android)
  2. In the generated (Important Files/Android Main Activity) have it extend CardboardHarness instead of AndroidHarness.
  3. Change the appClass in the same to your project's application file.
That should be it. For an example application, check out CardboardStarTravel example in the test package.

If you wish to build the sources, it needs to have an android.jar attached to the project.

Known issues.
Drift with the accelerometer is pretty bad. Don't know if there is anything to do about it on the application side.
Movement is fairly jittery. Adding a filter for the accelerometer might be desirable.

Like I said, there will be more to come!

lördag 24 januari 2015

Designing a virtual reality HMD for smart phones

Update: Design is now available on Thingiverse.

This is supposed to be about software, I know. But without proper hardware it's impossible to write any software. I realized when putting together the Android VR library for jMonkeyEngine 3.0 that I had nothing to test it on.
The quickest solution to that problem would be cutting out a Google Cardboard. I'm not particularly fond of cardboard, however and without it being laser-cut it would look terrible. I have the benefit of owning a 3D-printer (Prusa i3) so I thought I'd have a go at designing my own HMD inspired by the Google Cardboard schematics. They can only be used so far, though, since they're meant to be cut and folded. Using a 3D-printer there are benefits of being able to print complex geometry directly.
I decided to build it in 3 steps, each as simple as possible to avoid overhang problems. The first one would be the cradle where the phone would rest.
I based the measurements around my Samsung Galaxy S4 and tried to design it to allow access to the buttons on the side of the phone as well as the USB and audio. In general, I tried to leave as much space as possible on the sides for different phone types.
Printing time is always an issue with hobby printers, which is why I left the back side open. This and to allow the battery some fresh air. There are holes in it as well (the design anyway. The printer doesn't really make them). These are for if at some point one would like to mimick the Oculus Rift DK2's positional tracking by placing some LED's there.

Next, I went to the piece next to the eyes as the middle part would just be about creating some distance between the lenses and the screen (or so I thought). I happen to have an Oculus Rift DK1 which isn't seeing much use now, so I decided to butcher two of the eye cups for lenses. They are 36mm in diameter, which should give some additional FOV compared to the 25mm recommended for the Google Cardboard. Apart from making good fittings for the lenses the biggest challenge with this piece was making it fit well around the face.

I think I actually spent most of the time making the middle piece. Modelling a good cup for the nose was a big challenge and I've scrapped several prints due to it not being well printed. It's very spacious, and should suit most nose shape and sizes. The other thing with it is that it's slightly wider at the bottom than at the top. This is because I made the piece next to the face slightly more narrow than the phone cradle. In the case I build a Note-sized cradle, this would be even more pronounced.

Below is the current state of the prototype. It's working very well together with the Google Cardboard demos. I can see some 5 mm outside of my S4 screen at the top and bottom, so maybe an S5 would be perfect for these lenses.

I want to have a fitting for a magnet on it as well but I seem to have lost the magnets I bought, for now..
I plan on sharing these as well as a BOM for a complete HMD once I'm happy with the design. Stay tuned for more.

fredag 2 januari 2015

Virtual Reality for Android using jMonkeyEngine

Oculus Rift may be leading the pack currently, but I'm sure there will be more contenders for the virtual reality throne, shortly. So, while the Oculus Rift plugin was a good start I think it is time to look into what it would take to support more devices. The architecture established for the Oculus Rift plugin is good enough and I decided to see how much effort it would be to implement a basic virtual reality API for Android. After all, the low-budget Google Cardboard probably makes it the most accessible device of all.

You can find the repository for the project here:


It's implemented with ease of usage in mind. An application wishing to use it needs to do two things.
  1. Create a VRAppState instance and supply a suitable HeadMountedDisplay (currently either a DummyDisplay or AndroidDisplay).
AndroidDisplay display = new AndroidDisplay();
VRAppState vrAppState = new VRAppState(display);
  1. For controls, get the StereoCameraControl from the VRAppState and add it as a Control to a Spatial. It will now follow the Spatial through the world.
Node observer = new Node("");
See AndroidVRTest for an example implementation. 


Like I stated in the beginning, it follows closely what has already been implemented in the Oculus Rift plugin but classes have been abstracted to allow for more diverse, future, implementations.
It revolves around class called VRAppState. This class sets up two viewports and a StereoCameraControl which handles the two different views.
The StereoCameraControl class gets its data (currently only rotation) from a class implementing a HeadMountedDisplay interface. In this example it's called AndroidDisplay. The AndroidDisplay class accesses the Android application and registers itself as a SensorEventListener for the Accelerometer and Magnetometer. The default update delay is way too slow, so it uses SENSOR_DELAY_GAME instead.
sensorManager = (SensorManager) JmeAndroidSystem.getActivity().getApplication().getSystemService(Activity.SENSOR_SERVICE);
accelerometer = sensorManager.getDefaultSensor(Sensor.TYPE_ACCELEROMETER);
magnetometer = sensorManager.getDefaultSensor(Sensor.TYPE_MAGNETIC_FIELD);
sensorManager.registerListener(this, accelerometer, SensorManager.SENSOR_DELAY_GAME);
sensorManager.registerListener(this, magnetometer, SensorManager.SENSOR_DELAY_GAME);
Once sensor data is updated it's received by the onSensorChanged method. It updates our local values and confirms that data has been received before getting the rotational data of the device in the form of a Matrix. This is stored in a temporary field and then orientation is interpolated towards it. This was due to using the raw data was much too jittery.
public void onSensorChanged(SensorEvent event) {
 if (event.sensor.getType() == Sensor.TYPE_ACCELEROMETER) {
  gravity = event.values;
 } else if (event.sensor.getType() == Sensor.TYPE_MAGNETIC_FIELD) {
  geomagnetic = event.values;
 if (gravity != null && geomagnetic != null) {
  boolean success = SensorManager.getRotationMatrix(R, I, gravity, geomagnetic);
  if (success) {
   SensorManager.getOrientation(R, orientationVector);
   tempQuat.fromAngles(orientationVector[2], -orientationVector[1], orientationVector[0]);
orientation.slerp(tempQuat, 0.2f);

It also needs to know about the physical size of the screen. This is used by the distortion shader. With some conversion it can be deducted from the Android applications WindowManager.
DisplayMetrics displaymetrics = new DisplayMetrics();
float screenHeight = displaymetrics.heightPixels / displaymetrics.ydpi * inchesToMeters;
float screenWidth = displaymetrics.widthPixels / displaymetrics.xdpi * inchesToMeters;
This and other information is stored in a class inspired by the Oculus Rift HMDInfo, called HeadMountedDisplayData. This contains data on the HMD itself, like distance between lenses, distance from screen to lens, resolution, etc.

The shader is using the same principle established early in the Oculus Rift plugin which itself was inspired by an example implementation on the Oculus Developer web site (it seems it has since been removed from the website. If anyone has a link, please let me know). Each display has a post processing filter and the necessary distortion correction is done in a fragment shader. It begins with the class called BarrelDistortionFilter which is instantiated in the VRAppState class.
The BarrelDistortionFilter takes the information from the HeadMountedDisplayData and creates a projection matrix for the Camera associated with its ViewPort. It also prepares some variables for the shader.
The scaleFactor value is an arbitrary number used to fit a specific screen. This most likely needs a formula for different screen sizes.


jMonkeyEngine Oculus Rift plugin:
Sensors overview:
Registering sensors and reading orientation data:
Google Cardboard:

lördag 6 december 2014

Free floating VR menu in jMonkeyEngine with Oculus Rift

The recommended way of displaying GUI and menus in VR is to not have it static, tied to the ”screen”. This despite the first thing you're being greeted by when using the Oculus Rift is a big warning screen pasted on your face.
The idea behind a free-floating GUI is that it's more natural and adding to the comfort of the user (which is a big thing in VR!). When it comes to menus this also has the benefit of making the user able to do selections without any other controller than the HMD itself.
In this tutorial we'll create a menu that lets the user select menu items simply by looking at them for a certain amount of time. I've chosen to write it similar to Packt's Cookbooks since it's something I'm familiar with and it will hopefully make owners of the jMonkeyEngine 3.0 Cookbook feel right at home!

Disclaimer: The API for the Oculus Rift plugin is in no way final and it might change after this tutorial is written.


We'll need the Oculus Rift plugin for jMonkeyEngine. How to download and set it up can be found here:
You can also find the full code for this tutorial in the "examples" folder.
This tutorial won't explain how the OVRApplication works, for this it's recommended to check out the project site's wiki. The tutorial will be implemented in two classes. One application class containing the basics and an AppState which will contain all the menu specific code.


We'll start with the application class which is implemented in 5 steps.
  1. For this example we start by creating a class that extends OVRApplication.
  2. In the simpleInit method we start by removing the default GUI elements like this:
  1. Next we set up the GUI node for manual positioning and scale it somewhat.
  1. Like in the example application for OVRApplication we create an observer node and assign the StereoCameraControl to it.
Node observer = new Node("Observer");
observer.setLocalTranslation(new Vector3f(0.0f, 0.0f, 0.0f));
  1. Now we add two lines to the simpleUpdate method to keep the GUI in place.
guiNode.setLocalTranslation(cam.getLocation().add(0, 0, 0.5f));
guiNode.lookAt(cam.getLocation(), Vector3f.UNIT_Y);
Apart from revisiting to add the AppState, that's it for the application class! The basics for the menu will be done in the following 8 steps.
  1. We start by creating a class called MenuAppState that extends AbstractAppState.
  2. We define a number of colors we'll use to apply to the menu items to indicate their status.
private static ColorRGBA DEFAULT_COLOR = ColorRGBA.DarkGray;
private static ColorRGBA SELECT_COLOR = ColorRGBA.Gray;
private static ColorRGBA ACTIVATE_COLOR = ColorRGBA.White;
  1. Its constructor should receive a width, height and the OculusGuiNode, all which it stores in private fields.
  2. We also create and store a Vector2f called screenCenter which should be initialized with width * 0.5f and height * 0.5f;
  3. In the initialize method we start by creating and storing a Node called menuNode and a Ray called sightRay.
  4. We set up a simple unshaded material to apply on the menu items and give it the DEFAULT_COLOR.
Material mat = new Material(app.getAssetManager(), "Common/MatDefs/Misc/Unshaded.j3md");
mat.setColor("Color", DEFAULT_COLOR);
  1. Next we create a number of menu items in the form of Quad's. They're set up in a grid and each gets its own clone of the material before being attached to the menuNode.
for(int x = -2; x < 2; x++){
for(int y = -2; y < 2; y++){
Geometry menuQuad = new Geometry("Test ", new Quad(width * 0.25f, height * 0.25f));
menuQuad.setLocalTranslation(x * width * 0.25f, y * height * 0.25f, 0);
  1. Then we attach the menuNode to the guiNode.
  2. Now we override the setEnabled method and add some logic so that the menuNode is added to the scenegraph when the AppState is enabled and removed when it's disabled.
if(enabled && !this.isEnabled()){
} else if (!enabled && this.isEnabled()){
Now we have a basic menu showing but no way of selecting anything. Adding this functionality will consist of an additional 16 steps.
  1. First of all we'll add another static field called ACTIVATE_TIME which is a float and set to 5f.
  2. In addition we need another float field called timer, a CollisionResults called collisionResults and a Geometry called selectedGeometry.
  3. We create a new method called selectItem which takes a Geometry called g as input.
  4. Inside, we set selectedGeometry to g and the color of its material to a clone of SELECT_COLOR.
selectedGeometry.getMaterial().setColor("Color", SELECT_COLOR.clone());
  1. Lastly we set timer to 0.
  2. Now we create another method called unselectItem.
  3. Inside, we check if selectedGeometry is not null and if so we set the color of its material to a clone of DEFAULT_COLOR before setting selectedGeometry to null.
  4. Next we override the update method and start by setting up the Ray we created in the initialize method. It should originate from the camera.
sightRay.setOrigin(app.getCamera().getWorldCoordinates(screenCenter, 0f));
  1. We do a collision check between the Ray and menuNode and see if any of the Geometries inside are being looked at. If any of them are we perform the following piece of code to set the Geometry to the selected one.
Geometry g = collisionResults.getClosestCollision().getGeometry();
if(g != selectedGeometry){
  1. If instead none is being looked at we call unselectItem to null a potentially selectedGeometry.
  2. After performing this check we call the clear method on collisionResults.
  3. Before doing the last bit in this method we create another (empty for now) method called activateItem.
  4. Continuing in the update method we check if selectedGeometry is not null.
  5. If it is assigned we increase timer by tpf.
  6. Then we perform the following piece of code to make the color brighter of the selectedGeometry.
float interpolateValue = timer / ACTIVATE_TIME;
((ColorRGBA)selectedGeometry.getMaterial().getParam("Color").getValue()).interpolateLocal(SELECT_COLOR, ACTIVATE_COLOR, interpolateValue);
  1. Lastly, we check if timer is greater than ACTIVATE_TIME in which case we call the activateItem method.
  2. The final thing we need to do is to return to the application class and create an instance of the MenuAppState class.
MenuAppState menuAppState = new MenuAppState(settings.getWidth(), settings.getHeight(), (OculusGuiNode) guiNode);


In the beginning of the application class, we telling the OculusGuiNode that we'll manage the positioning of the GUI ourselves. AUTO mode will otherwise always place it in front of the camera and rotate with it. We still want it to be in front of the camera which is why we manually place it there. The difference is that the cameras rotation won't be propagated to the guiNode's translation. We also want the GUI to always face the camera even if it's not strictly necessary for this example.
In the MenuAppState we create a bunch of placeholder menu items in the form of Quads which we align in a grid. In this example we attach and detach the menuNode via the setEnabled method but depending on the rest of the application it might be neater to do it using the stateAttached and stateDetached methods.
In the update method we use raycasting originating from the camera to detect if any of the menu items is in the center of the screen. When this occurs we change the color of the item to indicate this to the user and also begin a counter. We also start interpolating the color of the selected item to indicate that something is happening. When the timer exceeds 5 seconds it triggers an activation of the item. It's not as quick as if one would have used a mouse but controlling methods are different for VR and this enables the user to navigate menus without having an additional controller.
The one thing the example leaves out is what to do when an item is selected. Since the example is using instances of Geometry there's not much to do. A simple approach would be to create a MenuItem class that extends Geometry or even better; add a custom Control to the Geometry which is called through the activateItem method.