Directed Acyclic Graphs and Executing Tasks in Order (and in Parallel) Based on Dependencies [1107]

A little while ago, there was a requirement to write a tool that could take a number of tasks each with a set of dependencies and execute them in parallel while taking the dependencies into account.

The tasks themselves were meant for data migration but that is not particularly relevant. We were writing a number of tasks which all had a set of dependencies (some of the tasks did not have any dependencies or the process could of course never start).

It was assumed that there were no cyclic dependencies (which would be error in this particular case anyway)

Bearing in mind that this was a quick and dirty tool for use three times, some of the bits in here could do with tidying up.

Each task was defined to implement the following interface

public interface Task extends Runnable {
	public String getName();
	public Set getDependencies();

It should all be self explanatory. Extending the Runnable interface ensure that we can pass it into threads and other relevant bits of code. The getDependencies is expected to return the name of the tasks that it depends on.

The basic task runner which I describe below does not check if the task described in any list of dependencies actually exist. If an non-existing dependency is defined, it will likely just throw a Null Pointer Exception. I wrote this a long time ago, so don’t actually remember.

Java Object Size In Memory

Anyone who has worked with java in a high end application will be well aware of the double edged sword that is java garbage collection. When it works – it is awesome but when it doesn’t – it is an absolute nightmare. We work on a ticketing system where it is imperative that the system is as near real-time as possible. The biggest issue that we have found is the running of of memory in the JVM which causes a stop the world garbage collection, which results in cluster failures since an individual node is inaccessible for long enough that it is kicked out of the cluster.

There are various ways to combat this issue and the first instinct would be suggest that there is a memory leak. After eliminating this as a possibility, the next challenge was to identify where the memory was being taken up. This took some time and effort and the hibernate second level cache was identified. We were storing far too much in the second level cache.

Android – Parcel data to pass between Activities using Parcelable classes

Passing data between activities on android is unfortunately, not as simple as passing in parameters. What we need to to do is tag these onto the intent. If the information we need to pass across is a simple object like a String or Integer, this is easy enough.

String strinParam = "String Parameter";
Integer intParam = 5;
Intent i = new Intent(this, MyActivity.class);
i.putExtra("", stringParam);
i.putExtra("", intParam);

Passing in custom objects is a little more complicated. You could just mark the class as Serializable
and let Java take care of this. However, on the android, there is a serious performance hit that comes with using Serializable. The solution is to use Parcelable.

Android – Multi-line Select List

It turns out that it is surprisingly easy to add a multi line select list to the UI.
There are four main parts to it. The layout file, a subclass to the adapter, the activity and of course the data itself.

Database Systems Compared

My first experiences of a computer started with DBase III+ which is now dBASE, then went on to Foxpro, now Microsoft Visual Foxpro. I have since used Filemaker Pro, Microsoft Access, Microsoft SQL Server, MySQL, PostgreSQL, SQLite and HSQLDB. I have not yet used IBM DB2, Oracle. Wikipedia has a list of database systems.

Having worked with this range of database systems and having done copious amounts of research into DB2, Oracle and other DB systems I have not mentioned, I like answering the age old questions. Which is the best database system?

Ah! if only it was that simple. There is no database system that is appropriate for any given requirement. But then, if you have been in the technology sector long enough, you would already know that. It’s all about using the right tool for the job.

I separate these systems into two broad categories and Oracle. There are the Desktop based database systems:

  • DBase
  • Foxpro
  • SQLite
  • Filemaker Pro
  • Microsoft Access
  • MySQL

DBase, FoxPro, Filemaker Pro and Microsoft Access are essentially a GUI frontend that has a database backing.

Access is the best choice for this purpose under the majority of circumstances. Filemaker Pro is relevant in some. The usual reason to use DBase or FoxPro is simply that the developer is used to it. This is not a good enough reason.

I have used DBase III+ for developing an office management suite back in 1994. I have since used Filemaker Pro to develop a simple contact management database in 1998, Microsoft Access to develop a patient management system for a clinic.

SQLite, HSQLDB and MySQL are database engines that are to be utilised by popping a frontend on top; sometimes the frontend is Microsoft Access. Microsoft Access can also be used for its database engine.

Access is usually the worst choice for this except as a stopgap. There are exceptions to this. One is for a web frontend if the site is not too busy and its running on a microsoft platform. You don’t have to go to the hassle of installing anything on the server. The drivers will take care of it all.

HSQLDB becomes an obvious choice for a light java based application and SQLite for any other lightweight applications.

MySQL is substantially more powerful and scales a lot better. I include it in this section because it is a server grade database system that can also work well in a desktop environment.

I have used Access for several web based systems and I have used HSQLDB for unit testing hibernate and for a quick and dirty MP3 library that linked into musicBrainz. I have used SQLite in passing to be utilised by open source products.

I have used MySQL with an Access frontend as a management suite for a website as well.

And we have the server based database systems:

  • MySQL
  • Microsoft SQL Server
  • IBM DB2
  • PostgreSQL

MySQL was used as the backed database system for the website. This was the perfect choice since the most important requirement was speed. Particuarly with the Query Cache and Master Slave replication, MySQL was the best choice.

SQL Server was used as the backend system for an online course for the Scottish Enterprise around 1999/2000. While MySQL would have been a good choice this, it was not of production quality at the time.

We have also used Ms SQL Server for an insurance company since all the infrastructure was based on Windows and PostgreSQL did not have a viable Windows version at the time.

We use PostgreSQL for megabus. While speed is absolutely critical, it is a ticketing system which means that transactionality is absolutely critical.

While MySQL now has transactionality with innodb, it is still nowhere near as good as the transactionality provided by PostgreSQL through MVCC (Multi-version Concurrency Control). We could have used Ms SQL Server but the cost savings are dramatic.

To summarise, each system has a specific use, specific strengths and weaknesses and which should be used is highly dependent on what it is to be used for. I am hopeful that the summary of what we have used each of these systems for us useful in determining which one is best placed to solve any specific problem 😀

We have not yet used Oracle and it was a strong contender for megabus but the serious heavyweight functionality provided by Oracle comes at a price and it is not yet a cost effective option.

