Posts tagged: Linux

Perfect Linux

According to Brian Lunduke, Ubuntu 9.10 is almost perfect, and I concur.

Being a bit of a purist, I ran Debian for very many years but found their stable releases lagging behind far too much. This was largely due to their perfectly understandable view of it being ready only when it is right.

For a while, I ran their unstable distribution called Sid, based on the disturbed, hyperactive 10 year old boy in the film Toy Story. The idea being that Sid breaks things, and it certainly did. While it taught me a heck of a lot about linux (and the terminal), my computer was broken on a very regular basis.

Read more »

Vista Guest, Linux Host, VirtualBox, Host Networking – Bridge

One would think that it would be straightforward, work off the bat, or at least have some reasonable documentation. Unfortunately, no!

I needed host networking to be able to access network resources (Samba shares etc.) which does not work if the guest OS is on NAT :-(

Solving it was easy though… I assume Vista is installed as a guest with the guest additions and that your user account is a part of the vboxusers group.

On the linux host, first install bridge utils. I run Ubuntu, so it was as easy as:

$ sudo aptitude install bridge-utils

Next, you need to set up the bridge; again, easy on Ubuntu:

add the following section to /etc/network/interfaces

auto br0
iface br0 inet dhcp
bridge_ports eth1

Add the interfaces to VirtualBox

$ sudo VBoxAddIF vbox0 ‘shri’ br0

Within the VirtualBox Guest settings, choose Host Networking and fo the interface, choose br0

bring the interface up:

$ sudo ifup br0

and start your guest os… et voila, it just works…

Making Twitter Faster

From my perspective, Twitter has a really really interesting technical problem to solve. How to store and retrieve a large amount of data really really quickly.

I am making some assumptions based on how I see twitter working. I have little information about how it is architected apart from some posts that suggests that it is running ruby on rails with MySQL?

Twitter is in the rare category where there is a very large number of data being added. There should be no updates (except to user information but there should be relatively very small amount of that). There is no need for transactionality. If I guess right, it should be a large amount of inserts and selects.

While a relational database is probably the only viable choice for the time being, I think that twitter can scale and perform better if all the extra bits of a relational database system was removed.

I love challenges like this. Technical ones are easier 😉

If I didn’t have a lifetime job, I would prototype this in a bit more depth. Garry pointed me in the direction of Hadoop. Having had a quick look at it, it can take care of the infrastructure, clustering and massive horizontal scaling requirements.

Now for the data layer on top. How to store and retrieve the data. HBase is probably a good option but doing it manually should be fairly straightforward too.

From my limited understanding of twitter, there are two key pieces of functionality, the timelines and search.

The timelines can be solved by storing each tweet as a file within a directory structure. My tweets would go into

/w/o/r/d/s/o/n/s/a/n/d/

The filename would be -

For the public timeline, you just have a similar folder structure, but with the timestamp, for example, the timestamp 1236158897 would go into the following structure as a symlink

/1/2/3/6/1/5/8/8/9/7/

For search, pick up each word in the tweet and pop the tweet as a symlink into that folder. You could have a folder per word or follow the structure above.

/t/w/i/t/t/e/r/- OR

twitter/-

You would then have an application running on top with a distributed cache with an API to ease access into the data easier than direct file access. Running on Linux, the kernel will take care of the large part of the automatic caching and buffering as long as there is enough RAM on the box.

This can in theory be done without Hadoop in between and separating the directory structures across multiple servers but that can have complications of its own, especially with adding and removing boxes for scalability.

You are also likely to run into issues with the number of files / sub-directories limits but they can be solved by ‘archiving’ – multiple options for that too…

Thinking about this problem brought me back to the good old days of working on the search mechanism within megabus.com. We needed the site to deal with a large number of searches on limited hardware when the project was still classified as a pilot.

With some hard work and experimentation, we were able to reduce the search time to a tenth of the original time.

I’ll admit that I don’t know the details or the intricacies of the requirements that twitter has. I have probably over-simplified the problem but it was still fun to think about. If you can think of problems with this – let me know; I wanna turn them into opportunities 😉

Master Jack

Everybody knows the old adage – “Jack of all trades, master of none”. I agree with this. So does the whole open source movement (in general). Thunderbird is a good example of an application that does one thing but does it well. Compare that to the old Mozilla suite that did mail, newsgroups and web browsing and one integrated applications and all the problems that came with that.

At Kraya, we believe in being able to do one thing and one thing well. Ironic since we do so many things. Kraya has become more like an operating platform for building a whole set of tools on top. Similar to running multiple applications on your computer. They all do one thing and one thing well (unless we are talking about something like Microsoft Outlook which does loads of things badly).

Then there is the synergy between the different teams that is sometimes absolutely crucial to the success of some of the projects.

As an organisation, we are a little off the wall and don’t really follow the mainstream. We believe that tools and technologies have a specific space they are meant to fill (yes, including microsoft products). They really have no competition as far as desktop operating systems are concerned for users who are not entirely technically literate. Sure the Apple Mac’s are great (and potentially better than Windows) but they have effectively priced themselves out of the mainstream market.

Linux, while fantastic is just not user friendly enough for the masses. My argument with Microsoft is in it trying to fill gaps with products that just can’t handle it. The Server platform while a good product, is not ideal for the range of circumstances under which it is deployed. Microsoft products are very easy to set up and use which also means that the products are generally very easy to set up wrong as well.

Products like Linux (yes, I am generalising) can be a lot more complex to set up but once it is set up, it can usually run without any issues for years. I have servers & desktops that have not been restarted for years (apart from the odd hardware upgrade / change). This is a testament to the sturdiness of the software.

Microsoft Server software is getting there. I remember the days of NT, when I was responsible for a couple of servers. They had to be restarted every fortnight, without fail before it would just fail to work. It was like clockwork…

Part of the reason why we do so many things is that we are different enough and wish to work with companies who can accomodate our very specific needs and there are very few (none that we felt could do that job as well as we could or better).

This has lead us down the path of setting up teams specialing in each area.

  • Software Engineering (which is split up in several areas as well, Java, PHP, Web, Desktop, Middleware and so on),
  • Systems (again split into multiple areas, Web systems, Office Server Infrastructure, Linux Server, Windows Servers and so on),
  • Technical Support (again, Windows based desktops, Mac based desktops, Linux based desktops, Laptops, Hardware and so on).

Krish has set up a film production company to follow his passion for making films and this will be launching with a splash over the next few weeks / months. We have already won a couple of small projects,  he is working on a few projects of fiction and music videos.

Kraya’s new R&D department is involved in developing 2 products and a third project completely unrelated to software.

So, how are we not the jack of all trades and master of none. Well, we are good at technology and all the other things we do as a coincidence. We are good at getting to understand needs, then going out there and finding all the tools that needs to be put together to solve the problem in the most effective manner.

We are currently in the process of putting together some case studies that can demonstrate this in more details… 😀

Customisation

Being an avid Linux user for users, I am seriously spoilt in terms of being able to customise everything / anything to be more the way I want it to be…

Two main reasons for this is that most software that comes on Linux is highly customisable to start off with. The second reason is that if you don’t like something, you can change it.

There is also the nice thing that most things that you think would be cool or useful in software is already available in some form since someone else thought so too, but before you did and has had the chance to spend some time building it.

I love this so much so that I have often put together a quick linux box for doing things that one could easily replace with an embedded device like a router. I have swayed between the two options based on how much I want simplicity vs flexibility.

One of my favourite responses to someone telling me that we need something that we don’t have is – “we’ll build one”… The software customisation / writing has turned into a metaphor that I apply across more and more things. You need a new table with custom bits – let’s build it. You need a classic car with all the modern gizmos – you know what – let’s just build it.

This has its pro’s and cons. For one, it feels like anything is possible. It also becomes very frustrating to work with limited, limiting, or closed source software (esp when you just want to fix a quick bug that really irks you). It also eats up all your time as you try and do all the things you want… just because you can…

Striking a balance is hard especially when a client asks if it is possible to do something very specific. The answer is of course yes and there is a question that goes with that response. At what value does it become cost effective and provide a good Return On Investment(ROI)

WordPress Themes