Back In The Saddle

I haven’t posted anything in over a month. A lack of posts should not be construed to imply a lack of progress, however. On the contrary, over the past month I have refactored and rewritten large portions of WPISuite’s code to make it capable of hosting server-driven modules. In the process I also performed a general cleanup on the areas I was working with. The result is, in my opinion, a much better code flow. This also enabled some small, but significant, improvements to the user interface (loading bars, anyone?) and the unwinding of some especially nasty hacks (ding, dong, WPISuiteMessager is dead!). There are a lot more things I’d like to do (ActiveObjects must die!), but the client is now capable of doing what we need, our time is limited, and I have a lot of other, more important items on my to-do list.

There are three large items currently topping this list:

  1. Server configuration. Up to now, we’ve simply relied on an XML configuration file for this, but that’s simply not sufficient. For a server that can have (in theory) as many nodes as you can afford the hardware to support, uniformity of configuration is a big deal. I’ve come to the conclusion that as much configuration data as possible should be stored in the database. This will enable support for item two on my list…
  2. Server Administration module. Ideally, I’d like to build a module for the client capable of displaying the state of and configuring the entire server: every node, active or not. Whether or not this will come to pass, I can’t say. At minimum, this module should support the same tasks as its standalone counterpart, the System Administration module, does. Due to ActiveObjects hack-arounds, it’s not feasible to just write a new controller and reuse that GUI. Although I probably wouldn’t do that even if it were possible, because I have some HCI problems with the design.
  3. Server permissions system. Over break, I probably spent about a week in total researching and contemplating this problem. I’ve concluded that the usual suspects like RBAC, ABAC, and capabilities aren’t going to work for us. Not smoothly, at least. Capabilities is designed to solve a problem we don’t have, RBAC treats permissionss as being largely orthogonal to your domain model, and ABAC is concerned with attributes of the user and the resource rather than attributes of the relationship between them. That last point is the most important one. I realized that when we ask the question, “is user X allowed to do action Y on resource Z?”, the answer is always in our object graph, and it’s always in our edges, not our nodes. That is, the answer to the permissions question is arrived at by asking the question, “what is user X’s relationship with resource Z, and what are the attributes of that relationship?”

    When I got to that point, I was suddenly reminded of a short piece by Martin Fowler that I had read a while back. He discusses the problem of having an anemic domain model and I realized that was exactly our problem. Well, more accurately, the problem is that we don’t have a domain model, just a data model. A lot of people tend to use these terms interchangeably, but they’re actually two different things. The data model just defines the structure of your data, i.e. how it’s persisted. The domain model is a layer on top of the data model that organizes your data in a way that makes sense for your application and enforces your business rules. As Fowler states, a lot of people think adding business logic to your model is something to be avoided. I think that’s probably because they’re trying to add the rules to their data model rather than their domain model, and that is a bad idea. But, as Fowler states, if you don’t put your business logic into your domain model, then you’ve incurred the problems of the object-oriented paradigm without taking advantage of any of the benefits, and you’ve slidden into the procedural paradigm.

    The point of that whole story is that security rules are just as much business logic as anything else. Our objects need to encapsulate behavior as well as data, if for no other reason than that I really don’t want to write the same permissions and validation logic in two different places. I’ve actually already done this on a smaller scale with our client configuration code. The objects we throw around are actually wrappers that encapsulate the underlying objects provided to us by Commons Configuration. They provide validation methods and build additional data based on the configuration data. For instance, the ServerConfigProfile iterates over its modules and determines if legacy support is required or not, along with building a set of all the exports the server needs to make available. These are later used by the ServerFunctionDelegate to make sure the server actually provides said exports before the main GUI is instantiated. If it doesn’t, login fails and an appropriate error is displayed.

Well, that was a lot longer than I expected (as usual). I haven’t decided which problem I’m going to tackle next: the permissions or the configuration. The team is meeting in about an hour to discuss the project. We’re going to be focusing on the GUI for the Submission System module, but we may talk about these other issues as well if there’s time. I may make another post after the meeting depending on what we accomplish.

So a lot has been done to the server side of the module over break, and a lot remains to be done.

Some highlights of work that has been completed:

- Implemented commands so that certain tasks can be accomplished by the server such as compiling and testing source files.

- Finished implemented an ant command that will be used to build code.

- Started implementing a junit command, its accompanying testing suite, and its client side display. This is coming along nicely, the command won’t be finished until we finalize the parameters for the testing suite. Once that happens it will quickly progress.

 

What remains to be done generally relates to the commands/testing suites as well. Originally I wanted to continue to use the database for storing the results of tests and the commands themselves. However each time I look at how that can be accomplished I realize its a horrible idea. For example to test a submission and restore the results we are going to need some way of linking the submission to the test results. However since our testing suites run as a separate process this means passing in the submission id as a parameter in some input file, and what if the tests need to be compiled server side as well do we have a separate section of code strictly for commands that operate on files that are not part of a submission? What if we want to add some extension in the future?

This problem along with the fact that you use a database for storing large amounts of data that can quickly be searched, aggregated and returned isn’t what we really need for this portion of the system. This has lead me to the conclusion that a file based solution is a much better route to go. Since the commands are given a working directory as part of their parameter (for relative path names) it will require no real modification to the existing command code and allow for extensions in the future.

 

Another possible benefit of this is that if a user’s client doesn’t have the proper module installed to view the results of some command then if the file is text based it can  easily be opened and read.

The Flying Spaghetti (Code) Monster

The five days that have passed since my last post have been… interesting, to say the least. On Sunday, I dealt with getting the dependencies that aren’t available in any public Maven repositories into our repository. Artifactory has a really nice web interface, so this was a breeze. It only took me about 15 minutes, and that was mostly because the Dirmi and Cojen POMs needed tweaking.

After that, I went back to what I had originally been planning to do: look at the changes we need to make to the WPISuite Client in order to accommodate our server-driven module. It turns out that the needed changes are far from trivial. I’ve mapped out a general plan to overhaul the client’s core classes, trying to balance several competing needs:

  1. User-friendliness. We could avoid changing anything client-side if we decided to make people log in to the server from inside our module, but forcing people to log in again after they’ve just logged in to WPISuite is not very user-friendly.
  2. Reasonable scope. Really, the entire client needs a massive overhaul, especially to get database-layer stuff out of the GUI, but it’s not in the scope of this project to do something that would require significant changes to all of the existing modules.
  3. Preserve functionality. Similar to #2, we need to avoid making changes that are likely to break existing modules, while still making enough changes to accommodate our own module.

Keep those in mind while you evaluate my plan. The plan will seem like an over-engineered response to the problem, but I think it provides the best balance of the stated goals.

Basically, the plan the grows out of the fact that, to maintain user-friendliness, there must be a single source of truth about the system. There can only be exactly one answer to questions like whether a given user exists or how many invalid password attempts are allowed before a user is locked out. The existing client database (hereafter referred to as the legacy database) is not secure. Therefore, the single source of truth must be the server database, and the legacy database must be “slaved” to it.

A corollary of this is that access to a server database’s corresponding legacy database cannot be allowed to happen independently of access to the server database, lest the two lose synchronization. To deal with this (and simplify client configuration), I plan to add a legacy support module to the server. This module will be very simple: all it will do is obtain the connection information for the legacy database from the server’s configuration file and give the information to clients that ask for it. The client will be responsible for ensuring that any changes it makes to the server database are reflected in the legacy database. This is for several reasons, the most important of which is that if, instead, the server makes the changes to the legacy database, a lot of things in the client that rely on knowing when it has updated the legacy database will probably break.

Client-side, step one of the plan is to overhaul the client’s configuration procedure. Currently, the client is configured by two separate files: wpisuite.module, which lists the modules to be loaded, and database_config.ini, which supplies the database connection information. I plan to replace these with a more elaborate, but unified, configuration file. This file would contain at least one configuration profile. Each configuration profile would contain three crucial pieces of information:

  1. The configuration type: whether the profile is for a standalone configuration or for a server configuration.
  2. The list of modules to be loaded.
  3. Connection information appropriate for the configuration type.

The first thing WPISuite Client does when it loads will be to validate the active configuration profile. As noted above, the configuration file can contain multiple profiles, only one of which can be active at a time. The reason for allowing multiple configurations is two-fold. First, users might like a feature that allows them to quickly switch between multiple databases and servers. Second, and more important: this will make testing changes much easier for us. To facilitate this, the login view will be augmented with a selection box that allows the user to easily change the active profile, and the “database chooser” view will be replaced by a “configuration editor” view.

Validating a profile means that we need to be able to tell whether a given module is allowed to be loaded for the given configuration type. For example, we would not want to allow the Submission System module to be loaded in a standalone profile. Furthermore, since the main GUI hasn’t been instantiated yet, we must do this without instantiating the module. I plan to address this issue by creating a @ClientModule annotation, which will be added to each module’s main class. It will contain a field that defines whether the module is a standalone module or a server module and a field that defines which configuration types are valid for the module. Coming at module validation from the other direction, it’s entirely possible that a server may not have the legacy support module installed. If so, we would not want to allow standalone modules to be loaded in a profile associated with that server.

So we’ve dealt with overhauling configuration. Now we have to deal with retrofitting login and other pre-main GUI functionality (user registration, expired password, etc) to be usable under a server configuration in addition to a standalone configuration. Most of these classes currently contain their controller functions in their views, so these will have to be separated first. Then we will simply write controllers for server configurations and select the appropriate controller at run-time.

We’ve now reached the point of being able to use a server to authenticate users, but our work still isn’t done. The entire main GUI, which would be instantiated at this point along with the configuration profile’s listed modules, is driven by the ActiveState object. This critical object allows modules to get and set information like who the currently logged-in user is. Obviously, our module has an interest in knowing when such information changes, but the data objects ActiveState makes available are the ones created by ActiveObjects from the legacy database. These objects are useless to server-driven modules. Therefore, I plan to make a subclass of ActiveState that will make objects provided by the server available and that will properly synchronize changes to these bits of information, regardless of whether a server module or a standalone module is changing them. In this way, server modules get the support they need, while standalone modules are oblivious to the altered flow.

And that concludes the overhaul plan as I have currently conceived it. The plan will involve a lot of work and I expect it will take weeks to fully implement, but the end result should be a client capable of seamlessly integrating server modules with standalone modules, which is our primary objective.

Having said that, I can already see one complaint about my plan: it doesn’t allow for independent use of the legacy database. As stated, this is a deliberate design decision, and most of the negative consequences of this decision can be overcome simply by providing scripts to “upgrade” from a standalone system to a server system or to “downgrade” from a server system to a standalone system (although obviously you would lose any data specific to server modules in a downgrade). The only catch is that passwords stored in the two databases will not be compatible and are non-recoverable (well, they’re not recoverable in a short amount of time), so the script would need to replace the passwords with new, randomly generated ones, e-mail each user to inform them of the upgrade/downgrade, and give them their new, temporary password (similar to what currently happens when a new user registers an account).

Comments? Criticisms? Let me know as I’m planning to begin work on this soon. Side note: I may be the only one of us working on it for a while. At our meeting today, James and I collaborated on filling out the TEN and its testing modules, and he is now focusing his efforts on that. Meanwhile, Chance is still working on the GUI mockups.

Dependency Smells

If you’ve taken SoftEng, then you’ve heard of “code smells”: hacks, dead code, and other debris that a project’s code tends to accumulate as it ages. To go along with code smells, I’d like to introduce the concept of “dependency smells”, which are dependencies that are in your project that are no longer needed or used.

How did I end up here? Well, earlier today, I exchanged a few e-mails with Mike Voorhis about being unable to access the Artifactory installation I had created on our virtual server from the outside world. Fortunately, he was able to resolve the problem in short order. I poked around in Artifactory for a bit, noting with pleasant surprise that it comes preprogrammed with the information necessary to access and cache artifacts from all of the major public Maven repositories. In fact, I didn’t have to add references to any other repositories. All of the dependencies we were already managing with Ivy were in at least one of the predefined repositories!

I thought about poking around in the GUI code that James and Chance have been working on, but I couldn’t launch the project. It was then that I recalled that they had created a second libraries folder for dependencies needed by the WPISuite Client and svn:ignore’d it. I guess I could have just copied the appropriate libraries over from the main WPISuite project, but it had always been my intention to switch the client dependencies over to Ivy at some point anyway, so I figured I may as well do it now.

Of course, I’m not one to engage in half-measures, so that meant that I first needed to determine which libraries were direct dependencies. I took all of the libraries off the WPISuite build path and proceeded to go through all of the source files, noting which missing imports Eclipse was crying about, then re-adding the library necessary to satisfy each complaint. When I had finished, only 11 of the 21 JARs in the lib directory were on the build path. I figured some were probably transitive dependencies required by other libraries, so I was not surprised when my first attempts to run the client failed. Adding two more libraries to the build path made WPISuite happy.

That still only accounts for 13 out of 21 libraries though. What are the other eight for? Well, I’m fairly sure that two of them are needed for running unit tests. The other six… they appear to be dependency smells. I can’t be 100% certain as it’s not like I went and thoroughly tested every part of the client, but as I performed some rudimentary actions like working with users and projects, I didn’t get any exceptions complaining about missing classes. On top of those six, I discovered that those two aforementioned transitive dependencies are actually no longer necessary: a newer version of the library exists that removes the need for them.

So we’ve got eight libraries that either aren’t being used now or wouldn’t be necessary if that one library were upgraded. Add them up and you get just shy of a megabyte worth of unnecessary libraries.

I found some more traditional code smells too. I couldn’t understand why HSQLDB was a direct dependency until I took a second look at the one place where it was being imported. Turns out that somebody clicked the wrong option when Eclipse asked them which class they meant and accidentally imported org.hsqldb.Types instead of java.sql.Types. This was in edu.wpi.wpisuite.data.Permissions, by the way, so it’s been there for at least six months without anyone noticing. I also can’t figure out what email.pl is for. It looks like it was being used to send the new user e-mails at one time, but was superseded by the JavaMail library and never deleted. That file has been there since day one, a full year and a half ago.

At any rate, I accomplished my objective, and the dependencies we need are now being managed using Ivy. Note that the client dependencies we currently need are a subset of the dependencies the full WPISuite Client needs, since we’re currently working on a stripped down version of the Client that contains only what we need to write our module. At some point later on, when we add the rest of the modules back in, I’ll add those dependencies to Ivy as well. I may also do some additional digging to find out where each library is being used and document that in our Ivy configuration.

Right now, my next step is to deal with the real reason I set up Artifactory, which is to get the libraries we’re using that don’t exist in any public Maven repositories out of source control. Artifactory appears to have a convenient way of adding artifacts to its local repositories from its web interface, so I’ll probably tinker with that tomorrow.

The Last Week In Perspective

Since this project started, I don’t think I’ve ever gone a full week before without posting anything on this blog. So why the sudden gap? Part of it is because nobody seems to be reading these things. The other part is because I was busy writing code in an attempt to push a new database initiative, but didn’t want to start running my mouth until I was sure it could reasonably be expected to work.

Rewind to last week. After I finished implementing our session system, I started hunting for cures for what ails our database. I’d been doing this on and off for weeks and got nowhere with it. As it turned out, that was because I didn’t really know what I was looking for. I began my search looking for a solution to what was (and still is) our biggest problem: how to perform queries. Ideally, what I wanted was something that is both database-agnostic (going along with the whole “database abstraction” theme) and type-safe, meaning that if the model classes were refactored at some point in the future, the queries would be refactored as well. Or, at least, we would get errors at compile-time rather than at errors at run-time (or worse yet, strange bugs).

I found most of what I was looking for in QueryDSL. Using annotations processing, it generates the source for special query classes that can then be used to write type-safe queries. Caveats include the need to have annotations in the code (side note: the annotations themselves aren’t my problem; my problem is the fact that they would have to be Hibernate annotations which would make the model classes non-portable) and the fact that queries won’t be automatically refactored with the model and query classes. I noted that QueryDSL included support for JPA and JDO in addition to Hibernate, which set off a new round of thoughts.

I don’t know why this never occurred to me earlier. A couple years back, I was looking for a solution to this exact same problem. At the time, none existed, or at least weren’t very good. But I did encounter JPA and JDO. At the time, I discarded them because I was looking for a type-safe query solution, not a database abstraction solution. Now, I needed both. A few more leaps brought me to DataNucleus, which, it appeared to me, is exactly what we need. Maybe.

DataNucleus has certain caveats to it. For one thing, the type-safe query mechanism isn’t finalized yet. DataNucleus is actually the reference implementation of JDO, and said query mechanism is actually an implementation of the mechanism proposed for addition to JDO in version 3.1. Although the DataNucleus developer in charge of the implementation also happens to be the guy more-or-less in charge of defining the mechanism for the standard, it’s entirely possible that things might change before it goes gold.

The other significant caveat is in how DataNucleus works. Hibernate approaches the problem using reflection, versus DataNucleus’ bytecode enhancement approach. Neither of these is a perfect solution. Reflection is slow and tends to impose additional programming burdens, but bytecode enhancement is essentially magic that developers have no control over.

Despite the caveats, I decided to experiment with DataNucleus. My initial results were promising, so I progressed to a full-blown port of our prototype to DataNucleus, completely removing Hibernate. This went swimmingly also, and I managed to reproduce nearly all of the prototype’s functionality under Hibernate using DataNucleus instead. Then I tried to cluster our model objects using Terracotta. As I’d feared, multiple independent frameworks trying to modify the same bytecode didn’t go over so well. I was able to fashion a workaround by creating a special object to be used only for clustering that would contain all the information needed to re-find the pertinent objects once a TEN picked up a work item.

I discussed my DataNucleus experiment with James, and we spent several hours arguing Hibernate versus DataNucleus. Essentially, he argued that we’ve spent the last few weeks dithering over problems with the database and we needed to start making progress towards building a working system rather than continue dealing with database issues. I countered that I believed DataNucleus solved those problems and that we would be able to implement functionality faster with DataNucleus than with Hibernate. He countered that it would be foolish to replace Hibernate with DataNucleus at this juncture, given that we don’t know what problems we might still run into and that bytecode enhancement is more likely to cause headaches for us than reflection.

In the end, we arrived at a compromise of sorts in which I conceded the argument, for now. Our plan is to move forward with Hibernate in the short-term, despite the issues we both have where Hibernate is concerned. Once we’ve got a working system, we’ll revisit this discussion. I’m hoping that it will turn out that we can substitute DataNucleus for Hibernate without too many problems, because DataNucleus is a far superior framework, in my opinion. I especially like the fact that, if you don’t have to conform to an existing database schema, you can opt to have your model define the schema for you. Perhaps more importantly, Hibernate only works with relational databases. By contrast, DataNucleus can work with a much wider variety of datastores, including some object databases, a couple cloud databases, and even a number of flat file formats. Best of all, you don’t have to change any of your code to switch between them. Right now, we’re using MySQL as our database because it’s convenient, but by the end of the project, I’d prefer to be using an object database. Object-relational mapping of any kind imposes a huge performance hit, based on the charts and graphs I’ve seen, and it’s completely unnecessary.

That discussion wrapped up Sunday night (or, more accurately, Monday morning), and James spent the next couple days restructuring the project (which we both agreed was necessary) and starting to build system functionality. The team met on Wednesday for several hours, focusing on those issues. While James and Chance focused on our GUI, I made a few changes to the restructuring, then turned my attention to that oft-discussed but still not actually finished issue of setting up a local Maven repository. I don’t like Linux and the lack of VNC access on our virtual server doesn’t help. I finally managed to get Artifactory up and running, but couldn’t access it from my laptop. We discovered that our MySQL server also cannot be accessed remotely, so it appears that there’s a firewall in place preventing connections.

I’m going to e-mail Mike Voorhis about this, but it occurred to me that our virtual server is probably not the best place for the repository. Our code, obviously, is going to outlive our MQP, but the virtual server presumably won’t. I don’t know if The Powers That Be have plans for getting rid of t1k1/2, but if not, I think one of them would probably make a better home for our repository. Right now, though, I don’t have time to pursue this issue as I have work in other classes that needs attention. So expect to see more on this sometime next week.

The Session Solution

At our meeting on Monday, we discussed most of the things I talked about in my last post. The major stumbling block was the session management issue. We agreed that option #3 was preferable and Professor Pollice suggested we take a look at EJB Session beans. I spent a few hours on Monday examining them. They seem pretty good, although a bit more complex than what we’re doing right now. Unfortunately, we can’t use them. I discovered that these beans are designed to be remoted using standard Java RMI, which, as we already know, doesn’t work for us. The remoting mechanism doesn’t appear to be configurable or pluggable either, so it’s not like we can put in Dirmi instead.

I went looking for other solutions. The more I thought about the problem, the more I realized that the problem isn’t just about building one object (the session object), it’s about building an entire graph of objects that are connected to the session object. Specifically, all of the exports have to be tied to the session object somehow, since they need to know who’s logged in (if the user is logged in at all). The exports also need to be tied to the database underneath. And all of this wiring has to be accomplished without referencing any concrete implementations of anything, only interfaces.

Eventually, I arrived at PicoContainer. I whipped up a small test project earlier and it appears to suit our needs. I’m working on integrating it into the prototype. My design goes like this:

  • Define two scopes: an “application” scope and a “session” scope. Objects with application scope are effectively singletons: they will only be instantiated once and will not know anything about sessions. They will be allowed to depend on other application-scope objects, but not on session-scope objects. Objects with session scope will be instantiated exactly once for each session. They will be allowed to depend on other session-scope objects and application-scope objects.
  • Dependencies will be defined in constructors and injected accordingly. This has various benefits, as discussed on the PicoContainer site, the most important of which is that it doesn’t make us explicitly dependent on PicoContainer.
  • Adjust our registry to handle both application and session scope objects. All objects with session scope will be assumed to be exports that a client can use. This can be changed later if necessary by adding an additional scope, but for now, this assumption holds true.
  • Use PicoContainer to automatically wire everything together and produce an object graph for each session. Critically, this allows us to easily ensure that each session object will be instantiated exactly once per session. That is, if we have session objects A, B, and C, with B and C depending on A, then A will be instantiated once and B and C will be injected with references to the same instance of A. It’s possible situations may arise in the future where developers wouldn’t want this, but again, it can be fixed by adding an additional scope.

As I implied earlier, I’m working on this but I haven’t finished yet, so it’s possible other problems will appear. Hopefully this should be finished and committed by the end of the day tomorrow and we can get on with implementing all of the exports the client will need.

Midnight Musings

I haven’t posted anything since Thursday, but not because I haven’t been doing anything. In fact, I’ve been looking at several problems, which I’m going to talk about in some depth here. I even have solutions for some of them! Try not to act so surprised.

But first, a recap of what I did after my post on Thursday, for those who don’t read the commit logs. I executed the package restructuring we discussed at our Wednesday meeting. I also made some improvements to the Registry to decouple CFN modules from a specific implementation. Dirmi gave me some headaches in that regard, but I was able to work around them. Finally, I fixed the UserManagement export so it is no longer tied to the Hibernate implementation of IUserManager.

There are just two problems. First, we really need to come up with names that are less similar for these two areas because it’s already caused a lot of confusion between James and I. Second, the UserManagement export is not really decoupled from the Hibernate implementation of IUserManager. I discovered that UserManager (and IUserManager, by extension) implements ITableManager<User, Criterion>. Criterion happens to be a Hibernate class… which means that we still have Hibernate tendrils reaching up into layers they shouldn’t. Over the last few days, James hasn’t been online at the same times I have, so I haven’t been able to ask him about this yet.

On the topic of modularity, we’ve made some progress towards modularizing our Hibernate configuration. The Hibernate configuration file is now eliminated and the database connection information is now pulled from the WPISuite Server configuration file. There’s still more work to be done though. Right now, we still have an issue with not having actually defined a standard User, only a SubmissionSystemUser. Having a special subclass is a bad idea, in my opinion, but the only reason we have it is to determine which courses a given user is a professor for, so I think we can eliminate this problem. James mentioned something at the meeting about trying a different way of handling this linkage. We also have the issue that the HibernateConfig class is currently housed in the Submission System Database package. We still haven’t really decided whether every module should run its own Hibernate session or whether they should all share one. I’d prefer the latter, but we have to figure out how to make it so all the modules can contribute to the mapping configuration first.

At the same time, I’ve been looking at modularity issues as they relate to Terracotta. Unlike with Hibernate, we won’t be able to completely eliminate the Terracotta configuration file. This is simply a consequence of the fact that Terracotta has to instrument classes at run-time, which means that it has to know which classes it needs to modify before they get loaded (and that Terracotta has to be up and running before any of our code executes). Still, we can and should be able to modularize a lot of the configuration. Any given module will have a set of roots, locks and classes that Terracotta needs to know about, and these can be specified modularly by making our module JARs conform to the Terracotta Integration Module (TIM) specification. On the surface, this isn’t particularly hard: all we have to do is define one OSGi header in our manifest and have an XML fragment containing the appropriate sections of Terracotta configuration in the JAR’s root.

That latter part is proving more troublesome than I expected. The Terracotta Eclipse plugin puts all the configuration into one file, which is good from a development perspective because it makes it easy to make and test changes. But it also means that the configuration information for every module is in the same file and it’s effectively impossible to tell which bits of configuration belong to which module. Right now, my best idea for dealing with this is to put comments in the configuration file to define the start and end sections for each module, then have an Ant task that parses this appropriately to build the fragments. Alternatively, the fragments could just be maintained manually. I haven’t really decided what to do yet.

This brings me to my final topic for this post, which is communication security. For the first time, I noticed that Terracotta actually doesn’t have any configuration options to prevent unauthorized access to a cluster. I figure we can work around this by noting it in our documentation and recommending that the firewalls of the system(s) the Terracotta server is running on (note that our WPISuite Server instances are Terracotta clients) be configured so that attempts to talk to the DSO port will be blocked unless they come from an authorized IP address.

I also looked at security as it relates to Dirmi, mostly because the question of how to handle authorizing user actions is still bugging me. Basically, it seems to me that we have three choices for doing this:

  1. Send the username and password on every method call.
    Pros: Relatively easy to implement. Guarantees a high level of security.
    Cons: Lots of communication overhead (since we have to send two strings on every method call) and lots of computation overhead (since we have to validate them on every method call).
  2. Authenticate the user, then return a randomly generated session ID to be used in subsequent calls.
    Pros: Also fairly easy to implement. Much less communication and computation overhead than option #1.
    Cons: Increases memory usage. Doesn’t guarantee the same security as option #1; session hijacking is a well-known and well-researched issue. On the other hand, this is the way the Web works, and a sufficiently large and random ID should make life very difficult for attackers.
  3. Get Dirmi to handle the session issue for us.
    Pros: Easiest to implement. Even less overhead than option #2. May provide more security than option #2, given that Dirmi Sessions appear to operate in isolation from one another.
    Cons: More memory usage than option #2. Aspects of Dirmi that impact our security may change in the future. On the other hand, Dirmi is open-source, so we can always fork it and make our own “hardened” version if need be.

My personal preference is for #3. Session isolation, if Dirmi really is doing that (it’s a little hard to tell from looking at the source code), provides the best security we could ask for. Even if sessions aren’t isolated, I’ve determined from debugging and reading the source that object IDs are long integers generated using the Mersenne Twister algorithm. That’s roughly 18 quintillion possible IDs to check, which is a pretty big haystack, and even if you find one that maps to an object, there’s no guarantee it’s the one you want. This too could change at any time, but again, it’s open-source, so if it’s really a problem, we can always make our own version. Or just not upgrade to newer versions, since we know the behavior of the current version.

Anyway, I hope that was all fairly coherent, since it’s now past two o’clock in the morning (yes, James, morning, not night!). More on all of this in the near future, hopefully…

The Coupling Game

Yesterday, the team met for six or seven hours. The main goal of the meeting was to iron out the remaining code and project structure issues so we can move past this point we’ve been stuck at for a while now. To make a long story short, James and I agreed that we wanted to make changes to the project structure and further decouple the datastore from the rest of the application. Unfortunately, technical limitations of the language (mostly in relation to generics) effectively prevent further decoupling, and sanity limitations prevented some of the project restructuring we were trying to do. In the end, we decided that these changes weren’t going to add sufficient value to be worth expending additional effort on, so we’re leaving things mostly the way they are.

So what does this mean for our datastore and how it interacts with the rest of our code? James could probably answer this better than I can, but the coupling should still be reasonably low, unlike with the WPISuite Client. Any datastore that might replace Hibernate in the future would have to follow a few rules:

  1. It would have to be able to deal with data classes that are POJOs. I think this is a good thing, since it prevents datastore-specific functionality from leaking into the rest of the application, but this would pose a problem for some datastores that require data classes to extend a base class or implement a base interface.
  2. All interactions with the database have to be done using classes that implement what we’re presently calling the “table manager” interface, so classes that implement this interface would have to be written for the other datastore.

We also briefly explored the possibility of using OSGi instead of JSPF. We decided against OSGi for a couple reasons:

  1. Making OSGi and Terracotta work together looks painful. The only discussion I could find about doing this was written a couple years ago and involved writing a bunch of Java code to deal with the classloader problem.
  2. OSGi would introduce a lot more overhead than JSPF.
  3. OSGi doesn’t add any features we want, and arguably introduces some we don’t want. In particular, OSGi allows bundles to be removed at any time, which is something that would pose a serious problem for us, since it would mean that exports we offer to the client might suddenly become unavailable at any time. Yes, I know, there are event listeners that we can use to fix this problem, but the point is that using OSGi creates problems without delivering any benefits.

So, what are we planning to do now? Well, there’s going to be some package restructuring so that the structure makes more sense. We’re also going to finish interfacing out the table managers and have JSPF load the implementations at run-time, so that our code is not dependent on the Hibernate implementations. I’m also going to be adding some documentation to interfaces and classes elsewhere that I wrote. James has already mostly dealt with one of the issues I was going to look at, which was moving the database configuration into our WPISuite Server configuration file. I have a few quibbles with this that I’m going to talk to him about. After that, it’s full steam ahead on writing the exports that the clients will interact with.

Another day, another five-and-a-half hour meeting

As the title of the post implies, we met again today for 5.5 hours. We did not get that much work done, but we did iron out some mis-communications and solve (or rather, work around) some problems.

The big issue for our code was the whole “database as a plugin” thing. It turned out that James and I had completely different interpretations of what that meant, but we talked it out and agreed on a definition. Basically, interaction with the datastore will be defined by the individual modules. They’ll provide the implementations of TableManagerInterface for their data classes. In the event that a module needs access to data that is custody of another module, it will be able to obtain managers from the other module through the TableManagerUtil. I think that as long as all modules are using the same datastore, this should work fine. In the event that there was some sort of mixture of different datastores (especially different paradigms), I’m not sure how this would hold up. But presumably if there were a decision to switch to a different datastore in the future, all modules would be converted, not just some. In any case, it’s kind of an abstract problem and I’m not particularly concerned about it.

What I am particularly concerned about are the two Terracotta bugs we uncovered. The first is related to the StreamCorruptedException problem we first encountered when trying to run the demo on Monday. My belief that a version mismatch was the cause turned out to be wrong, and I ended up spending about another two hours trying to figure out the problem before I finally stumbled onto it. It turns out that the problem only occurs if you’re running Windows 7 (older versions may be affected also, but I don’t have any systems running them to test) in 32 or 64-bit trim, using the Terracotta Eclipse plugin version 3.4.0, using a recent version of the JRE/JDK (6u21 and 6u22 fail, but 6u16 and 6u17 work fine), and it’s not the first time you’ve tried to run the Terracotta server after importing the project. That last one is seriously weirding me out: if I delete the project and then re-import it from SVN, Terracotta runs fine the first time I try to run it, but after I stop it, the exceptions occur on any subsequent attempts to start it! This is why the demo failed on Monday: I ran the demo Sunday night to make sure everything worked, but I only ran it once, so I didn’t discover the problem.

The second bug will eventually prove very problematic if it’s not fixed, but only for James. While we were attempting to nail down the details of the SCE bug, I asked him to try different versions of the JDK on his computer. Well, he didn’t get any exceptions, no matter what version of the JDK he tried (6u16, 6u21, and 6u22). Instead, once he launched the Terracotta server, Eclipse just hung. And when I say it hung, I mean it really hung: the UI was completely unresponsive and he had to manually kill the processes. It’s as if the plugin still thinks it’s trying to launch the server even though the console quite clearly displayed the “this server is now an active node and is ready for work” message. Until a fix or a workaround is found, James won’t be able to develop anything that requires clustering behavior. Fortunately, right now we’re working on developing the CRUD functions that need to be exposed to the client, which means that James can run our code as a standard Java application with no problems because that doesn’t require clustering.

I registered an account over at Terracotta and replied to a thread in their forum started by someone else who was encountering the SCE issue, detailing what we’d discovered. One of their developers promptly replied and asked me to submit a bug in their JIRA tracker. I tried to do so, but JIRA kept rejecting my credentials. After a short back-and-forth with the developer over private messages, I realized that while their forum was happy to log me in as “CaptainR”, JIRA was only happy if I entered my username as “captainr”. Once I was able to get in, I filed bugs for both of the above issues. I’m also watching them, which means I get e-mail updates when anything happens on either of them. The SCE problem is CDV-1527 and the hang issue is CDV-1528. Immediately after I posted the hang issue, that same developer posted a comment asking for a thread dump of the server process during the hang. Since my computers are unaffected, I’ve asked James to create an account over at Terracotta and talk to this guy directly rather than try to be an intermediary between them.

In between dealing with all these problems, I did manage to get a few things done. First, before the meeting, I made some improvements to our export registry. Now it is considerably more type-safe. During the meeting, I added the Ant build to the project’s build process so that module JARs will be rebuilt every time the project is run, rather than having to manually run the build and risking developers making changes to modules and then forgetting to rebuild the JARs and wondering why the changes they made aren’t having any effect. I also implemented our first real export: IUserManagement, which currently contains a registerUser() method that you can call to create a new user in our database.

James and Chance, meanwhile, were discussing problems related to WPISuite Client’s GUI and trying to figure out how to shoehorn our server-driven module into their client-driven structure. We’ve figured out how to get a blank panel we can work with to build our screens, but that still leaves the problem of authenticating users. Essentially, there are two ways to solve that problem:

  1. We can provide our own separate login. This has the benefit of not needing us to mess with the rest of the client’s code, but means that it would no longer be single sign-on.
  2. We can replace the client’s login with a login that goes through our server. The pros and cons are the reverse of option #1: we have to mess with more client code, but we retain SSO.

The consensus was to go for option #2. What we’re planning to do is to have all of the user data, including password hashes, in the server database. This is the data that will be used for authentication purposes. At the same time, we’ll create a “shadow” record in the client database that contains all the same information, minus the password hash. All of the client modules will thus work as they do now, without modification, but the user data in the client database will only be used for linking to other things in the client database. It will not be used for authentication at all.

So that’s what went on today. I have a short list of miscellaneous things that need to be done, like adding documentation in a few places and figuring out how to move certain elements of the Hibernate configuration file (like what database to connect to) to our configuration file. I’m planning to work on these tomorrow.

Extra! Extra! Read all about it!

I haven’t made a blog post in a couple days now, mostly because after marathoning the MQP through most of last week, I needed to turn my attention to other classes. Yes, I know, that is utter blasphemy, but I’ve never NR’d any courses and I’d like to keep it that way. Seeing as I haven’t really done any coding since my last post, this post is basically just going to be a news roundup. So, in the news:

  • Ivy continues to be flaky, but it’s really Eclipse’s fault this time. As far as I can tell, there’s no way to specify a custom ivysettings.xml without hardcoding the project name, because all of Eclipse’s ${project_*} variables are all determined by the currently selected project, rather than the project the file is part of.
  • Lousy database abstraction is what you get for not picking Team 4! (You know who you are, you!) Chance, James, and I met for about five hours yesterday and had a serious moment of consternation when we realized just how far ActiveObjects’s tendrils extend into WPISuite Client’s GUI. James did some work on this after the meeting and informed me late last night that he has come up with a solution.
  • Terracotta is probably fixed now. The problem appears to be caused by a version mismatch: my laptop had Terracotta server 3.3.0, but Eclipse plugin version 3.4.0, installed. Looks like somebody over at Terracotta screwed up and updated their Eclipse plugin a few days early, before version 3.4.0 of the server was released. I checked back today and server 3.4.0 is now available, so I’ll install that on my laptop and confirm that that’s the problem.
  • Our virtual server is now in the works. I got an e-mail about an hour ago from Mike Voorhis asking what software we need installed on it. I gave him the list, but told him that since there’s nothing especially hard to install on the list, he can feel free to just set up the server and let us handle it if he’s busy.

I’m going to be busy with other work all day today, so I probably won’t work on the Terracotta issue (or any other MQP-related stuff) until tomorrow.

Follow

Get every new post delivered to your Inbox.