Importing FreeMind Mind Maps into TiddlyWiki 5

One of the largest improvements I have made to my personal development workflow is keeping a commonplace book of all the things I have been tinkering with, for “fun” and for work. The process has worked best since I started keeping it in a TiddlyWiki, a nice digital format. This may be worth a post at another time, though at least one post has already been made.

I still use FreeMind mind maps when doing brainstorming or freewheeling research when things are very much in flux. I tend to love mind maps during this phase, but don’t find them as attractive a tool for longer term knowledge management. This usually leads me to the point where I want to import my notes into TiddlyWiki. Now, it is entirely possible to simply export an image or HTML page from FreeMind and add the file to TiddlyWiki. It is also possible to attach the raw .mm file. In some cases, this may even make sense.

Sometimes, however, it would just make more sense to dump it as an outline in wiki text format. To help with this, I have created an XSLT stylesheet (mostly because I’ve never done real, dedicated work with XSLT) that can be used fairly readily. It is on github at https://github.com/michaeljmcd/mm2tiddlywikitext under an MIT license. One of these days I might package it into a better standalone utility. Maybe not. We’ll see.

Creating Integration Tests with JNDI

Technically speaking, an automated test that requires JNDI is not a unit test. As an aside, it is preferrable to segregate the portions of the application accessing JNDI to leave as much of the application unit testing as possible. Nevertheless, if JNDI is used, something must ultimately do it and it is preferrable to be able to test this code prior to its us in production.

So far, the best starting point for me has been to use Simple JNDI to create the provider and allow the rest of the code to work unimpeded. The original Simple JNDI project decayed a little bit. An update with bug fixes is available on github with a different group id (https://github.com/h-thurow/Simple-JNDI). I also used H2’s in-memory database so that I could put real connection information in the test case.

To get started, I added these dependencies to my pom.xml:

  <dependency>
      <groupId>com.github.h-thurow</groupId>
      <artifactId>simple-jndi</artifactId>
      <version>0.12.0</version>
      <scope>test</scope>
  </dependency>
<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <version>1.4.192</version>
    <scope>test</scope>
</dependency>

It took some trial and error to get the configuration right, which is one of the motivations for this writeup. The first thing you need is to add a jndi.properties file to your test resources. The settings I will discuss below were chosen to emulate the Tomcat Server setup that we use here.

java.naming.factory.initial = org.osjava.sj.SimpleContextFactory
org.osjava.sj.root=target/test-classes/config/
org.osjava.sj.space=java:/comp
#org.osjava.sj.jndi.shared=true
org.osjava.sj.delimiter=/

Notice that the root is given relative to the pom. This is one thing that caused me a great deal of grief on the first pass. Another important element was the use of the space option, which was needed to emulate Tomcat’s environment. Within the test resources, I added a config folder, containing a single file: env.properties. This latter file had contents like the following:

org.example.mydatasource/type=javax.sql.DataSource
org.example.mydatasource/url=jdbc:h2:mem:
org.example.mydatasource/driver=org.h2.Driver
org.example.mydatasource/user=
org.example.mydatasource/password=

The data source in question did indeed have a dotted name, which only added to some of the initial confusion. This allowed my code-under-test to work as anticipated. For reference’s sake, this is what the Java code looked like:

private DataSource retrieveDataSource() {
    try {
        Context initContext = new InitialContext();
        Context envContext  = (Context)initContext.lookup("java:/comp/env");

        return (DataSource)envContext.lookup(DATA_SOURCE_NAME);
    } catch(NamingException e) {
        log.error("Error attempting to find data source", e);
        return null;
    }
}

Sources

Some Additional Thoughts on Large ebook Conversions

I absolutely love exploring books and acquiring new reading material. The quest for more reading material has often lead me all over the public domain-loving internet looking for obscure texts. Gutenberg, the Internet Archive, CCEL and Sacred Texts are among my favorite haunts. I often find myself attempting to convert texts for display on my nook SimpleTouch (this older piece of tech is probably worth its own post at some point). Calibre is, of course, a natural tool of choice, but I have found something odd: when dealing with larger texts, especially those of a more technical nature (as opposed to general fiction), Calibre has very limited options for taking the book from plain text to a formatted version. Most of the options it does present are based heavily on Markdown.

This design choice is a reasonable one, but often breaks down for texts that are not sufficiently close to Markdown. One of my recent conversions is an excellent example of this. I have been looking for good concordances of the Bible for my ereader to help with Bible study and general writing when all I have is a paper notebook and my Nook. It turns out that the options for concordances in either the Barnes and Noble or Amazon stores are relatively limited. So, I turned to CCEL and was attempting to convert “Nave’s Topical Bible.”

When attempting to convert from plain text, one of the biggest difficulties is structure detection. If you look at the Calibre documentation on structure detection (
https://manual.calibre-ebook.com/conversion.html#structure-detection), one of the more obvious things is that the chapter detection occurs after a book has been converted to HTML. There are effectively no options to control structure detection in the conversion from plain text to HTML.

What I wound up doing was falling back on the old txt2html tool, which has some more complete options than those in Calibre. I ended up using commands like the following to convert to HTML manually.

 $ txt2html -pi -pe 0 ttt.txt -H '^[A-Z]$' -H '^\s\s\s[A-Z][A-Za-z- ]+$' > ntt.html

This approach isn’t all gravy. It requires some manual tinkering to find good regexes for each individual book. Moreover, different books require different regexes. Here is another example from a book I converted.

 $ txt2html -pi -pe 0 ntb.txt -H '^[A-Z]$' -H '\s\s\s[A-Z-]+$' > ntb.html

In some cases, I even had to add a level of headers for use in the books.

Weighing in on JavaScript Package Managers

I have quite recently begun work on an open source project with a node back-end and front-end work planned to be done in React. This is my first full effort to work with the latest and greatest in JavaScript tooling. We use Ext JS and Sencha Cmd at work and, whatever else you want to say about that stack, it is different. My last full blown front-end development was before the real Node boom and I pretty much did it the old fashioned way — namely, downloading minified JavaScript by hand and referencing it in my markup (shaddup ya whippersnappers).

JavaScript saw a real explosion in package managers a few years ago, which was a natural place to go from a growing ecosystem to have none. Market forces naturally took over and many of the earlier examples have been culled out of existence. There are really two main options at this point: NPM and Bower. Bower has enjoyed a healthy following, but (by my entirely unscientific survey), it appears as though the NPM uber alles faction within the JavaScript world is growing stronger.

The sentiment is echoed in other places, but http://blog.npmjs.org/post/101775448305/npm-and-front-end-packaging gives a good overview of the fundamental syllogism. It basically goes that package management is hard, NPM is large and established, so you should use NPM everywhere rather than splitting package managers.

The argument has a certain intrinsic appeal – after all, the fewer package managers, the better, right?

The real problem is that it is possible to use NPM as a front-end package manager, but it is deeply unpleasant. Systems like Browserify and Webpack are needed to prepare dependencies for usage on the front-end. These are complex and, to a degree, brittle (I ran into https://github.com/Dogfalo/materialize/issues/1422 while attempting to use Materialize with an NPM application).

Even if one assumes that every package can ultimately be Browserified (and it doesn’t seem like an overly-optimistic assumption), the effort seems to be pure waste. Why would I spend time writing complex descriptors for modules on top of their existing packages? For it’s shortcomings, Bower seems more robust. I spent a few hours fiddling with Browserify and Materialize without much success (although I think I do see how browserify would work now), but mere minutes wiring up Bower.

This does not get into the fact that Browserify/Webpack require additional information to extract CSS, images and web fonts. Even when things are working, it would require constant effort to keep it all up to date.

At the moment, NPM, even NPM 3, simply does not have good answers for setting up front-end development. The NPM proponents really, in my opinion, need to focus on making front-end modules more effective rather than pushing tools that are little more than hacks, like Browserify and Webpack. At this point, I am just going to rock out with Bower. Maybe someday I will be able to trim out Bower — but I would rather spend time coding my application than giving NPM some TLC.

Converting Large Text Files to epub with Calibre

I spent some time debugging a long-standing issue I have had using Calibre to convert large text-documents to epubs for viewing on my nook. The normal course of events was that I would feed a large (multi-megabyte–the example I was debugging with was 5.5 MB) text document into Calibre and attempt to convert it to an epub with the defaults. After a lot of churning, Calibre would throw a deep, deep stack trace with the following message at the bottom:

calibre.ebooks.oeb.transforms.split.SplitError: Could not find reasonable point at which to split: eastons3.html Sub-tree size: 2428 KB

I have long been aware that large HTML documents have to be chunked for epub conversion, although I do not claim to know whether this is mandated in the spec, or allowed and needed as a technical requirement for individual readers. In either event, Adobe Editions devices, like the nook, require chunks of 260 KB. The error is clear in this light. For some reason, Calibre was unable to create small enough chunks to avoid issues.

My working assumption had been that Calibre would chunk the files at the required size. So, every 260KB, give or take a bit to find the start of a tag, would become a new file. The default, however, is to split on page breaks. Page break detection is configurable, but defaults to header-1 and header-2 tags in HTML. When your document is in plain text, as opposed to Markdown or some such, few, if any, such headers will be generated. This can cause Calibre to regard the entire document as a single page, which it cannot determine how to split into smaller files.

Converting a large, plain-text document to Markdown or HTML by hand is a task that is much too manual for someone who simply wants to read an existing document. My approach was much more straightforward. What I did was change the heuristic used to insert page breaks.

On the Structure Detection tab (when using the GUI), there is an option entitled “Insert page breaks before (XPath Expression):”. I replaced the default (which was the XPath for H1 and H2 tags) with the following:

 //p[position() mod 20 = 0] 

This will insert a page break every 20 paragraphs. The number was utterly arbitrary. Because paragraphs are usually well-detected, this worked fine. My large 5.5 MB file, a copy of Easton’s Bible Dictionary from CCEL, converted as expected.

Features & Identity

Recently, I was reading the Wall Street Journal’s article about Facebook working to incorporate a Twitter-style hashtag in its platform (Source: http://online.wsj.com/article/SB10001424127887323393304578360651345373308.html). The article has comparably little to say. Like most mainstream treatments of technology, it is mostly a fluff piece, but one thing caught my eye.

The writer, and, most likely, Facebook itself have lost sight of vision while staring at features. Twitter’s hashtag concept works because Twitter is built as a broadcast system. What I say to anyone, I say to the world. So, cross-referencing user posts by tag gives me an idea as to what everyone on Twitter has to say about a specific topic.

Facebook is not, by design, a broadcast system. It really does aim to be more of a social network. When I use Facebook, the focus is on the set of people that I know. The cross-referencing idea has very limited usefulness in the echo chambers of our own friends, family and acquaintances. For better or worse, we probably already know what they think.

Both Twitter and Facebook need to concentrate on vision, especially the latter, which seems to have the larger share of feature envy. The focus is not on hashtags. It is on whether I want to communicate with a circle of friends or broadcast to the whole world. In all honesty, there is room for both provided that they can find a way to monetize the affair. This has actually been the sticking point for all social networks so far. They get big, they get popular and they do so with venture capital. Then they collapse when their growth can no longer be maintained. Therein lies the sticking point: coming up with a social networking concept that accomplishes the members’ goals in a sustainable way (and, yes, that means making money).

SpiralWeb v0.2 Released

SpiralWeb version 0.2 has just been released. I felt the urge to scratch a few more itches while using it for another project. As with version 0.1, it can be installed from PyPi using pip. The changelog follows:

== v0.2 / 2012-10-08
* bugfix: Exceptions when directory not found
* bugfix: PLY leaks information
* bugfix: Create version flag
* bugfix: Top level exceptions not handled properly
* bugfix: Exceptions when chunk not found
* bugfix: Pip package does not install cleanly
* Change CLI syntax
* Cleanup default help

Also of note is the fact that the source code has been moved over to github:

https://github.com/michaeljmcd/spiralweb.

Now, off to bed. I have to get to the gym in the morning.

Announcing SpiralWeb version 0.1

Version 0.1 of SpiralWeb is available for download at http://pypi.python.org/pypi/spiralweb/0.1. To install, make sure that you have Python and pip, then run pip install spiralweb to download and install. The project home page can be found at https://gitorious.org/spiralweb.

About SpiralWeb

SpiralWeb is a Literate Programming system written in Python. Its primary aims are to facilitate the usage of literate programming in real-world applications by using light weight text-based markup systems (rather than TeX or LaTeX) and painlessly integrating into build scripts. It is language agnostic. The default type-setter is Pandoc’s extended markdown, but new backends can be readily added for any other system desired.

For more information on literate programming, please see literateprogramming.com.

Usage

The syntax is minimal:

@doc (Name)? ([option=value,option2=value2...])?

Denotes a document chunk. At the moment, the only option that is used is the out parameter, which specifies a path (either absolutely or relative to the literate file it appears in) for the woven output to be written to.

@code (Name)? ([option=value,option2=value2...])?

Denotes the beginning of a code chunk. At present, the following options are used:

  • out which specifies a path (either absolutely or relative to the literate file it appears in) for the tangled output to be written to.
  • lang which specifies a language that the code is written in. This attribute is not used except in the weaver, which uses it when emitting markup, so that the code can be highlighted properly.

@<Name>

Within a code chunk, this indicates that another chunk will be inserted at this point in the final source code. It is important to note that SpiralWeb is indentation-sensitive, so any preceding spaces or tabs before the reference will be used as the indentation for every line in the chunks output–even if there is also indentation in the chunk.

@@

At any spot in a literate file, this directive results in a simple @ symbol.

In Search of C# Omnicomplete for Vim

By day, I write in C#, mostly on a stock .NET install (version 4, as of this writing; I expect that the principles laid out here will transfer forward as the Vim ecosystem is fairly stable). I often find myself switching back and forth between Visual Studio 2010 (with VsVim) and gvim 7.3. Frankly, I should like to spend more time on the gvim side than I do. While a great deal of time and effort has gone into customizing my vimrc for .NET development, I often find myself switching back to VS in order to get the benefits of Intellisense when working with parts of the very hefty .NET Framework that I do not recall by memory.

Every so often, I do some fishing around for something useful to make my Vim Omnicomplete more productive. In this post, I will layout my newest attempt and analyze the findings. As such, this post may or may not be a tutorial on what you should do. In any event, it will be a science experiment in the plainest sense of the word.

First, the hypothesis. While checking out the Vim documentation on Omnicomplete, we see that the Omnicomplete function for C makes heavy use of an additional tag file, generated from the system headers
[http://vimdoc.sourceforge.net/htmldoc/insert.html#ft-c-omni]
, and that this file is used in conjunction with what Omnicomplete understands about the C programming language to make a good guess as to what the programmer likely intends.

It should be possible then, with minimum fuss, to generate a similar tag file for C#. It may also be necessary to tweak the completion function parameters. We will look at that after we have checked the results of the tag file generation.

It turns out that Microsoft releases the .NET 4 Framework’s sourcecode under a reference-only license
[http://referencesource.microsoft.com/netframework.aspx]
. The initial vector of attack will be to download the reference code, and build a tag file from it (this seems well in keeping with the intent behind the license—if this is not so, I will gladly give up the excercise). The link with the relevant source is the first one (Product Name, “.NET” and version of “8.0” as of this writing). The source was placed under RefSrc in Documents.

After running:

ctags -R -f dotnet4tags *

in the RefSrc\Source\.Net\4.0\DEVDIV_TFS\Dev10\Releases\RTMRel directory, we got our first pass at a tag file. A little googling prompted the change to this
[http://arun.wordpress.com/2009/04/10/c-and-vim/]
:

ctags -R -f dotnet4tags --exclude="bin" --extra=+fq --fields=+ianmzS --c#-kinds=cimnp *

Then, as the documentation says, we added the tag to our list of tag files to search:

set tags+=~/Documents/RefSrc/Source/.Net/4.0/DEVDIV_TFS/Dev10/Releases/RTMRel/dotnet4tags

When this is used in conjunction with tagfile completion (C-X C-]) the results are superior to any previous attempts, particularly in conjunction with the Taglist plugin
[http://www.thegeekstuff.com/2009/04/ctags-taglist-vi-vim-editor-as-sourece-code-browser/]
.

With this alone, we do not get any real contextual searching. For example, if we type something like:

File f = File.O

and then initiate the matching, we get practically any method that begins with an O regardless of whether or not said method is a member of the File class.

If we stop here, we still have a leg up over what we had before. We can navigate to .NET framework methods and fetch their signatures through the Taglist browser—but we would still like to do better.

The only reason the resulting tagfile is not here included, is that it is fairly large—not huge, but much too large to be a simple attachment to this post.

Symbols vs. Keywords in Common Lisp

I was resuming work on my Sheepshead game today (more will be coming in time on this), and it occurred to me: what is the difference between a symbol and a keyword? If you type

(symbolp 'foo)

and

(symbolp :foo)

both return

T

but, if you type

(eq 'foo 'foo)
(eq :foo :foo)

both return

T

yet

(eq :foo 'foo)

returns

NIL

Finally, if you type

(symbol-name 'foo)

and

(symbol-name :foo)

both return

"FOO"

So, what gives? Both are symbols and symbols with the same print name, at that. The difference, is that keywords are all generated in the KEYWORD package, but symbols identified with the QUOTE operator are generated in the current package. So,

* (symbol-package 'foo)                                                                   
#<PACKAGE "COMMON-LISP-USER">

but

* (symbol-package :foo)                                                                   
#<PACKAGE "KEYWORD">

Just a quick little tidbit.