<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7633517363101745721</id><updated>2011-07-08T07:55:28.101-07:00</updated><category term='install'/><category term='back-end'/><category term='introduction'/><category term='tools'/><category term='matrix compiler'/><category term='bibliometrics'/><category term='oops'/><category term='SAINT'/><category term='name'/><category term='website'/><category term='open source'/><category term='beta'/><category term='word splitter'/><category term='cross platform'/><category term='source'/><category term='web of knowledge'/><category term='Database'/><category term='relation calculator'/><category term='demonstration'/><category term='patentometrics'/><category term='2010.01 beta 1'/><category term='#essconf09'/><category term='matrix builder'/><category term='video'/><category term='Qt'/><category term='design'/><category term='isi'/><category term='2011.01'/><category term='launch'/><category term='pajek'/><category term='issue tracker'/><category term='record grouper'/><category term='release'/><title type='text'>SAINT toolkit</title><subtitle type='html'>SAINT, the Science Assessment Integrated Network Toolkit, is a free, open source toolkit to support bibliometric and patentometric analysis.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>24</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-2851330770979977393</id><published>2011-06-14T08:36:00.001-07:00</published><updated>2011-06-14T08:39:37.855-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='2011.01'/><category scheme='http://www.blogger.com/atom/ns#' term='release'/><category scheme='http://www.blogger.com/atom/ns#' term='beta'/><title type='text'>Release!</title><content type='html'>Yes, it is finally there: a new release of SAINT!&lt;br /&gt;&lt;br /&gt;The version I have put online is version 2011.01 beta 1. It contains all the basic tools, and supports both Access and MySql as database backends. Many issues were fixed, but it is still a beta. Please let me know of any problems, especially of regressions.&lt;br /&gt;&lt;br /&gt;&lt;a href="https://www.assembla.com/spaces/srtools/documents/aB44T6LOir4kUleJe5cbCb/download/aB44T6LOir4kUleJe5cbCb"&gt;Get it&lt;/a&gt; while it is fresh!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-2851330770979977393?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/2851330770979977393/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2011/06/release.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/2851330770979977393'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/2851330770979977393'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2011/06/release.html' title='Release!'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-2332322428613243226</id><published>2011-03-15T01:31:00.001-07:00</published><updated>2011-03-15T01:36:03.438-07:00</updated><title type='text'>Saint as Qt Ambassador</title><content type='html'>Recently, SAINT was added as a Nokia Qt Ambassador program. That means that it being used as a showcase for the capabilities of Qt and the different areas Qt is used in. &lt;br /&gt;Check SAINT on as a Qt showcase &lt;a href="http://qt.nokia.com/qt-in-use/ambassadors/project?id=a0F20000006KgBSEA0"&gt;here&lt;/a&gt;!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-2332322428613243226?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/2332322428613243226/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2011/03/saint-as-qt-ambassador.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/2332322428613243226'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/2332322428613243226'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2011/03/saint-as-qt-ambassador.html' title='Saint as Qt Ambassador'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-7506706097892547269</id><published>2011-03-14T08:31:00.001-07:00</published><updated>2011-03-15T03:01:08.704-07:00</updated><title type='text'>Database layer working: ISI parser to proof it</title><content type='html'>Moving away from Access as the only database backend supported by SAINT has been on the TODO list for quite a long time. As it turned out, creating a system to allow for multiple database backends was not all that easy. It required changes on many levels, which I'll detail a bit below. But don't let that spoil the big news: &lt;span style="font-weight:bold;"&gt;it works&lt;/span&gt;! &lt;br /&gt;&lt;br /&gt;Yes, that's right, you can now actually use different databases than just Access with SAINT. At this moment, I have it working with just the ISI parser, but that is already a very important step. I use it as a testbed for all the other tools, which basically need the same infrastructure that ISI parser does, though usually to a lesser extend. That comes down to: if it works for the ISI parser, it should (mostly) work with the rest of the tools too. I will release a beta to demonstrate the working ISI parser as soon as possible.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Nitty-gritty stuff&lt;/span&gt;&lt;br /&gt;So... what kind of changes were needed in the SAINT software stack to get all this work? Quite a lot, as it turned out. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Driver backend&lt;/span&gt;&lt;br /&gt;The first thing that needed to be done, was to come up with a driver backend that would support all the operations we need on databases and that will work for all of them. The standard Qt database drivers only support abstractions for manipulating the data (Data Manipulation Language or DML), not the data &lt;span style="font-style:italic;"&gt;structure&lt;/span&gt; (Data Definition Language or DDL). Since SAINT manipulates the data structure all the time (creating tables to store results, for instance), we need support for all of these operations. But: not all databases do that in the same way! SQL is a standard only to point... To further complicate things, not all database support the same data types, nor do these types map to the datatypes used inside SAINT and Qt in a trivial way. To store a value like a date or a long text string, you need different data types in Access and MySql, for instance. &lt;br /&gt;&lt;br /&gt;I choose to extend the standard Qt Sql database drivers with additional methods, and in that way create new drivers that can be used with the standard Qt database system, but that offer support for far more operations. Currently, I have drivers implemented for Access (on top of the Qt ODBC driver) and MySql (using the standard Qt MySql driver). &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Abstracting database access inside the tools&lt;/span&gt;&lt;br /&gt;In order to able to actually use these shiny new drivers, all access to the database needed to be abstracted out of the code. That is: nowhere in the code could direct SQL be used anymore, nor could I use things like type names in the form as strings any more. All SQL statements needed to be generated by the current driver based on a common format. I used Qt's own QSqlRecord for that, in combination with a hugely extended list of supported query types for QSqlDriver::sqlStatement(). Where this last statement is barely documented in the Qt documentation, is turns out to be essential if you want to make your application database agnostic. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;User interface&lt;/span&gt;&lt;br /&gt;Creating a user interface to select a file on your local file system is not all that complicated. But coming up with a good interface to select or create a generic database, either as a file or running on a server, is a different matter. It took quite a bit of effort to design a build a UI that seems to be easy enough to use, and that can be embedded into the different tools with ease. It will probably need some more tweaking, but here is an idea of the current UI:&lt;br /&gt;&lt;br /&gt;The widget that can be embedded in the different tools to select the database:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-9tLGJnQ_hd8/TX47KAS7VeI/AAAAAAAAA7A/jIq-Z6siTng/s1600/DatabaseSelector.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 89px;" src="http://4.bp.blogspot.com/-9tLGJnQ_hd8/TX47KAS7VeI/AAAAAAAAA7A/jIq-Z6siTng/s400/DatabaseSelector.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5583965631050110434" /&gt;&lt;/a&gt;&lt;br /&gt;It features a drop down box with the 10 most recently used databases (with an icon representing the type of database), and a button on the right to open up a dialog box that allows to select another database currently not in the list. This widget &lt;span style="font-style:italic;"&gt;will&lt;/span&gt; allow drag &amp; drop in the near future, so you can simply drop your exising database file or .sdbd file (more on that later) on it.&lt;br /&gt;&lt;br /&gt;The Database Selection Dialog that you get when clicking the button:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-B1Wy3qtm_aU/TX48wk-OO8I/AAAAAAAAA7I/wMgCPBBDSdI/s1600/DatabaseSelectorDialog%2Bprevious%2Bpage.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 222px;" src="http://1.bp.blogspot.com/-B1Wy3qtm_aU/TX48wk-OO8I/AAAAAAAAA7I/wMgCPBBDSdI/s400/DatabaseSelectorDialog%2Bprevious%2Bpage.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5583967393242037186" /&gt;&lt;/a&gt;&lt;br /&gt;This dialog lists the databases that are known to SAINT. For each of them, a small file is stored in your documents directory (under &lt;span style="font-style:italic;"&gt;SAINT/database descriptions/&lt;/span&gt;, to be exact) containing all the information that is needed to connect with the database, along with some meta information. You can even send these files to your college to easily share access to a server-based database if you want. &lt;br /&gt;&lt;br /&gt;Select or create another database:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-Ra-LDdJbZ30/TX48xFIYgKI/AAAAAAAAA7Q/Ce1sT-pplo8/s1600/DatabaseSelectorDialog%2BOther%2Bpage%2Bfile.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 226px;" src="http://1.bp.blogspot.com/-Ra-LDdJbZ30/TX48xFIYgKI/AAAAAAAAA7Q/Ce1sT-pplo8/s400/DatabaseSelectorDialog%2BOther%2Bpage%2Bfile.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5583967401874587810" /&gt;&lt;/a&gt;&lt;br /&gt;This is the page that allows you to select or create a database that has not been used before. The drop down box on the top displays a list with available database types. Currently on Windows, you can choose from Microsoft Access and MySql. The rest of the dialog will adapt itself accordingly. Showing on this screenshot is what you get when selecting an Access database.&lt;br /&gt;&lt;br /&gt;Dialog for MySql databases:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-5qGDbTOUKKU/TX48xAaxVNI/AAAAAAAAA7Y/KgEl5Bxj2rM/s1600/DatabaseSelectorDialog%2BOther%2Bpage%2Bserver.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 242px;" src="http://2.bp.blogspot.com/-5qGDbTOUKKU/TX48xAaxVNI/AAAAAAAAA7Y/KgEl5Bxj2rM/s400/DatabaseSelectorDialog%2BOther%2Bpage%2Bserver.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5583967400609535186" /&gt;&lt;/a&gt;&lt;br /&gt;The above displays the same dialog, but with the MySql database type selected. Once you enter the relevant data on the server and your credentials, available databases on the server will automatically be displayed. &lt;br /&gt;&lt;br /&gt;All database types will insist on an Alias to be created for the database. This alias is the name for the database that you defined by selecting a file name, or by entering a server address, port, user credentials and database name. It is used in the file name for the .sdbd (Saint DataBase Description), in the drop down for the database selector widget and in the list of existing databases. A suitable alias will be suggested automatically.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-7506706097892547269?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/7506706097892547269/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2011/03/database-layer-working-isi-parser-to.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/7506706097892547269'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/7506706097892547269'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2011/03/database-layer-working-isi-parser-to.html' title='Database layer working: ISI parser to proof it'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-9tLGJnQ_hd8/TX47KAS7VeI/AAAAAAAAA7A/jIq-Z6siTng/s72-c/DatabaseSelector.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-9069280478705400091</id><published>2010-03-30T04:55:00.001-07:00</published><updated>2010-03-30T05:14:48.039-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Database'/><category scheme='http://www.blogger.com/atom/ns#' term='word splitter'/><title type='text'>Word stemming</title><content type='html'>One of the issues with working with (bibliometric) data is data cleaning. While we already have an experimental tool called &lt;span style="font-style:italic;"&gt;Record Grouper&lt;/span&gt; that is being refactored at the moment, we also needed a quick-and-dirty way to do some basic cleaning on words.&lt;br /&gt;&lt;br /&gt;To facilitate this need, I have added a new feature to the Word Splitter tool. The tool now builds an additional table called Wordstems, and it adds an additional field to the Words column with a reference to a record in this Wordstems table. Each word is now being &lt;span style="font-style:italic;"&gt;stemmed&lt;/span&gt; by using the Porter stemming algorithm. I am looking into replacing this algorithm by another, more accurate one, but the idea will stay the same if that happens.&lt;br /&gt;&lt;br /&gt;So now, each word will get it's stem (according to this Porter algorithm) associated with it using the stem's ID number. This makes it easy to treat words that have the same stem as the same word, so we can get rid of the difference between "robot", "robots", "robotic" and "robotics" easily if we want.&lt;br /&gt;&lt;br /&gt;On a note that relates to the last posting on database compatibilities. While the Word Splitter still only works with MS Access, under the hood a lot has changed. I have implemented a driver system with extends the database drivers that are available in Qt by default, and works around a couple of bugs on the side. Based on this, I have created an Access Driver that is now being used by the Word Splitter tool. While it is not completely database independent yet, it is a good start and a good test for the new driver.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-9069280478705400091?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/9069280478705400091/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2010/03/word-stemming.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/9069280478705400091'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/9069280478705400091'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2010/03/word-stemming.html' title='Word stemming'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-220845793967831845</id><published>2010-02-24T02:09:00.000-08:00</published><updated>2010-02-24T02:34:51.447-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='back-end'/><category scheme='http://www.blogger.com/atom/ns#' term='SAINT'/><category scheme='http://www.blogger.com/atom/ns#' term='Database'/><category scheme='http://www.blogger.com/atom/ns#' term='Qt'/><category scheme='http://www.blogger.com/atom/ns#' term='cross platform'/><title type='text'>Database compatabilities</title><content type='html'>At the moment, SAINT only supports Microsoft Access as a database to store your data. Because most of the tools are quite tightly bound to accessing this data structure, you are really bound to Access. That is really becoming a problem. &lt;br /&gt;&lt;br /&gt;First of all, Access itself is a problem. Access has some limitations that limit the amount of data that it can handle to about 2GB. However, that limit includes temporary space needed for the execution of some complex queries. In practice, you will thus run into this limit much earlier in the form of a vague error message. Access has other problems too, but this is the main issue. Access does have two big strengths: first of all, it is easy to handle it's databases. You can just copy over files, and it is already installed in many professional environments. Second, it has a quite good visual query designer, making Access less of a database than it is an analysis environment in the context of using it with SAINT. &lt;br /&gt;&lt;br /&gt;The second problem with Access has to do with &lt;a href="http://qt.nokia.com"&gt;Qt&lt;/a&gt;, the toolkit that is used as the base of SAINT. Qt supplies a set of database drivers, that makes it possible to interface with different databases. It does not have a driver for Access/JET however. SAINT accesses the databases using a driver for the &lt;a href="http://en.wikipedia.org/wiki/Open_Database_Connectivity"&gt;ODBC&lt;/a&gt; layer. That has an impact on the speed of the connection, as well as on the supported features. A solution would be of course to write a database driver for Qt myself, but that is not a trivial task.&lt;br /&gt;&lt;br /&gt;The third problem is that of cross platform compatibility. Access only runs on Microsoft Windows, but in the scientific world, macs and linux machines are not all that uncommon. It would be very good if SAINT would be able to work on those systems as well. Because Qt itself is cross platform, that is not so hard, were it not that SAINT is bound (too) tightly with Access and thus with Windows.&lt;br /&gt;&lt;br /&gt;So, it was time to come up with a solution. &lt;br /&gt;Like I said, Qt supports database drivers for different databases. These work cross platform. However, they don't entirely abstract away the differences between the databases themselves. Different databases support, for instance, different data types. They also use a slightly different query languages. The base is the same (SQL), but the dialects differ. That is easy to understand: the databases are different for a reason, and they support different features. That also means that you sometimes have to talk to them in a slightly different way. Qt helps with this, but not enough. &lt;br /&gt;&lt;br /&gt;This issue makes it hard to simply exchange one database for another. That would also create other problems for the users. What to do with existing data sets that are in Access already? An optimal solution would be to build in support for several data back-ends, and let the user choose in an easy way which one to use. This way, the tools can be used for small, simple sets that work perfectly on Access or some other small local database, but also on huge sets that run on a remote server. &lt;br /&gt;&lt;br /&gt;To achieve that, I need to extend Qt's abstraction from the database back-end. Currently, it supports basic manipulation and querying of the data in the database, but it does not support changing the database structure. That feature is needed in many tools though. Many parts of SAINT create new tables, or extend existing ones. The solution I have decided on focuses on extending the Qt SQL driver model to support these operations, and then re-factoring the existing SAINT code to use this abstraction and to remove the current explicit SQL code that is used everywhere. All SQL will then be generated by the abstraction layer, making it possible to support different databases as a back-end for SAINT by just replacing the database driver. Of course, there is also some work to be done in the interface, as connecting to a remote database over the Internet needs a different setup than just pointing to a file on your local file system. &lt;br /&gt;&lt;br /&gt;I am developing this code in a separate branch of the code for now. It will be for SAINT version 2.0. So far, the progress is promising, but it is still far from production ready. Expect a first beta release of this code somewhere in April.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-220845793967831845?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/220845793967831845/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2010/02/database-compatabilities.html#comment-form' title='1 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/220845793967831845'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/220845793967831845'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2010/02/database-compatabilities.html' title='Database compatabilities'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-8090461729814016972</id><published>2010-02-10T00:56:00.000-08:00</published><updated>2010-02-10T01:33:57.073-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='isi'/><title type='text'>Changes in ISI data importer: increased compatibility with other sources</title><content type='html'>Recently, I was contacted by somebody who tried to use SAINT on a dataset converted from PubMed to the ISI text format. That did not work. It turns out that the ISI data importer relied on the existence of a really ISI specific field to ensure records were unique. That field does not exist in the data the converter tool created, and thus the application regarded all of them as the same: the code was empty for all of them. &lt;br /&gt;&lt;br /&gt;A small update fixed that. Now, instead of only relying on the ISI code field, the program will try other options. The first option tried is to see if there is a valid DOI code. That code is unique too, if it is available. If it is, that code will be used as the unique identifier. If there is no DOI code, an artificial code will be generated based on the journal name, the ISSN number, the year, the full name of the first author and the title of the article. That should yield a pretty unique identifier. The code will be prepended with an identifier that tells you where the code came from. These can be "isi", "doi" or "saint".&lt;br /&gt;&lt;br /&gt;As a side-effect of this change, I have also added a new field in the articles table output. The DOI field is now included. That also creates an interesting matching opportunity with the cited references.&lt;br /&gt;&lt;br /&gt;All very nice perhaps, but... The changes can lead to &lt;span style="font-weight:bold;"&gt;two problems&lt;/span&gt;:&lt;br /&gt;1) If you import data from multiple sources, you may run into double values. This can be the case if an article occurs for instance in both an ISI and another source. In the ISI source, there will be an ISI identifier, so that will be used as the unique code, even if there is a DOI field too. If the same article appears in another data source that does not have the ISI identifier, we'll run into a problem. The DOI field may be used as the identifier, and the articles are no longer identified as the same one. &lt;br /&gt;&lt;br /&gt;2) If you use this newest version of the ISI data importer to augment data from an earlier version, old articles will not be recognized as the same anymore, because in the previous version, there was no "isi" prefix to the code. If you plan to do that, you should append the string "isi:" (w/o the quotes) before every code field. You can do that by running the query below:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;UPDATE Articles SET Articles.code = "isi:"+[Articles].[code];&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-8090461729814016972?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/8090461729814016972/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2010/02/changes-in-isi-data-importer-increased.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/8090461729814016972'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/8090461729814016972'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2010/02/changes-in-isi-data-importer-increased.html' title='Changes in ISI data importer: increased compatibility with other sources'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-6419301420433906946</id><published>2010-01-25T07:13:00.000-08:00</published><updated>2010-01-27T05:00:52.050-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='video'/><category scheme='http://www.blogger.com/atom/ns#' term='demonstration'/><category scheme='http://www.blogger.com/atom/ns#' term='isi'/><title type='text'>ISI importer demonstration</title><content type='html'>After some good responses on my video demonstrating the Matrix Builder tool, I decided to make a video demonstrating the ISI Data Importer tool. Again, the video is available at Youtube. &lt;br /&gt;&lt;br /&gt;The video demonstrates the complete process from selecting your input file to starting the actual parsing, and highlights the new features of the parser that help you identify the next step. I hope it will be useful to you!&lt;br /&gt;&lt;br /&gt;&lt;object width="425" height="344"&gt;&lt;param name="movie" value="http://www.youtube.com/v/ajn-JVrPiNo&amp;hl=nl_NL&amp;fs=1&amp;"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/ajn-JVrPiNo&amp;hl=nl_NL&amp;fs=1&amp;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Update (January 27, 2010):&lt;/span&gt; &lt;br /&gt;OK, I am still learning how to deal with Youtube, and how to optimize the videos properly. I have changed the video for an HD version. If all goes well, that should work now. Watch in full screen at 720p quality for optimal viewing. It may take a little time for Youtube to process the video though.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-6419301420433906946?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/6419301420433906946/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2010/01/isi-importer-demonstration.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/6419301420433906946'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/6419301420433906946'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2010/01/isi-importer-demonstration.html' title='ISI importer demonstration'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-3509162603531334041</id><published>2010-01-15T07:28:00.000-08:00</published><updated>2010-01-27T04:37:01.881-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='video'/><category scheme='http://www.blogger.com/atom/ns#' term='matrix builder'/><category scheme='http://www.blogger.com/atom/ns#' term='demonstration'/><category scheme='http://www.blogger.com/atom/ns#' term='matrix compiler'/><title type='text'>Video demonstration</title><content type='html'>I did a first attempt at creating a demonstration video for our SAINT tools. The first tool to get it, is the Matrix Builder tool. It's very basic, and runs for less than a minute and a half. I will create more elaborate videos later on, but it may be useful for somebody already.&lt;br /&gt;&lt;br /&gt;&lt;object width="425" height="344"&gt;&lt;param name="movie" value="http://www.youtube.com/v/deOLG3Rdgzk&amp;hl=nl_NL&amp;fs=1&amp;"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/deOLG3Rdgzk&amp;hl=nl_NL&amp;fs=1&amp;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;Update: the video has been replaced with a higher resolution version. This should improve readability quite a lot. It may take a little while until the HD version is actually updated and  you'll notice the improved quality.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-3509162603531334041?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/3509162603531334041/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2010/01/video-demonstration.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/3509162603531334041'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/3509162603531334041'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2010/01/video-demonstration.html' title='Video demonstration'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-247947755565296837</id><published>2010-01-15T01:58:00.000-08:00</published><updated>2010-01-15T02:06:20.155-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='install'/><category scheme='http://www.blogger.com/atom/ns#' term='oops'/><title type='text'>Wrong file in download link</title><content type='html'>When uploading all the files, and updating the CMS to point to them, I made a mistake. The installer I file I pointed to from our CMS at the &lt;a href="http://www.rathenau.nl/tools"&gt;Rathenau Website&lt;/a&gt; was actually a very old installer. The installer pointed to from this blog was &lt;a href="http://www.assembla.com/spaces/srtools/documents/ag952O_5ur3PG2eJe5aVNr/download/SAINTInstall2010.01beta1.exe"&gt;the correct one&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The mistake has been fixed now, but still: embarrassing. Sorry for any confusion or inconvenience!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-247947755565296837?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/247947755565296837/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2010/01/wrong-file-in-download-link.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/247947755565296837'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/247947755565296837'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2010/01/wrong-file-in-download-link.html' title='Wrong file in download link'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-5101644229156110555</id><published>2010-01-13T00:45:00.000-08:00</published><updated>2010-01-13T00:58:43.992-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='release'/><category scheme='http://www.blogger.com/atom/ns#' term='2010.01 beta 1'/><title type='text'>Release 2010.01 beta 1 is out!</title><content type='html'>Yes, I finally made a new release ready for testing.&lt;br /&gt;&lt;br /&gt;This new release fixes several bugs, the one bugging you most probably being the &lt;a href="http://www.assembla.com/spaces/srtools/tickets/19-ISI-parser-does-not-fill-Couple_Articles_Author"&gt;ISI parser issue&lt;/a&gt;. Even though there was a work-around, it was an annoying issue. Sorry about that!&lt;br /&gt;&lt;br /&gt;Another big thing in this release is that it is the first release to include the community detection. As part of the Network Tools program, you can now employ the algorithm described in V. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre 2008, &lt;span style="font-style: italic;"&gt;Fast unfolding of community hierarchies in large networks&lt;/span&gt;. More on that algorithm you can find on their &lt;a href="http://findcommunities.googlepages.com/"&gt;website&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As already described in the last blog posting, there are also some changes under the hood of the Matrix Builder tool. Besides labels and edge values, you can now also output other properties to the Pajek .net output file, allowing you to display more aspects at the same time in your network.&lt;br /&gt;&lt;br /&gt;There are many other small tweaks and bugfixes in this version. Expect more additions before the final release. When that will be released? When it is done!&lt;br /&gt;&lt;br /&gt;Download the latest beta toolkit (installer for Microsoft Windows) &lt;a href="http://www.assembla.com/spaces/srtools/documents/ag952O_5ur3PG2eJe5aVNr/download/SAINTInstall2010.01beta1.exe"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-5101644229156110555?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/5101644229156110555/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2010/01/release-201001-beta-1-is-out.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/5101644229156110555'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/5101644229156110555'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2010/01/release-201001-beta-1-is-out.html' title='Release 2010.01 beta 1 is out!'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-3056879711024165843</id><published>2009-12-09T03:46:00.001-08:00</published><updated>2010-01-15T07:35:50.975-08:00</updated><title type='text'>Big changes in Matrix Builder</title><content type='html'>More news on updated tools: the Matrix Builder.&lt;br /&gt;&lt;br /&gt;While simple in concept, the Matrix Builder is a very important tool in our toolbox. And it has seen some important changes! Let me take you through them.&lt;br /&gt;&lt;br /&gt;Just like the ISI parser, the matrix compiler has had some subtle interface changes. The window can now be minimized, and the close button has been removed. The old OK button has been renamed to Run. This will result in less confusion.&lt;br /&gt;&lt;br /&gt;More importantly, is the introduction of &lt;span style="font-style: italic;"&gt;properties&lt;/span&gt;. That is: more data can be attached to the output than just connection strengths and vertex labels. There are many such properties possible, and we have a &lt;a href="http://www.assembla.com/spaces/srtools/tickets/9-Make-it-possible-to-use-Pajek-attributes"&gt;wish list&lt;/a&gt; for almost all of them. The basics are implemented, and all that remains now is to implement more 'plugins' to support more properties.&lt;br /&gt;&lt;br /&gt;Quite a few nice properties have already been implemented. The most useful will probably be the vertex size property (scale node sizes according to some value), and the vertex colouring.&lt;br /&gt;&lt;br /&gt;The new version will bear version number 1.3&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-3056879711024165843?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/3056879711024165843/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/12/big-changes-in-matrix-compiler.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/3056879711024165843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/3056879711024165843'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/12/big-changes-in-matrix-compiler.html' title='Big changes in Matrix Builder'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-9057697209700039321</id><published>2009-12-09T03:29:00.001-08:00</published><updated>2009-12-09T03:45:24.915-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='isi'/><title type='text'>ISI parser updates</title><content type='html'>It has been a long time since my last blog. That does not mean that developments have stopped! I'll try to fill you in on what happened in the meantime. I'll start with the oldest tool in the box: the ISI data importer.&lt;br /&gt;&lt;br /&gt;Recently, I made an update for the ISI data importer. Most changes are pretty small, but still...&lt;br /&gt;The first &lt;a href="http://www.assembla.com/spaces/srtools/tickets/17-Input-field-ISI-importer-is-too-limited"&gt;issue&lt;/a&gt; that was addressed, is a limitation that existed with selecting files. If you selected lots of files, especially if they had long file names in a deep level of you directory hierarchy, it could happen that you'd run over 32 thousand characters for the file names. That resulted in leaving some files out, without any warning! The issue was addressed by changing the way multiple selected files are displayed in the file selection widget. Instead of just listing all the file names including their paths (which you will never read anyway for 50+ files), you now get a listing like &lt;blockquote&gt;'file_1' and 999 other files in directory ''file_0.txt' and 999 other files in directory 'C:/Documents and Settings/andre/Desktop/data'&lt;/blockquote&gt;Much more readable. The way the display is formatted, depends on the number of files selected.&lt;br /&gt;&lt;br /&gt;Another issue that was addressed, is the laggy display of the progress. If you hid or obscured the progress window during a parsing operation, it would take untill the next new file untill it was updated again. This is now fixed. What's more, the window can be minimized, and the main window is hidden during the parsing process. The progress bars are also changed. The files progress bar has been replaced by a write progress bar, that displays the amount of parsed records (articles) being written to the database.&lt;br /&gt;&lt;br /&gt;A new feature has also been introduced: it is now possible to add data to an existing database! That means that if you select an existing database as your output file, you are now forced to choose what to do with that. You can either augment the existing data (no duplicates will be made) or you can overwrite the existing database completely.&lt;br /&gt;&lt;br /&gt;A user interface change will enforce that you actually have make a choice: The OK button has been renamed in a Run button, and the Close button has been removed. Closing is done by just closing the window. The Run button will only be available if no problems have been detected. If there are problems that prevent running, hovering your mouse over a small warning-sign icon will tell you about what they are.&lt;br /&gt;&lt;br /&gt;This new version of the ISI data importer will get version number 1.2.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-9057697209700039321?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/9057697209700039321/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/12/isi-parser-updates.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/9057697209700039321'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/9057697209700039321'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/12/isi-parser-updates.html' title='ISI parser updates'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-7759811685510607694</id><published>2009-08-06T08:50:00.000-07:00</published><updated>2009-08-06T09:05:47.292-07:00</updated><title type='text'>Progress with Relation Calculator</title><content type='html'>The relation calculator tool that I introduced in my last blog, is shaping up nicely. Sure, it is not all smooth sailing, but it is really starting to look like something that could be very useful indeed. Let's use a screenshot of the current version to illustrate where things are heading:&lt;br /&gt;&lt;a style="" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_s9HldKof-So/Snr8Mkmu0TI/AAAAAAAAAzU/-vsds_x9BWA/s1600-h/relationcalculator.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 363px; height: 263px;" src="http://1.bp.blogspot.com/_s9HldKof-So/Snr8Mkmu0TI/AAAAAAAAAzU/-vsds_x9BWA/s400/relationcalculator.png" alt="" id="BLOGGER_PHOTO_ID_5366879198880125234" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;What do you see?&lt;br /&gt;The main thing you'll notice is the area on the right, where you see boxes that are connected by lines. Each one of these boxes represents one simple step in the process of doing an analysis. In this case: calculating a Jaccard coefficient based on a co-word/cited reference combinations between journal papers. A nice, basic selection of such basic steps - components in the terminology of the application - is already there, though not fully implemented yet. The list of components can be extended later on using plugins. The available components are visible in the list on the top left.&lt;br /&gt;&lt;br /&gt;Each of the components have inputs on their left and/or outputs on their right. Outputs can be connected to one or more inputs for other components, thus creating a graph. Note that no circular connections are allowed. The user can use drag &amp;amp; drop to put components on the screen, and to connect inputs and outputs together.&lt;br /&gt;&lt;br /&gt;So, how do these separate components become an analysis?&lt;br /&gt;On the bottom left of the screen, you can see the Execution order window. Here you can see the order in which the components will be executed. This is determined by their connections, the question if they present an interface at run time, and their positions on the screen. You can open and save analysis sequences for reuse and distribution.&lt;br /&gt;&lt;br /&gt;Once you have hooked up every component in the right order, you can run it. You will be presented with a wizard-type interface that guides you through the UI elements that each of the components presents (if any), and that present information about the progress of your analysis if there are long run times involved.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-7759811685510607694?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/7759811685510607694/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/08/progress-with-relation-calculator.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/7759811685510607694'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/7759811685510607694'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/08/progress-with-relation-calculator.html' title='Progress with Relation Calculator'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_s9HldKof-So/Snr8Mkmu0TI/AAAAAAAAAzU/-vsds_x9BWA/s72-c/relationcalculator.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-6632151564350922169</id><published>2009-07-17T03:20:00.000-07:00</published><updated>2009-07-17T04:09:01.123-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='design'/><category scheme='http://www.blogger.com/atom/ns#' term='relation calculator'/><title type='text'>Building a new tool: Relation Calculator</title><content type='html'>As announced in the previous posting about development prioritites, I will be working on a tool to calculate scores for relations between objects in the database. That sounds very general, and it actually is. Allow me to elaborate...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;background&lt;/span&gt;&lt;br /&gt;The basic way of creating relations in our relational-database based data store is to create SQL queries that express the relation (e.g. correlation or distance measure) you are after. You can for instance relatively easily express a bibliometric coupling measure in SQL. It is also possible, but already more complicated, to express a Jaccard index over, say, title word similarities between articles in your database.&lt;br /&gt;&lt;br /&gt;This approach, while very flexible in theory, still has some limitations in actual practice. The first is user-related: not every researcher wanting to do this kinds of analysis is a hero in creating SQL queries. That limits the usefulness of the tool set, not because what people want can not be done, but because it has a too high learning curve to actually do so. Providing standard database views for standard analyzes only solves this to a limited level.&lt;br /&gt;&lt;br /&gt;Another limitation is that database engines are not always as efficient as they could be in evaluating the expressions that you need to construct often used measures. Also, they tend to use only a single thread to do a single query, thus making limited use of the resources of our modern multi-core computers. Using &lt;a href="http://gpgpu.org/"&gt;GPGPU&lt;/a&gt; techniques to speed up calculations is completely out of the scope of SQL for the foreseeable future. All this means that our calculations take a lot longer then they need to take, and sometimes run into arbitrary limits that they need not run into.&lt;br /&gt;&lt;br /&gt;As stated above, providing standard views can only work up to a point. We want to be able to calculate relationships between all kinds of items in the database (articles, journals, authors, ...), and we want to be able to use different measures for them as well (Jaccard, cosine, Salton, ...). Providing standard solutions for all of them is simply not doable. It would result in an exponential increase of pre-defined views for everything you want to add, and would basically create a mess in what is embedded in the database. That's not a very inviting prospect. What we need is something that is flexible enough to allow calculating relations between all kinds of items in all kinds of ways we can think of.&lt;br /&gt;&lt;br /&gt;So, we need a tool that can make these calculations:&lt;ol&gt;&lt;li&gt;more easy to use (at least for the standard analyzes), and&lt;br /&gt;&lt;/li&gt;&lt;li&gt;faster to compute, while still&lt;/li&gt;&lt;li&gt;be as flexible as possible.&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-weight: bold;"&gt;introducing Relation Calculator&lt;br /&gt;&lt;/span&gt;The Relation Calculator tool should become that tool for you. It uses a concept of building blocks, that can be put together to create a calculation that makes sense. Each building block will be responsible for a small part of the chain, like selecting which database to operate on, selecting views and fields, loading the data from the database, calculating a Jaccard index, etc. Each building block has inputs and/or outputs, that can be connected to each other. In this way, a calculation for a relation can be defined. Some building blocks will present a UI to the user during execution of the calculation, for instance to allow selecting a database or some parameter like a threshold. Other building blocks will just perform some service, or maybe even one simple logical operation. Of course, these configurations of blocks can be loaded and saved, to be re-used later on. A set of pre-defined configurations can then be presented to the user.&lt;br /&gt;&lt;br /&gt;New building blocks can be added as plugins later on, making the tool extensible. Another option to extend the functionality is to use building blocks that execute a script as their payload. For instance, you would be able to define a function in JavaScript that expresses a relationship between two authors based on some input data. That script can then be used in a configuration. The possibilities are virtually endless.&lt;br /&gt;&lt;br /&gt;I envision a graphical environment where a user would be able to drag and drop building blocks on a canvas and graphically connect the input and outputs of the blocks. This would create a very simple way to define new configurations to calculate new relations.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;relation to existing code&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;There already is some code that does the kind of work I described here. Like I mentioned in my previous blog, the current Record Grouper basically calculates such relations already. There is also already some code available that lays the basis for a script-based relation calculator. These existing pieces will of course not be just thrown away. They will have to be refactored to be able to use them as building blocks in the new Relation Calculator.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;status&lt;/span&gt;&lt;br /&gt;I am now very bussy implementing the infrastructure for all this. Though it is a lot of work, I am confident that it will work. Some details still have to be filled in, but I don't expect major obstacles in the near future. I hope to get a basic model working soon.&lt;br /&gt;&lt;br /&gt;The idea is that&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-6632151564350922169?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/6632151564350922169/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/07/building-new-tool-relation-calculator.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/6632151564350922169'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/6632151564350922169'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/07/building-new-tool-relation-calculator.html' title='Building a new tool: Relation Calculator'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-4060390674956163735</id><published>2009-06-30T02:38:00.000-07:00</published><updated>2009-06-30T04:09:56.634-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='record grouper'/><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='word splitter'/><category scheme='http://www.blogger.com/atom/ns#' term='relation calculator'/><title type='text'>Development priorities</title><content type='html'>We have set some development priorities after the first release that we did recently. These are partly reflected in the &lt;a href="http://www.assembla.com/spaces/srtools/tickets"&gt;Ticket tracker&lt;/a&gt;, but to be honest that doesn't quite do it justice yet.&lt;br /&gt;&lt;br /&gt;A small but important update is to change the output format for the Matrix Compiler tool. Currently, it outputs to a full matrix in DL format. This is a bit inflexible, as it does not allow attributes other than a label for the nodes, and it outputs a full matrix even for a large, sparce matrix. That results in bigger output files, and thus longer processing/io time. In the (hopefull near) future, it will allow for the addition of including attributes to nodes as well as connections. You can track the progress on this issue &lt;a href="http://www.assembla.com/spaces/srtools/tickets/8-Change-output-format-to-list-format"&gt;here&lt;/a&gt;. We're already testing it...&lt;br /&gt;&lt;br /&gt;A larger task has to do with &lt;a href="http://www.assembla.com/spaces/srtools/tickets/10-Split-record-grouper"&gt;reworking the Record Grouper and the Relation Calculator&lt;/a&gt;. The first part of that job is to specialize the Record Grouper to do only that: group objects based on some kind of relation between them. This means ripping out a large piece of complicated code (but not throwing that away, see later) and focus on a good UI to make it easier to work with. This means fine tuning the layout, but also add options to undo, &lt;a href="http://www.assembla.com/spaces/srtools/tickets/4-Add-save-option-to-recordgrouper"&gt;store and re-load your work&lt;/a&gt;, etc. That is a lot of work in itself, but it will simplify the code considderably making it easier to maintain.&lt;br /&gt;&lt;br /&gt;Another large task is to get the Relation Calculator into a usable shape. This is a complex tool. The basic idea is that will become a specilized tool to calculate a similarity or distance measure between any two objects in the database, and be as flexible as possible as to how to calculate that. Currently, we only use SQL queries to calculate such scores, but that is sometimes limited, often complex, and usually relatively slow because most SQL backends don't use multi-threading for single queries, let alone utilize things like letting the videocard do work for you.&lt;br /&gt;&lt;br /&gt;You can express a lot of such measures in SQL, but it is often complex to do, especially if you are not that used to using databases. That makes the current way of working harder to use for novices, but also for seasoned researchers who just are not that into SQL. It is however more flexible than being stuck to defaults too much. In this tool, we want to make the standard things easy, and yet be as flexible as possible to enable more advanced use.&lt;br /&gt;&lt;br /&gt;The goal is to make the standard analysis that you would normally run on a database generated by the ISI Data Importer and processed by the Word Splitter and optionally the Record Grouper as easy as selecting them and optionally set some time slices or threshold, after which ready-to-analyze output is generated. We aim to release a first working combination of these tools at the end of August.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-4060390674956163735?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/4060390674956163735/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/development-priorities.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/4060390674956163735'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/4060390674956163735'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/development-priorities.html' title='Development priorities'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-1149955971152420804</id><published>2009-06-29T04:06:00.000-07:00</published><updated>2009-06-29T04:11:40.934-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='name'/><title type='text'>New name: SAINT</title><content type='html'>We did it: we came up with a nice acronym as a name for the toolkit. Since Science Research Tools is a bit generic, and SciSA toolkit too much linked to our department name, we came up with a new name: SAINT. That is an acronym for &lt;span style="font-weight: bold;"&gt;S&lt;/span&gt;cience &lt;span style="font-weight: bold;"&gt;A&lt;/span&gt;ssessment (or Analysis, at your choice) &lt;span style="font-weight: bold;"&gt;I&lt;/span&gt;ntegrated &lt;span style="font-weight: bold;"&gt;N&lt;/span&gt;etwork &lt;span style="font-weight: bold;"&gt;T&lt;/span&gt;oolkit. This name will be used everywhere to refer to the toolkit from now on, though it will take a bit of time for it to be used everywhere consistently. Note that our URL's will not change, so there is no need to update your bookmarks.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;SAINT will save you (time). &lt;/span&gt;&lt;br /&gt;That's just one of the many possible catchphrases of course... We'll come up with some new ones in due time to advertise the toolkit to the outside world. If you have any suggestions, please let us know!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-1149955971152420804?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/1149955971152420804/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/new-name-saint.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/1149955971152420804'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/1149955971152420804'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/new-name-saint.html' title='New name: SAINT'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-1743994220867783333</id><published>2009-06-26T01:55:00.000-07:00</published><updated>2009-06-26T02:41:55.652-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='#essconf09'/><category scheme='http://www.blogger.com/atom/ns#' term='release'/><category scheme='http://www.blogger.com/atom/ns#' term='demonstration'/><title type='text'>Mini-demo during coffee breaks</title><content type='html'>The official demonstration session on the e-Social science conference today is scheduled for the last 20 minutes of the lunch break. Because of the nature of the lunches (a three course affair that seems to run over the allotted time every time so far), I have decided to cancel this demonstration. Experience by others so far are that no-one shows up for these sessions.&lt;br /&gt;&lt;br /&gt;For those interested, I will be giving mini demonstrations with my laptop only in the coffee breaks in the morning and/or afternoon. Simply find me, and ask! I am exited to fold open my laptop for you to show you our work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-1743994220867783333?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/1743994220867783333/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/mini-demo-during-coffee-break.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/1743994220867783333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/1743994220867783333'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/mini-demo-during-coffee-break.html' title='Mini-demo during coffee breaks'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-7582513243483933192</id><published>2009-06-26T00:07:00.000-07:00</published><updated>2009-06-26T01:52:14.479-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='introduction'/><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='pajek'/><category scheme='http://www.blogger.com/atom/ns#' term='matrix compiler'/><title type='text'>Introducing: Matrix compiler</title><content type='html'>In this third instalment of the 'Introducing...' series, I will be talking about the Matrix Compiler tool. While you can already see a lot of interesting things by just looking at the tables and views in the database that contains your data, visualization can be a huge benefit to recognize patterns. That means that you will need to somehow transform your data out of the database into a format that you can use for visualization.&lt;br /&gt;&lt;br /&gt;The Matrix Compiler is a tool that can do this. It can use the database and translate it's data into a format that you can read into Pajek, a well known visualization and network analysis package.&lt;br /&gt;&lt;br /&gt;Analysing and visualizing networks means that you will be dealing with two kinds of things. First, there are the objects that are connected, which we&lt;sup&gt;*&lt;/sup&gt; will call the nodes. Next, there are the connections between those nodes themselves. At least the latter should be available as a table or query/view in your database. For briefness, I will call them all a view from here. Depending on how you build up your database, it is possible that your connections view will contain complete labels (or other attributes), but it very possibly may only contain an ID of the node in the database, and the label for the node is defined in some other view. The Matrix Compiler supports both modes of operation: the labels for the nodes can be retrieved from either the connections view, or be looked up from an external view.&lt;br /&gt;&lt;br /&gt;After opening the database and indicating the name of the output file, the matrix can be set up. When constructing the matrix, you start with selecting the connection view, by placing the cursor in the corresponding box and either typing the name or selecting it from the view by double clicking on it. You then select which field in that view represents, respectively, the value for the relationship, the row and the column. A connection view thus needs to have at least three fields: two to represent the nodes you are connecting, and one to indicate the strength of that connection.&lt;br /&gt;&lt;br /&gt;The next step is to define the structure of your matrix. There are many options for that. If the types of objects in the rows and columns are the same, you may want to create a square matrix where all the nodes appear as both a row and a column. This makes sense if you are, for instance, constructing a matrix to represent co-authorships, but not if you want to display in which journals authors publish.&lt;br /&gt;&lt;br /&gt;If you chose to create a square matrix, you may also choose if you want the matrix to be symmetric or not. In a symmetric matrix, the value of M(a,b) would be identical to M(b,a). Again, in case of co-authorships the meaning of author a sharing a co-authorship with author b is the same as saying that author b is sharing a co-authorship with author a. But for a citation relationship, a citation from a to b is different from one from b to a.&lt;br /&gt;In both cases, you can also choose if you want to set the diagonal, that is M(a,a) to a set value, or if you want to use values occuring in your data. This can make sense to filter out things like self-citations from your data.&lt;br /&gt;If your data contains data for the same relation more than once, you can choose how to deal with that. The options are to use the first occurrence, use the last, add the occurring values or multiply all the occurring values.&lt;br /&gt;&lt;br /&gt;The last step of creating your matrix, is to choose where the labels for the nodes should come from. As explained above, there are two basic options. If the correlation view already contains the labels, just select the appropriate option and you are done. If that view contains references to nodes though, you can now select where to get the actual labels. To continue our example on the co-authorships, it would make sense to select the Authors table as the table to find the labels, and use the full author name as the label for your nodes in the network.&lt;br /&gt;&lt;br /&gt;Note that if you use an external source of labels, you can choose how to deal with nodes in your label view that do not appear in the correlation view. For instance, authors in your Authors table may not have any co-authorships. That means that they will not show up in your correlation view. You may or may not want to include these unconnected nodes. The choise is yours.&lt;br /&gt;&lt;br /&gt;Note that outputting very big matrix files, can take some time, as the output size is O(n&lt;sup&gt;2&lt;/sup&gt;). We are planning to change the output format shortly, from a matrix form to a list form. That will result in smaller output files for big matrices, and will also allow the inclusion of attributes other than a label to both the nodes and the connections.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;* Pajek itself uses a different terminology. It instead talks about Vertices for the nodes, and Arcs and Edges for the connections between these nodes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-7582513243483933192?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/7582513243483933192/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/introducing-matrix-compiler.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/7582513243483933192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/7582513243483933192'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/introducing-matrix-compiler.html' title='Introducing: Matrix compiler'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-5127407622009160264</id><published>2009-06-24T21:43:00.000-07:00</published><updated>2009-06-24T22:05:54.471-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='introduction'/><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='word splitter'/><title type='text'>Introducing: Word Splitter</title><content type='html'>In the second installment of the "Introducing" series, I will tell you something about our small utility called Word Splitter. The idea of the word splitter is simple. You point it to a field in your database that contains text, like a title or an abstract. The word splitter then builds up a table with all the words that occur in the database in that field, and a table with pointers between the record identifier that the text came from and the record identifier in the words table, so you can find which words belong to which text-record. To make it possible to re-construct the text based on the sentence, the position the word was in is also stored in that pointer table.&lt;br /&gt;&lt;br /&gt;Our tool can do more than that though. First of all, it can process multiple text fields from your database at the same time, making it more efficient to work with for you. So, you can split both that title as well as that abstract simultaniously.&lt;br /&gt;&lt;br /&gt;Furthermore, the Word Splitter can use stopwords. Stopwords are, in their basic form, lists of words that are ignored when splitting the text, for instance because they are too common. That means that if you use stopwords, not all words from the text will be stored in the Words table, nor will pointers occur in the couple table. However, for different purposes, you may want to think of the procedure in two ways. One option is to first split the complete text, then remove the stopwords, and then store the words and their positions &lt;span style="font-style: italic;"&gt;after removing the stopwords&lt;/span&gt; to the database. This will result in consecutive word positions in the couple table, even if there used to be one or more stopwords between two words in the original text. Alternatively, you can split the complete text, note each word's position, remove the stopwords and only then store them into the database. This will result in word positions that reflect the original position of the word in the text, but leaves them non-consecutive.&lt;br /&gt;To provide maximum flexibility in the analysis, &lt;span style="font-style: italic;"&gt;both&lt;/span&gt; these positions are stored in the couple table.&lt;br /&gt;&lt;br /&gt;The Word Splitter can use several stop word lists at once, and furthermore knows three kinds of lists. First, it can use simple text files that contain lists of words. Second, it can use a field in an existing database that also contains a list of words. And last, it can use lists of regular expressions. These expressions are patterns that each word is matched against, and if it matches, it is regarded as a stop word. That allows you to, for instance, filter out numbers or dates without having to write them all out.&lt;br /&gt;&lt;br /&gt;To make it easy to use these stop word lists, you can create sets of these lists, and store such a set as a file again. This way, you can easily review the stop words you used for your analysis, and you can re-use the same set later on. You can also use one such a set as your default set of stop words.&lt;br /&gt;&lt;br /&gt;This was a basic introduction to our Word Splitter tool. I hope you will like using it!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-5127407622009160264?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/5127407622009160264/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/introducing-word-splitter.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/5127407622009160264'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/5127407622009160264'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/introducing-word-splitter.html' title='Introducing: Word Splitter'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-3562312008402643283</id><published>2009-06-24T12:43:00.000-07:00</published><updated>2009-06-25T00:52:34.823-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='launch'/><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='install'/><category scheme='http://www.blogger.com/atom/ns#' term='#essconf09'/><title type='text'>Installer for windows online</title><content type='html'>With the official launch of the toolkit only a night away, I have just uploaded an &lt;a href="http://www.assembla.com/spaces/srtools/documents/cgjVQEyjWr3PPceJe5afGb/download/SciSA_Toolkit_Install.exe"&gt;installer&lt;/a&gt; for the Windows platform to the &lt;a href="http://www.assembla.com/spaces/srtools/documents"&gt;file storage&lt;/a&gt; we have on Assembla. Of course, our &lt;a href="http://www.rathenau.nl/tools"&gt;website&lt;/a&gt; has been updated to reflect that. Other files you can find there include documentation, but also testcases to reproduce bugs.&lt;br /&gt;&lt;br /&gt;Tomorrow at 11 AM, I will do the first demonstration (note, the time has changed from the earlier announcement) on the e-Social Science conference in Cologne. I will give a little bit of background, and then quickly move on to actually showing the attendants the tools on some real life data. Of course, I will also show some pretty pictures that we made using the tools, courtesy of Edwin (thanks!)&lt;br /&gt;&lt;br /&gt;I hope everything will go all right. There will be another demonstration on friday, so plenty of opportunities to see the tools in action!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-3562312008402643283?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/3562312008402643283/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/installer-for-windows-online.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/3562312008402643283'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/3562312008402643283'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/installer-for-windows-online.html' title='Installer for windows online'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-4050434954921597036</id><published>2009-06-19T05:10:00.000-07:00</published><updated>2009-06-19T05:19:27.308-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='issue tracker'/><category scheme='http://www.blogger.com/atom/ns#' term='website'/><category scheme='http://www.blogger.com/atom/ns#' term='source'/><title type='text'>Open source repository and issue tracker online</title><content type='html'>Yesterday, we reached an important milestone in the project. We have put a &lt;a href="http://code.assembla.com/srtools/git/nodes/stable"&gt;public repository&lt;/a&gt; online that contains the complete source code for all the tools. That's right: you can download the sources, tinker with them, and use them however you want.&lt;br /&gt;&lt;br /&gt;We selected &lt;a href="http://www.assembla.com"&gt;Assembla&lt;/a&gt; as our hosting for this project. It supports the Git distributed source code repository system, and nicely integrates that with an &lt;a href="http://www.assembla.com/spaces/srtools/tickets"&gt;issue tracker&lt;/a&gt;. So far, it seems to be pretty flexible and works nicely. While the institute is developing her new website, we have put up a &lt;a href="http://www.rathenau.nl/tools"&gt;temporary website&lt;/a&gt; on the toolkit as well. The address will not change once the new site is up, so bookmark away!&lt;br /&gt;&lt;br /&gt;If you are familiar with C++, or want to learn that: try your hand to help develop these tools. It's really not all that difficult. Of course, just reporting issues, suggesting documentation updates or giving ideas for improvements and extensions are also very valuable contributions!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-4050434954921597036?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/4050434954921597036/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/open-source-repository-and-issue.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/4050434954921597036'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/4050434954921597036'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/open-source-repository-and-issue.html' title='Open source repository and issue tracker online'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-796733102890341666</id><published>2009-06-19T04:53:00.000-07:00</published><updated>2009-06-19T05:10:22.384-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='web of knowledge'/><category scheme='http://www.blogger.com/atom/ns#' term='isi'/><title type='text'>Introducing: the ISI Data Importer</title><content type='html'>This is the first of what is to become a series of postings to introduce all the tools in the toolkit. I hope it will give a clear overview of what kind of tools we offer, and what they do.&lt;br /&gt;&lt;br /&gt;The ISI data importer is aimed at importing bibliographic data that you downloaded from ISI/Web of Knowledge. You can download data on the resulting articles from your searches in a text format. The ISI Data Importer tool can read these files and output them to a structured database format. The usage of structured databases is one of the basic ideas of the Scrience Research Toolkit. Using structured, standard databases to house the data allows us to use standard tools. Databases have been in development for decades, and are quite efficient for many tasks that suit the kind of work we do with the data. Also, getting the data in a form that is as structured as possible, gives us maximum flexibility.&lt;br /&gt;&lt;br /&gt;The interface of the ISI Data Importer is quite simple: &lt;a style="" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_s9HldKof-So/Sjt-HmIR_5I/AAAAAAAAAyw/VHtYJBCzWgY/s1600-h/isidataimporter.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 384px; height: 182px;" src="http://1.bp.blogspot.com/_s9HldKof-So/Sjt-HmIR_5I/AAAAAAAAAyw/VHtYJBCzWgY/s320/isidataimporter.png" alt="" id="BLOGGER_PHOTO_ID_5349007651392061330" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;On the first tab, you select the input file or files. You can select as many files as you want, as long as they are located in a single directory. As Web of Knowledge only allows you to download a maximum of 500 records in one go, you can end up with lots of separate files that all contain a fraction of your data. Simply select them all, and they will all be imported in a single run. Double records will automatically be filtered out, so if you have created several sets that can overlap in their results, you will end up with a single, unified set without double data points that can ruin your similarity measures later on.&lt;br /&gt;&lt;br /&gt;On the output tab, you can select an output file. Currently the only supported database backend are Microsoft Access files, but we are working on extending that to include other and better database backends. Access can be a bit limiting and slow, especially if you work with large datasets. The filename you select does not need to exist yet. It will simply be created for you if it doesn't.&lt;br /&gt;&lt;br /&gt;Optionally, you can filter the data on the document types. Some of the more frequently occurring document types are included in the list on the Filter page. If you are missing an option, let me know, and I'll add it. Better yet: simply patch the list yourself, the sources are available!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-796733102890341666?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/796733102890341666/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/introducing-isi-data-importer.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/796733102890341666'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/796733102890341666'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/introducing-isi-data-importer.html' title='Introducing: the ISI Data Importer'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_s9HldKof-So/Sjt-HmIR_5I/AAAAAAAAAyw/VHtYJBCzWgY/s72-c/isidataimporter.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-6209464096514337807</id><published>2009-06-17T07:28:00.000-07:00</published><updated>2009-06-25T00:51:48.547-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='#essconf09'/><category scheme='http://www.blogger.com/atom/ns#' term='release'/><category scheme='http://www.blogger.com/atom/ns#' term='demonstration'/><title type='text'>Demonstration on e-Social Science conference</title><content type='html'>As announced in the introductory posting, we will be launching our toolkit on the e-Social Science conference in Colone, Germany. We will be doing that in a 20 minute demonstration session, where will we demonstrate how you can easily use a set of data downloaded from ISI/Web of Knowledge to create some maps of a research field, using a couple of database queries and our tools.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:85%;" &gt;update, june 18:&lt;/span&gt;&lt;br /&gt;&lt;s&gt;As soon as I know the exact time, date and location of this session, I will post it here.&lt;/s&gt;&lt;br /&gt;There will be two demonstration sessions:&lt;br /&gt;&lt;br /&gt;11:00 – 11:30:  Thursday 25 June&lt;br /&gt;&lt;s&gt;16:00 – 16:30:  Thursday 25 June&lt;/s&gt;&lt;br /&gt;13:40 – 14:00:  Friday 26 June.&lt;br /&gt;&lt;br /&gt;All demo's have been allocated to take place in the main foyer of Maternushaus.&lt;br /&gt;&lt;br /&gt;If you happen to be at this conference, don't hesitate to join in for this demonstration!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-6209464096514337807?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/6209464096514337807/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/demonstration-on-e-social-science.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/6209464096514337807'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/6209464096514337807'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/demonstration-on-e-social-science.html' title='Demonstration on e-Social Science conference'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7633517363101745721.post-5219328391777639197</id><published>2009-06-12T02:20:00.001-07:00</published><updated>2009-06-25T00:53:12.632-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='open source'/><category scheme='http://www.blogger.com/atom/ns#' term='introduction'/><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='patentometrics'/><category scheme='http://www.blogger.com/atom/ns#' term='#essconf09'/><category scheme='http://www.blogger.com/atom/ns#' term='bibliometrics'/><title type='text'>First post</title><content type='html'>Every blog needs an introductory posting, and this one is no different. What is it about? What can we expect to hear? Why even bother blogging? Those are the kinds of questions both you and I would like to see answered. "You &lt;span style="font-style: italic;"&gt;and&lt;/span&gt; I", I hear you wonder? Yes, because it is not completely clear to me either what exactly I will and will not write about yet. So let's start with giving some idea about what I am doing, and why that is interesting to keep a weblog about.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.rathenauinstituut.com/showpage.asp?steID=2&amp;amp;ID=3062"&gt;Science System Assessment&lt;/a&gt; department of the &lt;a href="http://www.rathenauinstituut.com/default.asp?steID=2&amp;amp;ID=2351"&gt;Rathenau Instituut&lt;/a&gt; is dealing, among other things, with trying to apply bibliometrics and patentometrics to map the dynamics of science and knowledge transfer. The problem that our department quickly ran into was that the available tools that can deal with this kind of data are relatively far between, require many manual, error prone and labour-intensive steps and don't fit together well. Worse, we soon ran into limitations with the amount of data we could handle in them that started to affect our research.&lt;br /&gt;&lt;br /&gt;So, we decided to build some tools ourselves. Seeing that the tools that were (and are) available are not open, we had to start from scratch. That presented both a challenge and an opportunity, because in this way we could also rethink basic issues of how these tools should work. We decided to go for a design where all tools work against standard relational databases in which we structure the available data as well as possible. We also wanted the tools to be easy to use, so a clear graphical interface was a must. Since I have experience developing software using the excellent C++ based &lt;a href="http://www.qtsoftware.com/"&gt;Qt toolkit&lt;/a&gt;, I chose to use that as the environment to build these tools in. As an added bonus, cross platform compatibility as well as database back end independence come practically for free.&lt;br /&gt;&lt;br /&gt;As the first tools began to be available in early versions, more and more ideas about what else we could do and needed begun to pop up, and soon the idea to build some tools led to a complete toolkit that is still growing. Now the time has come to make these tools available to you too. The toolkit will officially be launched at the &lt;a href="http://www.ncess.ac.uk/conference-09/"&gt;5th conference of the National Centre for e-Social Science in Cologne&lt;/a&gt;. The first tree tools will be released in their "1.0" or "ready to use" versions, while the rest of them are made available "as is". Because we would have liked to contribute to the exisiting tools but could not, we have decided to avoid the same issue with our initiative.&lt;br /&gt;&lt;br /&gt;We would love to hear from you, and even better, to work with to to improve these tools! We expressely invite you to use them, test them, and improve on them. To make that possible, we are making all sourcecode available under a liberal open source licence. We will also make a public issuetracker available, as well as a forum and other collaboration tools.&lt;br /&gt;&lt;br /&gt;And that brings us to the why of this blog: we feel that it is important to keep you up to date with what is happening, what we are planning, and what others are doing with these tools. This blog is one of the ways in which you can do that. We are also working on a nice website, and a temporary website will be up soon. If you have other ideas about how to communicate, want to aggregate your own blog, or have any other comments: don't hessitate to &lt;a href="http://www.rathenau.nl/showpageBreed.asp?steID=1&amp;amp;item=2023"&gt;contact me&lt;/a&gt;. I'd love to hear from you!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7633517363101745721-5219328391777639197?l=srtools.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://srtools.blogspot.com/feeds/5219328391777639197/comments/default' title='Reacties plaatsen'/><link rel='replies' type='text/html' href='http://srtools.blogspot.com/2009/06/first-post.html#comment-form' title='0 reacties'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/5219328391777639197'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7633517363101745721/posts/default/5219328391777639197'/><link rel='alternate' type='text/html' href='http://srtools.blogspot.com/2009/06/first-post.html' title='First post'/><author><name>André</name><uri>http://www.blogger.com/profile/01802493956014089291</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://3.bp.blogspot.com/_s9HldKof-So/SjIkyJ7yfLI/AAAAAAAAAx0/BE3yYxDBMAY/s1600-R/Andre%2520Somers.JPG'/></author><thr:total>0</thr:total></entry></feed>
