GNU Planet

Syndicate content
Just another WordPress.com weblog Robert Millan's blog 2010-09-07T06:18:06Z ... for the curious in you GNOME Commit-Digest 2010-09-07T06:16:56Z Neal H. Walfield blog/hacking 2010-09-04T14:17:06Z Neal H. Walfield blog/hacking 2010-09-04T14:17:06Z your cadillac aint no betta' than my bus stop hesa's blog » Free Software 2010-09-07T06:15:53Z News, announcements and comments from the developers of the Bazaar distributed version control system Bazaar developers' blog 2010-09-07T06:15:42Z News, announcements and comments from the developers of the Bazaar distributed version control system Bazaar developers' blog 2010-09-07T06:15:42Z your cadillac aint no betta' than my bus stop hesa's blog » Free Software 2010-09-07T06:15:53Z ... for the curious in you GNOME Commit-Digest 2010-09-07T06:16:56Z Neal H. Walfield blog/hacking 2010-09-04T14:17:06Z Eitan's Pitch Monotonous.org » Software 2010-08-25T22:18:01Z nickclifton - LiveJournal.com nickclifton 2010-09-07T06:16:32Z ... for the curious in you GNOME Commit-Digest 2010-09-07T06:16:57Z
Updated: 1 hour 57 min ago

Debian Installer with ZFS

Mon, 09/06/2010 - 15:00
Long time no see… Not much hacking in the last few months. Not much ranting either (some of you I’m sure will appreciate ;-). Anyway, I recently grew excited to learn that ZFS is coming to Debian. I decided to bite the bullet, patched the missing bits in GRUB and Parted, a few small changes [...]

Iliad 0.9 is out!

Mon, 09/06/2010 - 15:00
We are proud to announce the release of Iliad 0.9. Changes include: several improvements in Formula, including number input support a Better detection of dirty widgets to avoid rebuilding widgets several times iliad.js can be enabled/disabled with iliad.enableAjax() / iliad.disableAjax() improvements of the API for redirection links the initial redirect in the dispatcher has been removed support for even/odd columns in ILDatagrid as usual, a lot of bug has been fixed Iliad 0.9 is available for GNU-Smalltalk, a Pharo port will follow. You can install it using gst-package: gst-package --download grease -t ~/.st gst-package --download iliad -t ~/.st read more 2010-09-06T12:02:57Z Nicolas Petton

GNU Parallel 20100906 released

Mon, 09/06/2010 - 07:00
GNU Parallel 20100906 has been released. It is available for download at: http://ftp.gnu.org/gnu/parallel/ New in this release: Using --shebang GNU Parallel can be used as the parser for a script. E.g: #!/usr/bin/parallel --shebang traceroute (followed by lines of hosts) First community generated bugfixes Alt Linux package of GNU Parallel. Thanks to Michael Shigorin Sunfreeware package of GNU Parallel. Thanks to Steven M. Christensen Untested CentOS, Fedora, Mandriva, RedHat, and SUSE packages available through OpenSUSE build service: https://build.opensuse.org/package/show?package=parallel&project=home%3Atange Review of GNU Parallel. Thanks to Andrew McFague amcfague at wgen dot net http://www.andrew-mcfague.com/linux/utilities-linux/commands-every-serious-nix-user-should-know/#parallel First 1000 views of the intro video http://www.youtube.com/watch?v=OpaiGYxkSuQ or at http://tinyogg.com/watch/TORaR/ and http://tinyogg.com/watch/hfxKj/ sql - a small script to access sql bases from the command line which is a handy companion to parallel --colsep About GNU Parallel GNU Parallel is a shell tool for executing jobs in parallel using one or more machines. A job is typically a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. If you use xargs today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. If you use ppss or pexec you will find GNU Parallel will often make the command easier to read. GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs. You can find more about GNU Parallel at: http://www.gnu.org/software/parallel/ Watch the intro video on http://www.youtube.com/watch?v=OpaiGYxkSuQ or at http://tinyogg.com/watch/TORaR/ and http://tinyogg.com/watch/hfxKj/

Issue 100

Sun, 09/05/2010 - 23:00
This week… 2545 commits, in 212 projects, by 256 happy hackers (and 731 were translation commits). Eitan Isaacson added a checkbox to use the default colour theme when highlighring keys in Caribou . (GNOME bug 622246) Joanmarie Diggs added support for object toolkis in Orca, this allows Orco to handle cases where multiple toolkits are used in [...]

GNU Guile 1.9.12 released

Sun, 09/05/2010 - 07:00
We are pleased to announce GNU Guile 1.9.12, the thirteenth and one of the last pre-releases before 2.0, featuring a compiler and virtual machine and a large set of exciting new features. As usual feedback, bug reports, portability issues, etc., are all welcome! Summer holidays in this hemisphere reduced the release pace, though the list of news and bug fixes is pretty long. These changes bring us closer to our expectations for 2.0. Among other things 1.9.12 comes with a new "recursive REPL" for debugging (similar to Emacs' recursive editing levels), improvements to the dynamic foreign function interface (FFI), and various R6RS bug fixes. See http://lists.gnu.org/archive/html/guile-devel/2010-09/msg00033.html for the original announcement.

data-stream-processing

Sat, 09/04/2010 - 16:00
Data Stream Processing I've spent the past few weeks reading about data stream processing. The underlying problem that research in data stream processing tries to address is given a stream of data and a standing query but not enough space to store all the data (or not enough processing power to rescan all the data each time the query is evaluate), how do we effectively evaluate the query? Consider the problem of determining how frequently an event occurs in the recent past. The recent past is relative. Thus, it is not enough to simply increment a counter when the corresponding event occurred, we also need to decrement the counter as recorded events leave the recent past. The naïve solution is to record the time at which each event occurred. The amount of space required is proportional to the number of events. If there are a lot of events, e.g., every use of a file, this adds up to a lot of space. Moreover, processing the query will be expensive. The space requirement is typically expressed as the amount of buffer space be sub-linear with respect to the data that is relevant to the query. Meeting this requirement is the domain of data stream processing. Limiting space often requires accepting approximate answers: computing exact answers sometimes requires more state than is available. Often, however, exact answer are not required; a good approximation includes the degree of error. Data stream processing typically tries to bound relative error. Relative error is the difference between the reported answer and the true answer. Thus, an acceptable relative error of at most 50% allows reporting any value between 5 and 15 if the correct value is 10 and any value between 500 and 1500 if the correct value is 1000. Another way to think about relative error is that the first few decimal places are correct and the rest is a (good) guess. Introductory Literature There are a lot of papers that have been published on data stream processing in the last dozen years. I found that "Issues in Data Stream Management--A Survey" by Golab and Özsu (2003) to be a useful survey of the issues being addressed in data stream processing community (note the SIGMOD Record paper is a shorter, abbreviated version). A really nice, concise text book on data streams is "Data Streams: Algorithms and Applications" by S. Muthu Muthukrishnan. The model that I've become rather interested in is the sliding window model, in which the stream is infinite but only the last N elements or time units are relevant to the query. This model is explained well in "STREAM: The Stanford Data Stream Management System" by Arasu et al. My Motivation: Capturing Files' Histories The example of tracking all accesses to all files is what motivated me to take a deeper look at data streaming processing. I'm currently working on a project (variously known as Woodchuck, Smart Storage and NetCzar) to enable aggressive prefetching of network data, particularly in the context of mobile devices, such as e-mails, RSS feeds, Podcasts, and place-shifted video. The major challenges of this project include: scheduling the downloads, determining how much space each application should use, and identifying data that can be deleted. The former two issues are topics for another post. As for determining what can be deleted: each application could implement this logic, however, that's a fair amount of work. We're investigating how effectively we can centralize this. To determine what should be deleted, we need to understand when the file no longer has value or only has marginal value. For a Podcast episode, this might be once it has been listened. For news-like content, this is may be once an episode been superseded by a new episode. In most cases, understanding the access history of the file as well as related files appears essential. My goal is to succinctly capture the access history of all relevant files. Conceptually, we can think of file accesses as access events. Associated with each file is a stream of access events. As implied by this article, (we think) saving all accesses is too expensive and we'd like a compact data structure that summarizes the history of the file. We'd like to be able to ask questions of the sort: how recently was a particular file last accessed? How frequently was the file used in the last month? in the last year? Counting Data Structures: Exponential Histograms & Waves After reviewing several dozen papers, a score or so in depth, I identified two data structures that appear to enable us to answer these recency and frequency queries: exponential histograms (from "Maintaining Stream Statistics Over Sliding Windows" by Datar et al.) and waves (from "Distributed Streams Algorithms for Sliding Windows" by Gibbons and Tirthapura). Both of these data structures are used to solve the so-called counting problem, the problem of determining, with a bound on the relative error, the number of 1s in the last N units of time. In other words, the data structures are able to answer the question: how many 1s appeared in the last n units of time within a factor of Error (e.g., 50%). The algorithms are neat, so I'll present them briefly. See the linked papers for more details and proofs of correctness. The exponential histogram data structure is a histogram in which buckets recording older data are exponentially wider than the buckets recording more recent data. As a first approximation, the first bucket records when the first 1 occurred, the second bucket when the first of the next two 1s occurred, the third bucket when the first of the next four 1s occurred, etc. Consider the following data structure and illustration: struct bucket { /* The time the most recent 1 that this bucket records was seen. */ time_t start; /* The number of 1s this bucket records. */ int ones; }; Buckets & Time: |1|3 |23 |27 | 1 1 1 1 1 1 One: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 Note that the width of the bucket is in terms of 1s, not time. In this example, the second bucket records (23 - 2) = 20 seconds of history and 2 ones, and the third bucket records 27 - 23 = 4 seconds of history yet only 4 references. When we execute the query (the number of 1s seen in the last n units of time), we simply iterate over the buckets starting with the bucket containing the most recently recorded 1, and find the bucket that covers the time we are interested in. All of the previous buckets certainly recorded 1s that occurred during the time period in question. For the last bucket, we need to interpolate since we only know when the first 1 happened. Call the bucket we need to interpolate on bucket i. The minimum number of 1s that could have been seen is thus sum (buckets[0 <= j < i]) + 1: all the preceding references plus at least the first reference from bucket i. The maximum number of 1s that could have been seen is sum (buckets[0 <= j <= i]). Choosing the mid-point thus ensures a maximum relative error of 50% as sum (buckets[0 <= j < i]) + 1 == sum (buckets[0 <= j <= i]). Assume that i is 3, then we can visualize what is happening as follows: 1 2 4 8 ----- 7 Improving the relative error is straightforward: for each bucket width, we maintain more buckets. In particular, we need C=ceil(1/(2*Error)) + 1 buckets of each width. (Note that this equation says that we need 2 buckets per level for 50% error even though one is only required.) The only remaining issue is how to update buckets. Consider the above figure: if we add a bucket, we now have two buckets with width one. We can't adjust the buckets' boundaries as don't know when, e.g., event 3 happened. To work around this, we allow either C or C+1 buckets per width. When there are C+2 buckets of the same size, we coalesce the two oldest in the level. This data structure requires O(log^2(Error * N) / Error) bits, where N is the amount of time to cover: there are log (Error * N) buckets and each bucket needs to record log (N) bits for the time stamp and the number of 1s, respective. A slight space optimization is possible: we only need one bit to record the number of 1s per bucket. Assuming the buckets are ordered by age, we only need to indicate if the bucket in question records the same number of 1s as the previous bucket or twice as many. The exponential histograms requires a O(log(N)) update due to the requirement to coalesce buckets. The amortized cost, however, is just O(1). The waves data structure improves on this while having the same memory requirement: recording a new 1 requires just O(1) time. The wave data structure removes the requirement to merge buckets. It records the time at which critical 1s occurred, namely, the 1/Error+1 most recent 1s which are a multiple of 2^i, 0 <= i <= log(2*Error*N). The figure below illustrates the idea. See the paper for more details. 2 2 5 7 2^0: | | | | 2^1: | | | | 2^2: | | | 22 | 26 2^3: | | 12 | 20 | 0 8 16 24 These data structures can be used to the number of file accesses: file accesses are just 1s! A possible issue (for my purposes) with these two data structures is that the bounded relative error only applies to the query: how many 1s occurred in the last n units of time? If we want to know approximately when the ith 1 occurred, the error is potentially unbounded. Consider again the exponential histogram example presented above. The third 1 may have occurred any time between time 4 and time 22, inclusive. We could "flip the axis" (buckets are separated according to time and not 1s and they record references not time) to answer this query, however, we then have a similar problem when trying to determine the number of references that occurred in the last n units of time. This may not be a real issue as determining when a particular reference occurred (as opposed to the number of references) is primarily of interest for the most recent references, which the data structure does maintain. 2010-09-04T10:44:06Z

more-than-50-percent

Thu, 09/02/2010 - 16:00
I've spent the last few weeks reading papers on data stream processing. (Data stream processing is a really facinating topic. The basic idea is that you have constantly arriving new data and you want to continuously execute queries, e.g., you want to know the average of the last N elements. The difficulty is that you don't have enough space to store the whole stream or even the last N elements, or you don't have the time required to scan the last N. As such, you need an algorithm that not only incrementally updates the state to include new elements as they arrive, but also somehow expires old elements as they eclipse the event horizon! See "STREAM: The Stanford Data Stream Management System" by Arasu et al. for a good overview of the problem.) This blog post is not about stream processing, however. This blog post is about an algorithm, which I happened to come across it while researching that data stream processing. The algorithm is able to find the element that appears more than 50% of the time in an array, if there is one. That's not so amazing by itself. The amazing part is that it has an O(N) run time and an O(1) memory requirement. Further, the algorithm is amazingly short, yet not at all obvious to me. The algorithm is described in a paper entitled "Finding Repeated Elements" by Misra and Gries. The algorithm consists of two steps. First, it identifies a candidate value. The candidate value either occurs >50% of the time or there is no element in the array that occurs >50% of the time. The second step verifies whether the candidate element does indeed occur >50% of the time. Here's the algorithm, implemented in C: #define undefined (-1) int find_element_that_occurs_more_than_half_the_time (int array[], int n) { /* Step 1: Find the element that occurs >50% of the time, if there is one. */ int c = 0; int candidate = undefined; for (int i = 0; i < n; i ++) { if (array[i] == candidate) /* Match. */ c += 2; else if (i == c) /* New candidate. */ { c += 2; candidate = array[i]; } } /* Step 2: Verify that the candidate actually occurs >50% of the time. */ for (i = 0, c = 0; i < n; i ++) if (array[i] == candidate) c ++; if (c * 2 > n) return candidate; else return undefined; } Does it work? Valdidating a candidate is easy. My problem was understanding that selecting the candidate really works. I found it best to work through some examples. Let's consider one example, the array: [ 0, 1, 2, 2, 0, 0, 0 ], and what happens at each iteration of the the first loop: Initial state: i = 0, c = 0, candidate = undefined array[i = 0] = 0 => New candidate => c = 2, candidate = 0 array[i = 1] = 1 => Do nothing => c = 2, candidate = 0 array[i = 2] = 2 => New candidate => c = 4, candidate = 2 array[i = 3] = 2 => Match => c = 6, candidate = 2 array[i = 4] = 0 => Do nothing => c = 6, candidate = 2 array[i = 5] = 0 => Do nothing => c = 6, candidate = 2 array[i = 6] = 0 => New candidate => c = 8, candidate = 0 Yup, it works. The main insight required to understand the algorithm is that when c == i, there is no element that has appeared more than 50% of the time in the first i elements so the only possibility is array[i]. 2010-09-02T08:23:39Z

Nomination period open for Nordic Free Software Award

Thu, 09/02/2010 - 00:00
Until October 22 you can nominate a person, a project or an organisation for the Nordic Free Software Award. The Nordic Free Software Award given to people, projects or organisations in the Nordic countries that have made a prominent contribution to the advancement of Free Software. The award will be announced during [...]

farewell, Ian

Wed, 09/01/2010 - 08:00
I am grieved to say that my friend and colleague Ian Clatworthy died last night, after a long and horrible struggle with cancer. He and his wife Geri celebrated their 19th wedding anniversary yesterday before he passed away peacefully in his sleep, at home, with his family. I’ve known Ian for eleven years and he has worked at Canonical [...]

GNU M4 1.4.15 released [stable]

Wed, 09/01/2010 - 00:00
See the release announcement here: http://lists.gnu.org/archive/html/m4-announce/2010-08/msg00000.html

bugzilla-bzr integration

Tue, 08/31/2010 - 16:00
Max Kanat-Alexander’s new bugzilla-vcs extension (alpha) supports bzr, svn, hg, git, and cvs. Currently it supports linking bugs and commits, and displaying information about about commits in Bugzilla on the show_bug page.

Issue 99

Mon, 08/30/2010 - 16:00
This week… 2512 commits, in 181 projects, by 248 happy hackers (and 596 were translation commits). The synctex plugin, to synchronize TeX files and PDF output, has been merged into gedit-plugins. Both Lapo Calamandrei and Jakub Steiner worked on the metacity/mutter theme for GNOME 3. Nate Stedman added alpha support to backgrounds in Ease. A “save as PDF” plugin [...]

PetitParser for GNU Smalltalk

Mon, 08/30/2010 - 08:00
On and off the last couple of weeks I have ported PetitParser to GNU Smalltalk. I am still a Smalltalk newbie and it was a nice learning experience and it helped me to improve the way I code on GNU Smalltalk and to learn more about GNU Smalltalk and ANSI Smalltalk. To load the PetitParser.star one can type: gst-package http://smalltalk.gnu.org/project/petitparser/package.xml To load the PetitParser.star into the image do: PackageLoader fileInPackage: 'PetitParser' There are some differences between the real PetitParser and this port. GNU Smalltalk does not support binary selectors that have more than two charachters. This means that ==> and >=> had to be mapped to something else. I have picked => and >< for now.read more 2010-08-30T04:37:35Z Holger Hans Peter Freyther

FSCONS Embedded taks form

Mon, 08/30/2010 - 00:00
Over the last days we have started to add the accepted talks into the embedded schdule Getting familiar with embedded Linux using inexpensive hardware Embedding Linux for an Automotive Environment Workshop on file system formats Qt for smartphones And we have more coming in… just you wait and see //

Better mirror redirection

Fri, 08/27/2010 - 00:00
Brian Gough just improved mirror redirects in the download area: now you're redirected to the nearest mirror only when downloading a file, you stay on the reference server when browsing directories :) There is also a page footer warning people about the up-to-24h mirror replication delay.

lambdas-in-c

Thu, 08/26/2010 - 00:00
Today, I learned that gcc C has supported lambda functions since at least version 3.0.4. I wish I had known sooner. At the recent GNU Hackers Meeting in the Hague, Paolo Carlini gave a talk about C++0x and gcc. One of the new features he mentioned is C++'s support for lambda functions. If g++ has them, I thought, it shouldn't be too hard to introduce them into gcc. I asked Paulo and he said that he had heard of a project that was working on exactly this. Cool. Today, I went searching for the people working on lambda functions in gcc. I didn't find them. Instead, I found something even cooler: GNU C's compound statement and a macro are enough to get 95% of lambdas. Consider: #include #include /* Create a lambda function. Note: unlike lambdas in functional languages, this lambda does not capture the containing environment. Thus, if you access the enclosing environment, you must ensure that the lifetime of this lambda is bound by the lifetime of the enclosing environment (i.e., until the enclosing function returns). This means that if you access local variables, bad things will happen. If you don't access local variables, you're fine. */ #define lambda(l_ret_type, l_arguments, l_body) \ ({ \ l_ret_type l_anonymous_functions_name l_arguments \ l_body \ &l_anonymous_functions_name; \ }) int main (int argc, char *argv[]) { int array[] = { 4, 3, 1, 2, 5 }; void dump (void) { int i; for (i = 0; i < sizeof (array) / sizeof (array[0]); i ++) printf ("%d ", array[i]); printf ("\n"); } printf ("Initial: "); dump (); /* Ensure that the lambda is a nested function and thus requires a trampoline. */ int comparison = 0; qsort (array, sizeof (array) / sizeof (array[0]), sizeof (array[0]), lambda (int, (const void *a, const void *b), { dump (); printf ("Comparison %d: %d and %d\n", ++ comparison, *(const int *) a, *(const int *) b); return *(const int *) a - *(const int *) b; })); printf ("Sorted: "); dump (); return 0; } Output: $ gcc -Wall a.c -o a && ./a Initial: 4 3 1 2 5 4 3 1 2 5 Comparison 1: 4 and 3 3 4 1 2 5 Comparison 2: 2 and 5 3 4 1 2 5 Comparison 3: 1 and 2 3 4 1 2 5 Comparison 4: 3 and 1 3 4 1 2 5 Comparison 5: 3 and 2 3 4 1 2 5 Comparison 6: 3 and 5 3 4 1 2 5 Comparison 7: 4 and 5 Sorted: 1 2 3 4 5 That's awesome. And, it only uses features that have been supported by gcc for over a decade. FTW. The question that remained was: is this behavior reliable or is it based on an implementation quirk? I emailed the gcc-help list. Ian Lance Taylor replied that the behavior is undefined, but so far stable. His suggestion was to open a bug and request that this desirable (!!!) behavior be explicitly permitted. 2010-08-25T15:24:45Z

dynamic plug-in interface standards

Wed, 08/25/2010 - 08:00
Encouraging development of free plug-ins, and discouraging development of proprietary plug-ins. http://www.gnu.org/prep/standards/html_node/Dynamic-Plug_002dIn-Interfaces.html

YouTube Cookies and Gnash

Tue, 08/24/2010 - 16:00
Since bug reports are pouring in, I figured I'd answer this in just one place. YouTube uses cookies to redirect the browser to a regional video server closer to the user. This got broken in Gnash around the time of the 0.8.7 release, as Gnash used XPCOM to handle cookies. I fixed this back in March or so by using newer features of NPAPI, which requires a newer version of xulrunner. The 0.8.8 release of Gnash handles this correctly, but the user needs to delete all their YouTube cookies first and restart with a fresh one. That's all! For older browsers, you also have to block the YouTube cookies after deleting them to get this to work. read more 2010-08-24T14:08:17Z rob

Liquid War 6 0.0.9beta released

Tue, 08/24/2010 - 08:00
Hi, Quoting the NEWS file: -----8<------------------------------------------------------------- August 2010: release of 0.0.9beta. This releases fixes bugs 28030 (killing spree with small teams), 30409 (OS/X packaging), and 30354 (libpng detection). Frags are now by default calculated using a new method which gives bonuses to numerous teams when another team is dead. Network node auto-detection and listing works, however it's not activated by default. -----8<------------------------------------------------------------- From a player's point of view, this is mostly a bug-fix release. Under the hood, lots of new stuff, network nodes are able to discover each other (without the need of a centralized server, knowing one node, any of them, is enough). Download it on: http://download.savannah.gnu.org/releases/liquidwar6/0.0.9beta and http://www.ufoot.org/download/liquidwar/v6/0.0.9beta/ Enjoy, Christian.

My Perfect Data Backup Solution

Tue, 08/24/2010 - 00:00
This might already exist, and I might just be uninformed. When I look for a backup tool that will do what I need, usually I get a scoff in the form of “rsync, dummy”. My computing equipment typically consists of a laptop that I use everywhere, and sometimes even at home, a headless computer at [...]