2011 Annual Report

SDCI Net Improvement: Web10Gig - Taking TCP Instrumentation to the Next Level
Year 1 Annual Report
August 2011

Executive Summary

Significant progress has been made on the Web10G project during the first project year. A new website, Web10G.org has replaced the Web100.org website from the original project providing a new focus for information, software distribution and forum discussion. TheWeb10G kernel instrument set patches have been implemented, tested and distributed, with progress made on the Alpha release of the full kernel module pushed back to early in the project’s second year. The application binary interface has been evaluated, with a revised plan for changes from what was stated in the original proposal. Sample library tools were developed and released based on the Web10G kernel this year as well. Plans for Year 2 of the project include the Alpha release of the kernel module, including rigorous testing and patches as bugs are identified and fixed. The ABI and library tools will be refined and released based on the kernel developments. Outreach activities will pick up significantly, including presentation at appropriate community venues, such as the Internet2 Member Meeting and Joint Tech Workshops. In addition, we will hold the first User Group meeting, making changes to the Web10G suite based on input from the meeting.

Personnel

The Web10G project is a collaboration between PSC and NCSA with additional unfunded support from Matt Mathis (Google) and John Heffner (NetAps). Wendy Huntoon (PSC) and Janet Brown (PSC) have provided project management and general project oversight. Andy Adams (PSC) is responsible for kernel-side development, specifically, the kernel instrumentation set (KIS) and application binary interface (ABI). Chris Rapier (PSC) is providing support for the add software management, public access/presentations and kernel-side/user-land development and testing. John Estabrook (NSCA) is responsible for the user-land library and user tool development.

Matt Mathis and John Heffner, members of the original Web100 project who have since left PSC and joined other companies, provide additional support. As a Goggle employee, Mr. Mathis is allowed to contribute 20% of his time to an external project and has selected Web10G as that project. John Heffner continues to be involved in the project as well, in an unpaid advisor role. Specifically, he provides advice on and reviews various components of the Web10G software suite, including the kernel instruments and application binary code.

During the start up phase of the project, PSC’s documentation group provided support and advice on website design, development and implementation. They helped review the initial set of documents released on the website and provided a valuable sounding board for the overall site appearance and organization.

PSC applied for and received funding for Research Education for Undergraduates (REU) for Web10G. Since the funding came mid-summer, we decided to wait until the beginning of the fall semester to advertise for a student. The student will work with the Web10G staff on building, testing and documenting Web10G under multiple distributions of the Linux kernel.

Year 1 Activities

Start Up – Web10G Website

In addition to assembling the personnel required to support the program, as described above, a major start up activity for the Web10G project was migrating to a new website. The original Web100 website (www.web100.org) was still a significant resource within the community for original Web100 code as well as updates. The new Web10G web site was officially launched in April 2011 to correspond to a presentation on Web10G and the Spring Internet2 Members meeting. The new web site (see www.web10G.org) was streamlined, with specific sections for News, Developers, Software as well as a member login area. The Web100 web site has been archived, continuing to provide a repository for the original Web100 software, with an announcement and link pointing to Web10G for new updates and information. A new feature of the Web10G website is a developers forum, which was designed as a mechanism for Web10G users and developers to interact and as a replacement for the web100 discussion mailing list. To date, the forum has been unsuccessful. As discussed below under Year 2 plans, we will re-evaluate this website feature, most likely replacing it with an email list.

Development

Two of the primary development goals during the initial phase of the Web10G project were to (i) update Web100 kernel instrumentation set (KIS) within the most recent Linux kernels with RFC 4898's extended TCP statistics, and (ii) identify the new application binary interface (ABI) that Web10G would provide for access to the newly instrumented statistics within the kernel.

Updating the KIS has been straight-forward and was implemented early on in the initial phase of the project. Eventually the Web10G KIS will be released as a kernel module, which will remove the necessity of patching the kernel. However, until that time the Web10G code must be kept up to date with the current Linux kernel release. Given the rapid pace of development of the Linux kernel multiple releases must be done each year. Since the initial release of the kernel patch on Web10G.org the team has ported the patch through three iterations, for Linux version 2.6.38, 2.6.39 and, most recently, 3.0 The team expects that upwards of 5 ports will be required each year.

The second goal, the new ABI, proved to be far more difficult; research, based on information gathered from the Linux "netdev" mailing list, identified NetLink as the choice for the base kernel service in the new ABI design. Moreover, in keeping with the SNMP characteristics of the RFC 4898 MIB, the new NetLink ABI would emulate canonical SNMP polling, i.e., responding with the next TCP extended statistics entry in an internal list. Finally, an interim solution of restructuring the "/proc ABI" as a dynamically loaded kernel module (DLKM), which was proposed in Phase 1 of the proposal, was abandoned due to additional information garnered from the Linux netdev mailing list -- it was postulated that the legacy proc ABI was in the process of being deprecated within the Linux kernel.

In the second phase of the Web10G proposal, the focus was on developing the NetLink ABI designed during the first phase. This was realized near the end of the first year as an external DLKM. The DLKM was submitted to the Web10G developers for thorough review and testing. Likewise, although the second phase of the proposal sought to publish the Web10G "alpha release" at the end of the first year, the developers thought it prudent to continue testing and refining the new ABI and KIS, prior to publishing the alpha release. Thus, we expect the Web10G alpha release to be published shortly within the second year of the project. Additionally, we expect to present our ABI for formal review within the Linux development community during the second year, as well.

In the initial phase of the Web10g project, a library API and sample tools were produced for the Web10g kernel. This library API was designed to (1) provide access to the updated kernel instrument set, (2) be compatible with the legacy Web100 kernel patch, for those users transitioning from the previous collection of tools, and (3) allow verification of the newly implemented kernel instruments. It was released on the Web10g web site on May 20th of this year.

These tools are also an area of specific interest for Web10G. As we've seen one of the primary uses of Web100 in the wider world has been for the support of network analysis and metric tools. These include applications like NPAD, NDT, and Google's MLAB. As Web10G is developed the goal is to port these tools to the new platform as well with the introduction of a simplified library helping the transition. We have already done an initial investigation into porting NPAD and NDT. However, the actual work was placed on hold until the new library is finalized.

This past year, the Web10G team has focused on the validation of various metrics reported by the ESTATS kernel modifications. The end goal is the development of a rigorous test harness for Web10G, a goal for a later phase of the project. Currently, the team is validating the results by comparing Web10G metrics of given flow to the results returned by a TCPTrace analysis of that same flow. Where the returned metrics overlap there exists a significant level of agreement. Until the test harness is completed the team will continue to use this methodology to validate Web10G metrics. This test method identified several bugs in the Web10G code, which have either been resolved or will be resolved during Year 2 of the project.

Dissemination

Web10G was introduced to the community at the 2011 Spring Internet2 Members Meeting during a well-attended BoF session. This presentation outlined the evolutionary transition from Web100 to Web10G, discussed the need for RFC 4898 compliance, and the importance of industry acceptance. Much of this acceptance will depend on the successful development of a more robust ABI than the current /proc method. This presentation stimulated a fruitful discussion about the application of Web10G in the commercial world.

In addition, the Web10G team gave presentations to groups associated with 3ROX, the regional aggregation network associated with PSC. In February 2011, Web10G was included in a presentation given at the Penn State Cyber Infrastructure Day. While Web10G was not the focus of the presentation, it was included in the performance tools available to campus IT professionals and researchers to better understand network performance. In addition, a presentation on Web10G was given during the 2011 Spring 3ROX GigaPop meeting. In this presentation we again outlined the evolution and need for Web10G. Specific focus was made on the role of Web10G in the development of more accurate network performance analysis tools as well as encouraging the use of the Web10G suite by campus IT professionals.

Planned Year 2 Activities

In the proposal, Year 2 of the project was divided into two phases – Project Maturation (13-18 months) and Project Refinement (19 – 24 months). Based on the project progress to date, we have continued to divide the year into two phases, but have revised each phase to better fit the development path and needs of the project. Below are the revised plans for the project.

Phase 3, Project Maturation (13-18 months)

The Alpha release of Web10G will be published soon into Phase 3 of the project. While significant testing has already been done on the code prior to release, we still plan to actively collect bug reports and user input. In addition, the library API will be revised for the new kernel ABI. Existing tools and code samples will be updated accordingly. Once the student intern has been hired, we will begin work on testing the Web10G code under different distributions of the Linux kernel. Based on the results, kernel patches will be generated for the appropriate distributions and then supported by the project.

We will host our first Web10Gig User Group meeting and will review and start implementing suggestions that result from it. In addition, we will update the Web100 robustness suites, and design and start implementing instrument correctness suites. We will bring the kernel quality up to production grade. On or before month 18 we expect to do our first submission to kernel main line and will actively solicit input about it.

Phase 4, Project Refinement (19 - 24 months)

During this phase our number one priority will be actively grooming the code for main line inclusion on the basis of feedback from the Linux kernel community. We will also help existing Web100 users migrate to new software and will phase out existing Web100 tools and support. We will develop educational software based on the new TCP instruments and move all remaining tools and documentation to the public source repository.