January 2000

   XML and Object Persistence


Making Use of XML

Go beyond the hype and learn to use XML to restore COM object state in Web applications.

by Nachi Sendowski

  In some ways, the Web has set software development back light-years. Many luxuries developers take for granted when they create client/server applications aren't available when they create server-based Web applications; this makes developing Web applications a challenge.

What you need:
Visual Basic 5.0 or 6.0 (Professional or Enterprise Edition)
A data source
Internet Information Server (IIS) 4.0
Internet Explorer 5.0 (IE5) for XML 2.0
When developing server-side applications, you quickly discover the Web's foremost limitation—lack of state. This lack of state results from the absence of a persistent connection between the client and the server. From one user request to the next, you don't know if a particular user will return to issue more requests or to resolve the results of his or her last request, because your application doesn't keep an open connection with the requesting client. Web technology is based on the Hypertext Transfer Protocol (HTTP), which makes communication between Web browsers and Web servers possible. Although connectionless protocols such as HTTP have advantages, they present a problem when maintaining information (or "state") on users visiting a Web site.

Who's Doing What?
HTTP uses a simple request-response cycle: The Web browser sends a request to the Web server, and the Web server sends a response to the browser. Once the response leaves the server and reaches the client browser, the server forgets about the browser and the request. From a Web server's perspective, each HTTP request is separate and distinct, unrelated to previous requests. Further communication between the browser and the server starts from scratch. For this reason, it's difficult to know from one request to the next who the client (user) is and what the client is doing.

The Web was designed for viewing content—not for supporting task-oriented applications. It assumes connections are unreliable and should be made, used, and dropped as quickly as possible. As such, a Web application has to somehow maintain information about connection-related activities. It's not easy to hold onto information for each request; whether it's the data the user is working with or the name of the task the user is performing. It's difficult to keep the information cached on the server and match it with the correct user.

Identifying the user each time a request arrives is challenging because the connection doesn't persist. Scalability is also difficult when trying to hold onto information for a large number of users. When many users access your Web application concurrently, you can keep only a limited number of values or objects waiting around between client requests. If you use multiple servers to scale and handle the load, this is especially troublesome. You quickly learn to minimize, or even eliminate, maintaining session state for users. You'll build stateless Web applications that distribute the load across multiple servers when scalability is a concern. Each transaction can be considered separate and distinct, so a particular transaction is free to execute on any available server.

Stateless applications require stateless components. These components (or COM objects) are called from Active Server Pages (ASP) to access data, execute business logic, and, possibly, return results. Once again, for these objects to scale, instantiate them only when needed; use them quickly—usually for one method call—then dismiss them after each use. By not keeping object state between calls and not keeping any objects around between requests, the server uses its resources efficiently and handles more client requests. Traditional optimization techniques don't work on the Web. You must discard these objects as soon as possible to make room for others. It's futile to optimize access for one user. Optimize concurrent access for many users, remembering that "many" on the Web is a large number.

Writing task-oriented applications without holding onto objects and information between tasks isn't easy. However, these limitations can spur you on to invent new solutions. Some vendors offer session management solutions, and you'll find homegrown implementations as well. Some are better suited for functionality; others, for scalability. You'll have trouble finding one solution to handle both. In this article, I'll show you how to build one such solution; it handles state in a component used to perform Web-application database searches—DBSearch.

Object Contains Data and Metadata
The DBSearch object contains metadata about the database table it's accessing and searching. Simply put, metadata is data that describes other data. In this case, the metadata describes the search's structure and characteristics. The metadata includes information such as the location of the database, the names of tables and columns, and the datatypes of search fields. The object also contains the actual data values, which are retrieved from the database or entered by the user.

The data and metadata values together are known as the object's state. In client/server applications, you hold onto such an object, preserving its state from one call to the next. If you create an object to show an entry form, the object hangs around as long as necessary, to process the form values when they are submitted back to be saved. After all, why dismiss a perfectly good object that already knows everything about the form? However, in a server-based, stateless application, where you don't want to hold onto such an object for too long, you must quickly build the necessary object state each time the object is created and store any state when the object is destroyed.

The DBSearch object provides the user with a form for specifying search criteria. It collects entered criteria, performs the search, then displays the search results. In a typical life cycle for this sample object, the client initially requests a search form, and the object is instantiated and called to provide the search form. In this first step, the object doesn't require data from the user or database—only metadata, such as the columns to search on, datatypes, and labels. You can either hard-code this information in the object or, better yet, restore the information from a repository of sorts to avoid "tight coupling" between the component code and the application it's used for.

This is where XML comes in. Once you restore the metadata, the object can build and return HTML to render a form with the proper fields and labels. After the HTML page is built, the object has filled the request and can be destroyed immediately:

<%
' Sample object cycle, show search form
If Request("Action") = "SearchForm" Then
   Set Search = Server.CreateObject( _
      "VBPJ_WebComponents.DBSearch")

   Response.Write _
      Search.SearchFormHTML( _
      "TableName", _
         "SearchName","metadata.xml")
   Set Search = Nothing
End If
%>

When the form is submitted with search-criteria values, you need to create the object again to handle this new request. This time, however, the request is for performing the search. At this stage, the object doesn't know its previous state and needs to be completely restored. First, restore the metadata. You can either hard-code the metadata or retrieve it from a repository. If it's hard-coded, then restoring the metadata is as simple as instantiating the object. If the metadata is stored in a repository, you need to read and interpret that storage.

With the metadata restored, you can easily read the submitted form request; you now have the same knowledge you had when you generated the search form. The request to perform the search can include search values entered by the user and submitted with the form. You can read and validate the entries for proper datatypes and data consistency. If you find errors in the submitted values, you must return the form to the user with explanatory error text. As before, you know how to generate the form, although this time you add the error text and persist the user-entered values so you don't lose them. You make the entered data values persistent by adding them to the generated HTML page.

Now, the object has once again outlived its usefulness and can be destroyed. Its state is persisted already, in the stored or hard-coded metadata and in values in the HTML page. However, if there are no errors, you can build and issue the query against the data source, and return HTML to display the results. You can render the results as an HTML table and possibly include the search form again, above or below the result table, to enable new searches from the same page:

<%
' Sample ASP object cycle, Request a search 
If Request("Action") = "Search" Then
   Set Search = Server.CreateObject( _
      "VBPJ_WebComponents.DBSearch")

   If Search.ReadSearchRequest("metadata.xml") Then
      ' No data entry errors, show results
      Response.Write Search.ExecuteHTML()
   Else
      ' Data entry error! Re-Show 
      ' search form
      Response.Write _
         Search.SearchFormHTML()
   End If
   Set Search = Nothing
End If
%>

 
Figure 1 DBSearch Cycles Through Life. Click here.

The idea is to instantiate an object, quickly restore all needed state, perform the work, compose output (possibly HTML), persist any dynamic state to the HTML output, let go of known static information, and let the object be recycled (see Figure 1). At any point in any given cycle, the object might read or write data to and from the database to carry its work. It might also read or write data to or from the composed HTML to restore or persist state information. However, in all cases, the object must first read the metadata to restore initial object state and identity information (see Figure 2).

 
Figure 2 Data Flows Through DBSearch. Click here.

XML to the Rescue
The metadata repository is the core of this functionality and a key part of the scheme. To function, every instantiation of this generic stateless object requires initial information. This information consists of structured, named values, such as tables, with one or more search definitions. The search definitions have attributes, such as name, title, and possible options, and include one or more search-criteria fields, with information such as datatype, label, database column name, and instructions (see Listing 1). These values are collected at development time and used at run time. During development, authoring and maintaining this metadata needs to be easy; at run time, the metadata should load quickly and be easy to access in memory. This is a perfect use for Extensible Markup Language (XML).

XML provides a flexible and easy-to-use object model for accessing structured data programmatically. An application can easily traverse an XML document using the XML parser and object model, and have sequential or random access to any well-formed, structured information. The data kept in XML is text-based, readable, easy to maintain, potentially self-validating, and accessible with a variety of tools. It's basically a fast and flexible runtime object model with a simple authoring tool. Again, XML is the perfect tool for handling application metadata.

Microsoft provides a standard XML parser with IE 4.0 and 5.0. This parser can read any well-formed XML file and send back a Document Object Model (DOM) to use at run time. You can retrieve data quickly from the XML file, and the object model is simple and easy to work with. Many XML editors also let you load, edit, and save your XML files. With these editors, you don't need to create your own application for authoring or maintaining XML data. You can download a free XML editor from Microsoft's XML Notepad Web site (see Links).

While developing an application that uses the generic DBSearch object, you can add tables, search definitions, search-criteria fields, and field attributes to the metadata, for the object to use in its various incarnations. You can use any text editor or a specialized XML editor to add these values as XML elements to the XML file. The object uses these table and search definitions to restore identity and structure when instantiated. In one instantia-tion, you might use DBSearch to search for employees by name and department, while in another instantiation, you might use the same DBSearch to search for invoice items by invoice number and product name. In the XML structure, you can find everything you need to know about the table, search, and field attributes to show a search form, build a SQL statement, or display the results.

At run time, the DBSearch object uses objects the XML parser provides to load the XML file and convert it into an XML DOM object. This object model is structured as a tree and contains nodes holding all the XML file's information. You can easily locate specific nodes, node collections, or attributes of these nodes using the object model's simple methods and properties (see Listing 2).

It's easy to use the new version of the Microsoft XML parser and DOM (version 2.0, which shipped with IE5) to find and read XML elements. First, create the XML document-object MSXML.DOMDocument, then load the file:


Set m_XMLDoc = New MSXML.DOMDocument
If Not m_XMLDoc.Load("metadata.xml") _ 
   Then
   With m_XMLDoc.parseError
      If .errorCode <> 0 Then
         Err.Raise 1, , "Parsing Error.." 
      Else
         Err.Raise 1, , "Error Occurred.."
   End With
End If

Next, find the element you want to work with. You can start from the root node and process the entire tree, or you can search for a specific node and start there. Searching for a specific node in the XML 2.0 DOM is a breeze when using the new Extensible Stylesheet Language (XSL) querying capabilities (see the sidebar, "What's New in XML 2.0"). In the case of the DBSearch object, you need table information, search information, and information about fields related to the search:

' Find the TABLE Element by name 
' (case-insensitive), always get the 
' first one found
Set m_TableElement = _
   m_RootElement.selectSingleNode( _
   "TABLE[@NAME $ieq$ '" & TableName & _
   "'][0]")
' Find the SEARCH Element by 
' name (case-insensitive)
Set m_SearchElement = _
   m_TableElement.selectSingleNode( _
   "SEARCH[@NAME $ieq$ '" & SearchName & _
   "'][0]")
' Get the collection of all fields 
' under this Search (as 
' MSXML.IXMLDOMNodeList)
Set m_FieldNodes = _
   m_SearchElement.selectNodes( _
   "SEARCHFIELD")

Once you have references to all the elements, you can retrieve values for the attributes required to build and run the search, such as the database table name, column names, labels, and titles:

m_DBTableName = _
   m_TableElement.getAttribute( _
   "DBTABLENAME")
m_FetchSize = _
   m_SearchElement.getAttribute( _
   "FETCHSIZE")

You now have the information necessary to perform either DBSearch object function. The metadata allows DBSearch to author a search form in HTML, which enables the user to specify search criteria. DBSearch can also use the metadata with the user-submitted search criteria to build a SQL statement on the fly to retrieve the desired data.

You can write other stateless COM components for Web applications by keeping data in XML and restoring it for each object instantiation. These stateless objects can support task-oriented applications that would otherwise require keeping the object around between tasks. They can handle each request as though it were independent and from a different user. Because every transaction can be considered separate and distinct, a particular transaction is free to execute on any available server, allowing additional computing resources to be freely added as the demand for them increases.

The free online code implements the DBSearch object used in this article. The sample code also includes a simple Web application that uses the DBSearch object to search for records in the famous Pubs database. The VB source code for the DBSearch object is provided, as well as an ASP page and the XML metadata file for accessing a couple of tables in the Pubs database. You can add more Pubs information to the XML metadata with ease or, better yet, modify it to search in your own database. By keeping data in XML and restoring it for each object instantiation, you can create any task-oriented object required for building stateless Web apps.


Nachi Sendowski lives in the San Francisco Bay Area and is a principal partner in The Enticy Group, a consulting LLC. He's also the director of software engineering for Healinx Corp., an Internet startup. Nachi is responsible for the architecture, design, and development of software frameworks that provide the tools to efficiently build scalable, multitier Web apps. Reach Nachi at nachi@enticy.com or nachi@healinx.com.

 
Resources
"Making the Case for XML" by Nachi Sendowski, VBPJ Special Issue: New Technologies in the Enterprise Fall 1999

"Managing Session State in a Web Farm" by Dennis Angeline, Microsoft Consulting Services, March 11, 1998

"Design Strategies for Scalable Active Server Applications" by Steve Kirk, MSDN Content Development Group, August 1997

The Hypertext Transfer Protocol (HTTP) standards document

Extensible Markup Language (XML)

Links
The Enticy Group

Healinx Corp.

XML Editor download

VB Zone Links
Product Review of the Week
Rational Suite Enterprise

Site of the Week
Andrea VB Programmers eGroup

Book Review of the Week
Professional Visual Basic 6 Distributed Objects

Tip of the Day
MTSTransactionMode in Visual Basic 6.0

Download of the Week
Roboprint 5.2

Get the Code
  Registered users can download the code for the Magazine issue in which an article appears. Get the code for this issue here.
  Premier Club members can download the code from each article individually. Get the code for this article here.
  Join the Premier Club