ContentsPresented at Documation East 1997 by
John Tigue
Senior Software Architect
DataChannel
http://xml.datachannel.com
Moderated by
Robin A. Tomlin
Executive Director
SGML Open
Describe how XML enables Web-native distributed computingContextOutline XML based mechanisms for Web-native distributed computing
- assume an environment with only HTTP and XML
- illustrate how distributed computing is possible in this context
- discuss other bootstrap facilities beyond a transport and a syntax
The subject is Web distributed computing not publishingDefinitionsThe Web is distinct from the Internet
- publishing is essentially one way communication, distributed computing is two way
- distributed computing implies XML for consumption by software not by humans
- the topic is software which has access only to HTTP on port 80
- the IP Internet in general is not considered e.g. FTP on port 21 is not in scope
- firewalls impose this lowest common denominator constraint
Part 2. Status of the Web in Terms of Distributed Computing
- WebComputing
- for lack of a better name, this is the label for the concepts described in this document
- DC
- an abbreviation of "distributed computing"
- URL
- only HTTP URLs in this discussion
nothing like "ftp://ftp.x.com"- URL query term
- that which comes after the "?" in an URL
e.g. "http://www.acme.com/dummy?blue=yes"
"blue=yes" is the query term- URL segment
- e.g. "http://www.acme.com/software/java/index.html"
"software" and "java" are segments- httpd
- used as shorthand for any generic Web server
Essentially just HTML and HTTPCurrent generation DCOM and CORBA are not native to the WebNo search standards
- forms and URL query terms are the status quo for client to server communications
- CGI is a nondescript and unstructured interface
- firewalls are limiting other current DC options
Nothing like an ORB on the Web
- search engines are not usable as a computing service
- e.g. the old MetaCrawler was heading in the right direction
No native architecture
- hard-wired addresses and names are all there is; cannot request objects by type
- CORBAServices have no analog on the Web
- an interfacing framework, standard services, and other APIs are lacking
They date from before the age of global computingNeed Web native distributed computingIIOP is an Internet thing not a Web thing
- isolated networks were relatively closed and fixed
- the Internet and Web are open and fluid without centralized global control (except addressing)
- the Web is a different environment for which they were not designed
- both of these technologies will continue to evolve but they are not where they need to be yet
HTTP tunneling is a hack
- not part of the HTTP only environment
- e.g. no global services lookup available to Web-only software
- simply using HTTP as another transport not focused on the Web
- even though CORBA and DCOM can sometimes go through firewalls (e.g. Wonderwall) this way, a Web client cannot globally look up objects with this i.e. not native
Don't currently have loosely structured architecture to reflect the nature of the WebXML enabled distributed computing is nascentDon't have document centric services
- need ability to reason about objects and services on the Web
- e.g. "where in the world can CD 7-64571-2 be purchased cheaply"
- e.g. "where on the local intranet is the office supplies database"
- the Web is simply a transport for inter-linked documents
- query-able services must be accessible as pages which can be located by type
Few applications of XML related to distributed computingPart 3. WebComputing BasicsHot spots CDF & RDF are not distributed computing
- WIDL by webMethods is the best thing yet
- their "services" are analogous to RPC stubs
- currently shipping
- MS’s XML-Data was another start
- not shipping
- CDF: no two way communication except logging
- although RDF will probably be useful to DC
Text is the Web's intrinsic data typeXML based protocols: HTTP POST exchange of XML documentsWell-formedness allows Web structures of arbitrary complexity
- textual nature of XML continues the evolution of the Web along its historical lines of SGML and HTML
- e.g. <img height=123... is really <img height="123"... i.e. the 123 is just a string, which gets interpreted as a number
- verbosity not an issue; with a priori knowledge of the document structure compression can be even better
XML gives type to Web objects
- a document's element tree can be deep
- great inter-operability and versioning
- parsing possible without total knowledge of the structure
DTDs are the header files for the Web’s APIs
- "<!DOCTYPE x…" means different types of object on the Web
- currently little typing; just HTML and some SGML documents
- when SGML is dumbed down to HTML, typing is lost
XML processors are the Web’s data marshallers
- XML based WebComputing is like CORBA in that the DTDs only describe interfaces not implementations
- can search for the string "<!DOCTYPE something…" to find objects by type
- this means software components can find each other by interface name
- linking is inherent to the technology so they're like Web browsers but for software not humans
- e.g. DataChannel's PaxSyntactica is a JavaScript-accessible, HTTP-loading XML processor
- high performance built into the spec
HTTP POST method is the transport for XML based protocolsHow other DCs will reflect XMLDTDs are contracts which formalize the interfaces
- POST embeds an arbitrary entity in the HTTP request
- currently used for passing HTML form data to server
- can use this to marshal data, in XML documents, to the server and back
- one document passed from client to server and another is returned
- the protocols will be expressed as DTDs
- validation can be used to decide if responsible/capable of servicing a particular request
- enables loose coupling of client and server
- the nature of XML implies that older servers may be able to understand only part of a request ( b/ using earlier version of DTD) but still be able to honor request
DCOM and CORBA will interface to the Web via XMLPart 4. HTTP IssuesMicrosoft is in position for WebComputing
- WebComputing does not replace them; it is an evolution
- they and their vendors will still do things essentially the way they do now
- an XML "wire protocol" is simply native to the Web
CORBA will continue its evolution
- they are the most active XML developer e.g. XML-Data
- COM+ continues the evolution OLE
- "MS Windows Distributed interNet Applications (Windows DNA)…enables developers to integrate Web-based and client/server applications in a single, unified architecture."
- Windows and the Web: Ushering in the Era of "Web Computing"
- CORBA will probably soon add XML-based Web-IOP; this will basically be a standardized HTTP tunneling vocabulary (IIOP, the internet enabling of CORBA, was not added until 2.0)
- with some basic tags, DTDs could be generated automatically from IDL
- IDL will still be used but the marshaling will be into two separate XML docs
- will need more than just that e.g. the interface repository must be exposed as XML documents; then search engines will be able to catalog the CORBA features of a site and globally identifiable it as participating in certain distributed computing activities
Having a DC callback mechanism implies need for an httpd on clientMigrating to XML technologyHTTP client and server must be integrated
- Web natively, this involves URLs and HTTP
- the caller-back does an HTTP POST to the original client which gave it a callback URL
- being able to hear the incoming callback requires the client to have an httpd
- the received callback may very well be handed off to an ORB or DCOM but it travels over the Web in an HTTP POST
Notes
- consider: a Java applet has initialized in a client browser and started up a conversation with a server
- the applet now wishes to set up a callback with the server
- it locally registers a callback-able interface with its, say, AppletContext
- in so doing, the AppletContext's associated httpd creates a callback URL and gives the string to the applet
- the applet passes the callback URL string to the server
- later the callback happens; an XML document is POSTed from the server to the callback URL on the client
- the AppletContext and associated httpd know the document is for the applet
- the AppletContext calls back to the interface the applet registered and passes it the POSTed document
- so the applet needs to be able to communicate (not via HTTP) with its local httpd i.e. they are integrated
- could make it an applet subclass which implements Servlet
- a similar argument could be made with an ActiveX component
- having an HTTP server on the client enables push
- it can be written in as little as 1500 lines of code e.g. http://www.acme.com/java/software/Acme.Serve.Serve.html
- In the Jigsaw community, the local http server is used as a personal proxy server
Constrained subsets of HTML can be well-formed XMLPart 5. Firewall IssuesWeb servers can simultaneously service XML and HTML clients
- the DTD can have the same elements as HTML but with extra attributes and tighter content models
- this way older browsers can show info to end user
- firewalls would be happier
- the resulting documents will be well-formed and parsable by the XML processors
- the Accept field in the HTTP request header enables transitioning
- this field is how to say which media format is preferred
- Accept E.g.:
- "Accept: text/WebContainers, text/html; q=0.6, text/plain; q=0.2"
- Means "would like WebContainers but HTML would work and if you must, give back plain text"
- older client simply will not be requesting to use the new protocols
WebComputing will require the cooperation of the firewalls Current firewalls:XML
- allow packets from point A to point B
- will never allow arbitrary, opaque packets of octets through
- are sometimes set up by IS departments to only allow HTTP on port 80
- don't always allow HTTP requests into the network
XML enables firewalls to be controllably openedPart 6. WebComputing MechanismsConsiderations
- the textual, typed, and tagged nature of XML makes document content transparent to firewall
- e.g. email and attachments can be filtered by firewall
- firewall could now selectively say "document type X allowed from A to B"
- Web protocols based on XML's unified syntax means firewalls will be able to adapt quickly to new protocols
- processing an XML document is extra work for the firewall
- first example will probably be a standardization of a CORBA HTTP tunnelling vocabulary
Web client code needs to orient itself within the local networkWebContainers: what's in an URLClient code can use localhost to bootstrap to the rest of the local network
- without knowing the local network addresses, the first place to start is the local machine
- the loopback interface, the class A network ID 127, is specially reserved for host-local communication; IP address 127.0.0.1 a.k.a. localhost
- Java, JavaScript, and ActiveX can all get to localhost
- client can loop back to the httpd on localhost
- some standardized subtree will be a rigidly defined graph of XML files; think NT Registry
- e.g. <IntranetsSearchEngine>http://indexor.thiscorp.com/</IntranetsSearchEngine>
- IS can vendor-neutrally publish a map of the intranet for purposes of client navigation
- note: this does not require the integrated HTTP client and server discussed in the section on callbacks
How to map the contents of an URLWIP: Web search as distributed computing mechanismThe first step is simply a standardized return of a directory listing
- generic namespace signposts and DC meta-data; "What’s in that box and what can it do"
- a webContainer document describes what’s below or contained within an URL
- the URL segments are the containers
- e.g. "http://xml.datachannel.com/XMLTreeViewer/deploy/index.html"
- deploy and XMLTreeViewer are containers; XMLTreeViewer contains deploy
- kind of like the old WebMaps effort but only for a subtree and for the purposes of distributed computing
The real goal is being able to query about an object's interfaces
- current Web servers are close; they generate an HTML doc on the fly which lists the contents of a directory
- users can browse directories this way but need a standard DTD if software is to use the information
- this way Web server can act like a traditional file server e.g. Windows95 explorer could show Web server as just another drive
- for legacy clients the DTD could be a constrained HTML
- hosts need to publish meta-data on object so others can to know what can be interacted with on this machine
- servers need announce which interfaces/DTDs an object can handle
- the object at a specific URL may handle multiple document types; in software terms, this is how a single object can implement multiple interfaces
Not discussing how XML will make searching betterPart 7. Related NotesA search engine interaction protocol defined by DTDs
- true, XML will improve content indexing
- WIP is about how the search query and results are passed between the search engine server and its clients
WIP on the Internet
- DC architectures require object lookup by type
- search engines are the Web’s native lookup mechanism
- could then do global object lookup
- for lack of a better name: Web Indexor Protocol (WIP).
- note: this is just an introduction; a W3C WG will need to work it out in full to standardize DTDs
WIP in the intranets
- just as hardware standards define markets, DTDs will define online marketplaces
- being in a market on the Web will require having the correct typed XML pages on your site
- products and their specs will be globally located by searching for DTDs and instance fragments
- find me documents which contain "<!DOCTYPE ...." where sub element X contains the string....
- pages will be retrieved and compared by the software agent
- e.g. parsed and put in a a grid
WIP is a good example of XML-based protocols and DTDs as API
- corporate intranets could have local/internal Web services indexed
- these would be the interface lookup mechanism
- clients' local httpd will have pointers to private intranet search engine
- this way in an extranet situation, code can come into the local intranet and find its way to the correct information as pointed to by IS
- clients build a request document (complying to a WIP DTD) which quantifies what it is searching for
- the request document is POSTed
- the search indexor server reads the request though an XML processor
- the web index on the server is searched for matches
- the results are listed in a response document (which complies to the same WIP DTD)
- can have standard way to submit pages to multiple search engines
- every protocol should have a capabilities DTD, how to ask the server which features it implements
Constrained subsets of the HTML DTD could be used for transitioningJavaVersioning can be designed into DTDs
- same elements but more structured content model
- add attributes which are not intended to modify display
- use only well-formed tags or cheat e.g. "<br />" seems to work
- this can also be used to allow html subparts for human viewing e.g. advertisement banners
One DTD can be used for both a request and a response of a protocol
- reserve subtrees for future versions
- <!ELEMENT PartXVer1 (SubA, SubB, SubC, PartXChildrenBeyondVer1) >
- <!ELEMENT PartXChildrenBeyondVer1 ANY>
- same DTD could describe two trees of elements
- the response document could contain a copy of the request document
- better for keeping DTDs synchronous
NetworkingPart 8. ConclusionXML
- arbitrary IP networking possible, but applets are most successful if they talk to the server via HTTP POSTs
- Consider using W3C’s HTTP classes for development; more current with full source code
Java Security Model
- multiple XML processors available; come as small as 40K
- Java servlets are the best reference implementation; they run on Netscape servers, Apache, MS IIS (1.0 buggy), Sun’s Java Web Server, Jigsaw, acme.com’s, et alia
- 3.x browsers only allow Java applets to access the server they come from; can't use ServerSocket
- 4.x have more flexible security
- 5.x could have loopback only, local domain wide, and looser centrally configured by IS department
Copyright © 1997 DataChannel, All Rights Reserved.