Outline
The Problem
Many portable data collection devices, such as hand-held GPS
receivers, personal organizers, etc., can exchange data with a desktop
computer. They usually do this over a serial connection with
proprietary protocols. Vendors of portable data collection hardware
usually provide PC-based software to manage the data transfer from the
device to the PC, and this software often has a variety of analysis
and display features. These data collection applications would
benefit from having the data transferred to a web server, which can
add capabilities that are impossible or difficult on a PC, such as
data sharing, analysis and comparison, publication and archiving.
Transferring data from portable data collection hardware to a web
server requires solving and avoiding a significant problem:
portable data collection devices usually cannot connect
directly to the Internet. They typically do not have ethernet
connections, TCP/IP stacks, or support for the HTTP protocol.
When it is possible to add direct Internet
connectivity to these portable devices with add-on hardware,
it can be expensive.
The obvious
solution to this problem is to use the desktop personal computer as
an intermediary as shown in Figure 1 below. Most PCs are connected to the Internet already, so
it make sense to transfer data from the portable device to the PC,
then from the PC on to the web server.
Assuming that the PC is being used as the data transfer
intermediary, the next major problem is that existing web browsers are
not well suited to this type of data transfer. Browsers are unable to
obtain data directly from portable data collection devices. They
are largely designed for viewing web pages, not uploading data
streams. It can also be difficult to interface the web browser with
the PC-based data collection software.
An ArsDigita customer needed this type of data transfer capability for their
handheld odor detector, the Cyranose 320 (see http://cyranosciences.com/products/).
Cyrano Sciences Inc. had already developed a couple of versions of PC-based
software used to control and obtain data from this device. Using the techniques
described in this article, ArsDigita extended
their software by adding HTTP data transfer abilities.
The Vision
The solution we chose is to integrate HTTP data
transfer capabilities directly into the PC-based data collection
software. The collection software connects with the device over a
serial port or other direct connection, obtains the data, processes it,
then uses its Internet connection to send it to the web server.
Programming the device interfaces is beyond the scope of this
article, since hardware device interface issues are almost always
device-specific. Instead, we'll focus on the challenges involved in
adding Internet data transfer code to existing applications.
Because we restricted ourselves to modifying existing data collection
programs, we were limited in our choice of
programming language and application architecture.
We used two Microsoft development
environments: Visual Basic 5 Professional (VB) and Visual C++ 6.0
Standard with the Microsoft Foundation Class Library (VC++), which we
will discuss in turn.
The Visual Basic Solution
Development Environment Features and Use
Visual Basic includes an "Internet Transfer Control"
component that supports both HTTP and FTP protocols. It can operate
in either synchronous or asynchronous
mode: in synchronous mode, data transfer must be complete before any
other code can execute, while in asynchronous mode
other code can execute while data is transferred in the background.
Using the asynchronous mode is a little more complicated, since you
need to write additional code that looks for and acts on certain
transfer states.
To use HTTP data transfer capabilities in your program, you first
need to add an Internet Transfer Control component to a form. This
must be done at design time, not run time. It should be given a
descriptive name since the default Inet1
,
Inet2
, etc. names can be
confusing. The remaining steps depend on whether you want synchronous
or asynchronous transfer.
To use synchronous mode, you must:
Use standard string handling
capabilities to create a URL string that includes the protocol
(i.e., HTTP), the host, page, and the form variables. An example
might be
http://scorecard.org/env-releases/state-chemical-detail.tcl?category=cancer&modifier=air&fips_state_code=06
.
Call the OpenURL
method of the Internet Transfer
Control component. This method will return the text of the page, but
not the headers. No code will be able to execute during the
transfer.
Here is some example code using the OpenURL
method:
' we get the hostname from the Form1 web_host property
' hostname is name only, no protocol, e.g., "dev.hostname.com"
' login_page has leading slash and no "?" e.g., "/vb/login.tcl"
upload_host$ = Form1.web_host
login_page$ = "/vb/login.tcl"
' start with protocol and append host
login_url$ = "http://" & upload_host$
' add page and question mark
login_url$ = login_url$ & login_page$ & "?"
' add email and ampersand
login_url$ = login_url$ & "email=" & login_email.Text & "&"
' add password
login_url$ = login_url$ & "password=" & login_password.Text
' get page, parse it later
login_result$ = login_inet.OpenURL(login_url$)
The asynchronous mode requires a few more steps:
Use standard string handling
capabilities to create a URL string that includes the protocol
(i.e., HTTP), the host, and page. An example might be
http://scorecard.org/env-releases/state-chemical-detail.tcl
.
Use standard string handling
capabilities to create a URL string that includes the form
variables. An example might be
category=cancer&modifier=air&fips_state_code=06
.
Note that no question mark is needed at the end of the page nor at
the beginning of the form variables.
Call the Execute
method of the Internet Transfer Control component. Unlike OpenURL
this Execute
method takes three arguments: the
host/page, the HTTP method (e.g., POST
), and the form
variable string. This method will send an HTTP request to the web
server. Other code will supposedly be able to execute
during the transfer.
The StateChanged
method of the Internet Transfer Control
component will monitor the data transfer. You need to have code in
this method that includes a Select Case
statement (like
a C or Tcl switch
) that looks for certain transfer
states. For example, you might have a case
for the state icResponseCompleted
whose contents would
execute when the transfer was complete.
Here is some example code using the asynchronous Execute method:
' hardcoded host name, later might want to use registry?
upload_host$ = "dev.hostname.com"
new_session_page$ = "/vb/upload/open.tcl"
' create URL for login. will be using post since some fields may be large
' start with protocol and append host
new_session_url$ = "http://" & upload_host$
' add page
new_session_url$ = new_session_url$ & new_session_page$
' start creating form data for posting ------------------------
' read these right out of the text boxes
strFormData$ = "user_id=" & Form1.web_user_id & "&"
' hardcoded streaming_p
strFormData$ = strFormData$ & "streaming_p=t&"
strFormData$ = strFormData$ & "title=" & Me.title & "&"
strFormData$ = strFormData$ & "description=" & Me.description & "&"
strFormData$ = strFormData$ & "device_info=" & Me.device_info
' post
new_session_inet.Execute new_session_url$, "POST", strFormData$
' the rest of the action takes place in
new_session_inet_StateChanged()
And over in the Internet Transfer Control component StateChanged
method we have (for the asynchronous mode):
Private Sub new_session_inet_StateChanged(ByVal State As Integer)
Dim res_string As String
Dim data_str As String
Select Case State
' ... Other cases not shown.
Case icResponseCompleted ' 12
' Get the first (and only) chunk from the web server
res_string$ = new_session_inet.GetChunk(1024, icString)
' no need to loop, all fist in 1024 bytes
' parse out data with custom function
data_str$ = parse_data(res_string$)
' pass parsed data back to main form
Form1.web_data = data_str$
' close this window as soon as process is complete
Unload Me
End Select
End Sub
Design Approach
The application needed to stream data to the web server at a rate
of about one serial port reading per second. Primarily for this
reason, we decided to use the asynchronous Execute method. Also, we
decided to use the HTTP POST
method since many of our
data strings were very, very long.
The Visual Basic Internet Transfer Control component does not seem
to have any ability to deal with cookies; it could neither send them
nor store them. It also lacked any ability to read or write headers.
Since the ArsDigita Community System depends on cookies for user
authentication, we needed to develop another security method. We
decided to include user_id
as one of the form variables.
To prevent simple URL hacking, we also included a form variable that
contained a secret code that needed to be correct in
order for the data to be acceptable.
In some cases, we needed to pass data from the web server back to
the VB application. The web pages were developed to return non-HTML
text that could be easily parsed. Standard VB does not have much
parsing capability, so we kept the text very simple (e.g., status and
error codes). The web server does need to pass back HTTP headers,
however (see also the header bug description below).
In one particular application, performance was an important issue.
We wanted to be able to stream data at about one post per second. The
initial version of the application was developed on a fast, lightly
loaded computer with a high-speed Internet connection. Performance
was not an issue on such a platform, but testing with slower machines
on low-speed connections showed that Visual Basic was dropping data.
We developed two independent solutions:
- concatenate several device
readings into one
POST
operation, and
- have several distinct
Internet Transfer Control components which we would rotate in a
round-robin fashion.
Through testing, we empirically determined that
we could get reliable performance by using three separate components,
each posting three concatenated readings at a time.
The web server page that processed the POST
ed data
attempted to separate multiple device readings that had been posted
together by splitting on line breaks. So when we needed to concatenate
multiple readings, we used "%0a" to separate readings (this
character is equivalent to a line-feed, decimal 10).
Problems, Bugs and Quirks
Visual Basic can be a little tricky to program. Its Internet
Transfer Control component has the feeling of a black box. Its
limitations are not well documented, and in some cases we had to
resort to packet sniffing to diagnose bugs and find limits.
Service Pack Required
The first main bug that we encountered was that VB5'S Internet
Transfer Control's HTTP POST
method posts garbage rather than data.
As it turns out, this was a known bug that can be fixed by a service
pack (http://msdn.microsoft.com/vstudio/sp/vs97/).
Don't even attempt to use VB5'S Internet Transfer Control until you
have applied this service pack.
Undocumented Limit to Number of Components on Form
As mentioned above, to help performance we decided to use several
Internet Transfer Control components in a round-robin fashion.
However, with
as many as 10 control objects, data would be dropped silently. With
four or more components on a form, the application would freeze under
certain repeatable conditions. So far, use of three components has
been reliable. We did not experiment to see if one could get more
total components by spreading them out over several forms
Execute POST is not really Asynchronous
According to VB5's "Books Online", the Execute POST
method is supposed to be asynchronous:
The OpenURL method
results in a synchronous transmission of data. In this context,
synchronous means that the transfer operation occurs before any other
procedures are executed. Thus the data transfer must be completed
before any other code can be executed. On the other hand, the Execute
method results in an asynchronous transmission. When the Execute
method is invoked, the transfer operation occurs independently of
other procedures. Thus, after invoking the Execute method, other code
can execute while data is received in the background.
Unfortunately, MS's claim that "...other code can execute
while data is received in the background" is not 100% true. We
found that if you try to use the same control object again before the
remote web server responds, then you get an error (35764,
isExecuting
).
There is a hint about this in VB's help for the control property
StillExecuting
. This says:
Returns a value that
specifies if the Internet Transfer control is busy. The control will
return True when it is engaged in an operation such as retrieving a
file from the Internet. The control will not respond to other
requests when it is busy.
So, if you want to post frequently, you either need to wait until
your control has received data, or you must use multiple controls.
Visual Basic 5 OpenURL method is sensitive about headers
It turns out that VB5's Inet1.OpenURL(strURL$, icString)
method (the "easy" way to do HTTP in Visual Basic) is very
particular about how it gets data from AOLServer. It turns out that
VB fails if it hits a page that ns_return 200 text/html
but works if it hits a page that does ReturnHeaders
and
ns_write
.
The test page that we tried first
used ns_return, and VB generated fatal error messages with it.
We made a variation that was identical, except it used ReturnHeaders,
and it worked fine!
The difference between the server output is very minor, as shown
by the telnet
session output below.
The "buggy" ns_return
version
includes Date:
,
Server:
, and
Content-Length:
headers that are not
present in the ReturnHeaders
version that works.
It's a mystery why
this would cause any problem; this bug is not described on the
Microsoft web page.
-- works (uses ReturnHeaders and ns_write)
GET /test/login-abe2.tcl?email=foobar@arsdigita.com&password=vbsucks
HTTP/1.0
HTTP/1.0 200 OK
MIME-Version: 1.0
Content-Type: text/html
Set-Cookie: last_visit=947803872; path=/; expires=Fri, 01-Jan-2010
01:00:00 GMT
ad: status=approved; user_id=999;Connection closed by foreign host.
-- fails with VB Error 13 - Type mismatch (uses ns_return 200 text/html)
GET /test/login.tcl?email=foobar@arsdigita.com&password=vbsucks
HTTP/1.0
HTTP/1.0 200 OK
MIME-Version: 1.0
Date: Thu, 13 Jan 2000 22:52:22 GMT
Server: NaviServer/2.0 AOLserver/2.3.3
Content-Type: text/html
Content-Length: 49
ad: status=approved; user_id=999;Connection closed by foreign host.
The Visual C++ Solution
Development Environment Features
Visual C++ 6.0 Standard (VC++) and its Microsoft Foundation Class
Library (MFC) provide a very rich set of classes and functions for
bringing Internet data transfer into desktop PC applications. The
level of control, program expressiveness, documentation, reliability,
debugging capabilities, and development environment far surpass what
is available in Visual Basic. If you have any choice at all,
definitely choose Visual C++ rather than Visual Basic.
VC++/MFC provides a wide variety of tools to work with TCP/IP and
protocols such as HTTP and FTP. The remainder of this article is
limited to using standard MFC features to implement HTTP data
transfer.
Steps in a Typical HTTP Client Application
Here are the steps for doing HTTP
from a desktop application based on MFC:
Create a CInternetSession
object
on the stack.
Use
CInternetSession::GetHttpConnection()
to allocate
a CHttpConnection
object on the heap
Open an HTTP request with
CHttpConnection::OpenRequest()
.
This will allocate a
CHttpFile
object on the heap.
Send a request along with the POST
data using CHttpFile::SendRequest()
Check on the data transfer status
with CHttpFile::QueryInfoStatusCode()
When the transfer status is
HTTP_STATUS_OK
, then read data returned by the web server with
CHttpFile::ReadString()
Clean up after yourself and do all of those wonderful C++
memory management tasks.
Design Approach
We decided to create a custom class named ad_inet
that would
encapsulate the complexity of the HTTP and memory management steps
listed above. The design goal was to be able to post data with a
simple use of ad_inet::post_data()
. This was written with a single
public member function that took either two or three arguments. In
both cases, the first argument is the name of the web server page to
post to (relative to page root), and the second argument is the data
to post. The optional third argument is a pointer to a string
variable to be used for storing web server output. If this third
argument is missing, then web server output is ignored (but the data
is still posted).
For example, user code could be as simple as this:
// create a string for posting
CString data_to_post;
data_to_post = "session_id=" + CDataLoggerDoc::m_ad_session_id;
data_to_post += "&session_key=" + CDataLoggerDoc::m_ad_session_key;
data_to_post += "&data=" + web_data_str;
data_to_post += "%0a"; // need that newline for the Tcl parsing, even with only one line
// create a posting object and post data (discarding web server output)
ad_inet inet_post_obj;
inet_post_obj.post_data("/test/deviceX/post.tcl",data_to_post);
The verbosely commented ad_inet
class declaration is
shown below; the source code to the class definition (http://www.arsdigita.com/asj/pc-data-collection-to-web/pc-data-collection-to-web-source) is also
available. Note the suggestions for passing string arguments
towards the end.
class ad_inet {
private:
//////////////////////////////////////////////////////////
// Private data members
///////////////////////////////////////////////////////////
// We will need a null pointer to a CHttpConnection
// Later, we will get a value for this pointer by passing
// a server name and port to CInternetSession::GetHttpConnection
CHttpConnection* pServer;
// declare variables for the remote web server name and port
// (arguments for CInternetSession::GetHttpConnection)
CString strServerName;
INTERNET_PORT nPort;
// We will need a pointer to a CHttpFile
// Later, we will assign a value for this pointer
// by passing verb, page, and flags to CHttpConnection::OpenRequest
CHttpFile* pFile;
// This flag is another one of the arguments
// for CHttpConnection::OpenRequest
DWORD dwHttpRequestFlags;
// declare variable to obtain HTTP result status that
// will be filled by from CHttpFile::QueryInfoStatusCode
DWORD dwRet;
// declare a variable to hold our string data that
// we will be POSTing with CHttpFile::SendRequest
CString post_data_string;
// we declare a boolean flag that will be used by
// ad_inet::prv_post() to determine whether or not the
// caller wants the string that is returned by the web
// server
//
// a code reviewer pointed out that this could instead
// be a argument to ad_inet::prv_post(), rather than this member
// but its no big deal
BOOLEAN want_return_string;
//////////////////////////////////////////////////////////
// Private member functions
//////////////////////////////////////////////////////////
// this private function will be called by both versions of
// the public functions ad_inet::post_data(). This private function
// contains the ugly nuts and bolts of the posting process. Each
// version of the public function ad_inet::post_data() is therefore
// greatly simplified: basically all they do is set a flag that
// indicates whether or not they want the return string, and then
// they call this single, private function.
void prv_post(LPCTSTR page_from_root
, LPCTSTR data_to_post
, CString& string_from_web);
public:
////////////////////////////////////////////////////////////////////
//
// constructor and destructor - nothing fancy (yet)
ad_inet();
virtual ~ad_inet();
//
///////////////////////////////////////////////////////////////////
// MS Help Topic "Strings: CString Argument Passing" sez:
// http://msdn.microsoft.com/isapi/msdnlib.idc?
// theURL=/library/devprods/vs6/visualc/vcmfc/_mfc_cinternetsession.htm
//
//
// CString Argument-Passing Conventions
// When you define a class interface, you must determine the
// argument-passing convention for your member functions. There
// are some standard rules for passing and returning CString
// objects. If you follow the rules described in Strings as
// Function Inputs and Strings as Function Outputs, you will
// have efficient, correct code.
//
// Strings as Function Inputs
// If a string is an input to a function, in most cases it
// is best to declare the string function parameter as LPCTSTR.
// Convert to a CString object as necessary within the function
// using constructors and assignment operators. If the string contents
// are to be changed by a function, declare the parameter as a nonconstant
// CString reference (CString&).
//
// Strings as Function Outputs
// Normally you can return CString objects from functions because
// CString objects follow value semantics like primitive types.
// To return a read-only string, use a constant CString reference
// (const CString&). The following example ...
// post data and get string:
// page_from_root and data_to_post
// are constant strings (LPCTSTR),
// while string_from_web will be
// modified (CString&)
void post_data(LPCTSTR page_from_root
, LPCTSTR data_to_post
, CString& string_from_web);
// post data only (do not care about returned string)
// page_from_root and data_to_post
// are constant strings (LPCTSTR)
void post_data(LPCTSTR page_from_root
, LPCTSTR data_to_post);
};
Problems, Bugs and Quirks
VC++/MFC was surprisingly easy to work with, especially compared
to VB. In general, the development environment behaved as expected,
and took little time to learn.
The MFC Internet classes used here throw CInternetException
exceptions, so you definitely want to catch these. It seemed like a
good idea to code our own exception classes. For example, the
Internet connection and web server might be operating perfectly, but
you might post some data that the logic of the web server decides is
in some way erroneous (e.g., attempted login with an invalid
password). For such conditions, we programmed the web server to return
plain text status codes and error messages. Ideally, you would be
able to write custom exception classes that would interpret such web
server messages and let the C++ client program deal with them in
accordance with conventional C++ exception handling techniques.
However, the Microsoft documentation was either wrong, incomplete, or
impenetrable, since we could not get custom C++ exceptions to work as
advertised within the scope of our projects.
The VC++/MFC HTTP classes met our needs in this project. However,
others have found them lacking in certain respects. In particular, a
well-known and highly regarded book on Visual C++ states:
...MFC developers informed us the the CAsynchSocket and CSocket
classes were not appropriate for synchronous programming. The Visual
C++ online help say you can use CSocket for synchronous programming,
but if you look at the source code, you'll see some ugly
message-based code left over from Win16.
(David Kruglinski et
al., Programming Microsoft Visual C++ Fifth Edition, Microsoft Press,
1998, ISBN 1-57231-857-0)
This book is essential for serious work
with VC++/MFC.
Performance of the MFC classes was acceptable in our experience,
even with streaming data. However, if the PC application needs near
real-time capabilities, you should plan on profiling and optimizing
the code. For example, to obtain better performance, one client took
ArsDigita code that used MFC CString classes and replaced them with
standard C-style string buffers to avoid the overhead of the MFC
buffer management.
General Integration Issues
Existing application code
It is usually a challenge to graft HTTP POST
capabilities into an
existing application. The data that is to be sent to the web server
may exist in several locations inside the PC application. For
example, in one project, the existing PC application wrote data to a
text file, and the application needed to send each line of the file to
the web server as soon as it was written. However, the application
was originally designed to write many small chunks of data to the
file, one at a time, eventually followed by writing a line terminator
to the file. Each write to the file was separated from other writes
by many lines of application code. Since we needed to be able to send
the full line of data to the server in single, atomic, operation, we
had to develop a way to gather up and concatenate these small chunks
of data wherever they were generated, then finally sending it
to the web server when it
was fully assembled. I suspect that this type of problem would be
fairly common, unless the PC application was designed for this from
the beginning.
Visual Basic Option Explicit
By default, Visual Basic does not require variables to be declared
in advance before they are used. Many VB programs are developed in
this fashion, because it is thought by some to be easier. However,
this default behavior usually leads to bugs that are hard to
diagnose, since a typo in a variable name will cause a new variable
to be created. Such typos are hard to spot visually. Most experienced
VB programmers know that you can make VB require variable declaration
(i.e., the Dim
statement) by putting Option
Explicit
at the top of each module. The time saved by avoiding
variable name typo bugs more than makes up for the small amount of
extra time it takes to plan and declare your variables.
Unfortunately, after an application has been written without variable
declarations, it is very time consuming to suddenly turn on Option
Explicit
and mop up after the huge number of errors (undeclared
variables) that are found. This problem is common, especially when
trying to add HTTP POST
capabilities to an existing application.
Conclusion
Both Visual Basic and Visual C++ offer the ability to interface a
web service with PC-attached data collection hardware. Of the two
programming environments, Visual C++ was easier to work with, had
more ability, and was more reliable. The two environments require
completely different programming techniques to accomplish the data
transfer. It is our hope that the tips, advice, and warnings
presented in this article will make your project more efficient.
Many ideas presented themselves during the course of these
projects that we hope to be able to explore later. For example, one
potential solution to the problem of data being dropped by the Visual
Basic application under high load was to spool the data
in a filesystem buffer. Another separate process could read this
spooled data and send it to the web server as needed.
This approach has the potential
to increase reliability, but project scope did not allow time to
implement a spool. Another idea was to use XML to structure the
transferred data: standard XML parsers could then be used to extract
the data, rather than developing custom parsers for each situation.