DAO design for writing big XML file on database [on hold]


I am currently working on JavaEE application (Spring, Hibernate). I
have to put a big XML file (more than 1 gigabyte) on a relational
database (Postgres).



The application does not use batch processing. I've done some
searching but I did not find any solution for the design of the DAO
layer: if I use only one transaction, the server will not response to
any request until it finishes the insertion of rows (a huge number of
rows). So, using 1 transaction is not a good idea.
I can split XML file basing on its tags data: every tag content will
be inserted on a row.
The idea is to use multithreading to manage transactions (every
transaction inserts a defined number of rows).
I have found a difficulties to would know how to define the necessary
number of transactions to maintain a good time response of the
application. I also search how to manage failure of certain
transactions. For example, If only 3 transactions write over 1000000
fail, I should try again all the transactions?



When searching, I find that batch processing like Spring batch
manages database records and transactions failure. But in my
application, we did not use batch processing.



Unfortunately, I can not change the database to Nsql database or
add Spring Batch framework to the project.

I have an xml file which looks exactly like this just a lot more;



<movielist>
<movie>
<title>Amazon Quest </title>
<year>1954</year>
<length>75 min</length>
<director>Steve Sekely</director>
<rating>7</rating>
<genre>Action</genre>
<genre>Drama</genre>
<actor>Tom Neal</actor>
<actor>Carole Mathews</actor>
<actor>Carole Donne</actor>
<actor>Don Zelaya</actor>
<actor>Ralph Graves</actor>
</movie>
<movie>
<title>American Ninja 3: Blood Hunt </title>
<year>1989</year>
<length>89 min</length>
<certification>R</certification>
<director>Cedric Sundstrom</director>
<rating>7</rating>
<genre>Action</genre>
<genre>Drama</genre>
<actor>David Bradley</actor>
<actor>Steve James</actor>
<actor>Marjoe Gortner</actor>
<actor>Michele B. Chan</actor>
<actor>Yehuda Efroni</actor>
</movie>
</movielist>


This is my code to read it :



class Program
{
static void Main(string[] args)
{
XmlTextReader reader = null;

try
{
reader = new XmlTextReader("movies.xml");

while (reader.Read())
{
if (reader.IsStartElement())
{
if (reader.IsEmptyElement)
Console.WriteLine("<{0}/>",
reader.Name);
else
{
reader.Read();
Console.WriteLine(reader.ReadString());
}
}
}

}

finally
{
if (reader != null)
reader.Close();
}
}
}


but it gives me an error saying;



An unhandled exception of type 'System.Xml.XmlException' occurred
in System.Xml.dll



Additional information: Unexpected end of file has occurred. The
following elements are not closed: movielist. Line ___(last line which
looks exactly like the last line in this), position 1.



What could be the problem. It reads the file just fine until then
and the supposed line of error is the line where I say:-



if (reader.IsStartElement())

Web Development

I am using the netCDF4 library in python and just came across the
issue stated in the title. At first I was blaming groups for this, but
it turns out that it is a difference between the NETCDF4 and
NETCDF3_CLASSIC formats.



In the program below (which you will unfortunately not be able to
run, because it requires database access), I am creating a simple time
series netcdf file of the same data in 3 different ways: 1) as
NETCDF3_CLASSIC file, 2) as NETCDF4 flat file, 3) as NETCDF4 file with
groups. What I find with a simple timing and the ls command is:



1) NETCDF3          1.3483 seconds      1922704 bytes
2) NETCDF4 flat 8.5920 seconds 15178689 bytes
3) NETCDF4 groups 8.5565 seconds 15178896 bytes


It's exactly the same routine which creates 1) and 2), the only
difference is the format argument in the netCDF4.Dataset method. Is
this a bug or a feature?



Thanks, Martin



Code:



import sys
import datetime as dt
import numpy as np
from obs_stations_database import ObsStationsDatabase
import netCDF4 as nc

SERVER = "************"

def read_timeseries(user, password, network="GAW", station="GLH",
parameter="O3",
daterange=None):
# interpret daterange if given (convert string to datetime, format
YYYY-MM-DD)
if daterange is not None:
try:
if isinstance(daterange[0], basestring):
daterange[0] = dt.datetime.strptime(daterange[0],
"%Y-%m-%d")
if isinstance(daterange[1], basestring):
daterange[1] = dt.datetime.strptime(daterange[1],
"%Y-%m-%d")
except IOError as e:
raise IOError(e)
with ObsStationsDatabase(user_name=user, user_passcode=password,
database_host=SERVER) as db:
station_id = db.get_stations(network_name=network,
station_id=station,
key_only=True, as_dict=False)[0][1]
print "station_id = ", station_id
series_id = db.get_parameter_series_id(network, station_id,
parameter)
print "series_id = ", series_id
if series_id is not None:
t0 = dt.datetime.now()
data = db.get_hourly_data(series_id, daterange=daterange)
t1 = dt.datetime.now()
print "Database loading took %10.4f seconds." %
((t1-t0).total_seconds())
series_info = db.get_parameter_series_info(series_id)
return data, series_info


def write_to_netcdf_single(filename, data, series_info,
format='NETCDF4'):
vname = series_info["parameter_name"]
t0 = dt.datetime.now()
with nc.Dataset(filename, "w", format=format) as f:
# define dimensions and variables
dim = f.createDimension('time', None)
time = f.createVariable('time', 'f8', ('time',))
time.units = "days since 1900-01-01 00:00:00"
time.calendar = "gregorian"
param = f.createVariable(vname, 'f4', ('time',))
param.units = "nmol mol-1" ### replace this with database
query result!
flag = f.createVariable(vname+'_flag', 'i2', ('time',))
flag.long_name = "Data quality flag for %s. Values, see WMO
code table 033 020" % (v
# define global attributes
for k, v in sorted(series_info.items()):
if isinstance(v, dt.datetime):
v = v.isoformat(" ")
setattr(f, k, v)
# store data values
time[:] = nc.date2num(data.time, units=time.units,
calendar=time.calendar)
param[:] = data.value
flag[:] = data.flag
t1 = dt.datetime.now()
print "Writing simple file took %10.4f seconds." %
((t1-t0).total_seconds())


def write_to_netcdf_grouped(filename, data, series_info):
t0 = dt.datetime.now()
with nc.Dataset(filename, "w", format='NETCDF4') as f:
for i, sinfo in enumerate(series_info):
print i, sinfo
vname = sinfo["parameter_name"]
# define groups
grp = f.createGroup(sinfo["station_id"])
# define dimensions and variables
dim = grp.createDimension('time', None)
time = grp.createVariable('time', 'f8', ('time',))
time.units = "days since 1900-01-01 00:00:00"
time.calendar = "gregorian"
param = grp.createVariable(vname, 'f4', ('time',))
param.units = "nmol mol-1" ### replace this with
database query result!
flag = grp.createVariable(vname+'_flag', 'i2', ('time',))
flag.long_name = "Data quality flag for %s. Values, see
WMO code table 033 020"
# define global attributes
for k, v in sorted(sinfo.items()):
if isinstance(v, dt.datetime):
v = v.isoformat(" ")
setattr(grp, k, v)
# store data values
time[:] = nc.date2num(data[i].time, units=time.units,
calendar=time.calendar)
param[:] = data[i].value
flag[:] = data[i].flag
t1 = dt.datetime.now()
print "Writing grouped file took %10.4f seconds." %
((t1-t0).total_seconds())



if __name__ == "__main__":
if len(sys.argv) < 3:
print "Usage: obs_station_to_netcdf user password"
print "(username and password for the obs_surface_stations
database"
exit(2)
user = sys.argv[1]
password = sys.argv[2]
network = "GAW"
station = raw_input("Enter station code: ")
data, series_info = read_timeseries(user, password, network,
station, parameter="O3")
print series_info
filename = "%s_%s_nc3.nc" % (series_info["station_id"],
series_info["parameter_name"])
write_to_netcdf_single(filename, data, series_info,
format='NETCDF3_CLASSIC')
filename = "%s_%s.nc" % (series_info["station_id"],
series_info["parameter_name"])
write_to_netcdf_single(filename, data, series_info)
filename = filename.rstrip(".nc") + "_grouped.nc"
write_to_netcdf_grouped(filename, [data], [series_info])


And to prove that this is really the same data, here are the
ncdumps (global attribute/group attributes truncated):



NETCDF3_CLASSIC:



netcdf ASK123N00_O3_nc3 {
dimensions:
time = UNLIMITED ; // (120069 currently)
variables:
double time(time) ;
time:units = "days since 1900-01-01 00:00:00" ;
time:calendar = "gregorian" ;
float O3(time) ;
O3:units = "nmol mol-1" ;
short O3_flag(time) ;
O3_flag:long_name = "Data quality flag for O3. Values, see
WMO code table 033 020" ;

// global attributes:
:comments = "Time range 1-24 detected: Converted to 0-23
assuming data was given at interval endpoints" ;
...
}


NETCDF4 flat:



netcdf ASK123N00_O3 {
dimensions:
time = UNLIMITED ; // (120069 currently)
variables:
double time(time) ;
time:units = "days since 1900-01-01 00:00:00" ;
time:calendar = "gregorian" ;
float O3(time) ;
O3:units = "nmol mol-1" ;
short O3_flag(time) ;
O3_flag:long_name = "Data quality flag for O3. Values, see
WMO code table 033 020" ;

// global attributes:
...
}


NETCDF4 groups:



netcdf ASK123N00_O3_grouped {

group: ASK123N00 {
dimensions:
time = UNLIMITED ; // (120069 currently)
variables:
double time(time) ;
time:units = "days since 1900-01-01 00:00:00" ;
time:calendar = "gregorian" ;
float O3(time) ;
O3:units = "nmol mol-1" ;
short O3_flag(time) ;
O3_flag:long_name = "Data quality flag for O3. Values, see
WMO code table 033 020" ;

// group attributes:
....
} // group ASK123N00
}
Web Development

I am developing a hospital Management Application in mysql and php.

I am giving brief of tables for brevity.
I have among other tables for



medicine



id, medicine_name, medicine_id


patient_table



id, general_regn_no, ipd_regn_no, patient_name


patient_detail_entry(patient_admission_table)



id general_regn_no ipd_regn_no patient_name room_name_id, 
room_category_id admission_time ....


conceptualizing a form which will have following
fields



Primary fields



general_regn_no, ipd_regn_no, patient_name, room_name
date_time


medicine request fields for issue of medicine to
medicine store.(billing is done directly by medicine store).hence
price not needed.



medicine_name  medicine_id  quantity
medicine_name medicine_id quantity
medicine_name medicine_id quantity


so medicine request fields will be repeat fields. This request
should go to medicine store, which in turn will issue medicine on
receipt of request.



medicine_store_table



is_delivered, ipd_patient_id, patient_name,
issue_date_time, medicine_name, quantity


Now the question is should I have a separate table for issue and
request or a common table with many to many relationship between
patient and medicine.



Thanks.

Web Development

Good day to everyone. I was exported the a set of data in xml from
MYSQL database. But, I want to separate the existing xml file into 1
ROW in 1 XML file. The example is as below:



Exported XML result from database:



Filename: result01.xml



Script in file:



<ROWDATA>
<ROW>
<DOCKEY>57911</DOCKEY>
<DOCNO>MY1113</DOCNO>
<DOCDATE>20141201</DOCDATE>
</ROW>
<ROW>
<DOCKEY>57913</DOCKEY>
<DOCNO>MY1114</DOCNO>
<DOCDATE>20141201</DOCDATE>

</ROW>
<ROW>
<DOCKEY>57915</DOCKEY>
<DOCNO>MY1115</DOCNO>
<DOCDATE>20141201</DOCDATE>
</ROW>
<ROW>
<DOCKEY>57915</DOCKEY>
<DOCNO>MY1115</DOCNO>
<DOCDATE>20141201</DOCDATE>
</ROW>
<ROW>
<DOCKEY>57957</DOCKEY>
<DOCNO>MY1160</DOCNO>
<DOCDATE>20141201</DOCDATE>
</ROW>
</ROWDATA>


But what I need is to create one file per row:



Filename: 57911.MY1113.xml
XML in file:



<ROWDATA>
<ROW DOCKEY="57911" DOCNO="MY1113" DOCDATE="20141201">
</ROW></ROWDATA>


Filename: 57913.MY1114.xml
XML in file:



<ROWDATA>
<ROW DOCKEY="57913" DOCNO="MY1114" DOCDATE="20141201">
</ROW></ROWDATA>


Does anyone know if there's a simple way of creating multiple XML
files



as I mentioned. Your feedback is highly appreciated.



Thank you very much.

Web Development

1)Author
public class Author {



Integer id;
String name;


}
2)Editor
public class Editor {



Integer id;
String name;


}
3)Book
import java.util.List;



public class Book {



Integer      id;

String title;

// A book may have several authors. Note that the order of the authors
// is important, i.e. we want to be able to tell who's the first
author,
// who's the second author, and so on.
List<Author> authors;

// A book is edited by one editor.
Editor editor;


}



Q-1)
What is the type of the relationship between authors and books?

Q-2)
What is the type of the relationship between editors and books?

Q-3)
How many tables are needed for this database?

Q-4)
Which of the following is the correct schema for the authors
table?

1-(id, name)

2-(id, name, book_id)

3-(id, name, editor_id)

4-(id, name, book_id, editor_id)

Web Development

I am currently working on JavaEE application (Spring, Hibernate). I
have to put a big XML file (more than 1 gigabyte) on a relational
database (Postgres).



The application does not use batch processing. I've done some
searching but I did not find any solution for the design of the DAO
layer: if I use only one transaction, the server will not response to
any request until it finishes the insertion of rows (a huge number of
rows). So, using 1 transaction is not a good idea.
I can split XML file basing on its tags data: every tag content will
be inserted on a row.
The idea is to use multithreading to manage transactions (every
transaction inserts a defined number of rows).
I have found a difficulties to would know how to define the necessary
number of transactions to maintain a good time response of the
application. I also search how to manage failure of certain
transactions. For example, If only 3 transactions write over 1000000
fail, I should try again all the transactions?



When searching, I find that batch processing like Spring batch
manages database records and transactions failure. But in my
application, we did not use batch processing.



Unfortunately, I can not change the database to Nsql database or
add Spring Batch framework to the project.

Programming Languages

- Technology - Languages
+ Webmasters
+ Development
+ Development Tools
+ Internet
+ Mobile Programming
+ Linux
+ Unix
+ Apple
+ Ubuntu
+ Mobile & Tablets
+ Databases
+ Android
+ Network & Servers
+ Operating Systems
+ Coding
+ Design Software
+ Web Development
+ Game Development
+ Access
+ Excel
+ Web Design
+ Web Hosting
+ Web Site Reviews
+ Domain Name
+ Information Security
+ Software
+ Computers
+ Electronics
+ Hardware
+ Windows
+ PHP
+ ASP/ASP.Net
+ C/C++/C#
+ VB/VB.Net
+ JAVA
+ Javascript
+ Programming
Privacy Policy - Copyrights Notice - Feedback - Report Violation 2018 © BigHow