LOAD DATA INFILE
Syntax
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name
'
[REPLACE | IGNORE]
INTO TABLEtbl_name
[CHARACTER SETcharset_name
]
[FIELDS
[TERMINATED BY 'string
']
[[OPTIONALLY] ENCLOSED BY 'char
']
[ESCAPED BY 'char
']
]
[LINES
[STARTING BY 'string
']
[TERMINATED BY 'string
']
]
[IGNOREnumber
LINES]
[(col_name_or_user_var
,...)]
[SETcol_name
=expr
,...]
The LOAD DATA INFILE
statement reads rows from a text file into a table at a very high speed. The filename must be given as a literal string.
LOAD DATA INFILE
is the complement of SELECT ... INTO OUTFILE
. To write data from a table to a file, use
SELECT ... INTO OUTFILE
. To read the file back into a table, use LOAD DATA INFILE
. The syntax of the FIELDS
and LINES
clauses is the same for both statements. Both clauses are optional, but FIELDS
must precede LINES
if both are specified.
ucs2
character set. As of MySQL 5.0.19, the character_set_filesystem
system variable controls the interpretation of the filename.
You can also load data files by using the mysqlimport utility; it operates by sending a LOAD DATA INFILE
statement to the server. The --local
option causes mysqlimport to read data files from the client host. You can specify the --compress
option to get better performance over slow networks if the client and server support the compressed protocol.
If you use LOW_PRIORITY
, execution of the LOAD DATA
statement is delayed until no other clients are reading from the table.
If you specify CONCURRENT
with a MyISAM
table that satisfies the condition for concurrent inserts (that is, it contains no free blocks in the middle), other threads can retrieve data from the table while LOAD DATA
is executing. Using this option affects the performance of LOAD DATA
a bit, even if no other thread is using the table at the same time.
The LOCAL
keyword, if specified, is interpreted with respect to the client end of the connection:
-
If
LOCAL
is specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full pathname to specify its exact location. If given as a relative pathname, the name is interpreted relative to the directory in which the client program was started. -
If
LOCAL
is not specified, the file must be located on the server host and is read directly by the server. The server uses the following rules to locate the file:-
If the filename is an absolute pathname, the server uses it as given.
-
If the filename is a relative pathname with one or more leading components, the server searches for the file relative to the server's data directory.
-
If a filename with no leading components is given, the server looks for the file in the database directory of the default database.
-
Note that, in the non-LOCAL
case, these rules mean that a file named as ./myfile.txt
is read from the server's data directory, whereas the file named as myfile.txt
is read from the database directory of the default database. For example, if db1
is the default database, the following LOAD DATA
statement reads the file data.txt
from the database directory for db1
, even though the statement explicitly loads the file into a table in the db2
database:
LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;
Windows pathnames are specified using forward slashes rather than backslashes. If you do use backslashes, you must double them.
For security reasons, when reading text files located on the server, the files must either reside in the database directory or be readable by all. Also, to use LOAD DATA INFILE
on server files, you must have the FILE
privilege.
Using LOCAL
is a bit slower than letting the server access the files directly, because the contents of the file must be sent over the connection by the client to the server. On the other hand, you do not need the FILE
privilege to load local files.
LOCAL
works only if your server and your client both have been enabled to allow it. For example, if mysqld was started with --local-infile=0
, LOCAL
does not work.
On Unix, if you need LOAD DATA
to read from a pipe, you can use the following technique (here we load the listing of the /
directory into a table):
mkfifo /mysql/db/x/x
chmod 666 /mysql/db/x/x
find / -ls > /mysql/db/x/x &
mysql -e "LOAD DATA INFILE 'x' INTO TABLE x" x
Note that you must run the command that generates the data to be loaded and the mysql commands either on separate terminals, or run the data generation process in the background (as shown in the preceding example). If you do not do this, the pipe will block until data is read by the mysql process.
The REPLACE
and IGNORE
keywords control handling of input rows that duplicate existing rows on unique key values:
-
If you specify
REPLACE
, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. -
If you specify
IGNORE
, input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, the behavior depends on whether theLOCAL
keyword is specified. WithoutLOCAL
, an error occurs when a duplicate key value is found, and the rest of the text file is ignored. WithLOCAL
, the default behavior is the same as ifIGNORE
is specified; this is because the server has no way to stop transmission of the file in the middle of the operation.
If you want to ignore foreign key constraints during the load operation, you can issue a SET FOREIGN_KEY_CHECKS=0
statement before executing LOAD DATA
.
If you use LOAD DATA INFILE
on an empty MyISAM
table, all non-unique indexes are created in a separate batch (as for REPAIR TABLE
). Normally, this makes LOAD DATA INFILE
much faster when you have many indexes. In some extreme cases, you can create the indexes even faster by turning them off with ALTER TABLE ... DISABLE KEYS
before loading the file into the table and using ALTER TABLE ... ENABLE KEYS
to re-create the indexes after loading the file.
For both the LOAD DATA INFILE
and SELECT ... INTO OUTFILE
statements, the syntax of the FIELDS
and LINES
clauses is the same. Both clauses are optional, but FIELDS
must precede LINES
if both are specified.
If you specify a FIELDS
clause, each of its subclauses (TERMINATED BY
, [OPTIONALLY] ENCLOSED BY
, and ESCAPED BY
) is also optional, except that you must specify at least one of them.
If you specify no FIELDS
clause, the defaults are the same as if you had written this:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
If you specify no LINES
clause, the defaults are the same as if you had written this:
LINES TERMINATED BY '\n' STARTING BY ''
In other words, the defaults cause LOAD DATA INFILE
to act as follows when reading input:
-
Look for line boundaries at newlines.
-
Do not skip over any line prefix.
-
Break lines into fields at tabs.
-
Do not expect fields to be enclosed within any quoting characters.
-
Interpret occurrences of tab, newline, or ‘
\
’ preceded by ‘\
’ as literal characters that are part of field values.
Conversely, the defaults cause SELECT ... INTO OUTFILE
to act as follows when writing output:
-
Write tabs between fields.
-
Do not enclose fields within any quoting characters.
-
Use ‘
\
’ to escape instances of tab, newline, or ‘\
’ that occur within field values. -
Write newlines at the ends of lines.
Backslash is the MySQL escape character within strings, so to write FIELDS ESCAPED BY '\\'
, you must specify two backslashes for the value to be interpreted as a single backslash.
Note: If you have generated the text file on a Windows system, you might have to use LINES TERMINATED BY '\r\n'
to read the file properly, because Windows programs typically use two characters as a line terminator. Some programs, such as WordPad, might use \r
as a line terminator when writing files. To read such files, use LINES TERMINATED BY '\r'
.
If all the lines you want to read in have a common prefix that you want to ignore, you can use LINES STARTING BY '
to skip over the prefix, and anything before it. If a line does not include the prefix, the entire line is skipped. Suppose that you issue the following statement: prefix_string
'
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test
FIELDS TERMINATED BY ',' LINES STARTING BY 'xxx';
If the data file looks like this:
xxx"abc",1
something xxx"def",2
"ghi",3
The resulting rows will be ("abc",1)
and ("def",2)
. The third row in the file is skipped because it does not contain the prefix.
The IGNORE
option can be used to ignore lines at the start of the file. For example, you can use number
LINESIGNORE 1 LINES
to skip over an initial header line containing column names:
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test IGNORE 1 LINES;
When you use SELECT ... INTO OUTFILE
in tandem with LOAD DATA INFILE
to write data from a database into a file and then read the file back into the database later, the field- and line-handling options for both statements must match. Otherwise, LOAD DATA INFILE
will not interpret the contents of the file properly. Suppose that you use SELECT ... INTO OUTFILE
to write a file with fields delimited by commas:
SELECT * INTO OUTFILE 'data.txt'
FIELDS TERMINATED BY ','
FROM table2;
To read the comma-delimited file back in, the correct statement would be:
LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY ',';
If instead you tried to read in the file with the statement shown following, it wouldn't work because it instructs LOAD DATA INFILE
to look for tabs between fields:
LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY '\t';
The likely result is that each input line would be interpreted as a single field.
LOAD DATA INFILE
can be used to read files obtained from external sources. For example, many programs can export data in comma-separated values (CSV) format, such that lines have fields separated by commas and enclosed within double quotes. If lines in such a file are terminated by newlines, the statement shown here illustrates the field- and line-handling options you would use to load the file:
LOAD DATA INFILE 'data.txt' INTO TABLEtbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Any of the field- or line-handling options can specify an empty string (''
). If not empty, the FIELDS [OPTIONALLY] ENCLOSED BY
and FIELDS ESCAPED BY
values must be a single character. The FIELDS TERMINATED BY
, LINES STARTING BY
, and LINES TERMINATED BY
values can be more than one character. For example, to write lines that are terminated by carriage return/linefeed pairs, or to read a file containing such lines, specify a LINES TERMINATED BY '\r\n'
clause.
To read a file containing jokes that are separated by lines consisting of %%
, you can do this
CREATE TABLE jokes
(a INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
joke TEXT NOT NULL);
LOAD DATA INFILE '/tmp/jokes.txt' INTO TABLE jokes
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\n%%\n' (joke);
FIELDS [OPTIONALLY] ENCLOSED BY
controls quoting of fields. For output (SELECT ... INTO OUTFILE
), if you omit the word OPTIONALLY
, all fields are enclosed by the ENCLOSED BY
character. An example of such output (using a comma as the field delimiter) is shown here:
"1","a string","100.20"
"2","a string containing a , comma","102.20"
"3","a string containing a \" quote","102.20"
"4","a string containing a \", quote and comma","102.20"
If you specify OPTIONALLY
, the ENCLOSED BY
character is used only to enclose values from columns that have a string data type (such as CHAR
, BINARY
, TEXT
, or ENUM
):
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a \" quote",102.20
4,"a string containing a \", quote and comma",102.20
Note that occurrences of the ENCLOSED BY
character within a field value are escaped by prefixing them with the ESCAPED BY
character. Also note that if you specify an empty ESCAPED BY
value, it is possible to inadvertently generate output that cannot be read properly by LOAD DATA INFILE
. For example, the preceding output just shown would appear as follows if the escape character is empty. Observe that the second field in the fourth line contains a comma following the quote, which (erroneously) appears to terminate the field:
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a " quote",102.20
4,"a string containing a ", quote and comma",102.20
For input, the ENCLOSED BY
character, if present, is stripped from the ends of field values. (This is true regardless of whether OPTIONALLY
is specified; OPTIONALLY
has no effect on input interpretation.) Occurrences of the ENCLOSED BY
character preceded by the ESCAPED BY
character are interpreted as part of the current field value.
If the field begins with the ENCLOSED BY
character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY
sequence. To avoid ambiguity, occurrences of the ENCLOSED BY
character within a field value can be doubled and are interpreted as a single instance of the character. For example, if ENCLOSED BY '"'
is specified, quotes are handled as shown here:
"The ""BIG"" boss" -> The "BIG" boss
The "BIG" boss -> The "BIG" boss
The ""BIG"" boss -> The ""BIG"" boss
FIELDS ESCAPED BY
controls how to write or read special characters. If the FIELDS ESCAPED BY
character is not empty, it is used to prefix the following characters on output:
-
The
FIELDS ESCAPED BY
character -
The
FIELDS [OPTIONALLY] ENCLOSED BY
character -
The first character of the
FIELDS TERMINATED BY
andLINES TERMINATED BY
values -
ASCII
0
(what is actually written following the escape character is ASCII ‘0
’, not a zero-valued byte)
If the FIELDS ESCAPED BY
character is empty, no characters are escaped and NULL
is output as NULL
, not \N
. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
For input, if the FIELDS ESCAPED BY
character is not empty, occurrences of that character are stripped and the following character is taken literally as part of a field value. The exceptions are an escaped ‘0
’ or ‘N
’ (for example, \0
or \N
if the escape character is ‘\
’). These sequences are interpreted as ASCII NUL (a zero-valued byte) and NULL
. The rules for NULL
handling are described later in this section.
In certain cases, field- and line-handling options interact:
-
If
LINES TERMINATED BY
is an empty string andFIELDS TERMINATED BY
is non-empty, lines are also terminated withFIELDS TERMINATED BY
. -
If the
FIELDS TERMINATED BY
andFIELDS ENCLOSED BY
values are both empty (''
), a fixed-row (non-delimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. ForTINYINT
,SMALLINT
,MEDIUMINT
,INT
, andBIGINT
, the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.LINES TERMINATED BY
is still used to separate lines. If a line does not contain all fields, the rest of the columns are set to their default values. If you do not have a line terminator, you should set this to''
. In this case, the text file must contain all fields for each row.Fixed-row format also affects handling of
NULL
values, as described later. Note that fixed-size format does not work if you are using a multi-byte character set.Note: Before MySQL 5.0.6, fixed-row format used the display width of the column. For example,
INT(4)
was read or written using a field with a width of 4. However, if the column contained wider values, they were dumped to their full width, leading to the possibility of a “ragged” field holding values of different widths. Using a field wide enough to hold all values in the field prevents this problem. However, data files written before this change was made might not be reloaded correctly withLOAD DATA INFILE
for MySQL 5.0.6 and up. This change also affects data files read by mysqlimport and written by mysqldump --tab, which useLOAD DATA INFILE
andSELECT ... INTO OUTFILE
.
Handling of NULL
values varies according to the FIELDS
and LINES
options in use:
-
For the default
FIELDS
andLINES
values,NULL
is written as a field value of\N
for output, and a field value of\N
is read asNULL
for input (assuming that theESCAPED BY
character is ‘\
’). -
If
FIELDS ENCLOSED BY
is not empty, a field containing the literal wordNULL
as its value is read as aNULL
value. This differs from the wordNULL
enclosed withinFIELDS ENCLOSED BY
characters, which is read as the string'NULL'
. -
If
FIELDS ESCAPED BY
is empty,NULL
is written as the wordNULL
. -
With fixed-row format (which is used when
FIELDS TERMINATED BY
andFIELDS ENCLOSED BY
are both empty),NULL
is written as an empty string. Note that this causes bothNULL
values and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.
An attempt to load NULL
into a NOT NULL
column causes assignment of the implicit default value for the column's data type and a warning, or an error in strict SQL mode.
Some cases are not supported by LOAD DATA INFILE
:
-
Fixed-size rows (
FIELDS TERMINATED BY
andFIELDS ENCLOSED BY
both empty) andBLOB
orTEXT
columns. -
If you specify one separator that is the same as or a prefix of another,
LOAD DATA INFILE
cannot interpret the input properly. For example, the followingFIELDS
clause would cause problems:FIELDS TERMINATED BY '"' ENCLOSED BY '"'
-
If
FIELDS ESCAPED BY
is empty, a field value that contains an occurrence ofFIELDS ENCLOSED BY
orLINES TERMINATED BY
followed by theFIELDS TERMINATED BY
value causesLOAD DATA INFILE
to stop reading a field or line too early. This happens becauseLOAD DATA INFILE
cannot properly determine where the field or line value ends.
The following example loads all columns of the persondata
table:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;
By default, when no column list is provided at the end of the LOAD DATA INFILE
statement, input lines are expected to contain a field for each table column. If you want to load only some of a table's columns, specify a column list:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);
You must also specify a column list if the order of the fields in the input file differs from the order of the columns in the table. Otherwise, MySQL cannot tell how to match input fields with table columns.
Before MySQL 5.0.3, the column list must contain only names of columns in the table being loaded, and the SET
clause is not supported. As of MySQL 5.0.3, the column list can contain either column names or user variables. With user variables, the SET
clause enables you to perform transformations on their values before assigning the result to columns.
User variables in the SET
clause can be used in several ways. The following example uses the first input column directly for the value of t1.column1
, and assigns the second input column to a user variable that is subjected to a division operation before being used for the value of t1.column2
:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, @var1)
SET column2 = @var1/100;
The SET
clause can be used to supply values not derived from the input file. The following statement sets column3
to the current date and time:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, column2)
SET column3 = CURRENT_TIMESTAMP;
You can also discard an input value by assigning it to a user variable and not assigning the variable to a table column:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, @dummy, column2, @dummy, column3);
Use of the column/variable list and SET
clause is subject to the following restrictions:
-
Assignments in the
SET
clause should have only column names on the left hand side of assignment operators. -
You can use subqueries in the right hand side of
SET
assignments. A subquery that returns a value to be assigned to a column may be a scalar subquery only. Also, you cannot use a subquery to select from the table that is being loaded. -
Lines ignored by an
IGNORE
clause are not processed for the column/variable list orSET
clause. -
User variables cannot be used when loading data with fixed-row format because user variables do not have a display width.
When processing an input line, LOAD DATA
splits it into fields and uses the values according to the column/variable list and the SET
clause, if they are present. Then the resulting row is inserted into the table. If there are BEFORE INSERT
or AFTER INSERT
triggers for the table, they are activated before or after inserting the row, respectively.
If an input line has too many fields, the extra fields are ignored and the number of warnings is incremented.
An empty field value is interpreted differently than if the field value is missing:
-
For string types, the column is set to the empty string.
-
For numeric types, the column is set to
0
. -
For date and time types, the column is set to the appropriate “zero” value for the type.
These are the same values that result if you assign an empty string explicitly to a string, numeric, or date or time type explicitly in an INSERT
or UPDATE
statement.
TIMESTAMP
columns are set to the current date and time only if there is a NULL
value for the column (that is, \N
), or if the TIMESTAMP
column's default value is the current timestamp and it is omitted from the field list when a field list is specified.
LOAD DATA INFILE
regards all input as strings, so you cannot use numeric values for ENUM
or SET
columns the way you can with INSERT
statements. All ENUM
and SET
values must be specified as strings.
BIT
values cannot be loaded using binary notation (for example, b'011010'
). To work around this, specify the values as regular integers and use the SET
clause to convert them so that MySQL performs a numeric type conversion and loads them into the BIT
column properly:
shell>cat /tmp/bit_test.txt
2
127
shell>mysql test
mysql>LOAD DATA INFILE '/tmp/bit_test.txt'
->INTO TABLE bit_test (@var1) SET b= CAST(@var1 AS SIGNED);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
mysql>SELECT BIN(b+0) FROM bit_test;
+----------+
| bin(b+0) |
+----------+
| 10 |
| 1111111 |
+----------+
2 rows in set (0.00 sec)
When the LOAD DATA INFILE
statement finishes, it returns an information string in the following format:
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
No comments:
Post a Comment