Reasoning behind shifting over the text whem mismatch occurs in KMP algorithm?


I am not sure that you've got problems with understanding in only this point, so, if you don't mind, I'll just describe (with as much explanation as possible) the whole algorithm. The answer to your question is probably in the last paragraph, but you'd better read it all to understand my terminology better.

During the KMP algorithm, you are, actually, counting nearly the same values as in the table (this is usually called prefix function). So when you get to position i in the text, you need to count the maximum length of a substring in the text ending in position i which equals some prefix of the pattern. It is quite clear that if and only if the length of this substring is equal to length of the pattern, you have found the pattern in the text. So, how do you count this prefix function value fast? (I suppose that you count these for the pattern using some O(n^2) algorithm, which is not enough fast). Let's suppose that we've already did everything for the first i-1 symbols of the text and we are now working with position i. We will also need the prefix-function value for the previous symbol of the text: p[i-1].

Let's compare text[i] and pattern[p[i-1]] (indexation from 0, if you don't mind). We already know that pattern[0:p[i-1]-1] == text[i-1+p[i-1],i-1]: that's the definition of p[i-1]. So, if text[i] == pattern[p[i-1]], we now know that pattern[0:p[i-1]] == text[i-1+p[i-1], i]', and that's why p[i] = p[i - 1]. But the interesting part starts when text[i] != pattern[p[i-1]].

When these symbols are different, we start jumping. The reason for that is that we want to find the next possible prefix as fast as we can. So, how we do it. Just look at the picture here and follow the explanation (the yellow parts are substrings found for text[i-1]). We're trying to find some string s: s:=s1+text[i]. Because of the prefix-function definition, s1=s2, c=test[i]. But we already know (from finding the value for text[i-1]) that two yellow parts in the picture are the same, so s3 actually equals s1, and so s3=s1. So we can find the length of s1: it is table[p[i-1]]. So now if c1=text[i], we should stop: we've found p[i], it is s1.length + 1. And if c1!=text[i], we can just repeat the same jumping, looking now at the first table[table[p[i-1]]] symbols of the pattern, and so we go on until we find the answer, or we get to the first 0 symbols in this case p[i]:=0.



Reasoning behind shifting over the text whem mismatch occurs in KMP algorithm?

I am not sure that you've got problems with understanding in only this point, so, if you don't mind, I'll just describe (with as much explanation as possible) the whole algorithm. The answer to your question is probably in the last paragraph, but you'd better read it all to understand my terminology better.

During the KMP algorithm, you are, actually, counting nearly the same values as in the table (this is usually called prefix function). So when you get to position i in the text, you need to count the maximum length of a substring in the text ending in position i which equals some prefix of the pattern. It is quite clear that if and only if the length of this substring is equal to length of the pattern, you have found the pattern in the text. So, how do you count this prefix function value fast? (I suppose that you count these for the pattern using some O(n^2) algorithm, which is not enough fast). Let's suppose that we've already did everything for the first i-1 symbols of the text and we are now working with position i. We will also need the prefix-function value for the previous symbol of the text: p[i-1].

Let's compare text[i] and pattern[p[i-1]] (indexation from 0, if you don't mind). We already know that pattern[0:p[i-1]-1] == text[i-1+p[i-1],i-1]: that's the definition of p[i-1]. So, if text[i] == pattern[p[i-1]], we now know that pattern[0:p[i-1]] == text[i-1+p[i-1], i]', and that's why p[i] = p[i - 1]. But the interesting part starts when text[i] != pattern[p[i-1]].

When these symbols are different, we start jumping. The reason for that is that we want to find the next possible prefix as fast as we can. So, how we do it. Just look at the picture here and follow the explanation (the yellow parts are substrings found for text[i-1]). We're trying to find some string s: s:=s1+text[i]. Because of the prefix-function definition, s1=s2, c=test[i]. But we already know (from finding the value for text[i-1]) that two yellow parts in the picture are the same, so s3 actually equals s1, and so s3=s1. So we can find the length of s1: it is table[p[i-1]]. So now if c1=text[i], we should stop: we've found p[i], it is s1.length + 1. And if c1!=text[i], we can just repeat the same jumping, looking now at the first table[table[p[i-1]]] symbols of the pattern, and so we go on until we find the answer, or we get to the first 0 symbols in this case p[i]:=0.


KMP algorithm with one mismatch allowed in Java
public class KMPStringSearch {

    /**
     * Searches for all occurances of the word in the sentence. Runs
in O(n+k)
     * where n is the word length and k is the sentence length.
     *
     * @param word The word that is being searched
     * @param sentence The collections of word over which the search
happens.
     * @return The list of starting indices of the matched word in the
sentence.
     * Empty list in case of no match.
     */
    public List<Integer> searchString(final String word, final
String sentence) {
        final List<Integer> matchedIndices = new
ArrayList<>();

        final int sentenceLength = sentence.length();
        final int wordLength = word.length();
        int beginMatch = 0; // the starting position in sentence from
which the match started
        int idxWord = 0; // the index of the character of the word
that is being compared to a character in string
        final List<Integer> partialTable =
createPartialMatchTable(word);
        while (beginMatch + idxWord < sentenceLength) {
            if (word.charAt(idxWord) == sentence.charAt(beginMatch +
idxWord)) {
                // the characters have matched
                if (idxWord == wordLength - 1) {
                    // the word is complete. we have a match.
                    matchedIndices.add(beginMatch);
                    // restart the search
                    beginMatch = beginMatch + idxWord -
partialTable.get(idxWord);
                    if (partialTable.get(idxWord) > -1) {
                        idxWord = partialTable.get(idxWord);
                    } else {
                        idxWord = 0;
                    }
                } else {
                    idxWord++;
                }
            } else {
                // mismatch. restart the search.
                beginMatch = beginMatch + idxWord -
partialTable.get(idxWord);
                if (partialTable.get(idxWord) > -1) {
                    idxWord = partialTable.get(idxWord);
                } else {
                    idxWord = 0;
                }
            }
        }

        return Collections.unmodifiableList(matchedIndices);
    }

    /**
     * Creates the Partial Match Table for the word. Runs in O(n)
where n is the
     * length of the word.
     *
     * @param word The word whose Partial Match Table is required.
     * @return The table as a list of integers.
     */
    public List<Integer> createPartialMatchTable(final String
word) {
        if (StringUtils.isBlank(word)) {
            return Collections.EMPTY_LIST;
        }

        final int length = word.length();
        final List<Integer> partialTable = new
ArrayList<>(length + 1);
        partialTable.add(-1);
        partialTable.add(0);

        final char firstChar = word.charAt(0);
        for (int idx = 1; idx < word.length(); idx++) {
            final int prevVal = partialTable.get(idx);
            if (prevVal == 0) {
                if (word.charAt(idx) == firstChar) {
                    partialTable.add(1);
                } else {
                    partialTable.add(0);
                }
            } else if (word.charAt(idx) == word.charAt(prevVal)) {
                partialTable.add(prevVal + 1);
            } else {
                partialTable.add(0);
            }
        }

        return Collections.unmodifiableList(partialTable);
    }
}

What character-shifting / pseudo-encryption algorithm is used here?

is anybody familiar with this sort of obfuscation scheme in the Windows world?

Once you understand it correctly, it's just a trivial rotation cipher like ROT13.

Why would anyone use this?

Well, in general, this is very common. Let's say you have some data that you need to obfuscate. But the decryption algorithm and key have to be embedded in software that the viewers have. There's no point using something fancy like AES, because someone can always just dig the algorithm and key out of your code instead of cracking AES. An encryption scheme that's even marginally harder to crack than finding the hidden key is just as good as a perfect encryption scheme—that is, good enough to deter casual viewers, and useless against serious attackers. (Often you aren't even really worried about stopping attacks, but about proving after the fact that your attacker must have acted in bad faith for contractual/legal reasons.) So, you use either a simple rotation cipher, or a simple xor cipher—it's fast, it's hard to get wrong and easy to debug, and if worst comes to worst you can even decrypt it manually to recover corrupted data.

As for the particulars:

If you want to handle non-ASCII characters, you pretty much have to use Unicode. If you used some fixed 8-bit charset, or the local system's OEM charset, you wouldn't be able to handle passwords from other machines.

A Python script would almost certainly handle Unicode characters, because in Python you either deal in bytes in a str, or Unicode characters in a unicode. But a Windows C or .NET app would be much more likely to use UTF-16, because Windows native APIs deal in UTF-16-LE code points in a WCHAR * (aka a string of 16-bit words).

So, why 4142? Well, it really doesn't matter what the key is. I'm guessing some programmer suggested 42. His manager then said "That doesn't sound very secure." He sighed and said, "I already explained why no key is going to be any more secure than… you know what, forget it, what about 4142?" The manager said, "Ooh, that sounds like a really secure number!" So that's why 4142.


If it's not a library function, can you think of a better method to de-obfuscate these values without resorting to the magic 142 number.

You do need to resort to the magic 4142, but you can make this a lot simpler:

def decrypt(block):
    return struct.pack('>H', (4142 - int(block, 10)) % 65536)

So, each block of 5 characters is the decimal representation of a UTF-16 code unit, subtracted from 4142, using C unsigned-short wraparound rules.

This would be trivial to implement in native Windows C, but it's slightly harder in Python. The best transformation function I can come up with is:

def decrypt_block(block):
    return struct.pack('>H', (4142 - int(block, 10)) % 65536)

def decrypt(pwd):
    blocks = [pwd[i:i+5] for i in range(0, len(pwd), 5)] 
    return ''.join(map(decrypt_block, blocks)).decode('utf-16-be')

This would be a lot more trivial in C or C#, which is probably what they implemented things in, so let me explain what I'm doing.

You already know how to transform the string into a sequence of 5-character blocks.

My int(block, 10) is doing the same thing as your int(block.lstrip('0')), making sure that a '0' prefix doesn't make Python treat it as an octal numeral instead of decimal, but more explicitly. I don't think this is actually necessary in Jython 2.2 (it definitely isn't in more modern Python/Jython), but I left it just in case.

Next, in C, you'd just do unsigned short x = 4142U - y;, which would automatically underflow appropriately. Python doesn't have unsigned short values, just signed int, so we have to do the underflow manually. (Because Python uses floored division and remainder, the sign is always the same as the divisor—this wouldn't be true in C, at least not C99 and most platforms' C89.)

Then, in C, we'd just cast the unsigned short to a 16-bit "wide character"; Python doesn't have any way to do that, so we have to use struct.pack. (Note that I'm converting it to big-endian, because I think that makes this easier to debug; in C you'd convert to native-endian, and since this is Windows, that would be little-endian.)

So, now we've got a sequence of 2-character UTF-16-BE code points. I just join them into one big string, then decode it as UTF-16-BE.


If you really want to test that I've got this right, you'll need to find characters that aren't just non-ASCII, but non-Western. In particular, you need:

  • A character that's > U+4142 but < U+10000. Most CJK ideographs, like U+7000 (瀀), fit the bill. This should appear as '41006', because that's 4142-0x7000 rolled over as an unsigned short.
  • A character that's >= U+10000. This includes uncommon CJK characters, specialized mathematical characters, characters from ancient scripts, etc. For example, the Old Italic character U+10300 (
While concatenating two columns an output mismatch occurs

Concatenating strings, where one of your strings are NULL will always yield NULL. For example: SELECT NULL + 'Moo' will result in NULL - always. ISNULL() accepts 2 arguments and will return the second when the first resolves to NULL - BOL

Do you mean to do this?

SELECT   [Badge Id]         = v.PassNo
        ,VisitorName        = VName
        ,[Name(Department)] = ISNULL(e.eName, '') + ISNULL(' (' +
d.dtName + ')', '')
        ,v.EntryTime 
FROM Visitorlogo_tbl        V
JOIN DepartmentMaster_tbl   D ON v.Deptid=d.dtId
LEFT 
JOIN EmployeeMaster_tbl     E ON v.empid=e.eId
WHERE v.EntryTime >= '2014-06-29'

Parallel algorithm that does a small insertion/shifting

One possible approach.

Due to the ambiguity of your shifting (0, 1, 0, 1, 0, 1, 1, 1 and 0, 1, 0 ,0 all produce the same data offset pattern, for example) it's not possible to just create a prefix sum of the shift pattern to produce the relative offset at each position. An observation we can make, however, is that a valid offset pattern will be created if each zero in the shift pattern gets replaced by the first non-zero shift value to its left:

0, 1, 0, 0   (shift pattern)
0, 1, 1, 1   (offset pattern)

or

0, 2, 0, 2   (shift pattern)
0, 2, 2, 2   (offset pattern)

So how to do this? Let's assume we have the second test case shift pattern:

      0, 1, 0, 0, 2, 0, 0, 0

Our desired offset pattern would be:

      0, 1, 1, 1, 2, 2, 2, 2
  1. for a given shift pattern, create a binary value, where each bit is one if the value at the corresponding index into the shift pattern is zero, and zero otherwise. We can use a warp vote instruction, called __ballot() for this. Each lane will get the same value from the ballot:

      1  0  1  1  0  1  1  1  (this is a single binary 8-bit
    value in this case)
    
  2. Each warp lane will now take this value, and add a value to it which has a 1 bit at the warp lane position. Using lane 1 for the remainder of the example:

    + 0  0  0  0  0  0  1  0  (the only 1 bit in this value
    will be at the lane index)
    = 1  0  1  1  1  0  0  1
    
  3. We now take the result of step 2, and bitwise exclusive-OR with the result from step 1:

    = 0  0  0  0  1  1  1  0
    
  4. We now count the number of 1 bits in this value (there is a __popc() intrinsic for this), and subtract one from the result. So for the lane 1 example above, the result of this step would be 2, since there are 3 bits set. This gives use the distance to the first value to our left that is non-zero in the original shift pattern. So for the lane 1 example, the first non-zero value to the left of lane 1 is 2 lanes higher, i.e. lane 3.

  5. For each lane, we use the result of step 4 to grab the appropriate offset value for that lane. We can process all lanes at once using a __shfl_down() warp shuffle instruction.

      0, 1, 1, 1, 2, 2, 2, 2
    

    Thus producing our desired "offset pattern".

Once we have the desired offset pattern, the process of having each warp lane use its offset value to appropriately shift its data item is straightforward.

Here is a fully worked example, using your 3 test cases. Steps 1-4 above are contained in the __device__ function mydelta. The remainder of the kernel is performing the step 5 shuffle, appropriately indexing into the data, and copying the data. Due to the usage of the warp shuffle instructions, we must compile this for a cc3.0 or higher GPU. (However, it would not be difficult to replace the warp shuffle instructions with other indexing code that would allow operation on cc2.0 or greater devices.) Also, due to the various intrinsics used, this function cannot work for more than 32 data items, but that was a prerequisite condition stated in your question.

$ cat t475.cu
#include <stdio.h>
#define DSIZE 8

#define cudaCheckErrors(msg) 
    do { 
        cudaError_t __err = cudaGetLastError(); 
        if (__err != cudaSuccess) { 
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)
", 
                msg, cudaGetErrorString(__err), 
                __FILE__, __LINE__); 
            fprintf(stderr, "*** FAILED - ABORTING
"); 
            exit(1); 
        } 
    } while (0)


__device__ int mydelta(const int shift){
  unsigned nz = __ballot(shift == 0);
  unsigned mylane = (threadIdx.x & 31);
  unsigned lanebit = 1<<mylane;
  unsigned temp = nz + lanebit;
  temp = nz ^ temp;
  unsigned delta = __popc(temp);
  return delta-1;
}
__global__ void mykernel(const int *data, const unsigned *shift, int
*result, const int limit){ // limit <= 32
  if (threadIdx.x < limit){
    unsigned lshift = shift[(limit - 1) - threadIdx.x];
    unsigned delta = mydelta(lshift);
    unsigned myshift = __shfl_down(lshift, delta);
    myshift = __shfl(myshift, ((limit -1) - threadIdx.x)); // reverse
offset pattern
    result[threadIdx.x] = 0;
    if ((myshift + threadIdx.x) < limit)
    result[threadIdx.x + myshift] = data[threadIdx.x];
  }
}

int main(){
  int A[DSIZE]         = {3, 6, 7, 8, 1, 2, 3, 5};
  unsigned tc1B[DSIZE] = {0, 1, 0, 0, 0, 0, 0, 0};
  unsigned tc2B[DSIZE] = {0, 1, 0, 0, 2, 0, 0, 0};
  unsigned tc3B[DSIZE] = {0, 1, 0, 0, 1, 0, 0, 0};

  int *d_data, *d_result, *h_result;
  unsigned *d_shift;
  h_result = (int *)malloc(DSIZE*sizeof(int));
  if (h_result == NULL) { printf("malloc fail
"); return 1;}
  cudaMalloc(&d_data, DSIZE*sizeof(int));
  cudaMalloc(&d_shift, DSIZE*sizeof(unsigned));
  cudaMalloc(&d_result, DSIZE*sizeof(int));
  cudaCheckErrors("cudaMalloc fail");
  cudaMemcpy(d_data, A, DSIZE*sizeof(int), cudaMemcpyHostToDevice);
  cudaMemcpy(d_shift, tc1B, DSIZE*sizeof(unsigned),
cudaMemcpyHostToDevice);
  cudaCheckErrors("cudaMempcyH2D fail");
  mykernel<<<1,32>>>(d_data, d_shift, d_result,
DSIZE);
  cudaDeviceSynchronize();
  cudaCheckErrors("kernel fail");
  cudaMemcpy(h_result, d_result, DSIZE*sizeof(int),
cudaMemcpyDeviceToHost);
  cudaCheckErrors("cudaMempcyD2H fail");
  printf("index: ");
  for (int i = 0; i < DSIZE; i++)
    printf("%d, ", i);
  printf("
A:     ");
  for (int i = 0; i < DSIZE; i++)
    printf("%d, ", A[i]);
  printf("
tc1 B: ");
  for (int i = 0; i < DSIZE; i++)
    printf("%d, ", tc1B[i]);
  printf("
tc1 C: ");
  for (int i = 0; i < DSIZE; i++)
    printf("%d, ", h_result[i]);
  cudaMemcpy(d_shift, tc2B, DSIZE*sizeof(unsigned),
cudaMemcpyHostToDevice);
  cudaCheckErrors("cudaMempcyH2D fail");
  mykernel<<<1,32>>>(d_data, d_shift, d_result,
DSIZE);
  cudaDeviceSynchronize();
  cudaCheckErrors("kernel fail");
  cudaMemcpy(h_result, d_result, DSIZE*sizeof(int),
cudaMemcpyDeviceToHost);
  cudaCheckErrors("cudaMempcyD2H fail");
  printf("
tc2 B: ");
  for (int i = 0; i < DSIZE; i++)
    printf("%d, ", tc2B[i]);
  printf("
tc2 C: ");
  for (int i = 0; i < DSIZE; i++)
    printf("%d, ", h_result[i]);
  cudaMemcpy(d_shift, tc3B, DSIZE*sizeof(unsigned),
cudaMemcpyHostToDevice);
  cudaCheckErrors("cudaMempcyH2D fail");
  mykernel<<<1,32>>>(d_data, d_shift, d_result,
DSIZE);
  cudaDeviceSynchronize();
  cudaCheckErrors("kernel fail");
  cudaMemcpy(h_result, d_result, DSIZE*sizeof(int),
cudaMemcpyDeviceToHost);
  cudaCheckErrors("cudaMempcyD2H fail");
  printf("
tc3 B: ");
  for (int i = 0; i < DSIZE; i++)
    printf("%d, ", tc3B[i]);
  printf("
tc2 C: ");
  for (int i = 0; i < DSIZE; i++)
    printf("%d, ", h_result[i]);
  printf("
");
  return 0;
}
$ nvcc -arch=sm_35 -o t475 t475.cu
$ ./t475
index: 0, 1, 2, 3, 4, 5, 6, 7,
A:     3, 6, 7, 8, 1, 2, 3, 5,
tc1 B: 0, 1, 0, 0, 0, 0, 0, 0,
tc1 C: 3, 0, 6, 7, 8, 1, 2, 3,
tc2 B: 0, 1, 0, 0, 2, 0, 0, 0,
tc2 C: 3, 0, 6, 7, 8, 0, 1, 2,
tc3 B: 0, 1, 0, 0, 1, 0, 0, 0,
tc2 C: 3, 0, 6, 7, 8, 1, 2, 3,
$

Finding "decent" numbers algorithm reasoning?

OK, the thinking goes something like this.

  1. Once you've worked out how many 5s and how many 3s you want, you should front-load the 5s. The ordering makes no odds to whether the number is decent; but it'll be bigger if the 5s are at the front.
  2. The number of 3s you want should be the smallest number that satisfies the constraints, because then you'll have more 5s, making for a bigger number.
  3. The number of 3s must be divisible by 5. That means that there's never any point having more than 10 3s: you should only consider 0, 5 or 10 3s. This is because you want the smallest number that leaves the number of remaining digits divisible by 3, to satisfy the other constraint. If having 15 3s works, then so does having 0 3s; if 20 works, then so does 5; if 25 works, then so does 10. In general, subtracting 15 from the number of 3s will leave the constraints both still satisfied if they were before.
  4. The number of 5s must therefore be n-0 or n-5 or n-10, and we want the one that gives a number that's divisible by 3. The calculation c = 5*(2*n%3) will give you 0 3s and therefore n 5s if n was already divisible by 3; and 10 3s and therefore n-10 5s if n was one greater than a multiple of 3, in which case n-10 is still divisible by 3; and so on.
  5. The only thing to test is whether the calculation of c 3s and n-c 5s satisfies the implicit constraint that n-c should be non-negative. If it's negative then there's no solution; if it's non-negative then this is a valid solution and front-loading the 5s will give you the largest such solution.

This is one of quite a wide class of "programming" problems where the test isn't really to see whether you can bash out some code that does the job, but to see whether you can reduce the problem logically to the point where it's trivial and can be solved very efficiently.



- Technology - Languages
+ Webmasters
+ Development
+ Development Tools
+ Internet
+ Mobile Programming
+ Linux
+ Unix
+ Apple
+ Ubuntu
+ Mobile & Tablets
+ Databases
+ Android
+ Network & Servers
+ Operating Systems
+ Coding
+ Design Software
+ Web Development
+ Game Development
+ Access
+ Excel
+ Web Design
+ Web Hosting
+ Web Site Reviews
+ Domain Name
+ Information Security
+ Software
+ Computers
+ Electronics
+ Hardware
+ Windows
+ PHP
+ ASP/ASP.Net
+ C/C++/C#
+ VB/VB.Net
+ JAVA
+ Javascript
+ Programming
Privacy Policy - Copyrights Notice - Feedback - Report Violation 2018 © BigHow